Computer architecture for string searching

US12019701B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12019701-B2
Application numberUS-202117386477-A
CountryUS
Kind codeB2
Filing dateJul 27, 2021
Priority dateJul 27, 2021
Publication dateJun 25, 2024
Grant dateJun 25, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An embodiment of the present invention is a prime representation data structure in a computer architecture. The prime representation data structure has a plurality of records where each record contains a prime representation and where the prime representation is a product of two or more selected prime factors. Each of the selected prime factor associated with an n-gram of a domain representation of a domain string. The domain representation of the domain string is a domain string of ordered, contiguous domain characters. The n-gram being a subset of n number of the ordered, contiguous domain characters in the domain string. The computer architecture performs string searching and includes one or more central processing units (CPUs) with one or more operating systems, one or more input/output device interfaces, one or more memories, and one or more input/output devices. The architecture further includes the prime representation data structure, one or more prime target query data structures and a search process performed by one or more of the CPUs. The CPUs can be organized in a hierarchical structure. The prime target query data structure has one or more target prime queries. Each target prime query is the product of one or more target selected prime factors. Each target selected factor is associated with a target n-gram of a target domain representation of a target domain string. The search process, performed by one or more of the CPUs, determines whether one or more of the target selected prime factors is common with one of the selected prime factors. By performing this efficient testing, the computer system can determine if one or more small strings are included in one or more large strings.

First claim

Opening claim text (preview).

We claim: 1. A data structure comprising: a prime representation data structure on a network server, in a computer architecture, the prime representation data structure having a plurality of records, each record containing a prime representation, the prime representation being a product of two or more selected prime factors, each of the selected prime factors associated with an n-gram of a domain representation of a domain string, the domain representation of the domain string being a domain string of ordered, contiguous domain characters, and the n-gram being a subset of n number of the ordered, contiguous domain characters in the domain string; and a prime target query data structure, the prime target query data structure containing one or more target prime queries, each target prime query having one or more target selected prime factors, each target selected prime factor associated with a target n-gram of a target domain representation of a target domain string, wherein the target domain string is a short string (SS) in the domain. 2. The prime representation data structure, as in claim 1 , where one or more of the prime factors is in a compressed format. 3. The prime representation data structure, as in claim 2 , where the compressed format of one or more of the prime factors is an exponential form, the exponential form having a base and the base having an exponent. 4. The prime representation data structure, as in claim 3 , one or more of the prime factors is a Euler totient of a prime number, where the base of the exponential form is chosen from a set of small prime numbers. 5. The prime representation data structure, as in claim 4 , where the set of small prime numbers are a subset of the set of the first 15 prime numbers. 6. The prime representation data structure, as in claim 1 , where the domain characters are selected from the group consisting of: a character of a natural language, a letter of a natural language, a representation of a character in a natural language, an “American Standard Code for Information Interchange” (ASCII) character, a number, and a phoneme of a natural language. 7. The prime representation data structure, as in claim 1 , stored in a selection from the group consisting of: an internal computer memory, an external computer memory, a hard drive, a computer chip, a read only memory (ROM), a random-access memory (RAM), one or more memories distributed over a network, and a memory on a network server. 8. The prime representation data structure, as in claim 1 , where one or more of the prime representations are associated with a selection from the group consisting of: a public database, a literary work, a compendium of technical data, one or more documents on a network server a dictionary, a thesaurus, a natural language document translating a first word in a fist natural language to one or more second words in a second natural language, and a technical document translating a first functional description into a second functional description. 9. A string searching computer architecture comprising: one or more central processing units (CPUs) with one or more operating systems and one or more input/output device interfaces, one or more memories, and one or more input/output devices; a prime representation data structure on a network server, the prime representation data structure having a plurality of records, each record containing a prime representation, the prime representation being a product of two or more selected prime factors, each of the selected prime factors associated with an n-gram of a domain representation of a domain string, the domain representation of the domain string being a domain string of ordered, contiguous domain characters and the n-gram being a subset of n number of ordered, contiguous domain characters of the domain string; one or more prime target query data structures, the prime target query data structure containing one or more target prime queries, each target prime query having one or more target selected prime factors, each target selected prime factor associated with a target n-gram of a target domain representation of a target domain string, wherein the target domain string is a short string (SS) in the domain; and a search process performed by one or more of the CPUs to determine whether one or more of the target selected prime factors is common with one of the selected prime factors. 10. The string searching computer architecture, as in claim 9 , where the domain representation is a large string (LS) in a domain. 11. The string search computer architecture, as in claim 10 , where the SS is contained in the LS responsive to each of the target selected prime factors is common with at least one of the selected prime factors. 12. The string search computer architecture, as in claim 10 , where the SS is contained in the LS responsive to the prime representation divided by the target selected prime factors has a modulo of zero. 13. The string searching computer architecture, as in claim 9 , further comprising: a domain representation data structure with a plurality of domain records, the domain representation data structure stored in one or more of the memories, each of the domain records containing one or more of the domain representations; an n-gram process that creates one or more of the n-gram associated with one or more of the domain representations; and a mapping process that uniquely associates one of the selected prime factors with one of the n-grams and creates one of the prime representations of a corresponding domain representation by multiplying together each of the selected prime factors that is associated with one of the n-grams contained in the corresponding domain representation. 14. The string searching computer architecture, as in claim 9 , further comprising a selected prime number data structure that stores a plurality of selected prime factors. 15. The string searching computer architecture, as in claim 14 , where the each of the selected prime factors is selected to have a Euler's totient that can be defined in an exponential form, the exponential form being a base and an exponent, and where each of the bases is a small prime number. 16. The string searching computer architecture, as in claim 15 , where each of the small prime numbers is selected from a subset of the first 15 prime numbers. 17. The string searching computer architecture, as in claim 9 , where the mapping process performs the following steps instead: associating a first selected prime factor with each of the respective n-grams; associating a second prime factor with the respective n-gram; creating a first prime representations of a corresponding domain representation by multiplying together each of the first selected prime factors; and creating a second prime representations of a corresponding domain representation by multiplying together each of the second selected prime factors. 18. The string searching computer architecture, as in claim 9 , where the prime representation data structure is stored in a selection from the group consisting of: an internal computer memory, an external computer memory, an external hard drive, a computer chip, a read only memory (ROM), a random-access memory (RAM), and a memory on a network server. 19. A method of creating a prime representation data structure on a network server, comprising the steps of: creating a plurality of prime records stored in one or more of the memories, wherein each record contains a prime representation, the prime representati

Assignees

Inventors

Classifications

  • Management therefor · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Character encoding · CPC title

  • Unicode · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12019701B2 cover?
An embodiment of the present invention is a prime representation data structure in a computer architecture. The prime representation data structure has a plurality of records where each record contains a prime representation and where the prime representation is a product of two or more selected prime factors. Each of the selected prime factor associated with an n-gram of a domain representatio…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/144. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).