Methods and apparatus related to automatically rewriting strings of text
US-2016062969-A1 · Mar 3, 2016 · US
US10002128B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10002128-B2 |
| Application number | US-201514927274-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 29, 2015 |
| Priority date | Sep 9, 2015 |
| Publication date | Jun 19, 2018 |
| Grant date | Jun 19, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computerized system for transforming an input string includes a dictionary with tokens and associated scores. A chart parser generates a chart parse of the input string by, for each position within the input string, (i) identifying a string of at least one consecutive character in the input string that begins at that position and matches one of the tokens and (ii) unless the identified string is a single character matching the start character for another entry in the chart parse, creating an entry corresponding to the identified string. A partition selection module determines a selected partition of the input string. The selected partition includes an array of tokens selected from the chart parse such that their concatenation matches the input string. The selected partition is a minimum score partition, where the score is based on a sum of the tokens' associated scores from the dictionary.
Opening claim text (preview).
The invention claimed is: 1. A computerized system for transforming an input text string, the input text string being an ordered set of characters, the system comprising: a dictionary data store configured to store a plurality of tokens, wherein each token is associated with a score, and wherein each token is a string of one or more characters; a chart parser configured to generate a chart parse of the input text string, wherein: the chart parse includes a plurality of entries, each entry includes (i) an indication of a start character of the entry within the input text string and (ii) an indication of an end character of the entry within the input text string, and the chart parser is configured to, for each position within the input text string, (i) identify a string of at least one consecutive character in the input text string that begins at that position and matches one of the plurality of tokens and (ii) unless the identified string is a single character matching the start character for another entry in the chart parse, create an entry corresponding to the identified string; a partition selection module configured to determine a selected partition of the input text string based on the entries of the chart parse, wherein: the selected partition includes an array of tokens such that a concatenation of the array of tokens matches the ordered set of characters of the input text string, each of the array of tokens is selected from the chart parse, a score of the selected partition is based on a sum of, for each token of the array of tokens, the score specified by the dictionary data store, and the selected partition is a minimum score partition; a data store storing application state records; a set generation module configured to, in response to a set of tokens, select records from the data store to form a consideration set of records; and a results generation module configured to respond to the user device with a subset of the consideration set of records, wherein the subset identifies application states of applications that are relevant to a search query from the user device, wherein the input text string is based on the search query from the user device. 2. The system of claim 1 wherein the chart parser is configured to, for each position within the input text string, (i) identify a string of consecutive characters in the input text string that ends at that position and matches one of the plurality of tokens and (ii) unless the identified string is a single character matching the end character for another entry in the chart parse, create an entry corresponding to the identified string. 3. The system of claim 1 wherein, for each entry, the indication of the start character of the entry within the input text string and the indication of the end character of the entry within the input text string are specified as one of: a numerical start position within the input text string and a numerical end position within the input text string; the numerical start position within the input text string and a numerical length; and the numerical length and the numerical end position within the input text string. 4. The system of claim 1 further comprising a hash map configured to store hash values of the set of tokens from the dictionary data store, wherein: the chart parser is configured to calculate a hash value of a candidate token from the input text string; and presence of the calculated hash value in the hash map indicates that the candidate token matches one of the set of tokens. 5. The system of claim 4 wherein the set of tokens is a proper subset of the plurality of tokens in the dictionary data store, and wherein the set of tokens is selected from the plurality of tokens based on a domain of the input text string. 6. The system of claim 5 wherein the scores associated with the set of tokens are dependent on the domain. 7. The system of claim 1 wherein the chart parser is configured to generate a second chart parse by, for each position within the input text string, (i) identify a string of consecutive characters in the input text string that ends at that position and matches one of the plurality of tokens and (ii) unless the identified string is a single character matching the end character for another entry in the second chart parse, create an entry in the second chart parse corresponding to the identified string. 8. The system of claim 7 wherein the partition selection module is configured to: determine a first partition of the input text string having a first score using the entries of the chart parse; determine a second partition of the input text string having a second score using the entries of the second chart parse; designate the first partition as the selected partition in response to the first score being lower than the second score; and designate the second partition as the selected partition in response to the second score being lower than the first score. 9. The system of claim 1 wherein the score of the selected partition is equal to the sum of, for each token of the array of tokens, the score specified by the dictionary data store. 10. The system of claim 1 wherein, for each token in the dictionary data store, the associated score is based on frequency of occurrence of the token. 11. The system of claim 10 wherein, for each token in the dictionary data store, the associated score is calculated by taking an inverse logarithm of the frequency of occurrence of the token. 12. A search system comprising: the system of claim 1 ; and a set processing module configured to assign a score to each record of the consideration set of records, wherein the subset is selected based on the assigned scores. 13. A search system comprising: the system of claim 1 ; an intake module configured to generate the application state records from source data, wherein the source data includes a text string used as the input text string; and a set processing module configured to assign a score to each record of the consideration set of records. 14. A computerized method for transforming an input text string, the input text string being an ordered set of characters, the method comprising: storing a plurality of tokens in a dictionary data store, wherein each token is associated with a score, and wherein each token is a string of one or more characters; generating a chart parse of the input text string, wherein: the chart parse includes a plurality of entries; each entry includes (i) an indication of a start character of the entry within the input text string and (ii) an indication of an end character of the entry within the input text strings, and generating the chart parse includes, for each position within the input text string, (i) identifying a string of at least one consecutive character in the input text string that begins at that position and matches one of the plurality of tokens and (ii) unless the identified string is a single character matching the start character for another entry in the chart parse, creating an entry corresponding to the identified string; determining a selected partition of the input text string based on the entries of the chart parse, wherein: the selected partition includes an array of tokens such that a concatenation of the array of tokens matches the ordered set of characters of the input text string, each of the array of tokens is selected from the chart parse, a score of the selected partition is based on a sum of, for each token of the array of tokens, the score specified by the dictionary data store, and the selected partition is a minimum score partition;
Phrasal analysis, e.g. finite state techniques or chunking · CPC title
Processing of non-Latin text (kana-to-kanji conversion G06F40/129; vowelisation G06F40/232) · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.