Document retrieval using internal dictionary-hierarchies to adjust per-subject match results
US-2016299967-A1 · Oct 13, 2016 · US
US9317606B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9317606-B1 |
| Application number | US-201313757271-A |
| Country | US |
| Kind code | B1 |
| Filing date | Feb 1, 2013 |
| Priority date | Feb 3, 2012 |
| Publication date | Apr 19, 2016 |
| Grant date | Apr 19, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer implemented method and system for spell correcting terms within a string of terms that a computer system receives from a computer readable data string representative of a user search query.
Opening claim text (preview).
What is claimed is: 1. A computer implemented method, comprising: receiving a search query; identifying a plurality of search terms of the search query; determining a model term score for each of the identified search terms, the model term score for a given search term of the identified search terms being indicative of a probability of misspelling of the given search term, wherein determining the model term score for the given search term includes determining whether the given search term is in a dictionary of a language model and wherein the model term score for the given search term is determined independent of any alternative suggestions that are alternative spellings of the given search term; selecting candidate search terms from the identified search terms of the search query based on the model term scores, the candidate search terms being a subset of the identified search terms of the search query, wherein selecting a candidate search term of the candidate search terms comprises selecting the candidate search term based on the model term score for the candidate search term being more indicative of misspelling than the model term scores of other of the identified search terms not selected as one of the candidate search terms; and providing the candidate search terms to a spell correction system. 2. The computer implemented method as recited in claim 1 , wherein determining the model term score for each of the identified search terms further includes: determining whether each of the identified search terms is in a dictionary of a language model; and determining a language model term score for each of the identified search terms found in the dictionary of the language model. 3. The computer implemented method as recited in claim 2 , wherein determining the model term score for each of the identified search terms further includes: determining an n-gram model term score for each of the identified search terms not found in the dictionary of the language model, where each n-gram model term score has a value more indicative of misspelling than any of the language model term scores. 4. The computer implemented method as recited in claim 3 , wherein the language model term score is based on the probability of misspelling estimated by the language model using term frequency, and where determining the n-gram model term score is based on the probability of a character sequence estimated by the character n-gram model. 5. The computer implemented method as recited in claim 4 , wherein selecting the candidate search terms comprises selecting a predetermined number of the identified search terms having model term scores that satisfy a threshold, and where providing the candidate search terms to the spell correction system comprises providing the predetermined number of the identified search terms to the spell correction system and receiving alternative versions of the candidate search terms from the spell correction system. 6. The computer implemented method as recited in claim 5 , further comprising: determining a total number of terms M to be spell corrected; determining a context number of terms that is desired on either side of each of N terms having model term scores that satisfy the threshold; multiplying the context number of terms by 2 resulting in a product (P); adding one to the product resulting in (P+1); dividing total number of terms M by the result (P+1) thereby providing the number N; and providing the context number of terms on either side of each of the N terms having model term scores that satisfy the threshold to the spell correction system. 7. The computer implemented method as recited in claim 6 , further comprising: providing an improved score to one of two or more terms having the same model term scores based on proximity to a term with a model term score that satisfies the threshold. 8. The computer implemented method as recited in claim 1 , further comprising: receiving a suggestion from the spell correction system, the suggestion representative of suggesting an alternative spelling to one or more of the identified search terms. 9. The computer implemented method in claim 1 , wherein determining the model term score for each of the identified search terms further includes: determining if each of the identified search terms is in a language model and determining a language model term score for each of the identified search terms determined to be in the language model, wherein the language model term score is based on the probability of misspelling estimated by the language model using term frequency; and determining an n-gram model term score for each of the identified search terms received and determined to not be in the language model, where determining the n-gram model term score is based on the probability of a character sequence estimated by the character n-gram model. 10. The computer implemented method of claim 9 , wherein each n-gram model term score has a value more indicative of misspelling than any of the language model term scores. 11. The computer implemented method as recited in claim 10 , further comprising: determining a total number of search terms to be spell corrected; determining a context number of terms that is desired on either side of each of N terms having model term scores that satisfy the threshold; multiplying the context number of terms by 2 resulting in a product (P); adding one to the product resulting in (P+1); dividing total number terms by the result (P+1) thereby providing the number N; and providing the context number of terms on either side of each of the N terms having model term scores that satisfy the threshold to the spell correction system. 12. The computer implemented method as recited in claim 11 , further comprising: providing a higher score to one of two or more terms having the same model term score based on order of appearance. 13. The computer implemented method as recited in claim 11 , further comprising: providing a higher score to one of two or more terms having the same model term score based on proximity to a term with a model term score that satisfies the threshold. 14. The computer implemented method as recited in claim 9 , further comprising: receiving a suggestion from the correction system, the suggestion representative of suggesting an alternative spelling to one or more of the search terms. 15. A system comprising one or more processors configured to perform operations comprising: receiving a search query; identifying a plurality of search terms of the search query; determining a model term score for each of the identified search terms, the model term score for a given search term of the identified search terms being indicative of a probability of misspelling of the given search term, wherein determining the model term score for the given search term includes determining whether the given search term is in a dictionary of a language model and wherein the model term score for the given search term is determined independent of any alternative suggestions that are alternative spellings of the given search term; selecting candidate search terms from the identified search terms of the search query based on the model term scores, the candidate search terms being a subset of the identified search terms of the search query, wherein selecting a candidate search term of the candidate search terms comprises selecting the candidate search term based on the model term score for the candidate search term being more indicative of misspelling than the model term scores of other of the identified search terms not selecte
Orthographic correction, e.g. spell checking or vowelisation · CPC title
Syntactic pre-processing, e.g. stopword elimination, stemming · CPC title
Indexing; Web crawling techniques · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.