Text auto-correction via N-grams

US9779080B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9779080-B2
Application numberUS-201213544941-A
CountryUS
Kind codeB2
Filing dateJul 9, 2012
Priority dateJul 9, 2012
Publication dateOct 3, 2017
Grant dateOct 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An input text string is received that contains characters or words. The input text string can be completed or corrected using contact scores based on n-grams. In addition, a subsequent text string and a preceding text string for the input text string are also identified, again using n-gram scores. A corrected text string is created by inserting the preceding text string before the input text string and appending the subsequent text string after the input text string.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for text auto-correction, the method comprising: receiving an input text string on an electronic text input interface device, the input text string comprising N words and a categorical topic; generating a subsequent text string comprising a plurality of N−1 subsequent words forming a subsequent phrase within the categorical topic by determining probabilities that the N−1 subsequent words follow the N words in the input test string; generating a preceding text string comprising a plurality of N−1 preceding words forming a preceding phrase for the input text string within the categorical topic by determining probabilities that the N−1 preceding words following precede the N words in the input test string; creating a corrected text string by inserting the preceding phrase before the input text string and appending the subsequent phrase after the input text string; and displaying the corrected text string on the electronic text input interface device. 2. The method of claim 1 , wherein: the step of receiving the input text string further comprises receiving an input text string comprising the N words forming an input phrase; and the method further comprises replacing the input phrase with a substitute input phrase comprising N words based on N-gram content scores associated with the input phrase comprising the N words and the substitute input phrase expressing a probability of accuracy of content. 3. The method of claim 2 , wherein the step of replacing the input phrase with the substitute input phrase further comprises: identifying a plurality of sub-phrases in the input phrase, each sub-phrase comprising a number of words less than the N words forming the input phrase; identifying a plurality of substitute sub-phrases, each substitute sub-phrase associated with one of the sub-phrases in the input phrase; forming a plurality of candidate substitute input phrases, each candidate substitute input phrase comprising a unique combination of the input phrase and at least one of the plurality of substitute sub-phrases; assigning a content score to the input phrase and to each one of the candidate substitute input phrases; selecting the candidate substitute input phrase having a highest content score as the substitute input phrase; and replacing the input phrase with the substitute input phrase only if a substitute input phrase content score is higher than an input phrase content score. 4. The method of claim 1 , wherein: the step of receiving the input text string comprises: receiving the N words comprising a first language; and translating the N words to a second language; and the steps of generating the subsequent text string and the preceding text string further comprise generating a subsequent text string comprising a plurality of N−1 subsequent words in the second language forming a subsequent phrase and generating a preceding text string comprising a plurality of N−1 preceding words in the second language forming a preceding phrase. 5. The method of claim 1 , wherein: the step of receiving the input text string further comprises receiving an input text string comprising a plurality of N words forming an input phrase in a first language; and the method further comprises: replacing the input phrase with a substitute input phrase based on content scores associated with the input phrase and the substitute input phrase expressing a probability of accuracy of content; and translating the substitute input phrase into a second language. 6. The method of claim 1 , wherein the method further comprises locating the input text string in a state graph comprising a plurality of states and a plurality of transitions between pairs of states, each state comprising a given text string and each transition comprising a probability that text strings in a given pair of states associated with that transition comprise a sequence of text strings. 7. The method of claim 6 , wherein: the step of locating the input text string further comprises locating the input text string in an input state; and the steps of generating a subsequent text string and a preceding text string further comprise: generating a subsequent state comprising the subsequent text string such that a transition between the input state and the subsequent state comprises a highest probability associated with a sequence comprising the input text string followed by the subsequent text string; and generating a preceding state comprising the preceding text string such that a transition between the input state and the preceding state comprises a highest probability associated with a sequence comprising the preceding text string followed by the input text string. 8. The method of claim 1 , wherein: the step of receiving the input text string further comprises receiving at least one core word; and the method further comprises completing the input phrase containing the core word. 9. The method of claim 8 , wherein: step of receiving at least one core word further comprises receiving a plurality of core words; and the input phrase comprises the plurality of core words such the core words are non-contiguous words in the input phrase. 10. A non-transitory computer-readable storage medium containing a computer-readable code that when read by a computer causes the computer to perform a method for text auto-correction, the method comprising: receiving an input text string on an electronic text input interface device, the input text string comprising N words and a categorical topic; generating a subsequent text string comprising a plurality of N−1 subsequent words forming a subsequent phrase within the categorical topic by determining probabilities that the N−1 subsequent words follow the N words in the input test string; generating a preceding text string comprising a plurality of N−1 preceding words forming a preceding phrase for the input text string within the categorical topic by determining probabilities that the N−1 preceding words following precede the N words in the input test string; creating a corrected text string by inserting the preceding phrase before the input text string and appending the subsequent phrase after the input text string; and displaying the corrected text string on the electronic text input interface device. 11. The non-transitory computer-readable medium of claim 10 , wherein: the step of receiving the input text string further comprises receiving an input text string comprising the N words forming an input phrase; and the method further comprises replacing the input phrase with a substitute input phrase comprising N words based on content scores associated with the input phrase and the substitute input phrase expressing a probability of accuracy of content. 12. The non-transitory computer-readable medium of claim 11 , wherein the step of replacing the input phrase with the substitute input phrase further comprises: identifying a plurality of sub-phrases in the input phrase, each sub-phrase comprising a number of words less than the N words forming the input phrase; identifying a plurality of substitute sub-phrases, each substitute sub-phrase associated with one of the sub-phrases in the input phrase; forming a plurality of candidate substitute input phrases, each candidate substitute input phrase comprising a unique combination of the input phrase and at least one of the plurality of substitute sub-phrases; assigning a content score to each one of the candidate substitute input phrases; and selecting the candidate substitute input phrase having a highest content score as the substitute input phrase. 13. The non-tra

Assignees

Inventors

Classifications

  • G06F40/232Primary

    Orthographic correction, e.g. spell checking or vowelisation · CPC title

  • Converting codes to words; Guess-ahead of partial word inputs · CPC title

  • G06F17/273Primary

    Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9779080B2 cover?
An input text string is received that contains characters or words. The input text string can be completed or corrected using contact scores based on n-grams. In addition, a subsequent text string and a preceding text string for the input text string are also identified, again using n-gram scores. A corrected text string is created by inserting the preceding text string before the input text st…
Who is the assignee on this patent?
Caskey Sasha P, Kanevsky Dimitri, Kozloski James R, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F40/232. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).