Adaptive generation of out-of-dictionary personalized long words

US9411800B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9411800-B2
Application numberUS-16308208-A
CountryUS
Kind codeB2
Filing dateJun 27, 2008
Priority dateJun 27, 2008
Publication dateAug 9, 2016
Grant dateAug 9, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system is provided, including a display unit, a memory unit, and a processor. The processor is configured to calculate a mutual information value between a first chunk and a second chunk, and to add a new word to a language unit when a condition involving the mutual information value is satisfied. The new word is a combination of the first chunk and the second chunk. The processor is also configured to add the new word into an n-gram store. The n-gram store includes a plurality of n-grams and associated frequency or count information. The processor is also configured to alter the frequency or count information based on the new word.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method implemented by a device that executes a word building application, the method comprising: receiving, by the device, letters to initiate segmentation into a first letter set and a second letter set, each of the first and second letter sets comprising one or more of the letters; determining, by the device, a statistical relationship between the first letter set and the second letter set; determining, by the device, whether the statistical relationship satisfies a condition; responsive to satisfying the condition, adding, by the device, a word composed of the first letter set and the second letter set into a data store associated with the device containing one or more words and one or more bigrams, each of the words having associated count information and each of the bigrams composed of a leading data word and one or more trailing data words, each bigram configured to have associated count information; identifying, by the device, a first bigram from a set of bigrams in the data store, each bigram of the set including the first letter set as a trailing data word; transforming, by the device, a leading data word of the first bigram and the word into a new bigram to add to the data store associated with the device, the new bigram composed of: the leading data word of the first bigram as a leading data word of the new bigram and the word as a trailing data word of the new bigram, computing, by the device, updated count information for the new bigram using a proportional adjustment that describes a relationship between the word following the leading data word of the first bigram relative to the first letter set following the leading data word of the first bigram; and analyzing, by the device, received user generated content using the data store associated with the device to predict text associated with the user generated content. 2. A method as recited in claim 1 , further comprising: associating a first count with the first letter set corresponding to the number of instances of the first letter set as the leading data word; associating a new count with the word corresponding to the number of instances the first letter set is the leading data word and the second letter set is the trailing data word; and responsive to adding the word into the data store, updating the first count with a value determined from subtracting the new count from the first count. 3. A method as recited in claim 1 , further comprising: identifying count information for the first bigram of the set; determining a delta value for the first bigram of the set; and responsive to determining a positive delta value indicating a relationship between the first bigram of the set and the word, computing updated associated count information for the first bigram of the set. 4. A method as recited in claim 1 , further comprising: identifying a second set of bigrams from the one or more bigrams in the data store, each bigram of the second set having the associated count information and including the second letter set as the leading data word; determining a delta value for a first bigram of the second set; and responsive to determining a positive delta value indicating a relationship between the first bigram of the second set and the word, computing updated associated count information for the first bigram of the second set. 5. A method as recited in claim 4 , further comprising: creating a second new bigram, the second new bigram composed of the word as a leading data word of the second new bigram and a trailing data word of the first bigram of the set as a trailing data word of the second new bigram. 6. A method as recited in claim 4 , wherein computing updated count information for the first bigram of the second set uses a proportional adjustment that describes a relationship between the word preceding the trailing data word of the first bigram relative to the second letter set preceding the trailing data word of the first bigram. 7. A method as recited in claim 1 , wherein the leading data word is composed of: a respective letter set, a respective word, or a respective phrase. 8. A method as recited in claim 1 , wherein the one or more trailing words are composed of: a respective letter set, a respective word, or a respective phrase. 9. A method as recited in claim 1 , wherein to satisfy the condition further includes determining whether count information associated with the first letter set being followed by the second letter set is greater than a threshold value. 10. A method as recited in claim 1 , wherein each bigram of the set of bigrams has associated count information. 11. A method as recited in claim 1 , further comprising: displaying the predicted text. 12. A method as recited in claim 11 , further comprising providing the new bigram to a text prediction application responsive to receiving an indication of user input. 13. A method as recited in claim 1 , further comprising providing the updated count information to a text prediction application, a handwriting recognition application, and/or a spelling checker application. 14. A word building system, comprising: a device that includes at least a memory and a processor to implement a word building service that is configured to: receive letters to initiate segmentation into a first letter set and a second letter set, each of the first and second letter sets comprising one or more of the letters; determine a statistical relationship between the first letter set and the second letter set; determine whether the statistical relationship satisfies a condition; responsive to satisfying the condition, add a word composed of the first letter set and the second letter set into a data store associated with the device containing one or more words and one or more bigrams, each of the one or more words and one or more bigrams having associated count information, the one or more bigrams composed of a respective leading data word and one or more respective trailing data words; transform a leading data word of the first bigram and the word into a new bigram to add into the data store, the new bigram composed of: the leading data word of a first bigram as a leading data word of the new bigram; and the word as a trailing data word of the new bigram; compute updated count information for the new bigram using a proportional adjustment that describes a relationship between the word following the leading data word of the first bigram relative to the first letter set following the leading data word of the first bigram; and analyze received user generated content using the data store associated with the device to predict text associated with the user generated content. 15. A word building system as recited in claim 14 , wherein the word building service is further configured to: associate a first count with the first letter set corresponding to the number of instances of the first letter set as the leading data word; associate a new count with the word corresponding to the number of instances the first letter set is the leading data word and the second letter set is the trailing data word; and responsive to adding the word into the data store, updating the first count with a value determined from subtracting the new count from the first count. 16. A word building system as recited in claim 14 , wherein the word building service is further configured to: identify a set of bigrams from the one or more bigrams in the data store, each bigram of the set having the associated count information and including the fir

Assignees

Inventors

Classifications

  • G06F40/274Primary

    Converting codes to words; Guess-ahead of partial word inputs · CPC title

  • G06F17/276Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9411800B2 cover?
A system is provided, including a display unit, a memory unit, and a processor. The processor is configured to calculate a mutual information value between a first chunk and a second chunk, and to add a new word to a language unit when a condition involving the mutual information value is satisfied. The new word is a combination of the first chunk and the second chunk. The processor is also con…
Who is the assignee on this patent?
Morin Frederic, Yu Wei, Eisenhart F James, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F40/274. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 09 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).