Context-based utterance recognition

US9633653B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9633653-B1
Application numberUS-201514675104-A
CountryUS
Kind codeB1
Filing dateMar 31, 2015
Priority dateDec 27, 2011
Publication dateApr 25, 2017
Grant dateApr 25, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In some implementations, a digital work provider may provide language model information related to a plurality of different contexts, such as a plurality of different digital works. For example, the language model information may include language model difference information identifying a plurality of sequences of one or more words in a digital work that have probabilities of occurrence that differ from probabilities of occurrence in a base language model by a threshold amount. The language model difference information corresponding to a particular context may be used in conjunction with the base language model to recognize an utterance made by a user of a user device. In some examples, the recognition is performed on the user device. In other examples, the utterance and associated context information are sent over a network to a recognition computing device that performs the recognition.

First claim

Opening claim text (preview).

The invention claimed is: 1. A device comprising: a processor; and one or more computer-readable media to store processor-executable instructions that, when executed, program the one or more processors to: identify a plurality of n-grams associated with at least a portion of a digital work, an n-gram of the plurality of n-grams comprising a sequence of at least one or more words, one or more phonemes, one or more syllables, one or more letters, or one or more base pairs; associate a first probability of occurrence with the n-gram based at least in part on a frequency of occurrence in at least the portion of the digital work; determine language model difference information based at least in part on the first probability of occurrence associated with the n-gram differing from a second probability of occurrence of the n-gram in a base language model by more than a threshold amount; and determine a word based at least in part on a captured utterance, the base language model, and the language model difference information. 2. The device as recited in claim 1 , wherein the processor-executable instructions further program the one or more processors to generate the base language model based at least in part on at least one of a webpage, an electronic book, a news feed, a social network site, a microblog, or a closed captioning feed. 3. The device as recited in claim 1 , wherein the processor-executable instructions further program the one or more processors to generate the base language model from a plurality of digital works. 4. The device as recited in claim 1 , further comprising a communication interface, and wherein the processor-executable instructions further program the one or more processors to send, via the communication interface, the language model difference information and the base language model to a speech recognizing computing device that provides an utterance recognition service. 5. The device as recited in claim 1 , further comprising a communication interface, and wherein the processor-executable instructions further program the one or more processors to send, via the communication interface, the language model difference information to a user device in association with providing the digital work to the user device. 6. The device as recited in claim 1 , further comprising a communication interface, and wherein the processor-executable instructions further program the one or more processors to: receive, via the communication interface, the captured utterance from a user device; and send, via the communication interface, information associated with the word to the user device. 7. The device as recited in claim 6 , wherein the processor-executable instructions further program the one or more processors to receive, via the communication interface, context information associated with the utterance, the context information identifying the language model difference information. 8. The device as recited in claim 1 , further comprising a communication interface, and wherein the processor-executable instructions further program the one or more processors to: receive, via the communication interface, user information corresponding to the digital work from a plurality of user devices; and wherein the first probability of occurrence associated with the n-gram is weighted based at least in part on the user information. 9. The device as recited in claim 8 , wherein the user information includes at least one of information corresponding to a user highlight of the digital work or information corresponding to a user annotation to the digital work. 10. A method executable by one or more computing processors to perform operations comprising: identify a plurality of n-grams included in user information associated with at least a portion of a digital work, an n-gram of the plurality of n-grams comprising a sequence of one or more words; associating a first probability of occurrence with the n-gram based at least in part on a frequency of occurrence of the n-gram in at least the user information; determining language model difference information based at least in part on the first probability of occurrence associated with the n-gram differs from a second probability of occurrence of the n-gram in a base language model by more than a threshold amount; and determining a word based at least in part on a captured utterance, the base language model, and the language model difference information. 11. The method as recited in claim 10 , wherein the base language model includes a probability-weighted distribution of n-gram sequences for a language associated with the digital work. 12. The method as recited in claim 10 , wherein determining that the first probability of occurrence associated with the n-gram differs from the second probability of occurrence of the n-gram in the base language model by more than the threshold amount comprises determining that the first probability of occurrence associated with the n-gram differs from the second probability of occurrence of the n-gram in the base language model by more than a predetermined distance between the first probability of occurrence and the second probability of occurrence. 13. The method as recited in claim 10 , further comprising generating the base language model based at least in part on a plurality of digital works. 14. The method as recited in claim 10 , further comprising generating the base language model based at least in part on at least one of a webpage, an electronic book, a news feed, a social network site, a microblog, or a closed captioning feed. 15. The method as recited in claim 10 , further comprising determining language model difference information for the digital work based at least in part on the second probability of occurrence associated with the n-gram differing from the first probability of occurrence of the n-gram by more than the threshold amount. 16. The method as recited in claim 10 , wherein identifying the plurality of n-grams included in the user information associated with the at least the portion of the digital work comprises identifying the plurality of n-grams included in at least one of user highlights, user annotations, or user-created content. 17. One or more non-transitory computer-readable media maintaining instructions executable by one or more processors to perform operations comprising: determining an n-gram comprising a sequence of one or more words based at least in part on parsing a plurality of digital works, wherein the plurality of digital works are associated with a particular subject matter category; associating a first probability of occurrence with the n-gram based at least in part on a frequency of occurrence of the n-gram in the plurality of digital works; and determining language model difference information based at least in part on the first probability of occurrence associated with the n-gram differs from a second probability of occurrence of the n-gram in a base language model by more than a threshold amount; and determining a word based at least in part on a captured utterance, the base language model and the language model difference information. 18. The one or more non-transitory computer-readable media as recited in claim 17 , wherein the base language model includes a probability-weighted distribution of n-gram sequences for a language associated with the plurality of digital works. 19. The one or more non-transitory computer-readable media as recited in claim 17 , the operations further comprising: sending the language mod

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Probabilistic grammars, e.g. word n-grams · CPC title

  • of application context · CPC title

  • Training · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9633653B1 cover?
In some implementations, a digital work provider may provide language model information related to a plurality of different contexts, such as a plurality of different digital works. For example, the language model information may include language model difference information identifying a plurality of sequences of one or more words in a digital work that have probabilities of occurrence that di…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).