Context based language model selection

US9047870B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9047870-B2
Application numberUS-201113249181-A
CountryUS
Kind codeB2
Filing dateSep 29, 2011
Priority dateDec 23, 2009
Publication dateJun 2, 2015
Grant dateJun 2, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented speech-to-text conversion method, comprising: receiving a voice input provided by a user of an electronic device and contextual metadata that describes a context of the electronic device at a time when the voice input was received, the voice input received by a service running on the electronic device that is capable of providing, from voice or typed input, text output to multiple different applications on the electronic device, and is arranged to select a particular application of the multiple different applications to receive the text output, and the contextual metadata identifying text for a form field displayed to a user and to which the voice input was directed; identifying a plurality of base language models, wherein each base language model corresponds to a distinct textual corpus of content, and wherein each base language model is trained based on clusters identified in a bipartite cluster graph having clusters that correspond to particular categories of queries entered to a search engine by multiple different client devices, the clusters including search queries and corresponding search results, in the form of web pages, extracted from a historical log that are paired based on the web sites being top results for particular corresponding queries; selecting a particular base language model, from among the identified plurality of base language models, the selection based at least in part on the text corresponding to the field of the form displayed to the user and to which the voice input was directed; and using the selected particular base language model to convert the received voice input to a textual output, wherein the service is: able to (a) receive typed input in a typed mode and voice input in a spoken mode, and adopts the spoken mode based on a user selection before receiving the voice input, and (b) in response to receiving typed or voice input, provide text output to a first application, and arranged so that a particular instance of the service is external to the multiple different applications and provides text to different ones of the multiple different applications in a manner that speech-to-text conversion by the service is transparent to the different ones of the multiple different applications. 2. The method of claim 1 , wherein the voice input is received at a computer server system that is remote to the electronic device, the method further comprising: transmitting the textual output to the electronic device. 3. The method of claim 1 , wherein the form field is identified from a position of a cursor on the electronic device when the voice input was received at the electronic device. 4. The method of claim 1 , further comprising: generating an interpolated language model from a plurality of the base language models using determined weights for each of the base language models, wherein the weights are based at least in part on the contextual metadata, and the base language models are formed from queries entered by multiple users into a search engine. 5. The method of claim 1 , further comprising: building one or more of the base language models based on text input data collected from a plurality of users and metadata that corresponds to the text input data. 6. The method of claim 5 , wherein for a particular text input data the corresponding metadata identifies an input field that corresponds to the text input data. 7. The method of claim 5 , wherein the text input data and the metadata are formed as individual pairs, the method further comprising: forming a bipartite cluster graph of the individual pairs. 8. The method of claim 7 , further comprising identifying clusters in the graph and using the clusters to generate a language model. 9. The method of claim 8 , further comprising training the base language models by using sample voice utterances from a plurality of users of a plurality of electronic devices. 10. A computer-implemented system for converting speech to text, the system comprising: one or more computer processors; and one or more computer-readable devices including instructions that, when executed by the one or more computer processors, implement: an application of an operating system on distributed electronic devices, the application programmed to obtain, via a single instance of the application, both typed input and voice input, and to generate for a determined one of multiple different applications executable on a particular electronic device, text from either the typed input of the voice input depending on a user selection to place the particular electronic device in a typed input mode or a voice input mode, wherein voice input is accompanied by contextual metadata that describes a position of a cursor on a display of the particular electronic device at a time when the voice input is obtained; a plurality of base language models, each base language model corresponding to a particular semantic category, wherein the system is programmed to identify a particular base language model from the plurality of base language models, the identification based at least in part on the position of the cursor on the display of the particular electronic device at the time when the voice input is obtained, wherein each base language model is trained based on clusters identified in a bipartite cluster graph having clusters that correspond to particular categories of queries entered to a search engine by multiple different client devices, the clusters including search queries and corresponding search results, in the form of web pages, extracted from a historical log that are paired based on the web sites being top results for particular corresponding queries. 11. The system of claim 10 , further comprising an interpolated language model that is linked to the plurality of base language models, wherein each link between the interpolated language model and each of the base language models is associated with a weight, and the weight for each link between the interpolated language mode and a base language model is based on an accuracy of the base language model in associating a voice input with a text output representing a conversion of the voice input into text. 12. The system of claim 11 , wherein the weights represent likelihoods of usage in the interpolated language model matching usage in the particular base language model. 13. The system of claim 11 , wherein the weights are a function of the semantic category. 14. The system of claim 13 , further comprising: a network interface configured to: receive a voice input; and cause the interpolated language model to be applied to the voice input to generate a text output. 15. The system of claim 14 , wherein the network interface is further configured to: use metadata received with the voice input to match to the semantic category to determine weights for the base language models from the interpolated language model. 16. The system of claim 15 , wherein the system is configured to dynamically apply the weights for the plurality of base language models in real-time substantially as the voice input is received by the network interface. 17. A non-transitory computer-readable storage device encoded with a computer program product, the computer program product including instructions for speech-to-text conversion that, when executed, cause data processing apparatus to perform operations comprising: receiving a voice input provided by a user of an electronic device and contextual metadata that describes a context of the electronic device at a time when th

Assignees

Inventors

Classifications

  • Execution procedure of a spoken command · CPC title

  • by partitioning the display area of the touch-screen or the surface of the digitising tablet into independently controllable areas, e.g. virtual keyboards or menus · CPC title

  • using context dependencies, e.g. language models · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9047870B2 cover?
Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual co…
Who is the assignee on this patent?
Ballinger Brandon M, Schalkwyk Johan, Cohen Michael H, and 3 more
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 02 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).