Speech recognition using an operating system hooking component for context-aware recognition models

US9489375B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9489375-B2
Application numberUS-201213526789-A
CountryUS
Kind codeB2
Filing dateJun 19, 2012
Priority dateJun 19, 2011
Publication dateNov 8, 2016
Grant dateNov 8, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided. For each state, a corresponding language model is trained based on the input(s) provided to the application while the application was in that state. When the application is next observed to be in a previously-observed state, a language model associated with the application's current state is applied to recognize speech input provided by a user and thereby to generate speech recognition output that is provided to the application. An application's state at a particular time may include the user interface element(s) that are displayed and/or in focus at that time, and is determined by an operating system hooking component embedded in the automatic speech recognition system.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method performed by at least one computer processor executing computer program instructions tangibly stored on at least one non-transitory computer-readable medium, the method comprising using the at least one computer processor to perform operations of: receiving, by an automatic speech recognition system executed by the at least one computer processor, a first plurality of inputs into a user interface element of a target application while the target application is in a first state; training, by the automatic speech recognition system, a first language model based on the first plurality of inputs, comprising: identifying, by an operating system hooking component included in the speech recognition system executed by the at least one computer processor, a state of the user interface element by intercepting messages between the said user interface element and the computer processor's operating system while the user interface element is displayed in a foreground of a graphical user interface; and associating, by the automatic speech recognition system, the first language model with the user interface element; determining, by the automatic speech recognition system, that the target application is in the first state, wherein determining further comprises determining that the user interface element is in the identified state; applying, by the automatic speech recognition system, the first language model to a first speech input in response to determining that the target application is in the first state; receiving, by the automatic speech recognition system, a second plurality of inputs into the target application while the target application is in a second state that differs from the first state; training, by the automatic speech recognition system, a second language model based on the second plurality of inputs; determining, by the automatic speech recognition system, that the target application is in the second state; and applying, by the automatic speech recognition system, the second language model to second speech input in response to determining that the target application is in the second state. 2. The method of claim 1 , further comprising providing, to the target application, a result of applying the first language model to the first speech input. 3. The method of claim 2 , wherein providing further comprises emulating a keyboard event to submit at least one of text and a control sequence to the target application. 4. The method of claim 2 , wherein providing further comprises: posting the result to a clipboard buffer maintained by an operating system executing the target application; and receiving, by the target application, the result from the clipboard buffer. 5. The method of claim 2 , wherein providing further comprises: generating an operating system message including the result; and transmitting the operating system message to the target application. 6. The method of claim 2 , wherein providing further comprises emulating a pointing device event to submit at least one of text and a control sequence to the target application. 7. The method of claim 2 , wherein providing further comprises: identifying, by an operating system hooking component, a user interface having input focus; and providing the result to the user interface having input focus. 8. The method of claim 1 , wherein receiving further comprises receiving the first plurality of inputs from a text-based input device. 9. The method of claim 1 , wherein receiving further comprises receiving the first plurality of inputs from a pointing device. 10. The method of claim 1 , wherein receiving further comprises receiving the first plurality of inputs from a speech input device. 11. The method of claim 1 , wherein receiving further comprises receiving a speech-based input and a text-based input in the first plurality of inputs. 12. The method of claim 1 , wherein receiving further comprises receiving a speech-based input and input from a pointing device in the first plurality of inputs. 13. The method of claim 1 , wherein training the first language model further comprises: receiving a second plurality of inputs into a second copy of the target application, while the second copy of the target application is in the first state and executing on a different computing device than a computing device on which the first plurality of inputs is received; and modifying the first language model based on the second plurality of inputs. 14. The method of claim 1 , wherein training the first language model comprises: identifying a pattern of use of a user interface associated with the first plurality of inputs; and modifying the first language model based on the pattern of use. 15. The method of claim 1 , further comprising modifying a resource accessed by the language model, based on the first plurality of inputs. 16. The method of claim 1 , further comprising configuring a parameter governing whether to interpret an utterance as a grammar, based on the first plurality of inputs. 17. The method of claim 1 , further comprising configuring a parameter governing whether to interpret an utterance as text, based on the first plurality of inputs. 18. The method of claim 1 , further comprising: identifying, in the plurality of inputs, a plurality of input values; identifying a frequency with which each of the plurality of input values occurs in the plurality of inputs; and training the first language model based on the identified frequency. 19. The method of claim 1 , further comprising: identifying, for one of the plurality of inputs, an input value; determining that the input value is an instance of a concept; identifying, in the plurality of inputs, a number of instances of the concept; identifying a frequency with which the concept occurs in the plurality of inputs; and training the first language model based on the identified frequency. 20. The method of claim 1 , wherein training the first language model further comprises associating a probability with a word in the language model. 21. The method of claim 1 , further comprising modifying the language model, based on a type of a user interface element provided by the target application and receiving the first plurality of inputs. 22. The method of claim 1 , further comprising: identifying, by an operating system hooking component, a user interface element displayed in a foreground of a graphical user interface; and associating the first language model with the identified user interface element. 23. The method of claim 1 , further comprising: identifying, by an operating system hooking component, a state of a user interface element displayed in a foreground of a graphical user interface; and associating the first language model with the identified user interface element. 24. The method of claim 1 , further comprising: identifying, by an operating system hooking component, a user interface element into which one of the first plurality of inputs is provided; and associating the first language model with the identified user interface element. 25. The method of claim 1 , further comprising: identifying, by an operating system hooking component, a target application associated with a user interface element into which one of the first plurality of inputs is provided; and associating the first language model with the identifie

Assignees

Inventors

Classifications

  • Converting codes to words; Guess-ahead of partial word inputs · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • using icons (graphical or visual programming using iconic symbols G06F8/34) · CPC title

  • updating or merging of old and new templates; Mean values; Weighting · CPC title

  • G10L15/063Primary

    Training · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9489375B2 cover?
Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided. For each state, a corresponding language model is trained based on the input(s) provided to the application while the application was in that state. When the application is next observed to be in a previously-observed…
Who is the assignee on this patent?
Koll Detlef, Finke Michael, Mmodal Ip Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).