Speech recognition using an operating system hooking component for context-aware recognition models

US10325589B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10325589-B2
Application numberUS-201815981263-A
CountryUS
Kind codeB2
Filing dateMay 16, 2018
Priority dateJun 19, 2011
Publication dateJun 18, 2019
Grant dateJun 18, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided. For each state, a corresponding language model is trained based on the input(s) provided to the application while the application was in that state. When the application is next observed to be in a previously-observed state, a language model associated with the application's current state is applied to recognize speech input provided by a user and thereby to generate speech recognition output that is provided to the application. An application's state at a particular time may include the user interface element(s) that are displayed and/or in focus at that time, and is determined by an operating system hooking component embedded in the automatic speech recognition system.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method performed by at least one computer processor executing computer program instructions tangibly stored on at least one non-transitory computer-readable medium, the method comprising using the at least one computer processor to perform operations of: receiving, by an automatic speech recognition system executed by the at least one computer processor, a first plurality of inputs into an application while the application is in a first state; analyzing, by the automatic speech recognition system, a frequency with which one or more actions occur on the application, wherein analyzing the frequency with which the one or more actions occur on the application comprises: identifying, in the first plurality of inputs, a plurality of input values, and analyzing a frequency with which each of the identified plurality of input values occurs in the first plurality of inputs; training, by the automatic speech recognition system, a first language model based on the first plurality of inputs and the analyzed frequency with which the one or more actions occur on the application; determining, by the automatic speech recognition system, that the application is in the first state; and applying, by the automatic speech recognition system, the trained first language model to a first speech input in response to determining that the application is in the first state; wherein the identification is performed by an operating system hooking component included in the speech recognition system by intercepting messages between a user interface and the computer processor's operating system. 2. The method of claim 1 , wherein analyzing the frequency with which the one or more actions occur on the application is independent of an input used to initiate the one or more actions. 3. The method of claim 1 , wherein training the first language model further comprises associating a probability with a word in the first language model based on preceding words of the word. 4. The method of claim 1 , wherein applying further comprises applying the trained first language model to the first speech input if a number of the first plurality of inputs associated with the trained first language model exceeds a predefined threshold. 5. The method of claim 1 , wherein the one or more actions are opening or closing a file, executing a command, and modifying a record. 6. The method of claim 1 , wherein training the first language model further comprises: receiving a second plurality of inputs into a second copy of the application, while the second copy of the application is in the first state and executing on a different computing device than a computing device on which the first plurality of inputs is received; and modifying the first language model based on the received second plurality of inputs. 7. The method of claim 1 , wherein training the first language model further comprises: identifying a pattern of use of a user interface associated with the first plurality of inputs; and modifying the first language model based on the identified pattern of use. 8. The method of claim 1 , wherein training the first language model further comprises associating a probability of occurrence of a word with the word in the first language model. 9. The method of claim 1 , wherein determining that the application is in the first state further comprises: analyzing application data to determine that the application is in the first state; comparing the determined first state of the application to a state associated with the trained first language model; and determining that the determined first state of the application and the state associated with the trained first language model are substantially same states. 10. The method of claim 1 , wherein determining that the application is in the first state further comprises: comparing application data of the application to application data associated with the trained first language model; and determining that the application data of the application and the application data associated with the trained first language model are substantially same data. 11. The method of claim 1 , wherein applying further comprises applying the trained first language model to the first speech input after achieving a degree of confidence in a level of accuracy of the trained first language model. 12. An automated speech recognition system comprising: means for receiving a first plurality of inputs into an application while the application is in a first state; means for analyzing a frequency with which one or more actions occur on the application, wherein the means for analyzing the frequency with which the one or more actions occur on the application identifies, in the first plurality of inputs, a plurality of input values, and further analyzes a frequency with which each of the identified plurality of input values occurs in the first plurality of inputs; means for training a first language model based on the first plurality of inputs and the analyzed frequency with which the one or more actions occur on the application; means for determining that the application is in the first state; and means for applying the trained first language model to a first speech input in response to determining that the application is in the first state; wherein the identification is performed by an operating system hooking component included in the speech recognition system by intercepting messages between a user interface and the system's operating system. 13. The automated speech recognition system of claim 12 , further comprising a means for receiving the first plurality of inputs from at least one of a text-based input device, a pointing device, and a speech input device. 14. The automated speech recognition system of claim 12 , further comprising means for providing, to the application, a result of applying the trained first language model to the first speech input. 15. The automated speech recognition system of claim 12 , further comprising means for modifying a resource accessed by the trained first language model, based on the first plurality of inputs. 16. A non-transitory computer readable medium storing computer program instructions which, when executed by at least one computer processor, cause the at least one computer processor to: receive, by an automatic speech recognition system executed by the at least one computer processor, a first plurality of inputs into an application while the application is in a first state; analyze, by the automatic speech recognition system, a frequency with which one or more actions occur on the application, wherein the computer program instructions, when executed by the at least one computer processor, further cause the at least one computer processor to: identify, in the first plurality of inputs, a plurality of input values, and analyze a frequency with which each of the identified plurality of input values occurs in the first plurality of inputs; train, by the automatic speech recognition system, a first language model based on the first plurality of inputs and the analyzed frequency with which the one or more actions occur on the application; determine, by the automatic speech recognition system, that the application is in the first state; and apply, by the automatic speech recognition system, the trained first language model to a first speech input in response to determining that the application is in the first state; wherein the identification is performed by an operating system hooking component included in the speech recognition

Assignees

Inventors

Classifications

  • Converting codes to words; Guess-ahead of partial word inputs · CPC title

  • using context dependencies, e.g. language models · CPC title

  • Multi-language systems; Localisation; Internationalisation · CPC title

  • updating or merging of old and new templates; Mean values; Weighting · CPC title

  • Execution arrangements for user interfaces · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10325589B2 cover?
Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided. For each state, a corresponding language model is trained based on the input(s) provided to the application while the application was in that state. When the application is next observed to be in a previously-observed…
Who is the assignee on this patent?
Mmodal Ip Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 18 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).