Server side hotwording
US-2024412734-A1 · Dec 12, 2024 · US
US9076445B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9076445-B1 |
| Application number | US-201213705228-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 5, 2012 |
| Priority date | Dec 30, 2010 |
| Publication date | Jul 7, 2015 |
| Grant date | Jul 7, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for adjusting language models. In one aspect, a method includes accessing audio data. Information that indicates a first context is accessed, the first context being associated with the audio data. At least one term is accessed. Information that indicates a second context is accessed, the second context being associated with the term. A similarity score is determined that indicates a degree of similarity between the second context and the first context. A language model is adjusted based on the accessed term and the determined similarity score to generate an adjusted language model. Speech recognition is performed on the audio data using the adjusted language model to select one or more candidate transcriptions for a portion of the audio data.
Opening claim text (preview).
What is claimed is: 1. A method comprising: obtaining audio data; accessing first context information associated with the audio data, wherein the first context information indicates (i) a first geographical location, and (ii) a first time; accessing second context information associated with one or more previously typed or previously transcribed terms, wherein the second context information indicates (i) a second geographical location and (ii) a second time; determining a similarity score for the first context information and the second context information based on (i) a degree of a similarity of the second geographical location to the first geographical location and (ii) a degree of a similarity of the second time to the first time; adjusting a language model based on the similarity score to adjust a likelihood that the language model indicates the one or more previously typed or previously transcribed terms as a candidate transcription of the audio data; determining a transcription of the audio data using the adjusted language model; and outputting the transcription that was determined using the adjusted language model. 2. The method of claim 1 , wherein obtaining the audio data comprises receiving the audio data over a network from client device; and wherein outputting the transcription determined using the adjusted language model comprises providing the transcription to the client device over the network. 3. The method of claim 1 , wherein accessing the first context information comprises accessing information that indicates a geographical location where the audio data was recorded and a time when the audio data was recorded. 4. The method of claim 1 , wherein accessing the second context information comprises accessing second context information that is associated with one or more terms previously transcribed for other audio, the second context information indicating (i) a particular geographical location where the other audio was input, and (ii) a time when the other audio was input at the particular geographical location. 5. The method of claim 1 , wherein obtaining the audio data comprises obtaining audio data for an utterance of a user; wherein accessing the first context information comprises accessing information that indicates a geographical location of a device when the audio data was recorded by the device and a time when the audio data was recorded by the device; and wherein accessing the second context information comprises accessing second context information associated with one or more previously transcribed terms that were previously transcribed from previously received audio data for a previous utterance of the user, the second context information indicating a geographical location of the device when the previous utterance of the user was input to the device and a time when the previous utterance of the user was input to the device. 6. The method of claim 1 , wherein the first time indicates a first day of week when the audio data was recorded and the second time indicates a second day of week when the one or more previously typed or previously transcribed terms were input; and wherein determining the similarity score comprises determining the similarity score based on a similarity of the second day of week to the first day of week. 7. The method of claim 1 , wherein the first time indicates a first time of day when the audio data was recorded and the second time indicates a second time of day when the one or more previously typed or previously transcribed terms were input; and wherein determining the similarity score comprises determining the similarity score based on a similarity of the second time of day to the first time of day. 8. The method of claim 1 , wherein determining the similarity score comprises determining the similarity score based on a distance between the second geographical location and the first geographical location. 9. The method of claim 1 , wherein accessing the first context information comprises accessing information that indicates a geographical location indicated by a Global Positioning System (GPS) receiver of a device that receives the audio data. 10. The method of claim 1 , wherein adjusting the language model based on the similarity score comprises changing one or more weighting values in the language model that correspond to the one or more previously typed or previously transcribed terms. 11. The method of claim 10 , wherein changing the one or more weighting values comprises changing the one or more weighting values such that a magnitude of the change in the one or more weighting values is based on the similarity score. 12. The method of claim 1 , wherein adjusting the language model based on the similarity score comprises increasing the likelihood by an amount that is based on the similarity score. 13. A system comprising: one or more processors; and a non-transitory computer-readable medium coupled to the one or more processors having instructions stored thereon which, when executed by the one or more processors, cause the system to perform operations comprising: obtaining audio data; accessing first context information associated with the audio data, wherein the first context information indicates (i) a first geographical location, and (ii) a first time; accessing second context information associated with one or more previously typed or previously transcribed terms, wherein the second context information indicates (i) a second geographical location and (ii) a second time; determining a similarity score for the first context information and the second context information based on (i) a degree of a similarity of the second geographical location to the first geographical location and (ii) a degree of a similarity of the second time to the first time; adjusting a language model based on the similarity score to adjust a likelihood that the language model indicates the one or more previously typed or previously transcribed terms as a candidate transcription of the audio data; determining a transcription of the audio data using the adjusted language model; and outputting the transcription that was determined using the adjusted language model. 14. The system of claim 13 , wherein the first time indicates a first day of week when the audio data was recorded and the second time indicates a second day of week when the one or more previously typed or previously transcribed terms were input; and wherein determining the similarity score comprises determining the similarity score based on a similarity of the second day of week to the first day of week. 15. The system of claim 13 , wherein the first time indicates a first time of day when the audio data was recorded and the second time indicates a second time of day when the one or more previously typed or previously transcribed terms were input; and wherein determining the similarity score comprises determining the similarity score based on a similarity of the second time of day to the first time of day. 16. The system of claim 13 , wherein determining the similarity score comprises determining the similarity score based on a distance between the second geographical location and the first geographical location. 17. A non-transitory computer storage medium storing a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: obtaining audio data; accessing first context information associated with the audio data, wherein the first context information that
using context dependencies, e.g. language models · CPC title
using natural language modelling · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
for comparison or discrimination · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.