Techniques for updating an automatic speech recognition system using finite-state transducers
US-2017125012-A1 · May 4, 2017 · US
US9934777B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9934777-B1 |
| Application number | US-201615248211-A |
| Country | US |
| Kind code | B1 |
| Filing date | Aug 26, 2016 |
| Priority date | Jul 1, 2016 |
| Publication date | Apr 3, 2018 |
| Grant date | Apr 3, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
User-specific language models (LMs) that include internal word indexes to a word table specific to the user-specific LM rather than a word table specific to a system-wide LM. When the system-wide LM is updated, the word table of the user-specific LM may be updated to translate the user-specific indices to system-wide indices. This prevents having to update the internal indices of the user-specific LM every time the system-wide LM is updated.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for creating and using a user-specific language model, the method comprising, during a training period: identifying a first finite state transducer (FST) for automatic speech recognition (ASR), wherein the first FST corresponds to an ASR language model and is configured for use in processing audio data from a plurality of user devices; identifying a first table corresponding to the first FST, wherein the first table includes a first plurality of indexed entries, a first entry of the first plurality of indexed entries corresponding to a first word represented in the ASR language model; identifying music collection data associated with a first user profile, the music collection data including a plurality of word strings associated with a music collection corresponding to the first user profile, the plurality of word strings selected from among artist names, song titles and album titles; generating a second table corresponding to words in the plurality of word strings, wherein the second table includes: a first entry including a first word in the plurality of word strings and a first index value corresponding to a third entry in the first table corresponding to the first word, and a second entry including a second word in the plurality of word strings and a second index value corresponding to a fourth entry in the first table corresponding to the second word; generating a second FST corresponding to the music collection data, wherein the second FST includes a third index value corresponding to the first entry and a fourth index value corresponding to the second entry; and storing the second table and the second FST. 2. The computer-implemented method of claim 1 , further comprising, during a runtime period: receiving audio data associated with the first user profile; generating a modified second FST by substituting the first index value for the third index value and the second index value for the fourth index value; writing the first FST and the modified second FST into memory; performing ASR using the first FST and the modified second FST; and determining ASR output including the first word and the second word. 3. The computer-implemented method of claim 1 , further comprising, during the training period: determining that a third word in the music collection data is not represented in the first table; performing grapheme-to-phoneme processing to determine a third FST representing an estimated pronunciation of the third word; and storing an association between the third FST and the second FST, wherein creating the second table further comprises creating a third entry including a reference to the third FST. 4. The computer-implemented method of claim 3 , further comprising, during a runtime period: receiving audio data associated with the first user profile; identifying a third table associated with an updated ASR language model FST to be used during the runtime period; identifying a fifth entry in the third table corresponding to the third word; generating a modified second table including an updated third entry including a fifth index value to the fifth entry; generating a modified second FST by substituting the first index value for the third index value, the second index value for the fourth index value, and the fifth index value for the reference; and performing ASR using the updated ASR language model FST and the modified second FST. 5. A computer-implemented method, comprising: identifying a first language model configured for speech processing corresponding to multiple devices; identifying a first table representing words corresponding to the first language model; identifying a plurality of word strings associated with a first user profile; creating a second language model configured for speech processing corresponding to the plurality of word strings, the second language model including a plurality of references to a second table; generating a second table representing words of the plurality of word strings, the second table including at least: a first entry including a first word in the plurality of word strings and a first index value corresponding to a third entry in the first table, the third entry corresponding to the first word, and a second entry including a second word in the plurality of word strings and a second index value corresponding to a fourth entry in the first table, the fourth entry corresponding to the second word; generating a second language model configured for speech processing corresponding to the first user profile, the second language model including a third index value corresponding to the first entry and a fourth index value corresponding to the second entry; and storing the second table and the second language model as associated with the first user profile. 6. The computer-implemented method of claim 5 , further comprising: determining that at least a portion of the first table has changed resulting in an updated first table including a fifth entry corresponding to the first word and a sixth entry corresponding to the second word; generating an updated second table, the updated second table including at least: an updated first entry including the first word and a fifth index value corresponding to the fifth entry, and an updated second entry including the second word and a second index value corresponding to the sixth entry, wherein, after creating the updated second table, the third index value points to the updated first entry and the fourth index value points to the updated second entry. 7. The computer-implemented method of claim 5 , further comprising: identifying a second plurality of word strings associated with a second user profile; generating a third language model configured for speech processing corresponding to the second plurality of word strings, the third language model including a second plurality of references to a third table; generating a third table representing words of the second plurality of word strings, the third table including at least: a fifth entry including the first word and the first index value, and a sixth entry including a third word in the second plurality of word strings and a fifth index value corresponding to a fifth entry in the first table corresponding to the third word; generating a third language model configured for speech processing corresponding to the second user profile, the third language model including a sixth index value corresponding to the fifth entry and a seventh index value corresponding to the sixth entry; and storing the third table and the third language model as associated with the second user profile. 8. The computer-implemented method of claim 5 , further comprising: receiving audio data associated with the first user profile; generating a modified second language model by substituting the first index value for the third index value and the second index value for the fourth index value; writing the first language model and the modified second language model into memory; performing speech processing using the first language model and the modified second language model; and determining speech processing output including the first word and the second word. 9. The computer-implemented method of claim 5 , further comprising: determining that a third word in the plurality of word strings is not represented in the first table; and performing grapheme-to-phoneme processing to determine pronunciation data representing an estimated pronunciation of the third word, wherein creating the second table further comprises creating a third entry including a reference to the pronunciation data. 10. The c
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Speech classification or search · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Phonemes, fenemes or fenones being the recognition units · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.