Multi-feature balancing for natural language processors
US-2024419910-A1 · Dec 19, 2024 · US
US9594744B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9594744-B2 |
| Application number | US-201313829482-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 14, 2013 |
| Priority date | Nov 28, 2012 |
| Publication date | Mar 14, 2017 |
| Grant date | Mar 14, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transcribing utterances into written text are disclosed. The methods, systems, and apparatus include actions of obtaining a lexicon model that maps phones to spoken text and obtaining a language model that assigns probabilities to written text. Further includes generating a transducer that maps the written text to the spoken text, the transducer mapping multiple items of the written text to an item of the spoken text. Additionally, the actions include constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method comprising: obtaining a lexicon model that maps phones to spoken text, wherein spoken text includes only words that are spelled out; obtaining a language model that assigns probabilities to written text, wherein written text includes (i) words that are spelled out and (ii) sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols; obtaining grammar rules for expanding written text into spoken text; generating, using the grammar rules, expansions of written text that includes sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols into spoken text that includes only words that are spelled out, the expansions including mappings of multiple different items of written text to a particular item of the spoken text; generating a transducer that includes the expansions, where the transducer maps the written text to the spoken text, the transducer mapping the multiple different items of written text to the particular item of the spoken text; constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model; and generating, by an automated speech recognizer, a transcription of a subsequently received utterance into written text using the decoding network. 2. The method of claim 1 , further comprising: obtaining an utterance of a user; and transcribing the utterance to an item of the written text using the decoding network. 3. The method of claim 1 , wherein the spoken text comprises a textual representation of a spoken language. 4. The method of claim 1 , wherein the written text comprises a textual representation of a written language. 5. The method of claim 1 , wherein obtaining the grammar rules comprises: obtaining the grammar rules from a text-to-speech system. 6. The method of claim 1 , wherein obtaining the language model comprises obtaining spoken text of the language model; generating written text from the spoken text of the language model using the inverse of the transducer; and adding the generated written text to the language model. 7. The method of claim 1 , where obtaining the lexicon model comprises obtaining a context dependency model mapping context dependent phones to context independent phones; obtaining a pronunciation lexicon mapping sequences of the context independent phones to the spoken text; and constructing the lexicon model by composing the context dependency model and the pronunciation lexicon. 8. The method of claim 1 , wherein words that are spelled out comprise lexical items. 9. The method of claim 1 , wherein the sequences of characters that represent words, but that are not, spelled out words comprise non-lexical items. 10. The method of claim 1 , wherein the transducer maps the multiple different items of written text that includes (i) words that are spelled out and (ii) sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols into the particular item of the spoken text that includes only words that are spelled out. 11. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining a lexicon model that maps phones to spoken text, wherein spoken text includes only words that are spelled out; obtaining a language model that assigns probabilities to written text, wherein written text includes (i) words that are spelled out and (ii) sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols; obtaining grammar rules for expanding written text into spoken text; generating, using the grammar rules, expansions of written text that includes sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols into spoken text that includes only words that are spelled out, the expansions including mappings of multiple different items of written text to a particular item of the spoken text; generating a transducer that includes the expansions, where the transducer maps the written text to the spoken text, the transducer mapping the multiple different items of written text to the particular item of the spoken text; constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model; and generating, by an automated speech recognizer, a transcription of a subsequently received utterance into written text using the decoding network. 12. The system of claim 11 , wherein the operations further comprise: obtaining an utterance of a user; and transcribing the utterance to an item of the written text using the decoding network. 13. The system of claim 11 , wherein the spoken text comprises a textual representation of a spoken language. 14. The system of claim 11 , wherein the written text comprises a textual representation of a written language. 15. The system of claim 11 , wherein obtaining the grammar rules comprises: obtaining the grammar rules from a text-to-speech system. 16. The system of claim 11 , wherein obtaining the language model comprises obtaining spoken text of the language model; generating written text from the spoken text of the language model using the inverse of the transducer; and adding the generated written text to the language model. 17. The system of claim 11 , where obtaining the lexicon model comprises obtaining a context dependency model mapping context dependent phones to context independent phones; obtaining a pronunciation lexicon mapping sequences of the context independent phones to the spoken text; and constructing the lexicon model by composing the context dependency model and the pronunciation lexicon. 18. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: obtaining a lexicon model that maps phones to spoken text, wherein spoken text includes only words that are spelled out; obtaining a language model that assigns probabilities to written text, wherein written text includes (i) words that are spelled out and (ii) sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols; obtaining grammar rules for expanding written text into spoken text; generating, using the grammar rules, expansions of written text that includes sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols into spoken text that includes only words that are spelled out, the expansions including mappings of multiple different items of written text to a particular item of the spoken text; generating a transducer that includes the expansions, where the transducer maps the written text to the spoken text, the transducer mapping the multiple different items of written text to the particular item of the spoken text; constructing a decoding n
Probabilistic grammars, e.g. word n-grams · CPC title
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Phrasal analysis, e.g. finite state techniques or chunking · CPC title
Recognition networks (G10L15/142, G10L15/16 take precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.