Speech transcription including written text

US9594744B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9594744-B2
Application numberUS-201313829482-A
CountryUS
Kind codeB2
Filing dateMar 14, 2013
Priority dateNov 28, 2012
Publication dateMar 14, 2017
Grant dateMar 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transcribing utterances into written text are disclosed. The methods, systems, and apparatus include actions of obtaining a lexicon model that maps phones to spoken text and obtaining a language model that assigns probabilities to written text. Further includes generating a transducer that maps the written text to the spoken text, the transducer mapping multiple items of the written text to an item of the spoken text. Additionally, the actions include constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: obtaining a lexicon model that maps phones to spoken text, wherein spoken text includes only words that are spelled out; obtaining a language model that assigns probabilities to written text, wherein written text includes (i) words that are spelled out and (ii) sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols; obtaining grammar rules for expanding written text into spoken text; generating, using the grammar rules, expansions of written text that includes sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols into spoken text that includes only words that are spelled out, the expansions including mappings of multiple different items of written text to a particular item of the spoken text; generating a transducer that includes the expansions, where the transducer maps the written text to the spoken text, the transducer mapping the multiple different items of written text to the particular item of the spoken text; constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model; and generating, by an automated speech recognizer, a transcription of a subsequently received utterance into written text using the decoding network. 2. The method of claim 1 , further comprising: obtaining an utterance of a user; and transcribing the utterance to an item of the written text using the decoding network. 3. The method of claim 1 , wherein the spoken text comprises a textual representation of a spoken language. 4. The method of claim 1 , wherein the written text comprises a textual representation of a written language. 5. The method of claim 1 , wherein obtaining the grammar rules comprises: obtaining the grammar rules from a text-to-speech system. 6. The method of claim 1 , wherein obtaining the language model comprises obtaining spoken text of the language model; generating written text from the spoken text of the language model using the inverse of the transducer; and adding the generated written text to the language model. 7. The method of claim 1 , where obtaining the lexicon model comprises obtaining a context dependency model mapping context dependent phones to context independent phones; obtaining a pronunciation lexicon mapping sequences of the context independent phones to the spoken text; and constructing the lexicon model by composing the context dependency model and the pronunciation lexicon. 8. The method of claim 1 , wherein words that are spelled out comprise lexical items. 9. The method of claim 1 , wherein the sequences of characters that represent words, but that are not, spelled out words comprise non-lexical items. 10. The method of claim 1 , wherein the transducer maps the multiple different items of written text that includes (i) words that are spelled out and (ii) sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols into the particular item of the spoken text that includes only words that are spelled out. 11. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining a lexicon model that maps phones to spoken text, wherein spoken text includes only words that are spelled out; obtaining a language model that assigns probabilities to written text, wherein written text includes (i) words that are spelled out and (ii) sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols; obtaining grammar rules for expanding written text into spoken text; generating, using the grammar rules, expansions of written text that includes sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols into spoken text that includes only words that are spelled out, the expansions including mappings of multiple different items of written text to a particular item of the spoken text; generating a transducer that includes the expansions, where the transducer maps the written text to the spoken text, the transducer mapping the multiple different items of written text to the particular item of the spoken text; constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model; and generating, by an automated speech recognizer, a transcription of a subsequently received utterance into written text using the decoding network. 12. The system of claim 11 , wherein the operations further comprise: obtaining an utterance of a user; and transcribing the utterance to an item of the written text using the decoding network. 13. The system of claim 11 , wherein the spoken text comprises a textual representation of a spoken language. 14. The system of claim 11 , wherein the written text comprises a textual representation of a written language. 15. The system of claim 11 , wherein obtaining the grammar rules comprises: obtaining the grammar rules from a text-to-speech system. 16. The system of claim 11 , wherein obtaining the language model comprises obtaining spoken text of the language model; generating written text from the spoken text of the language model using the inverse of the transducer; and adding the generated written text to the language model. 17. The system of claim 11 , where obtaining the lexicon model comprises obtaining a context dependency model mapping context dependent phones to context independent phones; obtaining a pronunciation lexicon mapping sequences of the context independent phones to the spoken text; and constructing the lexicon model by composing the context dependency model and the pronunciation lexicon. 18. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: obtaining a lexicon model that maps phones to spoken text, wherein spoken text includes only words that are spelled out; obtaining a language model that assigns probabilities to written text, wherein written text includes (i) words that are spelled out and (ii) sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols; obtaining grammar rules for expanding written text into spoken text; generating, using the grammar rules, expansions of written text that includes sequences of characters that include abbreviations that are not spelled out words, acronyms that are not spelled out words, and numeric symbols into spoken text that includes only words that are spelled out, the expansions including mappings of multiple different items of written text to a particular item of the spoken text; generating a transducer that includes the expansions, where the transducer maps the written text to the spoken text, the transducer mapping the multiple different items of written text to the particular item of the spoken text; constructing a decoding n

Assignees

Inventors

Classifications

  • Probabilistic grammars, e.g. word n-grams · CPC title

  • Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • G06F40/289Primary

    Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • G10L15/083Primary

    Recognition networks (G10L15/142, G10L15/16 take precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9594744B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transcribing utterances into written text are disclosed. The methods, systems, and apparatus include actions of obtaining a lexicon model that maps phones to spoken text and obtaining a language model that assigns probabilities to written text. Further includes generating a transducer that map…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/289. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).