Cross-language speech recognition and translation

US10229674B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10229674-B2
Application numberUS-201514714046-A
CountryUS
Kind codeB2
Filing dateMay 15, 2015
Priority dateMay 15, 2015
Publication dateMar 12, 2019
Grant dateMar 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technologies are described herein for cross-language speech recognition and translation. An example method of speech recognition and translation includes receiving an input utterance in a first language, the input utterance having at least one name of a named entity included therein and being pronounced in a second language, utilizing a customized language model to process at least a portion of the input utterance, and identifying the at least one name of the named entity from the input utterance utilizing a phonetic representation of the at least one name of the named entity. The phonetic representation has a pronunciation of the at least one name in the second language.

First claim

Opening claim text (preview).

What is claimed is: 1. A device for speech recognition comprising at least one processor execute instructions, wherein the instructions configure the at least one processor to: receive an input utterance in a first language, the input utterance having at least one name of a named entity including a pronunciation in a second language; identify the at least one name using a named entity source; utilize a customized language model, stored in a database, to process a first portion of the input utterance including the at least one name in response to the at least one name being identified using the named entity source; utilize a generic language model corresponding to the first language to process a second portion of the input utterance; identify, by accessing a cross-language lexicon, that the at least one name of the named entity from the input utterance is pronounced in the second language; map letters from the at least one name to a phonetic representation using the pronunciation in the second language and a set of language rules for the first language; store, in the database, the phonetic representation of the at least one name of the named entity to be used in the customized language model, the phonetic representation using phonemes of the first language to represent the pronunciation of the at least one name in the second language, wherein the phonemes of the first language used to represent the pronunciation of the at least one name in the second language differ from phonemes of the first language used to represent a pronunciation of the at least one name in the first language; and output the phonetic representation to a communication application for displaying or transmitting text of the input utterance or a translation of the input utterance. 2. The device of claim 1 , wherein the at least one processor is further configured to: create an output utterance based on the input utterance, the output utterance comprising one or more of: a phonetic representation of the at least one name of the named entity the second language; or a phonetic representation of the at least one name of the named entity in the first language. 3. The device of claim 1 , wherein the customized language model comprises a context-free language model or an n-gram language model. 4. The device of claim 1 , wherein the at least one processor is further configured to: retrieve the phonetic representation from a lexicon of phonetic pronunciations of names for named entities, the lexicon including a plurality of pronunciations in both the first language and the second language for the same names of named entities. 5. The device of claim 1 , wherein the at least one processor is further configured to output an output utterance comprising the at least one name of the named entity to a communication application in operative communication with a remote computer. 6. The device of claim 1 , wherein the at least one processor is further configured to generate the customized language model based on names in a contact list of the device. 7. A method of speech recognition and translation for processing utterances in both a first language and a second language, the method comprising performing computer-implemented operations at a computing network including: categorizing names of named entities associated with a first user, the names being in the first language; constructing a lexicon of phonetic pronunciations of the names for the named entities, the lexicon including a mapping of letters from the names to respective phonetic representations of the names for the named entities pronounced in the second language using phonemes of the first language; constructing a customized language model for each type of named entity of the named entities to be used to process the names; storing the customized language model for each type of named entity of the named entities in a database with the phonetic representations of the names; and processing an utterance received from the first user in the first language including a name of a named entity including a pronunciation in the second language using a customized language model corresponding to the named entity in response to the name being identified using the lexicon and using on a set of language rules for the first language, wherein phonemes of the first language used to represent a pronunciation of the name of the named entity in the second language differ from phonemes of the first language used to represent a pronunciation of the name of the named entity in the first language; and outputting the phonetic representation corresponding to phonemes of the first language used to represent the name of the named entity in the second language to a communication application for displaying or transmitting text of the utterance or a translation of the utterance. 8. The method of claim 7 , further comprising: collecting the names of the named entities from one or more sources of named entities; the one or more sources of named entities being associated with the first user. 9. The method of claim 8 , wherein the one or more sources of named entities comprises at least one of: a contact list associated with the first user; location information associated with the first user; conversation data associated with the first user; or social media data associated with the first user. 10. The method of claim 8 , wherein the utterances received from the first user are created in a communication application, and wherein the one or more sources of named entities are retrieved from the communication application. 11. The method of claim 7 , wherein categorizing the named entities comprises categorizing named entities as a name of a person or a name of a geographic location. 12. The method of claim 11 , wherein categorizing the named entities further comprises categorizing named entities as out of vocabulary (OOV) entities. 13. The method of claim 7 , wherein constructing the lexicon of phonetic pronunciations comprises: mapping letters of a second name of a second named entity using a set of language rules for the first language; converting the mapped letters of the second name to a standard phonetic representation; converting the standard phonetic representation to a phonetic representation of pronunciation in the second language; and adding the phonetic representation of the pronunciation to the lexicon of phonetic pronunciations. 14. The method of claim 7 , further comprising: categorizing new names of named entities associated with a second user, the new names being in the second language; and constructing a lexicon of phonetic pronunciations for the named entities, the lexicon including respective phonetic representations of the names for the named entities pronounced in the first language using phonemes of the second language. 15. The method of claim 14 , further comprising: constructing the customized language model for at least one type of named entity of the new names of named entities. 16. The method of claim 15 , further comprising: translating utterances received from the second user in the second language to new output utterances in the first language, the new output utterances comprising at least one phonetic pronunciation of a new name of the named entities in the first language. 17. A speech recognition and translation system configured to translate a first utterance in a first language into a second utterance in a second language, the system comprising at least one computer including at least one processor to execute instructions, wherein the in

Assignees

Inventors

Classifications

  • Named entity recognition · CPC title

  • Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules · CPC title

  • using lexical or orthographic knowledge sources · CPC title

  • G10L15/187Primary

    Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice (G10L15/14 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10229674B2 cover?
Technologies are described herein for cross-language speech recognition and translation. An example method of speech recognition and translation includes receiving an input utterance in a first language, the input utterance having at least one name of a named entity included therein and being pronounced in a second language, utilizing a customized language model to process at least a portion of…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/187. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).