Automatic interpretation method and apparatus
US-10867136-B2 · Dec 15, 2020 · US
US11113481B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11113481-B2 |
| Application number | US-201916621578-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 2, 2019 |
| Priority date | May 2, 2019 |
| Publication date | Sep 7, 2021 |
| Grant date | Sep 7, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques described herein may serve to increase the language coverage of an automated assistant system, i.e. they may serve to increase the number of queries in one or more non-native languages for which the automated assistant is able to deliver reasonable responses. For example, techniques are described herein for training and utilizing a machine translation model to map a plurality of semantically-related natural language inputs in one language to one or more canonical translations in another language. In various implementations, the canonical translations may be selected and/or optimized for determining an intent of the speaker by the automated assistant, so that one or more responsive actions can be performed based on the speaker's intent. Put another way, the canonical translations may be specifically formatted for indicating the intent of the speaker to the automated assistant.
Opening claim text (preview).
What is claimed is: 1. A method implemented using one or more processers, comprising: capturing a spoken utterance at a microphone to generate audio data, wherein the spoken utterance is spoken in a first language; performing speech recognition processing on the audio data to generate speech-recognition output; applying the speech-recognition output as input across a trained encoder-decoder machine translation model to generate output, wherein the output comprises a canonical second language translation of the speech-recognition output from the first language; determining an intent conveyed by the spoken utterance based on the canonical second language translation; and performing one or more responsive actions based on the intent; wherein the encoder-decoder machine translation model was trained previously to map a plurality of syntactically-distinct but semantically-similar phrases in the first language to the same canonical second language translation, wherein the encoder-decoder machine translation model was trained previously by: processing the plurality of phrases in the first language based on the encoder-decoder machine translation model to generate a plurality of syntactically-distinct but semantically-similar second language translations; and comparing the plurality of syntactically-distinct but semantically-similar second language translations to the canonical second language translation to generate corresponding errors, whereby the encoder-decoder machine translation model was trained previously based on the corresponding errors. 2. The method of claim 1 , wherein a decoder portion of the encoder-decoder machine translation model is trained to map one or more semantic embeddings representing the plurality of syntactically-different but semantically-related phrases in the first language to the same canonical translation. 3. The method of claim 1 , wherein an encoder portion of the encoder-decoder machine translation model is trained to map the syntactically-different but plurality of semantically-related phrases in the first language to a lower number of semantic embeddings. 4. The method of claim 1 , further comprising: capturing another spoken utterance at the same microphone or a different microphone to generate additional audio data, wherein the another spoken utterance is spoken in a third language; performing speech recognition processing on the additional audio data to generate additional speech-recognition output; applying the additional speech-recognition output as input across the trained machine translation model to generate additional output, wherein the additional output comprises the canonical second language translation of the speech-recognized text; determining an additional intent conveyed by the another spoken utterance based on the canonical second language translation; and performing one or more additional responsive actions based on the additional intent; wherein the encoder-decoder machine translation model is trained to map a plurality of semantically-related phrases in the third language to the same canonical second language translation, wherein the canonical second language translation varies syntactically from at least some of the plurality of semantically-related phrases in the third language. 5. The method of claim 1 , wherein the trained encoder-decoder machine translation model includes a word piece vocabulary that is shared across multiple languages. 6. The method of claim 1 , wherein determining the intent comprises performing natural language processing on the canonical second language translation to determine the intent. 7. A method of training encoder-decoder machine translation model to map a plurality of syntactically-distinct but semantically-similar phrases in a first language to a single canonical second language translation, comprising: processing the plurality of syntactically-distinct but semantically-similar phrases in the first language based on the encoder-decoder machine translation model to generate a plurality of syntactically-distinct but semantically-similar second language translations; comparing the plurality of syntactically-distinct but semantically-similar second language translations to the single canonical second language translation to generate corresponding errors; and training the encoder-decoder machine translation model based on the corresponding errors. 8. The method of claim 7 , wherein the training comprises training a decoder portion of the encoder-decoder machine translation model to map one or more semantic embeddings representing the plurality of syntactically-distinct but semantically-related phrases in the first language to the single canonical second language translation. 9. The method of claim 7 , wherein the training comprises training an encoder portion of the encoder-decoder machine translation model to map the plurality of syntactically-distinct but semantically-related phrases in the first language to a lower number of semantic embeddings. 10. A system comprising one or more processors and memory storing instructions that, in response to execution of the instructions by the one or more processors, cause the one or more processors to: capture a spoken utterance at a microphone to generate audio data, wherein the spoken utterance is spoken in a first language; perform speech recognition processing on the audio data to generate speech-recognition output; apply the speech-recognition output as input across a trained encoder-decoder machine translation model to generate output, wherein the output comprises a canonical second language translation of the speech-recognition output from the first language; determine an intent conveyed by the spoken utterance based on the canonical second language translation; and perform one or more responsive actions based on the intent; wherein the encoder-decoder machine translation model was trained previously to map a plurality of syntactically-distinct but semantically-similar phrases in the first language to the same canonical second language translation, wherein the encoder-decoder machine translation model was trained previously by: processing the plurality of phrases in the first language based on the encoder-decoder machine translation model to generate a plurality of syntactically-distinct but semantically-similar second language translations; and comparing the plurality of syntactically-distinct but semantically-similar second language translations to the canonical second language translation to generate corresponding errors, whereby the encoder-decoder machine translation model was trained previously based on the corresponding errors. 11. The system of claim 10 , wherein a decoder portion of the encoder-decoder machine translation model is trained to map one or more semantic embeddings representing the plurality of syntactically-different but semantically-related phrases in the first language to the same canonical second language translation. 12. The system of claim 10 , wherein an encoder portion of the encoder-decoder machine translation model is trained to map the plurality of syntactically-different but semantically-related phrases in the first language to a lower number of semantic embeddings. 13. The system of claim 10 , wherein the intent is determined based on a mapping between the canonical second language translation and the intent.
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Statistical methods, e.g. probability models · CPC title
Discourse or dialogue representation · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Translation evaluation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.