Translation and speech recognition method, apparatus, and device

US11735184B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11735184-B2
Application numberUS-202016937349-A
CountryUS
Kind codeB2
Filing dateJul 23, 2020
Priority dateJul 24, 2019
Publication dateAug 22, 2023
Grant dateAug 22, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech recognition method including performing speech recognition on an inputted speech to obtain a first text, correcting the first text according to an obtained mapping relationship between words in different languages to obtain at least one second text, and in response to determining that the at least one second text corresponds to the same language, outputting the first text, or in response to determining that the at least one second text corresponds to different languages, determine an outputted text according to first probability values corresponding to each of the at least one second text. By combining the mapping relationships between words in different languages in correcting the initial ASR result, the present application ensures the accuracy of the final speech recognition result.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: one or more processors; and one or more memories storing thereon computer-readable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising: initializing parameters of a machine translation model according to parameters of a language model; training the machine translation model using training samples to obtain a trained machine translation model; performing speech recognition on an inputted speech to obtain a first text; correcting, by inputting the first text into the trained machine translation model, the first text according to a mapping relationship between words in different languages to obtain at least one second text; obtaining respective first probability values predicted by the trained machine translation model corresponding to respective second texts of the at least one second text; and determining an output text at least according to the respective first probability values corresponding to the respective second texts of the at least one second text, a respective first probability value representing a probability that the first text is corrected to a respective second text in the at least one second text, the determining the output text at least according to the respective first probability values corresponding to the respective second texts including: inputting the at least one second text into the language model to determine respective second probability values corresponding to the respective second texts of the at least one second text using the language model, a respective second probability value representing a reasonableness of grammar and semantics of the respective second text; determining the output text according to the respective first probability values and the respective second probability values corresponding to the respective second texts; in response to determining that the first text is consistent with a particular second text having a largest summed probability value, outputting the first text, a respective summed probability value of the respective second text representing a weighted sum of the respective first probability value and the respective second probability value corresponding to the respective second text; and in response to determining that the first text is inconsistent with the particular second text having the largest summed probability value, outputting the particular second text having the largest summed probability value. 2. The apparatus according to claim 1 , wherein the training the machine translation model using the training samples to obtain the trained machine translation model comprises: acquiring a speech sample containing a plurality of languages; performing speech recognition on the speech sample to obtain a plurality of text candidates; forming a training sample from annotated texts corresponding to the plurality of text candidates and the speech sample; and training the machine translation model using the training sample to obtain the trained machine translation model. 3. The apparatus according to claim 2 , wherein the correcting the first text comprises: inputting the first text into the trained machine translation model; and correcting the first text using the trained machine translation model. 4. The apparatus according to claim 1 , wherein: the machine translation model is composed of an encoder and a decoder; and the encoder or the decoder includes any one of neural network models including: a recurrent neural network model, a long short-term memory network model, and a bidirectional long short-term memory network model. 5. The apparatus according to claim 2 , wherein the acts further comprise: acquiring corpus samples corresponding to each of the plurality of languages; and training the language model using the corpus samples corresponding to each of the plurality of languages. 6. The method according to claim 1 , further comprising predicting the respective first probability values using the trained machine translation model. 7. The apparatus according to claim 1 , wherein: the mapping relationship between words in different languages comprises a mapping relationship between words in different dialects of a same language; and the at least one second text corresponds to the same language refers to that the at least one second text corresponds to a same dialect of the same language. 8. The apparatus according to claim 1 , wherein the acts further comprise: in response to determining that the at least one second text includes a word not corresponding to the first language, determining a target second text according to the respective first probability values corresponding to each of the at least one second text; and translating the target second text into a second language. 9. A method comprising: initializing parameters of a machine translation model according to parameters of a language model; training the machine translation model using training samples to obtain a trained machine translation model; performing speech recognition on an inputted speech to obtain a first text; correcting, by inputting the first text into the trained machine translation model, the first text according to a mapping relationship between words in different languages to obtain at least one second text; obtaining respective first probability values predicted by the trained machine translation model corresponding to respective second texts of the at least one second text; and determining an output text at least according to the respective first probability values corresponding to the respective second texts of the at least one second text, a respective first probability value representing a probability that the first text is corrected to a respective second text in the at least one second text, the determining the output text at least according to the respective first probability values corresponding to the respective second texts including: inputting the at least one second text into the language model to determine respective second probability values corresponding to the respective second texts of the at least one second text using the language model, a respective second probability value representing a reasonableness of grammar and semantics of the respective second text; determining the output text according to the respective first probability values and the respective second probability values corresponding to the respective second texts; in response to determining that the first text is consistent with a particular second text having a largest summed probability value, outputting the first text, a respective summed probability value of the respective second text representing a weighted sum of the respective first probability value and the respective second probability value corresponding to the respective second text; and in response to determining that the first text is inconsistent with the particular second text having the largest summed probability value, outputting the particular second text having the largest summed probability value. 10. The method according to claim 9 , wherein the training the machine translation model using the training samples to obtain the trained machine translation model comprises: acquiring a speech sample containing a plurality of languages; performing speech recognition on the speech sample to obtain a plurality of text candidates; forming a training sample from annotated texts corresponding to the plurality of text candidates and the speech sample; and training the machine translation model using the training sample to obtain the trained machine translation mode

Assignees

Inventors

Classifications

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • using context dependencies, e.g. language models · CPC title

  • G10L15/005Primary

    Language recognition · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Data-driven translation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11735184B2 cover?
A speech recognition method including performing speech recognition on an inputted speech to obtain a first text, correcting the first text according to an obtained mapping relationship between words in different languages to obtain at least one second text, and in response to determining that the at least one second text corresponds to the same language, outputting the first text, or in respon…
Who is the assignee on this patent?
Alibaba Group Holding Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).