Method and system for evaluating and improving live translation captioning systems

US11715475B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11715475-B2
Application numberUS-202117479349-A
CountryUS
Kind codeB2
Filing dateSep 20, 2021
Priority dateSep 20, 2021
Publication dateAug 1, 2023
Grant dateAug 1, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media for evaluating and improving live translation captioning systems. An exemplary method includes: displaying a word in a first language; receiving a first audio sequence, the first audio sequence comprising a verbal description of the word; generating a first translated text in a second language; displaying the first translated text; receiving a second audio sequence, the second audio sequence comprising a guessed word based on the first translated text; generating a second translated text in the first language; determining a matching score between the word and the second translated text; determining a performance score of the live translation captioning system based on the matching score.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for evaluating performance of a live translation captioning system, comprising: displaying a word in a first language on a first user interface; receiving a first audio sequence, the first audio sequence comprising a verbal description of the word in the first language; generating a first translated text in a second language by feeding the first audio sequence into a pipeline comprising an Automatic Speech Recognition (ASR) subsystem and a machine translation (MT) subsystem; displaying the first translated text on a second user interface; receiving a second audio sequence, the second audio sequence comprising a guessed word based on the first translated text; generating a second translated text in the first language by feeding the second audio sequence into the pipeline; determining a matching score between the word and the second translated text; determining a performance score of the live translation captioning system based on the matching score. 2. The method of claim 1 , wherein the generating the first translated text comprises: generating a first text sequence by feeding the first audio sequence into the ASR subsystem; and generating the first translated text in the second language by feeding the first text sequence into the MT subsystem corresponding to the second language. 3. The method of claim 2 , wherein the first audio sequence comprises a plurality of audio segments, and the ASR subsystem is configured to generate an output when each of the plurality of audio segments is fed in. 4. The method of claim 3 , wherein the feeding the first text sequence into the MT subsystem comprises: feeding every k-th output generated by the ASR subsystem into the MT subsystem, wherein k is a positive integer. 5. The method of claim 3 , wherein the feeding the first text sequence into the MT subsystem comprises: feeding the output generated by the ASR subsystem into the MT subsystem if t seconds have elapsed since a most recent output of the ASR subsystem was fed into the MT subsystem, wherein t is a positive integer. 6. The method of claim 1 , wherein the generating the second translated text comprises: generating a second text sequence by feeding the second audio sequence into the ASR subsystem; and generating the second translated text by feeding the second text sequence into the MT subsystem. 7. The method of claim 1 , wherein the ASR subsystem comprises a sequence-to-sequence ASR model trained based on a joint set of corpora from a plurality of languages. 8. The method of claim 1 , wherein the ASR subsystem comprises a plurality of ASR models respectively trained based on training samples from a plurality of languages. 9. The method of claim 1 , wherein the MT subsystem comprises a multilingual neural machine translation model trained based on a joint set of corpora from a plurality of languages. 10. The method of claim 1 , wherein the MT subsystem comprises a plurality of MT models respectively trained based on training samples from a plurality of languages. 11. The method of claim 1 , wherein the method further comprises selecting the word from a plurality of word candidates in a first language based on ambiguity scores of the plurality of word candidates, wherein the ambiguity scores of the plurality of word candidates are determined by: feeding each of the plurality of word candidates into an online dictionary to obtain returned entries; and determining an ambiguity score of the word based on a number of returned entries. 12. The method of claim 1 , wherein the receiving the first audio sequence comprises continuously receiving audio signals, and the generating the first translated text in the second language comprises streaming the continuous audio signals into the pipeline and obtaining a stream of translated phrases in the second language. 13. The method of claim 12 , wherein the displaying the first translated text on the second user interface comprises a live captioning of the stream of translated phrases. 14. The method of claim 1 , wherein the determining a performance score of the live translation captioning system based on the matching score comprises: in response to the matching score being greater than a threshold, increasing a performance score of the live translation captioning system, wherein the increase is inversely proportional to a time spent between generating the first translated text and generating the second translated text. 15. A system for evaluating performance of a live translation captioning system, the system comprising: one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to perform: displaying a word in a first language on a first user interface; receiving a first audio sequence, the first audio sequence comprising a verbal description of the word in the first language; generating a first translated text in a second language by feeding the first audio sequence into a pipeline comprising an Automatic Speech Recognition (ASR) subsystem and a machine translation (MT) subsystem; displaying the first translated text on a second user interface; receiving a second audio sequence, the second audio sequence comprising a guessed word based on the first translated text; generating a second translated text in the first language by feeding the second audio sequence into the pipeline; determining a matching score between the word and the second translated text; determining a performance score of the live translation captioning system based on the matching score.

Assignees

Inventors

Classifications

  • G10L15/32Primary

    Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

  • using very large corpora, e.g. the web · CPC title

  • Translation evaluation · CPC title

  • G06F40/58Primary

    Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

  • Training · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11715475B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media for evaluating and improving live translation captioning systems. An exemplary method includes: displaying a word in a first language; receiving a first audio sequence, the first audio sequence comprising a verbal description of the word; generating a first translated text in a second language; displa…
Who is the assignee on this patent?
Beijing Didi Infinity Technology & Dev Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/32. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).