Method for providing translation information, non-transitory computer-readable recording medium, and translation information providing apparatus
US-2018357224-A1 · Dec 13, 2018 · US
US11715475B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11715475-B2 |
| Application number | US-202117479349-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 20, 2021 |
| Priority date | Sep 20, 2021 |
| Publication date | Aug 1, 2023 |
| Grant date | Aug 1, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media for evaluating and improving live translation captioning systems. An exemplary method includes: displaying a word in a first language; receiving a first audio sequence, the first audio sequence comprising a verbal description of the word; generating a first translated text in a second language; displaying the first translated text; receiving a second audio sequence, the second audio sequence comprising a guessed word based on the first translated text; generating a second translated text in the first language; determining a matching score between the word and the second translated text; determining a performance score of the live translation captioning system based on the matching score.
Opening claim text (preview).
The invention claimed is: 1. A method for evaluating performance of a live translation captioning system, comprising: displaying a word in a first language on a first user interface; receiving a first audio sequence, the first audio sequence comprising a verbal description of the word in the first language; generating a first translated text in a second language by feeding the first audio sequence into a pipeline comprising an Automatic Speech Recognition (ASR) subsystem and a machine translation (MT) subsystem; displaying the first translated text on a second user interface; receiving a second audio sequence, the second audio sequence comprising a guessed word based on the first translated text; generating a second translated text in the first language by feeding the second audio sequence into the pipeline; determining a matching score between the word and the second translated text; determining a performance score of the live translation captioning system based on the matching score. 2. The method of claim 1 , wherein the generating the first translated text comprises: generating a first text sequence by feeding the first audio sequence into the ASR subsystem; and generating the first translated text in the second language by feeding the first text sequence into the MT subsystem corresponding to the second language. 3. The method of claim 2 , wherein the first audio sequence comprises a plurality of audio segments, and the ASR subsystem is configured to generate an output when each of the plurality of audio segments is fed in. 4. The method of claim 3 , wherein the feeding the first text sequence into the MT subsystem comprises: feeding every k-th output generated by the ASR subsystem into the MT subsystem, wherein k is a positive integer. 5. The method of claim 3 , wherein the feeding the first text sequence into the MT subsystem comprises: feeding the output generated by the ASR subsystem into the MT subsystem if t seconds have elapsed since a most recent output of the ASR subsystem was fed into the MT subsystem, wherein t is a positive integer. 6. The method of claim 1 , wherein the generating the second translated text comprises: generating a second text sequence by feeding the second audio sequence into the ASR subsystem; and generating the second translated text by feeding the second text sequence into the MT subsystem. 7. The method of claim 1 , wherein the ASR subsystem comprises a sequence-to-sequence ASR model trained based on a joint set of corpora from a plurality of languages. 8. The method of claim 1 , wherein the ASR subsystem comprises a plurality of ASR models respectively trained based on training samples from a plurality of languages. 9. The method of claim 1 , wherein the MT subsystem comprises a multilingual neural machine translation model trained based on a joint set of corpora from a plurality of languages. 10. The method of claim 1 , wherein the MT subsystem comprises a plurality of MT models respectively trained based on training samples from a plurality of languages. 11. The method of claim 1 , wherein the method further comprises selecting the word from a plurality of word candidates in a first language based on ambiguity scores of the plurality of word candidates, wherein the ambiguity scores of the plurality of word candidates are determined by: feeding each of the plurality of word candidates into an online dictionary to obtain returned entries; and determining an ambiguity score of the word based on a number of returned entries. 12. The method of claim 1 , wherein the receiving the first audio sequence comprises continuously receiving audio signals, and the generating the first translated text in the second language comprises streaming the continuous audio signals into the pipeline and obtaining a stream of translated phrases in the second language. 13. The method of claim 12 , wherein the displaying the first translated text on the second user interface comprises a live captioning of the stream of translated phrases. 14. The method of claim 1 , wherein the determining a performance score of the live translation captioning system based on the matching score comprises: in response to the matching score being greater than a threshold, increasing a performance score of the live translation captioning system, wherein the increase is inversely proportional to a time spent between generating the first translated text and generating the second translated text. 15. A system for evaluating performance of a live translation captioning system, the system comprising: one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to perform: displaying a word in a first language on a first user interface; receiving a first audio sequence, the first audio sequence comprising a verbal description of the word in the first language; generating a first translated text in a second language by feeding the first audio sequence into a pipeline comprising an Automatic Speech Recognition (ASR) subsystem and a machine translation (MT) subsystem; displaying the first translated text on a second user interface; receiving a second audio sequence, the second audio sequence comprising a guessed word based on the first translated text; generating a second translated text in the first language by feeding the second audio sequence into the pipeline; determining a matching score between the word and the second translated text; determining a performance score of the live translation captioning system based on the matching score.
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title
using very large corpora, e.g. the web · CPC title
Translation evaluation · CPC title
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Training · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.