Speech translation processing apparatus
US-2024370669-A1 · Nov 7, 2024 · US
US9710463B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9710463-B2 |
| Application number | US-201314099079-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 6, 2013 |
| Priority date | Dec 6, 2012 |
| Publication date | Jul 18, 2017 |
| Grant date | Jul 18, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A two-way speech-to-speech (S2S) translation system actively detects a wide variety of common error types and resolves them through user-friendly dialog with the user(s). Examples include features including one or more of detecting out-of-vocabulary (OOV) named entities and terms, sensing ambiguities, homophones, idioms, ill-formed input, etc. and interactive strategies for recovering from such errors. In some examples, different error types are prioritized and systems implementing the approach can include an extensible architecture for implementing these decisions.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for linguistic processing for speech-to-speech translation, the method comprising: receiving a linguistic input comprising a sequence of words in a first language from a first user, the linguistic input comprising a first audio input including a speech utterance by the first user; determining a first data representation of the linguistic input; processing, using a computer-implemented analyzer, the first data representation to identify at least part of the data representation as being potentially associated with an error of processing of the linguistic input, wherein the processing comprises identifying said part as at least one characteristic of (a) including out-of-vocabulary (OOV) words, (b) representing a named entity, (c) including a homophone, (d) having an ambiguous word sense, and (e) including an idiom in the first language; performing further processing, using a computer-implemented recovery strategy processor, of the identified at least part of the first data representation to form a modified data representation of the linguistic input; using a machine translator to form a second data representation of the modified data representation; and processing the second data representation using the recovery strategy processor to refine the second data representation through at least one of automated processing or user-assisted processing; determining a linguistic output from the refined second data representation, the linguistic output comprising a sequence of words in a second language; and providing the linguistic output to a second user, the linguistic output comprising a synthesized second audio signal including speech output, wherein identifying said part as including an idiom in the first language comprises performing rule-based idiom expansion and performing statistical idiom detection. 2. The method of claim 1 wherein the first data representation comprises a text representation in the first language. 3. The method of claim 1 wherein the method comprises speech-to-speech translation, and wherein the linguistic input comprises a first audio input including a speech utterance by the first user and the linguistic output comprises a synthesized second audio signal including speech output. 4. The method of claim 3 wherein determining the first data representation of the linguistic input comprises recognizing, using a speech to text module of the computer, the speech utterance in the first audio signal to form the first text representation, and wherein determining the linguistic output comprises using a text to speech module of the computer to form the second audio signal from the modified data representation. 5. The method of claim 1 wherein performing the further processing includes selecting and performing a recovery strategy according the identified characteristic. 6. The method of claim 5 wherein performing the recovery strategy includes soliciting and receiving input for the recovery strategy from a user. 7. The method of claim 6 wherein the user from whom the input for the recovery strategy is solicited and received is the first user. 8. The method of claim 6 wherein performing the recovery strategy includes soliciting and receiving input for the recovery strategy from one or both of the first user and a second user to whom a linguistic output based on the second data representation is presented. 9. The method of claim 5 wherein performing the recovery strategy includes identifying a part of the first data representation with a corresponding part of the linguistic input and wherein forming the data representing the linguistic output comprises forming said data to transfer the part of the linguistic input to a linguistic output without translation. 10. The method of claim 9 wherein the method comprises a speech-to-speech translation system, and wherein the linguistic input comprises a first audio input signal including a speech utterance by the first user and the linguistic output comprises a synthesized second audio signal including speech output, and wherein the second audio signal further comprises a part of the audio input signal. 11. The method of claim 1 wherein performing the further processing includes performing a constrained linguistic translation of the linguistic input. 12. The method of claim 5 wherein performing the recovery strategy includes soliciting and receiving input for the recovery strategy from the first user for disambiguation of a homophone, ambiguous word sense, or an idiom in the first language. 13. The method of claim 1 , wherein processing the first data representation further comprises at least one of: tagging one or more unambiguous words over a span of words within the first data representation; tagging one or more ambiguous words over the span of words within the first data representation; and predicting the one or more tagged ambiguous words based, at least in part, upon the one or more tagged unambiguous words. 14. The method of claim 13 , wherein predicting the one or more tagged ambiguous words comprises one or more of: (i) determining a phrase pair associated with each of the one or more tagged ambiguous words; (ii) searching an inventory of source phrase keywords; (iii) determining a prediction based upon a supervised model; and (iv) receiving input from the first user. 15. The method of claim 1 , wherein performing rule-based idiom expansion comprises performing rule-based idiom expansion comprising performing pronoun expansion and performing verb expansion. 16. The method of claim 1 , wherein processing the first data representation further comprises identifying one or more incomplete utterances within the first data representation by identifying fragments with ungrammatical structure. 17. The method of claim 1 , wherein the processing further comprises identifying said part as at least two characteristics of (a) including out-of-vocabulary (OOV) words, (b) representing a named entity, (c) including a homophone, (d) having an ambiguous word sense, and (e) including an idiom in the first language. 18. Software stored on a non-transitory computer-readable medium comprising instructions for causing a computer processor to perform a linguistic processing for speech-to-speech translation including: receiving first data representing a linguistic input comprising a sequence of words in a first language from a first user, the linguistic input comprising a first audio input including a speech utterance by the first user; determining a first data representation of the linguistic input; processing, using a computer-implemented analyzer, the first data representation to identify at least part of the data representation as being potentially associated with an error of processing of the linguistic input; performing further processing, using a computer-implemented recovery strategy processor, of the identified as least part of the first data representation to form a modified data representation of the linguistic input, wherein the processing comprises identifying said part as at least one characteristic of (a) including out-of-vocabulary (OOV) words, (b) representing a named entity, (c) including a homophone, (d) having an ambiguous word sense, and (e) including an idiom in the first language; using a machine translator to form a second data representation of the modified data representation; processing the second data representation using the recovery strategy processor to refine the second data representation through at least one of automated proce
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Assessment or evaluation of speech recognition systems · CPC title
Feedback of the input speech · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.