System and method for incrementally updating a reordering model for a statistical machine translation system
US-2016140111-A1 · May 19, 2016 · US
US10460040B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10460040-B2 |
| Application number | US-201615194249-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 27, 2016 |
| Priority date | Jun 27, 2016 |
| Publication date | Oct 29, 2019 |
| Grant date | Oct 29, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Exemplary embodiments relate to techniques for improving machine translation systems. The machine translation system may apply one or more models for translating material from a source language into a destination language. The models are initially trained using training data. According to exemplary embodiments, supplemental training data is used to train the models, where the supplemental training data uses in-domain material to improve the quality of output translations. In-domain data may include data that relates to the same or similar topics as those expected to be encountered in a translation of material from the source language into the destination language. In-domain data may include material previously translated from the source language into the destination language, material similar to previous translations, and destination language material that has previously been the subject of a request for translation into the source language.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: accessing a translation system, the translation system configured to generate a machine translation of source material from a source language into a destination language, the translation system being trained using destination language training data and comprising: a translation model configured to receive the source material and generating one or more destination language hypotheses for the source material, and a language model configured to select one of the destination language hypotheses based on an analysis of the destination language training data; analyzing supplemental destination language training data for training the language model, the supplemental destination language training data comprising one or more of: monolingual destination language material that has been previously machine translated from the source language, or destination language material for which translation into the source language has been previously requested; and based on the analyzing, modifying the language model to account for the supplemental destination language training data. 2. The method of claim 1 , wherein the supplemental destination language training data comprises posts from a social network. 3. The method of claim 1 wherein the translation model is configured to be trained using bilingual training data comprising material in the source language and material in the destination language, and the language model is configured to be trained using monolingual training data consisting of material in the destination language. 4. The method of claim 1 , wherein the supplemental destination language training data contains training material in one or more domains associated with the source language. 5. The method of claim 1 , wherein the supplemental destination language training data comprises untranslated destination language material that includes topics similar to topics found in translated destination language material. 6. The method of claim 1 , wherein: the translation system applies a model selected from a plurality of models for translating the source material into the destination material; the plurality of models comprise: a first language model targeted to a first demographic group, and a second language model targeted to a second demographic group; and further comprising: analyzing demographic information of an originator of a request to translate the source material into the destination language; selecting the first language model or the second language model based on the demographic information; and applying the selected language model to translate the source material. 7. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to: access a translation system, the translation system configured to generate a machine translation of source material from a source language into a destination language, the translation system being trained using destination language training data and comprising: a translation model configured to receive the source material and generating one or more destination language hypotheses for the source material, and a language model configured to select one of the destination language hypotheses based on an analysis of the destination language training data; analyze supplemental destination language training data for training the language model, the supplemental destination language training data comprising one or more of: monolingual destination language material that has been previously machine translated from the source language, or destination language material for which translation into the source language has been previously requested; and based on the analyzing, modify the language model to account for the supplemental destination language training data. 8. The medium of claim 7 , wherein the supplemental destination language training data comprises posts from a social network. 9. The medium of claim 7 , wherein the translation model is configured to be trained using bilingual training data comprising material in the source language and material in the destination language, and the language model is configured to be trained using monolingual training data consisting of material in the destination language. 10. The medium of claim 7 , wherein the supplemental destination language training data contains training material in one or more domains associated with the source language. 11. The medium of claim 7 , wherein the supplemental destination language training data comprises untranslated destination language material that includes topics similar to topics found in translated destination language material. 12. The medium of claim 7 , wherein: the translation system applies a model selected from a plurality of models for translating the source material into the destination material; the plurality of models comprise: a first language model targeted to a first demographic group, and a second language model targeted to a second demographic group; and further storing instructions for: analyzing demographic information of an originator of a request to translate the source material into the destination language; selecting the first language model or the second language model based on the demographic information; and applying the selected language model to translate the source material. 13. An apparatus comprising: a non-transitory computer-readable medium configured to store logic for implementing a translation system, the translation system configured to generate a machine translation of source material from a source language into a destination language, the translation system being trained using destination language training data and comprising: a translation model configured to receive the source material and generating one or more destination language hypotheses for the source material, and a language model configured to select one of the destination language hypotheses based on an analysis of the destination language training data; a processor configured to: analyze supplemental destination language training data for training the language model, the supplemental destination language training data comprising one or more of: monolingual destination language material that has been previously machine translated from the source language, or destination language material for which translation into the source language has been previously requested; and based on the analyzing, modify the language model to account for the supplemental destination language training data. 14. The apparatus of claim 13 , wherein the supplemental destination language training data comprises posts from a social network. 15. The apparatus of claim 13 , wherein the supplemental destination language training data contains training material in one or more domains associated with the source language. 16. The apparatus of claim 13 , wherein the supplemental destination language training data comprises untranslated destination language material that includes topics similar to topics found in translated destination language material. 17. The apparatus of claim 13 , wherein: the translation system applies a model selected from a plurality of models for translating the source material into the destination material; the plurality of models comprise: a first language model targeted to a first demographic group, and a second language model targeted to a second demographic group; and the p
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
using statistical methods · CPC title
using very large corpora, e.g. the web · CPC title
Statistical methods, e.g. probability models · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.