Using meta-information in neural machine translation
US-2017323203-A1 · Nov 9, 2017 · US
US10713593B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10713593-B2 |
| Application number | US-201615394708-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 29, 2016 |
| Priority date | Nov 4, 2016 |
| Publication date | Jul 14, 2020 |
| Grant date | Jul 14, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing machine learning tasks. One method includes receiving (i) a model input, and (ii) data identifying a first machine learning task to be performed on the model input to generate a first type of model output for the model input; augmenting the model input with an identifier for the first machine learning task to generate an augmented model input; and processing the augmented model input using a machine learning model, wherein the machine learning model has been trained on training data to perform a plurality of machine learning tasks including the first machine learning task, and wherein the machine learning model has been configured through training to process the augmented model input to generate a machine learning model output of the first type for the model input.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving (i) a model input comprising text in a source language, and (ii) data identifying a target language that the text in the source language is to be translated into by the machine learning model; augmenting the model input with an identifier that identifies at least the target language to generate an augmented model input; and processing the augmented model input using a machine learning model to generate a model output that is a translation of the model input into the target language, wherein the machine learning model has been trained on training data to translate model inputs into a plurality of different languages including the target language, and wherein the machine learning model comprises: an encoder neural network; and a decoder neural network that is shared between the plurality of different languages and that is configured to generate outputs from a shared vocabulary that includes outputs from all of the plurality of different languages. 2. The method of claim 1 , wherein augmenting the model input with an identifier comprises prepending a token identifier that identifies at least the target language to the model input. 3. The method of claim 1 , wherein the training data comprises a plurality of paired datasets, wherein each of the paired datasets comprises an input dataset paired with an output dataset, and wherein the plurality of paired datasets does not include a pairing of datasets comprising an input dataset in the source language paired with an output dataset in the target language. 4. The method of claim 1 , wherein the encoder neural network and the decoder neural network comprise respective recurrent neural networks. 5. The method of claim 1 , wherein the machine learning model has been trained on the training data to translate model inputs in a first plurality of different languages including the source language into any of the plurality of different languages that include the target language. 6. The method of claim 5 , wherein the identifier identifies both the source language and the target language. 7. The method of claim 5 , wherein the encoder neural network is shared among the first plurality of different languages. 8. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving (i) a model input comprising text in a source language, and (ii) data identifying a target language that the text in the source language is to be translated into by the machine learning model; augmenting the model input with an identifier that identifies at least the target language to generate an augmented model input; and processing the augmented model input using a machine learning model to generate a model output that is a translation of the model input into the target language, wherein the machine learning model has been trained on training data to translate model inputs into a plurality of different languages including the target language, and wherein the machine learning model comprises: an encoder neural network; and a decoder neural network that is shared between the plurality of different languages and that is configured to generate outputs from a shared vocabulary that includes outputs from all of the plurality of different languages. 9. The system of claim 8 , wherein the encoder neural network and the decoder neural network comprise respective recurrent neural networks. 10. The system of claim 8 , wherein the decoder neural network comprises an attention mechanism. 11. The system of claim 8 , wherein the augmented model input comprises a model input with a prepended token identifier for at least the target language. 12. The system of claim 8 , wherein the machine learning model has been trained on the training data to translate model inputs in a first plurality of different languages including the source language into any of the plurality of different languages that include the target language. 13. The system of claim 12 , wherein the identifier identifies both the source language and the target language. 14. The system of claim 12 , wherein the encoder neural network is shared among the first plurality of different languages. 15. One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving (i) a model input comprising text in a source language, and (ii) data identifying a target language that the text in the source language is to be translated into by the machine learning model; augmenting the model input with an identifier that identifies at least the target language to generate an augmented model input; and processing the augmented model input using a machine learning model to generate a model output that is a translation of the model input into the target language, wherein the machine learning model has been trained on training data to translate model inputs into a plurality of different languages including the target language, and wherein the machine learning model comprises: an encoder neural network; and a decoder neural network that is shared between the plurality of different languages and that is configured to generate outputs from a shared vocabulary that includes outputs from all of the plurality of different languages. 16. The computer-readable storage media of claim 15 , wherein augmenting the model input with an identifier comprises prepending a token identifier that identifies at least the target language to the model input. 17. The computer-readable storage media of claim 15 , wherein the training data comprises a plurality of paired datasets, wherein each of the paired datasets comprises an input dataset paired with an output dataset, and wherein the plurality of paired datasets does not include a pairing of datasets comprising an input dataset in the source language paired with an output dataset in the target language. 18. The computer-readable storage media of claim 15 , wherein the machine learning model has been trained on the training data to translate model inputs in a first plurality of different languages including the source language into any of the plurality of different languages that include the target language. 19. The computer-readable storage media of claim 17 , wherein the identifier identifies both the source language and the target language. 20. The computer-readable storage media of claim 17 , wherein the encoder neural network is shared among the first plurality of different languages.
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.