Attention-based image generation neural networks
US-10839259-B2 · Nov 17, 2020 · US
US11809834B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11809834-B2 |
| Application number | US-202117459041-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 27, 2021 |
| Priority date | Jul 26, 2018 |
| Publication date | Nov 7, 2023 |
| Grant date | Nov 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for machine translation using neural networks. In some implementations, a text in one language is translated into a second language using a neural network model. The model can include an encoder neural network comprising a plurality of bidirectional recurrent neural network layers. The encoding vectors are processed using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector. A decoder neural network generates a sequence of decoder output vectors using the attention context vectors. The decoder output vectors can represent distributions over various language elements of the second language, allowing a translation of the text into the second language to be determined based on the sequence of decoder output vectors.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for performing machine translation of text from a first language to a second language, the method comprising: generating, by one or more processors, a set of encoding vectors from a series of feature vectors representing characteristics of a text segment in the first language, by processing the feature vectors with an encoder neural network comprising a set of bidirectional recurrent neural network layers, each encoding vector of the set having a predetermined number of values; generating, by the one or more processors, multiple context vectors for each encoding vector based on multiple sets of parameters, the multiple sets of parameters being respectively used to generate the context vectors from different subsets of each encoding vector; generating, by the one or more processors, a sequence of output vectors using a decoder neural network that receives the context vectors, the decoder neural network comprising a recurrent neural network, the output vectors representing distributions over language elements of the second language; and determining, by the one or more processors, a translation of the text segment into the second language based on the sequence of output vectors. 2. The method of claim 1 , further comprising: providing the translation to a client device in response to a translation request. 3. The method of claim 1 , wherein the recurrent neural network in the decoder neural network does not perform self-attention. 4. The method of claim 1 , wherein the set of bidirectional recurrent neural network layers of the encoder neural network comprise long short-term memory (LSTM) layers. 5. The method of claim 1 , wherein the parameters are weighting values, and the different sets of the parameters are applicable to different non-overlapping continuous chunks of the encoding vectors. 6. The method of claim 1 , wherein the decoder neural network is configured to receive the context vectors, concatenated together, at a softmax layer providing output of the decoder neural network. 7. The method of claim 1 , wherein two or more of the language elements of the second language comprise characters, word pieces, words, or phrases. 8. The method of claim 1 , wherein the encoder neural network and the decoder neural network include a normalization layer between each recurrent hidden neural network layer. 9. The method of claim 1 , wherein at least one of the encoder neural network and the decoder neural network have been trained using synchronous training. 10. The method of claim 1 , wherein at least one of the encoder neural network and the decoder neural network have been trained using a learning rate that increases gradually during training. 11. The method of claim 1 , wherein at least one of the encoder neural network and the decoder neural network have been trained using label smoothing that introduces variability into target labels. 12. The method of claim 1 , wherein: the encoder neural network comprises a first encoder module and a second encoder module, the first encoder module and the second encoder module having different neural network topologies; the first encoder module uses a transformer layer structure with layers that each include (i) a self-attention network sub-layer and (ii) a feed-forward network sub-layer; and the second encoder module includes a series of bidirectional recurrent neural network layers each providing normalization before processing by a next recurrent layer. 13. A system comprising: one or more processors; and one or more data storage devices storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: generating a set of encoding vectors from a series of feature vectors representing characteristics of a text segment in a first language by processing the series of feature vectors with an encoder neural network comprising a set of bidirectional recurrent neural network layers, each encoding vector of the set having a predetermined number of values; generating multiple context vectors for each encoding vector based on multiple sets of parameters, the multiple sets of parameters being respectively used to generate the context vectors from different subsets of each encoding vector; generating a sequence of output vectors using a decoder neural network that receives the context vectors, the decoder neural network comprising a recurrent neural network, the output vectors representing distributions over language elements of a second language; and determining a translation of the text segment into the second language based on the sequence of output vectors. 14. A computer-implemented method for performing machine translation of text from a first language to a second language, the method comprising: obtaining, by a machine translation module, a source embedding associated with a text segment in the first language, the source embedding having a selected dimensionality; processing, with an encoder neural network of the machine translation module, the source embedding to obtain a set of encoding vectors, the encoder neural network comprising a set of bidirectional recurrent neural network layers including a plurality of forward-propagating layers and a plurality of backward-propagating layers; normalizing, by the machine translation module, the set of encoding vectors; processing, by a transformer encoder of the machine translation module, the normalized set of encoding vectors to obtain transformed data; generating output vectors using a decoder neural network from the transformed data, the decoder neural network comprising a recurrent neural network, the output vectors representing distributions over language elements of the second language; and determining, by the machine translation module, a translation of the text segment into the second language based on the output vectors. 15. The method of claim 14 , further comprising providing the translation to a client device in response to a translation request. 16. The method of claim 14 , wherein the recurrent neural network of the decoder neural network does not perform self-attention. 17. The method of claim 14 , wherein feature extraction performed by the encoder neural network is cascaded to the transformer encoder. 18. The method of claim 14 , wherein the set of bidirectional recurrent neural network layers of the encoder neural network comprise long short-term memory (LSTM) layers. 19. The method of claim 14 , wherein processing the normalized set of encoding vectors includes performing multi-headed attention. 20. The method of claim 14 , wherein: the transformer encoder comprises a first encoder module and a second encoder module, the first encoder module and the second encoder module having different neural network topologies; the first encoder module uses a transformer layer structure with layers that each include (i) a self-attention network sub-layer and (ii) a feed-forward network sub-layer; and the second encoder module includes a series of bidirectional recurrent neural network layers each providing normalization before processing by a next recurrent layer.
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.