Multidimentional image editing from an input image
US-2024087265-A1 · Mar 14, 2024 · US
US11138392B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11138392-B2 |
| Application number | US-201916521780-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 25, 2019 |
| Priority date | Jul 26, 2018 |
| Publication date | Oct 5, 2021 |
| Grant date | Oct 5, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for machine translation using neural networks. In some implementations, a text in one language is translated into a second language using a neural network model. The model can include an encoder neural network comprising a plurality of bidirectional recurrent neural network layers. The encoding vectors are processed using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector. A decoder neural network generates a sequence of decoder output vectors using the attention context vectors. The decoder output vectors can represent distributions over various language elements of the second language, allowing a translation of the text into the second language to be determined based on the sequence of decoder output vectors.
Opening claim text (preview).
What is claimed is: 1. A method for performing machine translation of a text from a first language to a second language, the method being performed by one or more computers, the method comprising: obtaining, by the one or more computers, a series of feature vectors representing characteristics of the text in a first language; generating, by the one or more computers, encoding vectors from the feature vectors by processing the feature vectors with an encoder neural network comprising a plurality of bidirectional recurrent neural network layers, each encoding vector having a predetermined number of values; processing, by the one or more computers, the encoding vectors using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector, wherein the multi-headed attention module includes multiple sets of parameters, and the multiple sets of parameters are respectively used to generate attention context vectors from different subsets of each encoding vector; generating, by the one or more computers, a sequence of output vectors using a decoder neural network that receives the attention context vectors, the decoder neural network comprising a plurality of unidirectional recurrent neural network layers, the output vectors representing distributions over various language elements of the second language; and determining, by the one or more computers, a translation of the text into the second language based on the sequence of output vectors. 2. The method of claim 1 , further comprising: storing data indicating the translation in a data retrieval system; accessing the stored data indicating the translation; and providing the translation to one or more client devices over a communication network. 3. The method of claim 1 , wherein each of the plurality of bidirectional recurrent neural network layers in the encoder neural network comprises a forward layer and a backward layer; and wherein, for each of the plurality of bidirectional recurrent neural network layers in the encoder neural network, the outputs of the forward layer and the backward layer are concatenated before being fed into the next layer. 4. The method of claim 1 , wherein the plurality of bidirectional recurrent neural network layers of the encoder neural network comprise long short-term memory (LSTM) layers. 5. The method of claim 4 , wherein the encoder neural network is configured to not apply a non-linearity to the output of the LSTM layers. 6. The method of claim 1 , wherein the parameters of the multi-headed attention module are weighting values, and the multi-headed attention module applies the different sets of the parameters to different non-overlapping continuous chunks of the encoding vectors. 7. The method of claim 1 , wherein the multi-headed attention module comprises multiple chunk processors, each chunk processor comprising a separately trained neural network, each of the chunk processors generating a different one of the attention context vectors for each encoding vector. 8. The method of claim 1 , wherein the multi-headed attention module generates the attention context vectors for a processing step based on (i) the encoding vector output by the encoder neural network for the processing step and (ii) a state of a first layer of the decoder neural network. 9. The method of claim 1 , wherein the decoder neural network is configured to receive the attention context vectors, concatenated together, at each of the unidirectional recurrent neural network layers and at a softmax layer providing output of the decoder neural network. 10. The method of claim 1 , wherein the encoder neural network and the decoder neural network include LSTM elements or gated recurrent unit (GRU) elements. 11. The method of claim 1 , wherein language elements of the second language comprise characters, word pieces, words, or phrases. 12. The method of claim 1 , wherein the encoder neural network and the decoder neural network applies per-gate layer normalization for each LSTM cell of the LSTM layers. 13. The method of claim 1 , wherein the encoder neural network and the decoder neural network include a normalization layer between each recurrent hidden neural network layer, the normalization layers configured to shift activations to a range that avoids saturation of a squashing function for propagation to a subsequent neural network layer. 14. The method of claim 1 , wherein the encoder neural network, multi-headed attention module, and/or the decoder neural network have been trained using synchronous training. 15. The method of claim 1 , wherein the encoder neural network, multi-headed attention module, and/or the decoder neural network have been trained using a learning rate that increases gradually over the course of training. 16. The method of claim 1 , wherein the encoder neural network, multi-headed attention module, and/or the decoder neural network have been trained using label smoothing that introduces variability into target labels. 17. The method of claim 16 , wherein label smoothing manipulates an input vector for a neural network by altering or replacing one or more elements of the input vector. 18. The method of claim 1 , wherein the encoder neural network comprises a first encoder module and a second encoder module, wherein the first encoder module and the second encoder module have different neural network topologies; wherein the first encoder module uses a transformer layer structure and has layers that each include (i) a self-attention network sub-layer and (ii) a feed-forward network sub-layer; and wherein the second encoder module includes a series of bidirectional recurrent neural network layers each providing normalization before processing by the next recurrent layer. 19. A system comprising: one or more computers; and one or more data storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations that include: obtaining, by the one or more computers, a series of feature vectors representing characteristics of a text in a first language; generating, by the one or more computers, encoding vectors from the feature vectors by processing the feature vectors with an encoder neural network comprising a plurality of bidirectional recurrent neural network layers, each encoding vector having a predetermined number of values; processing, by the one or more computers, the encoding vectors using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector, wherein the multi-headed attention module includes multiple sets of parameters, and the multiple sets of parameters are respectively used to generate attention context vectors from different subsets of each encoding vector; generating, by the one or more computers, a sequence of output vectors using a decoder neural network that receives the attention context vectors, the decoder neural network comprising a plurality of unidirectional recurrent neural network layers, the output vectors representing distributions over various language elements of a second language; and determining, by the one or more computers, a translation of the text into the second language based on the sequence of output vectors. 20. One or more non-transitory computer-readable media storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations that include: obtaining, by the one or mo
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.