Machine translation using neural network models

US11138392B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11138392-B2
Application numberUS-201916521780-A
CountryUS
Kind codeB2
Filing dateJul 25, 2019
Priority dateJul 26, 2018
Publication dateOct 5, 2021
Grant dateOct 5, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for machine translation using neural networks. In some implementations, a text in one language is translated into a second language using a neural network model. The model can include an encoder neural network comprising a plurality of bidirectional recurrent neural network layers. The encoding vectors are processed using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector. A decoder neural network generates a sequence of decoder output vectors using the attention context vectors. The decoder output vectors can represent distributions over various language elements of the second language, allowing a translation of the text into the second language to be determined based on the sequence of decoder output vectors.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for performing machine translation of a text from a first language to a second language, the method being performed by one or more computers, the method comprising: obtaining, by the one or more computers, a series of feature vectors representing characteristics of the text in a first language; generating, by the one or more computers, encoding vectors from the feature vectors by processing the feature vectors with an encoder neural network comprising a plurality of bidirectional recurrent neural network layers, each encoding vector having a predetermined number of values; processing, by the one or more computers, the encoding vectors using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector, wherein the multi-headed attention module includes multiple sets of parameters, and the multiple sets of parameters are respectively used to generate attention context vectors from different subsets of each encoding vector; generating, by the one or more computers, a sequence of output vectors using a decoder neural network that receives the attention context vectors, the decoder neural network comprising a plurality of unidirectional recurrent neural network layers, the output vectors representing distributions over various language elements of the second language; and determining, by the one or more computers, a translation of the text into the second language based on the sequence of output vectors. 2. The method of claim 1 , further comprising: storing data indicating the translation in a data retrieval system; accessing the stored data indicating the translation; and providing the translation to one or more client devices over a communication network. 3. The method of claim 1 , wherein each of the plurality of bidirectional recurrent neural network layers in the encoder neural network comprises a forward layer and a backward layer; and wherein, for each of the plurality of bidirectional recurrent neural network layers in the encoder neural network, the outputs of the forward layer and the backward layer are concatenated before being fed into the next layer. 4. The method of claim 1 , wherein the plurality of bidirectional recurrent neural network layers of the encoder neural network comprise long short-term memory (LSTM) layers. 5. The method of claim 4 , wherein the encoder neural network is configured to not apply a non-linearity to the output of the LSTM layers. 6. The method of claim 1 , wherein the parameters of the multi-headed attention module are weighting values, and the multi-headed attention module applies the different sets of the parameters to different non-overlapping continuous chunks of the encoding vectors. 7. The method of claim 1 , wherein the multi-headed attention module comprises multiple chunk processors, each chunk processor comprising a separately trained neural network, each of the chunk processors generating a different one of the attention context vectors for each encoding vector. 8. The method of claim 1 , wherein the multi-headed attention module generates the attention context vectors for a processing step based on (i) the encoding vector output by the encoder neural network for the processing step and (ii) a state of a first layer of the decoder neural network. 9. The method of claim 1 , wherein the decoder neural network is configured to receive the attention context vectors, concatenated together, at each of the unidirectional recurrent neural network layers and at a softmax layer providing output of the decoder neural network. 10. The method of claim 1 , wherein the encoder neural network and the decoder neural network include LSTM elements or gated recurrent unit (GRU) elements. 11. The method of claim 1 , wherein language elements of the second language comprise characters, word pieces, words, or phrases. 12. The method of claim 1 , wherein the encoder neural network and the decoder neural network applies per-gate layer normalization for each LSTM cell of the LSTM layers. 13. The method of claim 1 , wherein the encoder neural network and the decoder neural network include a normalization layer between each recurrent hidden neural network layer, the normalization layers configured to shift activations to a range that avoids saturation of a squashing function for propagation to a subsequent neural network layer. 14. The method of claim 1 , wherein the encoder neural network, multi-headed attention module, and/or the decoder neural network have been trained using synchronous training. 15. The method of claim 1 , wherein the encoder neural network, multi-headed attention module, and/or the decoder neural network have been trained using a learning rate that increases gradually over the course of training. 16. The method of claim 1 , wherein the encoder neural network, multi-headed attention module, and/or the decoder neural network have been trained using label smoothing that introduces variability into target labels. 17. The method of claim 16 , wherein label smoothing manipulates an input vector for a neural network by altering or replacing one or more elements of the input vector. 18. The method of claim 1 , wherein the encoder neural network comprises a first encoder module and a second encoder module, wherein the first encoder module and the second encoder module have different neural network topologies; wherein the first encoder module uses a transformer layer structure and has layers that each include (i) a self-attention network sub-layer and (ii) a feed-forward network sub-layer; and wherein the second encoder module includes a series of bidirectional recurrent neural network layers each providing normalization before processing by the next recurrent layer. 19. A system comprising: one or more computers; and one or more data storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations that include: obtaining, by the one or more computers, a series of feature vectors representing characteristics of a text in a first language; generating, by the one or more computers, encoding vectors from the feature vectors by processing the feature vectors with an encoder neural network comprising a plurality of bidirectional recurrent neural network layers, each encoding vector having a predetermined number of values; processing, by the one or more computers, the encoding vectors using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector, wherein the multi-headed attention module includes multiple sets of parameters, and the multiple sets of parameters are respectively used to generate attention context vectors from different subsets of each encoding vector; generating, by the one or more computers, a sequence of output vectors using a decoder neural network that receives the attention context vectors, the decoder neural network comprising a plurality of unidirectional recurrent neural network layers, the output vectors representing distributions over various language elements of a second language; and determining, by the one or more computers, a translation of the text into the second language based on the sequence of output vectors. 20. One or more non-transitory computer-readable media storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations that include: obtaining, by the one or mo

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11138392B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for machine translation using neural networks. In some implementations, a text in one language is translated into a second language using a neural network model. The model can include an encoder neural network comprising a plurality of bidirectional recurrent neural network layers. The encoding vecto…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/44. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 05 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).