What technology area does this patent fall under?

Primary CPC classification G06F40/58. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Machine translation using neural network models

US11809834B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11809834-B2
Application number	US-202117459041-A
Country	US
Kind code	B2
Filing date	Aug 27, 2021
Priority date	Jul 26, 2018
Publication date	Nov 7, 2023
Grant date	Nov 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for machine translation using neural networks. In some implementations, a text in one language is translated into a second language using a neural network model. The model can include an encoder neural network comprising a plurality of bidirectional recurrent neural network layers. The encoding vectors are processed using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector. A decoder neural network generates a sequence of decoder output vectors using the attention context vectors. The decoder output vectors can represent distributions over various language elements of the second language, allowing a translation of the text into the second language to be determined based on the sequence of decoder output vectors.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for performing machine translation of text from a first language to a second language, the method comprising: generating, by one or more processors, a set of encoding vectors from a series of feature vectors representing characteristics of a text segment in the first language, by processing the feature vectors with an encoder neural network comprising a set of bidirectional recurrent neural network layers, each encoding vector of the set having a predetermined number of values; generating, by the one or more processors, multiple context vectors for each encoding vector based on multiple sets of parameters, the multiple sets of parameters being respectively used to generate the context vectors from different subsets of each encoding vector; generating, by the one or more processors, a sequence of output vectors using a decoder neural network that receives the context vectors, the decoder neural network comprising a recurrent neural network, the output vectors representing distributions over language elements of the second language; and determining, by the one or more processors, a translation of the text segment into the second language based on the sequence of output vectors. 2. The method of claim 1 , further comprising: providing the translation to a client device in response to a translation request. 3. The method of claim 1 , wherein the recurrent neural network in the decoder neural network does not perform self-attention. 4. The method of claim 1 , wherein the set of bidirectional recurrent neural network layers of the encoder neural network comprise long short-term memory (LSTM) layers. 5. The method of claim 1 , wherein the parameters are weighting values, and the different sets of the parameters are applicable to different non-overlapping continuous chunks of the encoding vectors. 6. The method of claim 1 , wherein the decoder neural network is configured to receive the context vectors, concatenated together, at a softmax layer providing output of the decoder neural network. 7. The method of claim 1 , wherein two or more of the language elements of the second language comprise characters, word pieces, words, or phrases. 8. The method of claim 1 , wherein the encoder neural network and the decoder neural network include a normalization layer between each recurrent hidden neural network layer. 9. The method of claim 1 , wherein at least one of the encoder neural network and the decoder neural network have been trained using synchronous training. 10. The method of claim 1 , wherein at least one of the encoder neural network and the decoder neural network have been trained using a learning rate that increases gradually during training. 11. The method of claim 1 , wherein at least one of the encoder neural network and the decoder neural network have been trained using label smoothing that introduces variability into target labels. 12. The method of claim 1 , wherein: the encoder neural network comprises a first encoder module and a second encoder module, the first encoder module and the second encoder module having different neural network topologies; the first encoder module uses a transformer layer structure with layers that each include (i) a self-attention network sub-layer and (ii) a feed-forward network sub-layer; and the second encoder module includes a series of bidirectional recurrent neural network layers each providing normalization before processing by a next recurrent layer. 13. A system comprising: one or more processors; and one or more data storage devices storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: generating a set of encoding vectors from a series of feature vectors representing characteristics of a text segment in a first language by processing the series of feature vectors with an encoder neural network comprising a set of bidirectional recurrent neural network layers, each encoding vector of the set having a predetermined number of values; generating multiple context vectors for each encoding vector based on multiple sets of parameters, the multiple sets of parameters being respectively used to generate the context vectors from different subsets of each encoding vector; generating a sequence of output vectors using a decoder neural network that receives the context vectors, the decoder neural network comprising a recurrent neural network, the output vectors representing distributions over language elements of a second language; and determining a translation of the text segment into the second language based on the sequence of output vectors. 14. A computer-implemented method for performing machine translation of text from a first language to a second language, the method comprising: obtaining, by a machine translation module, a source embedding associated with a text segment in the first language, the source embedding having a selected dimensionality; processing, with an encoder neural network of the machine translation module, the source embedding to obtain a set of encoding vectors, the encoder neural network comprising a set of bidirectional recurrent neural network layers including a plurality of forward-propagating layers and a plurality of backward-propagating layers; normalizing, by the machine translation module, the set of encoding vectors; processing, by a transformer encoder of the machine translation module, the normalized set of encoding vectors to obtain transformed data; generating output vectors using a decoder neural network from the transformed data, the decoder neural network comprising a recurrent neural network, the output vectors representing distributions over language elements of the second language; and determining, by the machine translation module, a translation of the text segment into the second language based on the output vectors. 15. The method of claim 14 , further comprising providing the translation to a client device in response to a translation request. 16. The method of claim 14 , wherein the recurrent neural network of the decoder neural network does not perform self-attention. 17. The method of claim 14 , wherein feature extraction performed by the encoder neural network is cascaded to the transformer encoder. 18. The method of claim 14 , wherein the set of bidirectional recurrent neural network layers of the encoder neural network comprise long short-term memory (LSTM) layers. 19. The method of claim 14 , wherein processing the normalized set of encoding vectors includes performing multi-headed attention. 20. The method of claim 14 , wherein: the transformer encoder comprises a first encoder module and a second encoder module, the first encoder module and the second encoder module having different neural network topologies; the first encoder module uses a transformer layer structure with layers that each include (i) a self-attention network sub-layer and (ii) a feed-forward network sub-layer; and the second encoder module includes a series of bidirectional recurrent neural network layers each providing normalization before processing by a next recurrent layer.

Assignees

Google Llc

Inventors

Classifications

G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06F40/58Primary
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
G06N3/08
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 69178459

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11809834B2 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for machine translation using neural networks. In some implementations, a text in one language is translated into a second language using a neural network model. The model can include an encoder neural network comprising a plurality of bidirectional recurrent neural network layers. The encoding vecto…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/58. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).