What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Sequence-to-sequence convolutional architecture

US10839790B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10839790-B2
Application number	US-201715848199-A
Country	US
Kind code	B2
Filing date	Dec 20, 2017
Priority date	Feb 6, 2017
Publication date	Nov 17, 2020
Grant date	Nov 17, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Exemplary embodiments relate to improvements to neural networks for translation and other sequence-to-sequence tasks. A convolutional neural network may include multiple blocks, each having a convolution layer and gated linear units; gating may determine what information passes through to the next block level. Residual connections, which add the input of a block back to its output, may be applied around each block. Further, an attention may be applied to determine which word is most relevant to translate next. By applying repeated passes of the attention to multiple layers of the decoder, the decoder is able to work on the entire structure of a sentence at once (with no temporal dependency). In addition to better accuracy, this configuration is better at capturing long-range dependencies, better models the hierarchical syntax structure of a sentence, and is highly parallelizable and thus faster to run on hardware.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving an input sequence of data; providing the input sequence to a convolutional neural network comprising: an entirely convolutional encoder configured to encode the input sequence; and an entirely convolutional decoder comprising a plurality of layers, each layer associated with a respective attention that applies multiple attention passes in each of a plurality of time steps as part of a determination of a next part of the input sequence to which the decoder attends, wherein each attention pass comprises computing a conditional input for the respective decoder layer, and adding the conditional input to an output of the respective decoder layer; and applying the convolutional neural network to generate an output sequence representing a translation of the input sequence of data from a first language into a second language. 2. The method of claim 1 , wherein the convolutional neural network is arranged hierarchically, and at least one of the encoder or the decoder applies one or more non-linearities to determine which elements of a given hierarchical level are passed through to a next hierarchical level. 3. The method of claim 1 , wherein computations in the decoder are parallelized. 4. The method of claim 1 , wherein at least one of the encoder or the decoder is made up of a plurality of blocks, each of the plurality of blocks comprising at least one convolution and at least one non-linearity. 5. The method of claim 4 , further comprising a residual connection that adds the input of a respective block to the output of the respective block. 6. The method of claim 1 , wherein: the respective attention computes a context vector for its respective decoder layer, and the convolutional neural network accounts for contexts computed for preceding layers of the decoder at a given time step and at previous time steps that are within a receptive field of the respective decoder network layer. 7. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to: receive an input sequence of data; provide the input sequence to a convolutional neural network comprising: an entirely convolutional encoder configured to encode the input sequence; and an entirely convolutional decoder comprising a plurality of layers, each layer associated with a respective attention that applies multiple attention passes in each of a plurality of time steps as part of a determination of a next part of the input sequence to which the decoder attends, wherein each attention pass comprises computing a conditional input for the respective decoder layer, and adding the conditional input to an output of the respective decoder layer; and apply the convolutional neural network to generate an output sequence representing a translation of the input sequence of data from a first language into a second language. 8. The medium of claim 7 , wherein the convolutional neural network is arranged hierarchically, and at least one of the encoder or the decoder applies one or more nonlinearities to determine which elements of a given hierarchical level are passed through to a next hierarchical level. 9. The medium of claim 7 , wherein computations in the decoder are parallelized. 10. The medium of claim 7 , wherein at least one of the encoder or the decoder is made up of a plurality of blocks, each of the plurality of blocks comprising at least one convolution and at least non-linearity. 11. The medium of claim 10 , further comprising a residual connection that adds the input of a respective block to the output of the respective block. 12. The medium of claim 7 , wherein: the respective attention computes a context vector for its respective decoder network layer, and the convolutional neural network accounts for contexts computed for preceding layers of the decoder at a given time step and at previous time steps that are within a receptive field of the respective decoder layer. 13. An apparatus comprising: a non-transitory computer-readable medium configured to store an input sequence of data; and a hardware processor circuit configured to provide the input sequence to a convolutional neural network comprising: an entirely convolutional encoder configured to encode the input sequence; and an entirely convolutional decoder comprising a plurality of layers, each layer associated with a respective attention that applies multiple attention passes in each of a plurality of time steps as part of a determination of a next part of the input sequence to which the decoder attends, wherein each attention pass comprises computing a conditional input for the respective decoder layer, and adding the conditional input to an output of the respective decoder layer, wherein the hardware processor circuit is configured to apply the convolutional neural network to generate an output sequence representing a translation of the input sequence of data from a first language into a second language. 14. The apparatus of claim 13 , wherein the convolutional neural network is arranged hierarchically, and at least one of the encoder or the decoder applies one or more nonlinearities to determine which elements of a given hierarchical level are passed through to a next hierarchical level. 15. The apparatus of claim 13 , wherein computations in the decoder are parallelized. 16. The apparatus of claim 13 , wherein at least one of the encoder or the decoder is made up of a plurality of blocks, each of the plurality of blocks comprising at least one convolution and at least non-linearity. 17. The apparatus of claim 16 , further comprising a residual connection that adds the input of a respective block to the output of the respective block. 18. The method of claim 1 , wherein the conditional input is computed as a weighted sum of outputs of the encoder and an embedding of the input sequence of data. 19. The medium of claim 7 , wherein the conditional input is computed as a weighted sum of outputs of the encoder and an embedding of the input sequence of data. 20. The apparatus of claim 13 , wherein the conditional input is computed as a weighted sum of outputs of the encoder and an embedding of the input sequence of data.

Assignees

Facebook Inc

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/09
Supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 63444972

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10839790B2 cover?: Exemplary embodiments relate to improvements to neural networks for translation and other sequence-to-sequence tasks. A convolutional neural network may include multiple blocks, each having a convolution layer and gated linear units; gating may determine what information passes through to the next block level. Residual connections, which add the input of a block back to its output, may be appli…
Who is the assignee on this patent?: Facebook Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Speech recognition with attention-based recurrent neural networks

Processing sequences using convolutional neural networks

Convolutional, long short-term memory, fully connected deep neural networks

Frequently asked questions