Sequence-to-sequence convolutional architecture

US10839790B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10839790-B2
Application numberUS-201715848199-A
CountryUS
Kind codeB2
Filing dateDec 20, 2017
Priority dateFeb 6, 2017
Publication dateNov 17, 2020
Grant dateNov 17, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Exemplary embodiments relate to improvements to neural networks for translation and other sequence-to-sequence tasks. A convolutional neural network may include multiple blocks, each having a convolution layer and gated linear units; gating may determine what information passes through to the next block level. Residual connections, which add the input of a block back to its output, may be applied around each block. Further, an attention may be applied to determine which word is most relevant to translate next. By applying repeated passes of the attention to multiple layers of the decoder, the decoder is able to work on the entire structure of a sentence at once (with no temporal dependency). In addition to better accuracy, this configuration is better at capturing long-range dependencies, better models the hierarchical syntax structure of a sentence, and is highly parallelizable and thus faster to run on hardware.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving an input sequence of data; providing the input sequence to a convolutional neural network comprising: an entirely convolutional encoder configured to encode the input sequence; and an entirely convolutional decoder comprising a plurality of layers, each layer associated with a respective attention that applies multiple attention passes in each of a plurality of time steps as part of a determination of a next part of the input sequence to which the decoder attends, wherein each attention pass comprises computing a conditional input for the respective decoder layer, and adding the conditional input to an output of the respective decoder layer; and applying the convolutional neural network to generate an output sequence representing a translation of the input sequence of data from a first language into a second language. 2. The method of claim 1 , wherein the convolutional neural network is arranged hierarchically, and at least one of the encoder or the decoder applies one or more non-linearities to determine which elements of a given hierarchical level are passed through to a next hierarchical level. 3. The method of claim 1 , wherein computations in the decoder are parallelized. 4. The method of claim 1 , wherein at least one of the encoder or the decoder is made up of a plurality of blocks, each of the plurality of blocks comprising at least one convolution and at least one non-linearity. 5. The method of claim 4 , further comprising a residual connection that adds the input of a respective block to the output of the respective block. 6. The method of claim 1 , wherein: the respective attention computes a context vector for its respective decoder layer, and the convolutional neural network accounts for contexts computed for preceding layers of the decoder at a given time step and at previous time steps that are within a receptive field of the respective decoder network layer. 7. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to: receive an input sequence of data; provide the input sequence to a convolutional neural network comprising: an entirely convolutional encoder configured to encode the input sequence; and an entirely convolutional decoder comprising a plurality of layers, each layer associated with a respective attention that applies multiple attention passes in each of a plurality of time steps as part of a determination of a next part of the input sequence to which the decoder attends, wherein each attention pass comprises computing a conditional input for the respective decoder layer, and adding the conditional input to an output of the respective decoder layer; and apply the convolutional neural network to generate an output sequence representing a translation of the input sequence of data from a first language into a second language. 8. The medium of claim 7 , wherein the convolutional neural network is arranged hierarchically, and at least one of the encoder or the decoder applies one or more nonlinearities to determine which elements of a given hierarchical level are passed through to a next hierarchical level. 9. The medium of claim 7 , wherein computations in the decoder are parallelized. 10. The medium of claim 7 , wherein at least one of the encoder or the decoder is made up of a plurality of blocks, each of the plurality of blocks comprising at least one convolution and at least non-linearity. 11. The medium of claim 10 , further comprising a residual connection that adds the input of a respective block to the output of the respective block. 12. The medium of claim 7 , wherein: the respective attention computes a context vector for its respective decoder network layer, and the convolutional neural network accounts for contexts computed for preceding layers of the decoder at a given time step and at previous time steps that are within a receptive field of the respective decoder layer. 13. An apparatus comprising: a non-transitory computer-readable medium configured to store an input sequence of data; and a hardware processor circuit configured to provide the input sequence to a convolutional neural network comprising: an entirely convolutional encoder configured to encode the input sequence; and an entirely convolutional decoder comprising a plurality of layers, each layer associated with a respective attention that applies multiple attention passes in each of a plurality of time steps as part of a determination of a next part of the input sequence to which the decoder attends, wherein each attention pass comprises computing a conditional input for the respective decoder layer, and adding the conditional input to an output of the respective decoder layer, wherein the hardware processor circuit is configured to apply the convolutional neural network to generate an output sequence representing a translation of the input sequence of data from a first language into a second language. 14. The apparatus of claim 13 , wherein the convolutional neural network is arranged hierarchically, and at least one of the encoder or the decoder applies one or more nonlinearities to determine which elements of a given hierarchical level are passed through to a next hierarchical level. 15. The apparatus of claim 13 , wherein computations in the decoder are parallelized. 16. The apparatus of claim 13 , wherein at least one of the encoder or the decoder is made up of a plurality of blocks, each of the plurality of blocks comprising at least one convolution and at least non-linearity. 17. The apparatus of claim 16 , further comprising a residual connection that adds the input of a respective block to the output of the respective block. 18. The method of claim 1 , wherein the conditional input is computed as a weighted sum of outputs of the encoder and an embedding of the input sequence of data. 19. The medium of claim 7 , wherein the conditional input is computed as a weighted sum of outputs of the encoder and an embedding of the input sequence of data. 20. The apparatus of claim 13 , wherein the conditional input is computed as a weighted sum of outputs of the encoder and an embedding of the input sequence of data.

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10839790B2 cover?
Exemplary embodiments relate to improvements to neural networks for translation and other sequence-to-sequence tasks. A convolutional neural network may include multiple blocks, each having a convolution layer and gated linear units; gating may determine what information passes through to the next block level. Residual connections, which add the input of a block back to its output, may be appli…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).