Attention-based sequence transduction neural networks

US10452978B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10452978-B2
Application numberUS-201816021971-A
CountryUS
Kind codeB2
Filing dateJun 28, 2018
Priority dateMay 23, 2017
Publication dateOct 22, 2019
Grant dateOct 22, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. In one aspect, one of the systems includes an encoder neural network configured to receive the input sequence and generate encoded representations of the network inputs, the encoder neural network comprising a sequence of one or more encoder subnetworks, each encoder subnetwork configured to receive a respective encoder subnetwork input for each of the input positions and to generate a respective subnetwork output for each of the input positions, and each encoder subnetwork comprising: an encoder self-attention sub-layer that is configured to receive the subnetwork input for each of the input positions and, for each particular input position in the input order: apply an attention mechanism over the encoder subnetwork inputs using one or more queries derived from the encoder subnetwork input at the particular input position.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a sequence transduction neural network for transducing an input sequence having a respective network input at each of a plurality of input positions in an input order into an output sequence having a respective network output at each of a plurality of output positions in an output order, the sequence transduction neural network comprising: an encoder neural network configured to receive the input sequence and generate a respective encoded representation of each of the network inputs in the input sequence, the encoder neural network comprising a sequence of one or more encoder subnetworks, each encoder subnetwork configured to receive a respective encoder subnetwork input for each of the plurality of input positions and to generate a respective subnetwork output for each of the plurality of input positions, and each encoder subnetwork comprising: an encoder self-attention sub-layer that is configured to receive the subnetwork input for each of the plurality of input positions and, for each particular input position in the input order: apply a self-attention mechanism over the encoder subnetwork inputs at the plurality of input positions to generate a respective output for the particular input position, wherein applying a self-attention mechanism comprises: determining a query from the subnetwork input at the particular input position, determining keys derived from the subnetwork inputs at the plurality of input positions, determining values derived from the subnetwork inputs at the plurality of input positions, and using the determined query, keys, and values to generate the respective output for the particular input position; and a decoder neural network configured to receive the encoded representations and generate the output sequence. 2. The system of claim 1 , wherein the encoder neural network further comprises: an embedding layer configured to: for each network input in the input sequence, map the network input to an embedded representation of the network input, and combine the embedded representation of the network input with a positional embedding of the input position of the network input in the input order to generate a combined embedded representation of the network input; and provide the combined embedded representations of the network inputs as the encoder subnetwork inputs for a first encoder subnetwork in the sequence of encoder subnetworks. 3. The system of claim 1 , wherein the respective encoded representations of the network inputs are the encoder subnetwork outputs generated by the last encoder subnetwork in the sequence. 4. The system of claim 1 , wherein the sequence of one or more encoder subnetworks includes at least two encoder subnetworks, and wherein, for each encoder subnetwork other than a first encoder subnetwork in the sequence, the encoder subnetwork input is the encoder subnetwork output of a preceding encoder subnetwork in the sequence. 5. The system of claim 1 , wherein at least one of the encoder subnetworks further comprises: a position-wise feed-forward layer that is configured to: for each input position: receive an input at the input position, and apply a sequence of transformations to the input at the input position to generate an output for the input position. 6. The system of claim 5 , wherein the sequence comprises two learned linear transformations separated by an activation function. 7. The system of claim 5 , wherein the at least one encoder subnetwork further comprises: a residual connection layer that combines the outputs of the position-wise feed-forward layer with the inputs to the position-wise feed-forward layer to generate an encoder position-wise residual output, and a layer normalization layer that applies layer normalization to the encoder position-wise residual output. 8. The system of claim 1 , wherein each encoder subnetwork further comprises: a residual connection layer that combines the outputs of the encoder self-attention sub-layer with the inputs to the encoder self-attention sub-layer to generate an encoder self-attention residual output, and a layer normalization layer that applies layer normalization to the encoder self-attention residual output. 9. The system of claim 1 , wherein each encoder self-attention sub-layer comprises a plurality of encoder self-attention layers. 10. The system of claim 9 , wherein each encoder self-attention layer is configured to: apply a learned query linear transformation to each encoder subnetwork input at each input position to generate a respective query for each input position, apply a learned key linear transformation to each encoder subnetwork input at each input position to generate a respective key for each input position, apply a learned value linear transformation to each encoder subnetwork input at each input position to generate a respective value for each input position, and for each input position, determine a respective input-position specific weight for the input position by applying a comparison function between the query for the input position and the keys generated for the plurality of input positions, and determine an initial encoder self-attention output for the input position by determining a weighted sum of the values weighted by the corresponding input-position specific weights for the plurality of input positions, the values being generated for the plurality of input positions. 11. The system of claim 10 , wherein the encoder self-attention sub-layer is configured to, for each input position, combine the initial encoder self-attention outputs for the input position generated by the encoder self-attention layers to generate the output for the encoder self-attention sub-layer. 12. The system of claim 9 , wherein the encoder self-attention layers operate in parallel. 13. The system of claim 1 , wherein the decoder neural network auto-regressively generates the output sequence, by at each of a plurality of generation time steps, generating a network output at an output position corresponding to the generation time step conditioned on the encoded representations and network outputs at output positions preceding the output position in the output order. 14. The system of claim 13 , wherein the decoder neural network comprises a sequence of decoder subnetworks, each decoder subnetwork configured to, at each generation time step, receive a respective decoder subnetwork input for each of the plurality of output positions preceding the corresponding output position and to generate a respective decoder subnetwork output for each of the plurality of output positions preceding the corresponding output position. 15. The system of claim 14 , wherein the decoder neural network further comprises: an embedding layer configured to, at each generation time step: for each network output at output positions preceding the corresponding output position in the output order: map the network output to an embedded representation of the network output, and combine the embedded representation of the network output with a positional embedding of the corresponding output position of the network output in the output order to generate a combined embedded representation of the network output; and provide the combined embedded representations of the network output as input to a first decoder subnetwork in the sequence of decoder subnetworks. 16. The system of cl

Assignees

Inventors

Classifications

  • G06N3/045Primary

    Combinations of networks · CPC title

  • Physics · mapped topic

  • G06N3/04Primary

    Architecture, e.g. interconnection topology · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • G06N3/0455Primary

    Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10452978B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. In one aspect, one of the systems includes an encoder neural network configured to receive the input sequence and generate encoded representations of the network inputs, the encoder neural network comprising a sequence of one or more encode…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 22 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).