Synthetic-to-realistic image conversion using generative adversarial network (gan) or other machine learning model
US-2024428568-A1 · Dec 26, 2024 · US
US10452978B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10452978-B2 |
| Application number | US-201816021971-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 28, 2018 |
| Priority date | May 23, 2017 |
| Publication date | Oct 22, 2019 |
| Grant date | Oct 22, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. In one aspect, one of the systems includes an encoder neural network configured to receive the input sequence and generate encoded representations of the network inputs, the encoder neural network comprising a sequence of one or more encoder subnetworks, each encoder subnetwork configured to receive a respective encoder subnetwork input for each of the input positions and to generate a respective subnetwork output for each of the input positions, and each encoder subnetwork comprising: an encoder self-attention sub-layer that is configured to receive the subnetwork input for each of the input positions and, for each particular input position in the input order: apply an attention mechanism over the encoder subnetwork inputs using one or more queries derived from the encoder subnetwork input at the particular input position.
Opening claim text (preview).
What is claimed is: 1. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a sequence transduction neural network for transducing an input sequence having a respective network input at each of a plurality of input positions in an input order into an output sequence having a respective network output at each of a plurality of output positions in an output order, the sequence transduction neural network comprising: an encoder neural network configured to receive the input sequence and generate a respective encoded representation of each of the network inputs in the input sequence, the encoder neural network comprising a sequence of one or more encoder subnetworks, each encoder subnetwork configured to receive a respective encoder subnetwork input for each of the plurality of input positions and to generate a respective subnetwork output for each of the plurality of input positions, and each encoder subnetwork comprising: an encoder self-attention sub-layer that is configured to receive the subnetwork input for each of the plurality of input positions and, for each particular input position in the input order: apply a self-attention mechanism over the encoder subnetwork inputs at the plurality of input positions to generate a respective output for the particular input position, wherein applying a self-attention mechanism comprises: determining a query from the subnetwork input at the particular input position, determining keys derived from the subnetwork inputs at the plurality of input positions, determining values derived from the subnetwork inputs at the plurality of input positions, and using the determined query, keys, and values to generate the respective output for the particular input position; and a decoder neural network configured to receive the encoded representations and generate the output sequence. 2. The system of claim 1 , wherein the encoder neural network further comprises: an embedding layer configured to: for each network input in the input sequence, map the network input to an embedded representation of the network input, and combine the embedded representation of the network input with a positional embedding of the input position of the network input in the input order to generate a combined embedded representation of the network input; and provide the combined embedded representations of the network inputs as the encoder subnetwork inputs for a first encoder subnetwork in the sequence of encoder subnetworks. 3. The system of claim 1 , wherein the respective encoded representations of the network inputs are the encoder subnetwork outputs generated by the last encoder subnetwork in the sequence. 4. The system of claim 1 , wherein the sequence of one or more encoder subnetworks includes at least two encoder subnetworks, and wherein, for each encoder subnetwork other than a first encoder subnetwork in the sequence, the encoder subnetwork input is the encoder subnetwork output of a preceding encoder subnetwork in the sequence. 5. The system of claim 1 , wherein at least one of the encoder subnetworks further comprises: a position-wise feed-forward layer that is configured to: for each input position: receive an input at the input position, and apply a sequence of transformations to the input at the input position to generate an output for the input position. 6. The system of claim 5 , wherein the sequence comprises two learned linear transformations separated by an activation function. 7. The system of claim 5 , wherein the at least one encoder subnetwork further comprises: a residual connection layer that combines the outputs of the position-wise feed-forward layer with the inputs to the position-wise feed-forward layer to generate an encoder position-wise residual output, and a layer normalization layer that applies layer normalization to the encoder position-wise residual output. 8. The system of claim 1 , wherein each encoder subnetwork further comprises: a residual connection layer that combines the outputs of the encoder self-attention sub-layer with the inputs to the encoder self-attention sub-layer to generate an encoder self-attention residual output, and a layer normalization layer that applies layer normalization to the encoder self-attention residual output. 9. The system of claim 1 , wherein each encoder self-attention sub-layer comprises a plurality of encoder self-attention layers. 10. The system of claim 9 , wherein each encoder self-attention layer is configured to: apply a learned query linear transformation to each encoder subnetwork input at each input position to generate a respective query for each input position, apply a learned key linear transformation to each encoder subnetwork input at each input position to generate a respective key for each input position, apply a learned value linear transformation to each encoder subnetwork input at each input position to generate a respective value for each input position, and for each input position, determine a respective input-position specific weight for the input position by applying a comparison function between the query for the input position and the keys generated for the plurality of input positions, and determine an initial encoder self-attention output for the input position by determining a weighted sum of the values weighted by the corresponding input-position specific weights for the plurality of input positions, the values being generated for the plurality of input positions. 11. The system of claim 10 , wherein the encoder self-attention sub-layer is configured to, for each input position, combine the initial encoder self-attention outputs for the input position generated by the encoder self-attention layers to generate the output for the encoder self-attention sub-layer. 12. The system of claim 9 , wherein the encoder self-attention layers operate in parallel. 13. The system of claim 1 , wherein the decoder neural network auto-regressively generates the output sequence, by at each of a plurality of generation time steps, generating a network output at an output position corresponding to the generation time step conditioned on the encoded representations and network outputs at output positions preceding the output position in the output order. 14. The system of claim 13 , wherein the decoder neural network comprises a sequence of decoder subnetworks, each decoder subnetwork configured to, at each generation time step, receive a respective decoder subnetwork input for each of the plurality of output positions preceding the corresponding output position and to generate a respective decoder subnetwork output for each of the plurality of output positions preceding the corresponding output position. 15. The system of claim 14 , wherein the decoder neural network further comprises: an embedding layer configured to, at each generation time step: for each network output at output positions preceding the corresponding output position in the output order: map the network output to an embedded representation of the network output, and combine the embedded representation of the network output with a positional embedding of the corresponding output position of the network output in the output order to generate a combined embedded representation of the network output; and provide the combined embedded representations of the network output as input to a first decoder subnetwork in the sequence of decoder subnetworks. 16. The system of cl
Combinations of networks · CPC title
Physics · mapped topic
Architecture, e.g. interconnection topology · CPC title
Learning methods · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.