Systems and methods for simultaneous translation with integrated anticipation and controllable latency (stacl)
US-2020104371-A1 · Apr 2, 2020 · US
US11468324B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11468324-B2 |
| Application number | US-202016831206-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 26, 2020 |
| Priority date | Oct 14, 2019 |
| Publication date | Oct 11, 2022 |
| Grant date | Oct 11, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor-implemented method includes: using an encoder, determining, for each of a plurality of tokens included in an input sequence, a self-attention weight based on a token and one or more tokens that precede the token in the input sequence; using the encoder, determining context information corresponding to the input sequence based on the determined self-attention weights; and using a decoder, determining an output sequence corresponding to the input sequence based on the determined context information.
Opening claim text (preview).
What is claimed is: 1. A processor-implemented method comprising: using an encoder, determining, for each of a plurality of tokens included in an input sequence, a self-attention weight based on a token and one or more tokens that precede the token in the input sequence; using the encoder, determining context information corresponding to the input sequence based on the determined self-attention weights; and using a decoder, determining an output sequence corresponding to the input sequence based on the determined context information. 2. The method of claim 1 , further comprising training the encoder and the decoder based on the determined output sequence. 3. The method of claim 2 , wherein the determining of the self-attention weight comprises: masking token relationships between the token and each of tokens that follow the token in the input sequence; and determining the self-attention weight based on a result of the masking. 4. The method of claim 2 , wherein the determining of the self-attention weight comprises: determining the self-attention weight based on the token and each of a preset number of the tokens that precede the token in the input sequence. 5. The method of claim 2 , wherein the determining of the self-attention weight comprises: determining the self-attention weight using two or more of the tokens included in the input sequence. 6. The method of claim 2 , wherein the determining of the self-attention weight comprises: determining the self-attention weight based on the token and each of remaining tokens excluding a preset number of tokens among the tokens that precede the token in the input sequence. 7. The method of claim 2 , wherein the training of the encoder and the decoder comprises: training the encoder and the decoder such that a loss between a true sequence corresponding to the input sequence and the output sequence is less than or equal to a threshold. 8. The method of claim 2 , wherein the encoder and the decoder correspond to a transformer model. 9. The method of claim 2 , wherein either one or both of the input sequence or the output sequence is any one of speech data, sentence data, image data, biodata, and handwriting data. 10. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform the method of claim 2 . 11. A processor-implemented method comprising: using an encoder, determining, each time a token included in an input sequence is input or obtained, a self-attention weight based on an input token and one or more tokens that precede the input token in the input sequence; determining context information corresponding to the currently input tokens based on the determined self-attention weight; and using a decoder, determining an output sequence corresponding to the currently input tokens based on the determined context information. 12. The method of claim 11 , wherein the determining of the self-attention weight comprises: masking token relationships between the token and each of tokens that follow the token among the currently input tokens; and determining the self-attention weight based on a result of the masking. 13. The method of claim 11 , wherein the determining of the context information comprises: updating the context information each time the token of the input sequence is input. 14. The method of claim 11 , wherein the determining of the self-attention weight comprises: determining the self-attention weight based on the token and each of a preset number of the tokens that precede the token among the currently input tokens. 15. The method of claim 11 , wherein the determining of the self-attention weight comprises: determining the self-attention weight using two or more tokens among the currently input tokens. 16. The method of claim 11 , wherein the determining of the self-attention weight comprises: determining the self-attention weight based on the token and each of remaining tokens excluding a preset number of tokens among the tokens that precede the token among the currently input tokens. 17. An apparatus comprising: one or more processors configured to: determine, for each of a plurality of tokens included in an input sequence, a self-attention weight based on a token and one or more tokens that precede the token in the input sequence; determine, context information corresponding to the input sequence based on the determined self-attention weight; and determine, an output sequence corresponding to the input sequence based on the determined context information. 18. The apparatus of claim 17 , wherein the one or more processors is configured to train, based on the determined output sequence, an encoder for the determining of the self-attention weight and the determining of the context information and a decoder for the determining of the output sequence. 19. The apparatus of claim 18 , wherein, for the determining of the self-attention weight, the one or more processors is configured to: mask token relationships between the token and each of tokens that follow the token in the input sequence; and determine the self-attention weight based on a result of the masking. 20. The apparatus of claim 18 , wherein, for the determining of the self-attention weight, the one or more processors is configured to: determine the self-attention weight based on the token and each of a preset number of tokens that precede the token in the input sequence. 21. The apparatus of claim 18 , wherein, for the determining of the self-attention weight, the one or more processors is configured to: determine the self-attention weight using two or more of the tokens included in the input sequence. 22. The apparatus of claim 18 , wherein, for the determining of the self-attention weight, the one or more processors is configured to: determine the self-attention weight based on the token and each of remaining tokens excluding a preset number of tokens among the tokens that precede the token in the input sequence.
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Supervised learning · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.