Model learning device, model learning method, and program
US-2021216818-A1 · Jul 15, 2021 · US
US11734298B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11734298-B2 |
| Application number | US-202117304190-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 16, 2021 |
| Priority date | Mar 23, 2021 |
| Publication date | Aug 22, 2023 |
| Grant date | Aug 22, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to one embodiment, an information processing device includes: an encoder including a first layer and a second layer which are coupled in series; and a decoder. The encoder is configured to: generate, based on first data, a first key and a first value in the first layer, and a second key and a second value in the second layer; and generate, based on second data different from the first data, a first query in the first layer, and a second query in the second layer. The decoder is configured to: generate third data which is included in the first data and is not included in the second data, based on the first key, the first value, the first query, the second key, the second value, and the second query.
Opening claim text (preview).
What is claimed is: 1. An information processing device comprising: an encoder including a first layer and a second layer coupled in series; and a decoder, the encoder being configured to: generate, based on first data, a first key and a first value in the first layer, and a second key and a second value in the second layer; and generate, based on second data different from the first data, a first query in the first layer, and a second query in the second layer, and the decoder being configured to: generate third data which is included in the first data and is not included in the second data, based on the first key, the first value, the first query, the second key, the second value, and the second query. 2. The information processing device of claim 1 , wherein the decoder includes a first attention layer, a first neural network layer, a second attention layer, and a second neural network layer, the first attention layer is configured to generate fourth data by executing a first attention operation based on the first query, the first key and the first value, the first neural network layer is configured to generate fifth data by executing a first multiply-accumulate operation based on the fourth data, the second attention layer is configured to generate sixth data by executing a second attention operation based on the second query, the second key and the second value, and the second neural network layer is configured to generate the third data by executing a second multiply-accumulate operation based on the sixth data. 3. The information processing device of claim 2 , wherein the second attention layer is configured to generate the sixth data by executing the second attention operation based on a third query based on the fifth data and the second query, the second key, and the second value. 4. The information processing device of claim 3 , wherein the second attention layer is configured to generate the third query by executing a residual connection between the fifth data and the second query. 5. The information processing device of claim 2 , wherein the second neural network layer is configured to generate the third data by executing the second multiply-accumulate operation based on seventh data based on the fifth data and the sixth data. 6. The information processing device of claim 5 , wherein the second attention layer is configured to generate the seventh data by executing a residual connection between the fifth data and the sixth data. 7. The information processing device of claim 2 , wherein the decoder further includes a third neural network layer, the third data is independent from the fifth data, and the third neural network layer is configured to generate eighth data by executing a third multiply-accumulate operation based on the fifth data and the third data. 8. The information processing device of claim 2 , wherein each of the first neural network layer and the second neural network layer is configured to use a feed-forward network. 9. The information processing device of claim 2 , wherein the first attention operation and the second attention operation include source-target attention operations. 10. The information processing device of claim 1 , wherein the encoder is configured to: generate, based on the first data, the first key and the first value by executing a third attention operation in the first layer, and the second key and the second value by executing a fourth attention operation in the second layer, and generate, based on the second data, the first query by executing a fifth attention operation in the first layer, and the second query by executing a sixth attention operation in the second layer. 11. The information processing device of claim 10 , wherein the third attention operation, the fourth attention operation, the fifth attention operation and the sixth attention operation include self-attention operations. 12. The information processing device of claim 1 , further comprising: a storage configured to correlate and nonvolatilely store the first key and the first value, and to correlate and nonvolatilely store the second key and the second value, wherein the decoder is configured to load the first key, the first value, the second key and the second value from the storage. 13. The information processing device of claim 1 , wherein the encoder includes a first encoder and a second encoder, the first encoder includes a third layer and a fourth layer coupled in series, the third layer being the first layer, and the fourth layer being the second layer, the second encoder includes a fifth layer and a sixth layer coupled in series, the fifth layer being the first layer, and the sixth layer being the second layer, the first encoder is configured to generate, based on the first data, the first key and the first value in the third layer, and the second key and the second value in the fourth layer, and the second encoder is configured to generate, based on the second data, the first query in the fifth layer, and the second query in the sixth layer. 14. The information processing device of claim 13 , wherein the first key, the second key, the first query and the second query each have an identical number of dimensions. 15. The information processing device of claim 13 , wherein the first encoder is configured to generate, based on the second data, a third query in the third layer, and a fourth query in the fourth layer, the third query is identical to the first query, and the fourth query is identical to the second query. 16. The information processing device of claim 13 , wherein the first encoder is configured to generate, based on the second data, a third query in the third layer, and a fourth query in the fourth layer, the third query is different from the first query, and the fourth query is different from the second query. 17. An information processing method comprising: generating, based on first data, a first key, a first value, a second key and a second value; generating, based on second data different from the first data, a first query, and a second query; and generating third data which is included in the first data and is not included in the second data, based on the first key, the first value, the first query, the second key, the second value, and the second query. 18. The information processing method of claim 17 , wherein the generating the third data includes: generating fourth data by executing a first attention operation based on the first query, the first key and the first value; generating fifth data by executing a first multiply-accumulate operation based on the fourth data; generating sixth data by executing a second attention operation based on the second query, the second key and the second value; and generating the third data by executing a second multiply-accumulate operation based on the sixth data. 19. A generating method of a learning model, comprising: generating, based on first data, a first key, a first value, a second key and a second value; generating, based on second data different from the first data, a first query, and a second query; generating third data which is included in the first data and is not included in the second data, based on the first key, the first value, the first query, the second key, the second value, and the second query; computing a loss function, based on the generated third data; updating a parameter, based on the computed loss function; and repeating, based on the u
Auto-encoder networks; Encoder-decoder networks · CPC title
Supervised learning · CPC title
Feedforward networks · CPC title
Data format conversion from or to a database · CPC title
Natural language query formulation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.