Method and apparatus with model training and/or sequence recognition

US11468324B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11468324-B2
Application numberUS-202016831206-A
CountryUS
Kind codeB2
Filing dateMar 26, 2020
Priority dateOct 14, 2019
Publication dateOct 11, 2022
Grant dateOct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor-implemented method includes: using an encoder, determining, for each of a plurality of tokens included in an input sequence, a self-attention weight based on a token and one or more tokens that precede the token in the input sequence; using the encoder, determining context information corresponding to the input sequence based on the determined self-attention weights; and using a decoder, determining an output sequence corresponding to the input sequence based on the determined context information.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor-implemented method comprising: using an encoder, determining, for each of a plurality of tokens included in an input sequence, a self-attention weight based on a token and one or more tokens that precede the token in the input sequence; using the encoder, determining context information corresponding to the input sequence based on the determined self-attention weights; and using a decoder, determining an output sequence corresponding to the input sequence based on the determined context information. 2. The method of claim 1 , further comprising training the encoder and the decoder based on the determined output sequence. 3. The method of claim 2 , wherein the determining of the self-attention weight comprises: masking token relationships between the token and each of tokens that follow the token in the input sequence; and determining the self-attention weight based on a result of the masking. 4. The method of claim 2 , wherein the determining of the self-attention weight comprises: determining the self-attention weight based on the token and each of a preset number of the tokens that precede the token in the input sequence. 5. The method of claim 2 , wherein the determining of the self-attention weight comprises: determining the self-attention weight using two or more of the tokens included in the input sequence. 6. The method of claim 2 , wherein the determining of the self-attention weight comprises: determining the self-attention weight based on the token and each of remaining tokens excluding a preset number of tokens among the tokens that precede the token in the input sequence. 7. The method of claim 2 , wherein the training of the encoder and the decoder comprises: training the encoder and the decoder such that a loss between a true sequence corresponding to the input sequence and the output sequence is less than or equal to a threshold. 8. The method of claim 2 , wherein the encoder and the decoder correspond to a transformer model. 9. The method of claim 2 , wherein either one or both of the input sequence or the output sequence is any one of speech data, sentence data, image data, biodata, and handwriting data. 10. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform the method of claim 2 . 11. A processor-implemented method comprising: using an encoder, determining, each time a token included in an input sequence is input or obtained, a self-attention weight based on an input token and one or more tokens that precede the input token in the input sequence; determining context information corresponding to the currently input tokens based on the determined self-attention weight; and using a decoder, determining an output sequence corresponding to the currently input tokens based on the determined context information. 12. The method of claim 11 , wherein the determining of the self-attention weight comprises: masking token relationships between the token and each of tokens that follow the token among the currently input tokens; and determining the self-attention weight based on a result of the masking. 13. The method of claim 11 , wherein the determining of the context information comprises: updating the context information each time the token of the input sequence is input. 14. The method of claim 11 , wherein the determining of the self-attention weight comprises: determining the self-attention weight based on the token and each of a preset number of the tokens that precede the token among the currently input tokens. 15. The method of claim 11 , wherein the determining of the self-attention weight comprises: determining the self-attention weight using two or more tokens among the currently input tokens. 16. The method of claim 11 , wherein the determining of the self-attention weight comprises: determining the self-attention weight based on the token and each of remaining tokens excluding a preset number of tokens among the tokens that precede the token among the currently input tokens. 17. An apparatus comprising: one or more processors configured to: determine, for each of a plurality of tokens included in an input sequence, a self-attention weight based on a token and one or more tokens that precede the token in the input sequence; determine, context information corresponding to the input sequence based on the determined self-attention weight; and determine, an output sequence corresponding to the input sequence based on the determined context information. 18. The apparatus of claim 17 , wherein the one or more processors is configured to train, based on the determined output sequence, an encoder for the determining of the self-attention weight and the determining of the context information and a decoder for the determining of the output sequence. 19. The apparatus of claim 18 , wherein, for the determining of the self-attention weight, the one or more processors is configured to: mask token relationships between the token and each of tokens that follow the token in the input sequence; and determine the self-attention weight based on a result of the masking. 20. The apparatus of claim 18 , wherein, for the determining of the self-attention weight, the one or more processors is configured to: determine the self-attention weight based on the token and each of a preset number of tokens that precede the token in the input sequence. 21. The apparatus of claim 18 , wherein, for the determining of the self-attention weight, the one or more processors is configured to: determine the self-attention weight using two or more of the tokens included in the input sequence. 22. The apparatus of claim 18 , wherein, for the determining of the self-attention weight, the one or more processors is configured to: determine the self-attention weight based on the token and each of remaining tokens excluding a preset number of tokens among the tokens that precede the token in the input sequence.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Supervised learning · CPC title

  • G06F40/40Primary

    Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11468324B2 cover?
A processor-implemented method includes: using an encoder, determining, for each of a plurality of tokens included in an input sequence, a self-attention weight based on a token and one or more tokens that precede the token in the input sequence; using the encoder, determining context information corresponding to the input sequence based on the determined self-attention weights; and using a dec…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).