Transformer acceleration device and operation method thereof

US2025278270A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025278270-A1
Application numberUS-202418958633-A
CountryUS
Kind codeA1
Filing dateNov 25, 2024
Priority dateMar 4, 2024
Publication dateSep 4, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A transformer acceleration device configured to execute a transformer comprising a first plurality of decoder layers and a second plurality of decoder layers is disclosed. The transformer acceleration device comprises a first memory bank array configured to store a first plurality of weight matrices corresponding to the first plurality of decoder layers and a first plurality of key-value vector pairs corresponding to the second plurality of decoder layers, and a second memory bank array configured to store a second plurality of weight matrices corresponding to the second plurality of decoder layers, and a second plurality of key-value vector pairs corresponding to the first plurality of decoder layers.

First claim

Opening claim text (preview).

What is claimed is: 1 . A transformer acceleration device configured to execute a transformer comprising a first plurality of decoder layers and a second plurality of decoder layers, the transformer acceleration device comprising: a first memory bank array configured to store a first plurality of weight matrices corresponding to the first plurality of decoder layers and a first plurality of key-value vector pairs corresponding to the second plurality of decoder layers; and a second memory bank array configured to store a second plurality of weight matrices corresponding to the second plurality of decoder layers, and a second plurality of key-value vector pairs corresponding to the first plurality of decoder layers. 2 . The transformer acceleration device of claim 1 , further comprising a first processing circuit configured to: calculate the first plurality of key-value vector pairs based on the first plurality of weight matrices; and calculate the second plurality of key-value vector pairs based on the second plurality of weight matrices. 3 . The transformer acceleration device of claim 2 , further comprising a second processing circuit configured to: perform attention calculation for the second plurality of decoder layers based on the first plurality of key-value vector pairs; and perform attention calculation for the first plurality of decoder layers based on the second plurality of key-value vector pairs. 4 . The transformer acceleration device of claim 1 , wherein: a number of the first plurality of decoder layers and a number of the second plurality of decoder layers correspond to each other. 5 . The transformer acceleration device of claim 3 , wherein: an order of the first plurality of decoder layers and the second plurality of decoder layers is alternating with each other. 6 . The transformer acceleration device of claim 1 , wherein: a first capacity of the first plurality of weight matrices and the first plurality of key-value vector pairs corresponds to a second capacity of the second plurality of weight matrices and the second plurality of key-value vector pairs. 7 . The transformer acceleration device of claim 1 , wherein: the first plurality of key-value vector pairs comprises: a first plurality of key vectors corresponding to a first token stream provided from an external device; and a first plurality of value vectors corresponding to the first token stream, and the second plurality of key-value vector pairs comprises: a second plurality of key vectors corresponding to the first token stream; and a second plurality of value vectors corresponding to the first token stream. 8 . The transformer acceleration device of claim 7 , wherein: the first plurality of key-value vector pairs further comprises: a third plurality of key vectors corresponding to a second token stream provided from the external device; and a third plurality of value vectors corresponding to the second token stream, and the second plurality of key-value vector pairs further comprises: a fourth plurality of key vectors corresponding to the second token stream; and a fourth plurality of value vectors corresponding to the second token stream. 9 . The transformer acceleration device of claim 7 , wherein: each of the first plurality of key vectors, the first plurality of value vectors, the second plurality of key vectors, and second plurality of value vectors has smaller capacity than that of one memory cell row of memory banks included in the first memory bank array and the second memory bank array. 10 . A transformer acceleration device configured to operate based on a first activation vector corresponding to a first input token, the transformer acceleration device comprising: a first memory bank array configured to store a first plurality of weight matrices; a second memory bank array configured to store a first plurality of preceding key-value vector pairs corresponding to a first plurality of preceding tokens for the first input token; a first processing circuit configured to generate a first key-value vector pair based on the first activation vector and the first plurality of weight matrices; and a second processing circuit configured to perform first attention calculation based on the first key-value vector pair and the first plurality of preceding key-value vector pairs. 11 . The transformer acceleration device of claim 10 , wherein: the first key-value vector pair comprises a first key vector and a first value vector, the first plurality of preceding key-value vector pairs comprises a first plurality of preceding key vectors and a first plurality of preceding value vectors, and the second processing circuit is configured to: generate a first plurality of attention scores respectively corresponding to the first key vector and the first plurality of preceding key vectors; and generate a first attention vector by accumulating the first value vector and the first plurality of preceding value vectors based on the first plurality of attention scores. 12 . The transformer acceleration device of claim 11 , wherein the first processing circuit is further configured to store the first key vector and the first value vector in the second memory bank array. 13 . The transformer acceleration device of claim 12 , wherein: the transformer acceleration device is further configured to operate based on a second activation vector, the second activation vector being generated based on the first attention vector, the second memory bank array is further configured to store a second plurality of weight matrices, the first memory bank array is further configured to store a second plurality of preceding key-value vector pairs corresponding to the first plurality of preceding tokens, respectively, the first processing circuit is further configured to generate a second key-value vector pair based on the second activation vector and the second plurality of weight matrices, and the second processing circuit is further configured to perform a second attention calculation based on the second key-value vector pair and the second plurality of preceding key-value vector pairs. 14 . The transformer acceleration device of claim 13 , wherein the second processing circuit is configured to store the second key-value vector pair in the first memory bank array. 15 . The transformer acceleration device of claim 11 , wherein the transformer acceleration device is further configured to generate a first output token corresponding to the first input token and the first plurality of preceding tokens based on the first attention vector. 16 . An operation method of a transformer acceleration device including a plurality of memory bank arrays, the method comprising: generating a first activation vector based on a first input token; reading a first plurality of weight matrices from a first memory bank array, the first memory bank array being a memory bank array of the plurality of memory bank arrays; generating a first key-value vector pair based on the first plurality of weight matrices and the first activation vector; storing the first key-value vector pair in a second memory bank array, the second memory bank array being a memory bank array, of the plurality of memory bank arrays, different from the first memory bank array; generating a second activation vector based on the first key-value vector pair; and generating a first output token corresponding to the first input token based on the second activation vector. 17 . The operation method of cl

Assignees

Inventors

Classifications

  • G06N3/0455Primary

    Auto-encoder networks; Encoder-decoder networks · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Generative networks · CPC title

  • Combinations of networks · CPC title

  • Details of memory controller · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025278270A1 cover?
A transformer acceleration device configured to execute a transformer comprising a first plurality of decoder layers and a second plurality of decoder layers is disclosed. The transformer acceleration device comprises a first memory bank array configured to store a first plurality of weight matrices corresponding to the first plurality of decoder layers and a first plurality of key-value vector…
Who is the assignee on this patent?
Samsung Electronics Co Ltd, Naver Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/0455. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).