Block data load with transpose into memory

US12229570B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12229570-B2
Application numberUS-202217952270-A
CountryUS
Kind codeB2
Filing dateSep 25, 2022
Priority dateSep 25, 2022
Publication dateFeb 18, 2025
Grant dateFeb 18, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Block data load with transpose techniques are described. In one example, an input is received, at a control unit, specifying an instruction to load a block of data to at least one memory module using a transpose operation. Responsive to the receiving the input by the control unit, the block of data is caused to be loaded to the at least one memory module by transposing the block of data to form a transposed block of data and storing the transposed block of data in the at least one memory.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: responsive to receiving, by a control circuit of a processor, a single transpose and load instruction specifying a plurality of blocks of data to be processed in a transposed form by a processor array of the processor, fetching, by the control circuit of the processor, the data for processing by storing the transposed form of the data in at least one memory module circuit of the processor array without generating an intermediate representation of the transposed form of the data; and processing, by the processor array, the transposed form of the data stored in the at least one memory module circuit. 2. The method of claim 1 , wherein: the processor is part of a single instruction multiple data (SIMD) processor unit that includes the processor array and the at least one memory module circuit of the processor array. 3. The method of claim 1 , wherein the plurality of blocks of data includes a first matrix that is column major and the transposed form of the data includes a second matrix that is row major. 4. The method of claim 1 , wherein the plurality of blocks of data includes a first matrix that is row major and the transposed form of the data includes a second matrix that is column major. 5. A processor comprising: a control circuit configured to: receive a single transpose and load instruction specifying a plurality of blocks of data to be maintained in at least one memory module circuit in a transposed form that is suitable for processing; and responsive to receiving the single transpose and load instruction, fetch the data by storing the transposed form of the data in the at least one memory module circuit without maintaining an intermediate representation of the transposed form of the data; and a processor array configured to process the transposed form of the data. 6. The processor of claim 5 , wherein the processor array includes the at least one memory module circuit. 7. The processor of claim 6 , wherein the processor array includes at least one processor element corresponding to the at least one memory module circuit. 8. The processor of claim 5 , wherein the plurality of blocks of data includes a first matrix that is column major and the transposed form of the data includes a second matrix that is row major. 9. The processor of claim 5 , wherein the plurality of blocks of data includes a first matrix that is row major and the transposed form of the data includes a second matrix that is column major. 10. The processor of claim 5 , wherein the plurality of blocks of data includes training data for training a machine-learning model. 11. The processor of claim 10 , wherein the processor array is further configured to train the machine-learning model based on the transposed form of the data. 12. The processor of claim 11 , wherein the processor array is further configured to: process subsequent data using the trained machine-learning model; and output a result of processing the subsequent data. 13. The processor of claim 12 , wherein, in being configured to process the subsequent data, the processor array is configured to perform matrix multiplication. 14. A device comprising: a central processing unit configured to execute an application to issue a single transpose and load instruction specifying a plurality of blocks of data to be processed in a transposed form; a processor array configured to process the transposed form of the data from at least one memory module circuit of the processor array; and a control circuit configured to: receive the single transpose and load instruction from the central processing unit; and responsive to receiving the single transpose and load instruction, fetch the data by storing the transposed form of the data in the at least one memory module circuit without maintaining an intermediate representation of the transposed form of the data. 15. The device of claim 14 , wherein the plurality of blocks of data includes a first matrix that is column major and the transposed form of the data includes a second matrix that is row major, or the first matrix is row major and the second matrix is column major. 16. The device of claim 14 , wherein the control circuit includes fixed pattern remapping logic that transposes the data into the transposed form without maintaining the intermediate representation. 17. The device of claim 16 , wherein the fixed pattern remapping logic includes corner turn logic that transposes the data into the transposed form without maintaining the intermediate representation. 18. The device of claim 14 , wherein the control circuit transposes the data into the transposed form as part of the storing the transposed form of the data in the at least one memory module circuit without maintaining the intermediate representation. 19. The device of claim 14 , further comprising a data pool that maintains the plurality of blocks of the data, and the control circuit is configured to fetch the data from the data pool by storing the transposed form of the data in the at least one memory module circuit without maintaining an intermediate representation of the transposed form of the data. 20. The device of claim 19 , wherein the data pool is configured as external storage coupled to the processor array, and the at least one memory module circuit is configured as one or more registers of the processor array.

Assignees

Inventors

Classifications

  • single instruction multiple data [SIMD] multiprocessors · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Cooling means · CPC title

  • Register arrangements · CPC title

  • Arithmetic instructions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12229570B2 cover?
Block data load with transpose techniques are described. In one example, an input is received, at a control unit, specifying an instruction to load a block of data to at least one memory module using a transpose operation. Responsive to the receiving the input by the control unit, the block of data is caused to be loaded to the at least one memory module by transposing the block of data to form…
Who is the assignee on this patent?
Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/3887. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).