Two-dimensional zero padding in a stream of matrix elements

US12045617B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12045617-B2
Application numberUS-202217670611-A
CountryUS
Kind codeB2
Filing dateFeb 14, 2022
Priority dateJul 15, 2013
Publication dateJul 23, 2024
Grant dateJul 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for two selected dimensions of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When either selected dimension in the stream of vectors exceeds a respective specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving a set of parameters that defines: a first size of an array in a first dimension; a second size of the array in the first dimension; a third size of the array in a second dimension; and a fourth size of the array in the second dimension; until the third size is met: retrieving, by a circuit, from a memory, a respective first set of data along the first dimension, wherein the respective first set of data has the first size, wherein the circuit includes a set of multiplexers; determining whether to align the respective first set of data using the set of multiplexers; and appending, using the set of multiplexers, a respective first set of null elements to the respective first set of data without accessing the memory to produce a respective second set of data having the second size; and after the third size is met and until the fourth size is met: producing, using the set of multiplexers, a respective second set of null elements having the second size without accessing the memory; and providing, by the circuit, the array having the second size in the first dimension and the fourth size in the second dimension including the respective second sets of data and the respective second sets of null elements. 2. The method of claim 1 , wherein the set of parameters specifies a null value for the respective first sets of null elements and the respective second sets of null elements. 3. The method of claim 1 , wherein each null value of the respective first sets of null elements and the respective second sets of null elements has a value of zero. 4. The method of claim 1 , wherein: the retrieving of the respective first set of data includes: generating a first set of addresses; translating the first set of addresses using a table look-aside buffer to generate a second set of addresses; and retrieving the respective first set of data from the memory using the second set of addresses; and the appending of the respective first set of null elements to the respective first set of data does not include an access of the table look-aside buffer. 5. The method of claim 4 , wherein the producing of the respective second set of null elements does not include an access of the table look-aside buffer. 6. The method of claim 1 , wherein the providing of the array includes providing a stream of vectors corresponding to the array. 7. The method of claim 6 , wherein the set of parameters specifies an element size of the array, a number of elements in each vector of the stream of vectors, and a number of vectors in the stream of vectors. 8. The method of claim 1 , wherein the receiving of the set of parameters, the retrieving of the respective first sets of data, the appending, the producing of the respective second sets of null data, and the providing of the array are performed in response to a stream open command that specifies a register storing the set of parameters from among a set of registers. 9. The method of claim 1 comprising generating the respective first set of null elements by: maintaining a count based on the first size of the array in the first dimension; and generating a null element based on the count reaching zero. 10. The method of claim 1 comprising generating the respective second set of null elements by: maintaining a count based on the third size of the array in the second dimension; and generating a null element based on the count reaching zero. 11. The method of claim 1 , wherein: the memory is an L2 cache memory; and the retrieving of the respective first sets of data retrieves data from the L2 cache memory by bypassing an L1 cache memory coupled to the L2 cache memory. 12. A device comprising: a register configured to store a set of parameters that defines: a first size of an array in a first dimension; a second size of the array in the first dimension; a third size of the array in a second dimension; and a fourth size of the array in the second dimension; a memory; and a circuit coupled to the register and the memory that includes an alignment network, wherein the circuit is configured to: until the third size is met: retrieve, from the memory, a respective first set of data along the first dimension, wherein the respective first set of data has the first size; determine whether to utilize the alignment network to reorder the respective first set of data; and append a respective first set of null elements to the respective first set of data without accessing the memory to produce a respective second set of data having the second size; and after the third size is met and until the fourth size is met: utilize the alignment network to produce a respective second set of null elements having the second size without accessing the memory; and provide the array, such that the array has the second size in the first dimension and the fourth size in the second dimension and includes the respective second sets of data and the respective second sets of null elements. 13. The device of claim 12 , wherein the set of parameters specifies a null value for the respective first sets of null elements and the respective second sets of null elements. 14. The device of claim 12 , wherein each null value of the respective first sets of null elements and the respective second sets of null elements has a value of zero. 15. The device of claim 12 , wherein: the circuit includes a table look-aside buffer; the circuit is configured to retrieve the respective first set of data by: generating a first set of addresses; translating the first set of addresses using the table look-aside buffer to generate a second set of addresses; and retrieving the respective first set of data from the memory using the second set of addresses; and the circuit is configured to append the respective first set of null elements to the respective first set of data without an access of the table look-aside buffer. 16. The device of claim 15 , wherein the circuit is configured to produce the respective second set of null elements without an access of the table look-aside buffer. 17. The device of claim 12 , wherein the circuit is configured to provide the array as a stream of vectors. 18. The device of claim 17 , wherein the set of parameters specifies an element size of the array, a number of elements in each vector of the stream of vectors, and a number of vectors in the stream of vectors. 19. The device of claim 12 , wherein the circuit is configured to generate the respective first set of null elements by: maintaining a count based on the first size of the array in the first dimension; and generating a null element based on the count reaching zero. 20. A device comprising: a processor core comprising a register configured to store a set of parameters that defines: a first size of an array in a first dimension; a second size of the array in the first dimension; a third size of the array in a second dimension; and a fourth size of the array in the second dimension; a level one (L1) cache memory coupled to the processor core; a level two (L2) cache memory coupled to the L1 cache memory; and a circuit coupled between the processor core and the L2 cache memory in parallel with the L1 cache memory, wherein the circuit is configured to: receive the set of parameters from the processor core; until the third size is met: retrieve, from L2 cache memory, a respective first set of data along the first dimension, wherein the res

Assignees

Inventors

Classifications

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • using a mask · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • of multiple operands or results {(addressing multiple banks G06F12/06)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12045617B2 cover?
Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for two selected dimensions of the array. Data is fetched from a memory coupled to the streaming engine responsive to the str…
Who is the assignee on this patent?
Texas Instruments Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).