Tensor compression
US-11461625-B1 · Oct 4, 2022 · US
US2024378058A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024378058-A1 |
| Application number | US-202418779177-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 22, 2024 |
| Priority date | Jul 15, 2013 |
| Publication date | Nov 14, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for two selected dimensions of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When either selected dimension in the stream of vectors exceeds a respective specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.
Opening claim text (preview).
1 . A circuit device comprising: a memory control circuit configured to couple to a memory; and a processor core that includes a register, wherein the processor core is configured to: store a set of values in the register that specify a size of a data portion of an array in a first dimension and a size of a null element portion of the array in the first dimension; and based on an instruction: cause the memory control circuit to receive a set of data elements of the data portion of the array from the memory; cause the memory control circuit to produce a set of null elements of the null element portion without accessing the memory; and cause the memory control circuit to provide the set of data elements and the set of null elements to the processor core. 2 . The circuit device of claim 1 , wherein the set of values includes a first value that specifies a combined size of the data portion and the null element portion in the first dimension. 3 . The circuit device of claim 2 , wherein the first value specifies the combined size in terms of a number of bytes. 4 . The circuit device of claim 2 , wherein the set of values includes a second value that specifies the size of the data portion in the first dimension such that the size of the null element portion is based on a difference between the first value and the second value. 5 . The circuit device of claim 4 , wherein the set of values includes a third value that specifies that the second value is in the first dimension. 6 . The circuit device of claim 1 , wherein the set of values specifies a size of the data portion in a second dimension and a size of the null element portion in the second dimension. 7 . The circuit device of claim 1 , wherein the set of values specifies a value for the set of null elements. 8 . The circuit device of claim 1 , wherein the instruction is a stream open instruction that specifies the register. 9 . The circuit device of claim 1 , wherein the memory control circuit is configured to provide the set of data elements and the set of null elements as a vector. 10 . The circuit device of claim 1 , wherein the memory control circuit includes: an interface configured to couple to the memory and configured to receive the set of data elements of the data portion of the array from the memory; a storage circuit coupled to the interface and configured to store the set of data elements of the data portion of the array from the memory; and a set of multiplexers coupled between the storage circuit and the processor core. 11 . The circuit device of claim 1 further comprising the memory. 12 . The circuit device of claim 1 wherein the memory is a cache memory. 13 . The circuit device of claim 1 wherein the memory is a level-two (L2) cache memory. 14 . A method comprising: storing in a register, a set of values that specify a size of a data portion of an array in a first dimension and a size of a null element portion of the array in the first dimension; receiving an instruction by a processor core; and based on the instruction: causing a memory control circuit to receive a portion of the data portion of the array from a memory; causing the memory control circuit to produce a portion of the null element portion of the array without accessing the memory; and causing the memory control circuit to provide the portion of the data portion and the portion of the null element portion to the processor core. 15 . The method of claim 14 , wherein the set of values includes: a first value that specifies a combined size of the data portion and the null element portion in the first dimension; and a second value that specifies the size of the data portion in the first dimension. 16 . The method of claim 15 , wherein the set of values includes a third value that specifies that the second value is in the first dimension. 17 . The method of claim 14 , wherein the set of values specifies a size of the data portion in a second dimension and a size of the null element portion in the second dimension. 18 . The method of claim 14 , wherein the set of values specifies a value for the set of null elements. 19 . The method of claim 14 , wherein the instruction is a stream open instruction that specifies the register. 20 . The method of claim 14 , wherein the portion of the data portion and the portion of the null element portion are provided to the processor core as a vector.
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
using a mask · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
of multiple operands or results {(addressing multiple banks G06F12/06)} · CPC title
Parallel decoding, e.g. parallel decode units · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.