Neural network optimization device for edge device meeting on-demand instruction and method using the same
US-2024386276-A1 · Nov 21, 2024 · US
US9870338B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9870338-B2 |
| Application number | US-201113992209-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 23, 2011 |
| Priority date | Dec 23, 2011 |
| Publication date | Jan 16, 2018 |
| Grant date | Jan 16, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed compression and repeat in response to a single vector packed compression and repeat instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described.
Opening claim text (preview).
What is claimed is: 1. A method comprising: decoding an instruction; executing the decoded instruction to determine, for each packed data element position of a first source vector register, a number of times that packed data element position's packed data element is to be stored in a destination vector register based solely on the value of a corresponding packed data element position of a second source vector register, wherein the number of times is up to a plurality of times and store each packed data element of a packed data element position of the first source vector register into the destination vector register the value number of times based on the determination of the corresponding data element of the second source vector register's value. 2. The method of claim 1 , wherein the storing begins at the least significant packed data element position of the destination vector register and the packed data elements are stored in consecutive packed data element positions of the destination vector register. 3. The method of claim 1 , wherein the executing and storing steps further comprise: determining a value of a least significant packed data element position of the second source vector register; determining if that value is greater than 0; if the value is greater than 0, storing a corresponding packed data element position of the first source vector register's packed data element value number of times, wherein these packed data elements are stored in consecutively beginning at a least significant packed data element position of the destination vector register; and if the value is 0, determining a value of a next least significant packed data element position of the second source vector register; if the value of the next least significant data element position is greater than 0, storing a corresponding packed data element position of the first source vector register's packed data element value number of times, wherein these packed data elements are stored in consecutively beginning at a least significant packed data element position of the destination vector register that has not been previously written to. 4. The method of claim 3 , further comprising: repeating the determining and storing steps until all of the packed data element positions of the second source vector register's values have been evaluated. 5. The method of claim 4 , further comprising: writing a preset value into all unused packed data element positions of the destination vector register after all of the packed data element positions of the first source vector register have been written into the destination vector register. 6. The method of claim 5 , wherein the preset value is a value of all 1s. 7. The method of claim 1 , providing a programmer visible exception when all of packed data element positions of the destination vector register have been written to, but there are still packed data elements from the first source vector register that are to be written to the destination vector register. 8. The method of claim 1 , wherein the vector registers are all a same size of 128-bit, 256-bit, or 512-bit. 9. An article of manufacture comprising: a non-transitory machine-readable storage medium having stored thereon an occurrence of an instruction, wherein the instruction's format specifies as its source operands a first and second vector register and specifies as its destination a single vector register, and wherein the instruction format includes an opcode which instructs a machine, responsive to the single occurrence of the single instruction, to cause a determination, for each packed data element position of the first source vector register, a number of times that packed data element position's packed data element is to be stored in the destination vector register based solely on the value of a corresponding packed data element position of the second source vector register, storage of each packed data element of a packed data element position of the first source vector register into the destination vector register the value number of times based on the determination of the corresponding data element of the second source vector register's value, wherein the number of times is up to a plurality of times. 10. The article of manufacture of claim 9 , wherein the storing begins at the least significant packed data element position of the destination vector register and the packed data elements are stored in consecutive packed data element positions of the destination vector register. 11. The article of manufacture of claim 9 , further to cause: a determination of a value of a least significant packed data element position of the second source vector register; a determination of if that value is greater than 0; if the value is greater than 0, storage of a corresponding packed data element position of the first source vector register's packed data element value number of times, wherein these packed data elements are stored in consecutively beginning at a least significant packed data element position of the destination vector register; and if the value is 0, a determination of a value of a next least significant packed data element position of the second source vector register; if the value of the next least significant data element position is greater than 0, storage of a corresponding packed data element position of the first source vector register's packed data element value number of times, wherein these packed data elements are stored in consecutively beginning at a least significant packed data element position of the destination vector register that has not been previously written to. 12. The article of manufacture of claim 9 , further to: repeat until all of the packed data element position of the second source vector register's values have been evaluated. 13. The article of manufacture of claim 9 , further to: write a preset value into all unused packed data element positions of the destination vector register after all of the packed data element positions of the first source vector register have been written into the destination vector register. 14. The article of manufacture of claim 9 , wherein the preset value is a value of all 1s. 15. The article of manufacture of claim 9 , wherein the vector registers are all a same size of 128-bit, 256-bit, or 512-bit. 16. An apparatus comprising: a hardware decoder to decode a single instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode; execution circuitry to execute the decoded single instruction to determine, for each packed data element position of the first source vector register, a number of times that packed data element position's packed data element is to be stored in the destination vector register based solely on the value of a corresponding packed data element position of the second source vector register and store each packed data element of a packed data element position of the first source vector register into the destination vector register the value number of times based on the determination of the corresponding data element of the second source vector register's value, wherein the number of times is up to a plurality of times. 17. The apparatus of claim 16 , wherein the storage begins at the least significant packed data element position of the destination vector register and the packed data elements are stored in consecutive packed data element positions of the destination vector register. 18. The apparatus of claim 17 , wherein the exec
comprising a single central processing unit · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Loop control instructions; iterative instructions, e.g. LOOP, REPEAT · CPC title
using a mask · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.