Outer product-based matrix-vector multiplication operation apparatus for accelerating vector operation and method using the same
US-2024362297-A1 · Oct 31, 2024 · US
US9513871B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9513871-B2 |
| Application number | US-201113977257-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 30, 2011 |
| Priority date | Dec 30, 2011 |
| Publication date | Dec 6, 2016 |
| Grant date | Dec 6, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
Opening claim text (preview).
What is claimed is: 1. A method performed by a processor comprising: receiving, at a decode unit of the processor, and decoding with the decode unit, a floating point round-off amount determination instruction of an instruction set of the processor, the floating point round-off amount determination instruction indicating a source of one or more floating point data elements, indicating a number of fraction bits after a radix point, and indicating a destination storage location which is a register of the processor; and storing, with an execution unit of the processor, a result including one or more result floating point data elements in the destination storage location which is the register of the processor in response to the floating point round-off amount determination instruction, each of the one or more result floating point data elements including a difference between a corresponding floating point data element of the source in a corresponding position and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. 2. The method of claim 1 , further comprising: determining a rounded version of a floating point data element of the source that has been rounded to the indicated number of the fraction bits; and subtracting the rounded version from the floating point data element of the source. 3. The method of claim 1 , wherein receiving comprises receiving the floating point round-off amount determination instruction that explicitly specifies the number of the fraction bits. 4. The method of claim 3 , wherein receiving comprises receiving the floating point round-off amount determination instruction that has an immediate that includes a plurality of bits to explicitly specify the number of the fraction bits. 5. The method of claim 1 , wherein receiving comprises receiving the floating point round-off amount determination instruction that includes a packed data operation mask specifier and a data element broadcast control. 6. The method of claim 1 , wherein receiving comprises receiving the floating point round-off amount determination instruction that indicates a packed data operation mask, and wherein storing the result comprises conditionally storing the one or more result floating point data elements, which each include the difference between the corresponding floating point data element of the source and the rounded version of the corresponding floating point data element of the source, according to the packed data operation mask. 7. The method of claim 1 , wherein receiving comprises receiving the floating point round-off amount determination instruction that indicates the source of a single floating point data element, and wherein storing comprises storing a result packed data including a plurality of packed result floating point data elements, each of the result floating point data elements including a difference between the single floating point data element of the source and a rounded version of the single floating point data element of the source that has been rounded to the indicated number of the fraction bits. 8. The method of claim 1 , wherein receiving comprises receiving the floating point round-off amount determination instruction that indicates the source of a plurality of packed floating point data elements, and wherein storing comprises storing the result including a corresponding plurality of packed result floating point data elements. 9. The method of claim 1 , wherein receiving comprises receiving the floating point round-off amount determination instruction that indicates the source of a single scalar floating point data element, and wherein storing comprises storing the result including a single corresponding result floating point data element. 10. The method of claim 1 , wherein receiving comprises receiving the floating point round-off amount determination instruction that indicates the source that includes one of: (1) at least eight double precision floating point data elements; and (2) at least sixteen single precision floating point data elements. 11. An apparatus comprising: a plurality of registers; a decode unit to decode a floating point round-off amount determination instruction of an instruction set, the floating point round-off amount determination instruction to indicate a source of one or more floating point data elements, to indicate a number of fraction bits after a radix point, and to indicate a destination; a floating point execution unit coupled with the decode unit and coupled with the plurality of the registers, the floating point execution unit operable, in response to the floating point round-off amount determination instruction, to store a result that is to include one or more result floating point data elements in the destination, each of the one or more result floating point data elements to include a difference between a corresponding floating point data element of the source in a corresponding position and a rounded version of the corresponding floating point data element of the source rounded to the indicated number of the fraction bits. 12. The apparatus of claim 11 , wherein the floating point round-off amount determination instruction is to explicitly specify the number of the fraction bits. 13. The apparatus of claim 12 , wherein the floating point round-off amount determination instruction is to have an immediate including a plurality of bits to explicitly specify the number of the fraction bits. 14. The apparatus of claim 11 , further comprising a packed data operation mask register, and wherein the floating point round-off amount determination instruction is to indicate the packed data operation mask register. 15. The apparatus of claim 11 , wherein the floating point round-off amount determination instruction is to indicate the source of a single floating point data element, and wherein the execution unit, in response to the instruction, is to broadcast the single floating point data element. 16. The apparatus of claim 11 , further comprising a packed data operation mask register, and wherein the floating point round-off amount determination instruction is to include a packed data operation mask specifier and a data element broadcast control. 17. The apparatus of claim 11 , wherein the execution unit, in response to the instruction, is to store a packed data result that is to include a plurality of packed result floating point data elements. 18. The apparatus of claim 11 , wherein the packed data result is to include one of at least eight double precision floating point data elements and at least sixteen single precision floating point data elements. 19. The apparatus of claim 11 , wherein the execution unit, in response to the instruction, is to store the result that is to include a single scalar floating point data element. 20. The apparatus of claim 11 , wherein the execution unit comprises a floating point multiply and add unit. 21. The apparatus of claim 11 , wherein the execution unit comprises circuitry. 22. The apparatus of claim 21 , wherein the destination comprises a register of the apparatus. 23. A system comprising: an interconnect; a processor coupled with the interconnect, the processor including a decode unit to decode a floating point round-off amount determination instruction of an instruction set that is to indicate a source of one or more floating point data elements, to indicate a number of frac
Rounding · CPC title
having multiple operands in a single register · CPC title
according to data content, e.g. floating-point registers, address registers · CPC title
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
Format conversion instructions, e.g. Floating-Point to Integer, decimal conversion · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.