Multiply add functional unit capable of executing SCALE, ROUND, GETEXP, ROUND, GETMANT, REDUCE, RANGE and CLASS instructions
US-9606770-B2 · Mar 28, 2017 · US
US10073695B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10073695-B2 |
| Application number | US-201615366320-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 1, 2016 |
| Priority date | Dec 30, 2011 |
| Publication date | Sep 11, 2018 |
| Grant date | Sep 11, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a plurality of packed data registers; a decode unit to decode an instruction of an instruction set, the instruction to indicate a first storage location that is to store one or more floating point data elements, to indicate a number of fraction bits after a radix point, and to indicate a destination packed data register of the plurality of packed data registers; and a floating point execution unit coupled with the decode unit and coupled with the plurality of the packed data registers, the floating point execution unit operable, in response to the decode of the instruction, to store a result that is to include one or more result floating point data elements in the destination packed data register, each of the one or more result floating point data elements to include a difference between a corresponding floating point data element of the first storage location in a corresponding position and a rounded version of the corresponding floating point data element of the first storage location rounded to the indicated number of the fraction bits. 2. The apparatus of claim 1 , wherein the instruction is to explicitly specify the number of the fraction bits. 3. The apparatus of claim 2 , wherein the instruction is to have an immediate including a plurality of bits to explicitly specify the number of the fraction bits. 4. The apparatus of claim 1 , further comprising a packed data operation mask register, and wherein the instruction is to indicate the packed data operation mask register. 5. The apparatus of claim 1 , wherein the instruction is to indicate the first storage location that is to store a single floating point data element, and wherein the execution unit, in response to the instruction, is to broadcast the single floating point data element. 6. The apparatus of claim 1 , further comprising a packed data operation mask register, and wherein the instruction is to include a packed data operation mask specifier and a data element broadcast control. 7. The apparatus of claim 1 , wherein the execution unit, in response to the instruction, is to store a packed data result that is to include a plurality of packed result floating point data elements. 8. The apparatus of claim 7 , wherein the packed data result is to include one of at least eight double precision floating point data elements and at least sixteen single precision floating point data elements. 9. The apparatus of claim 1 , wherein the execution unit, in response to the instruction, is to store the result that is to include a single scalar floating point data element. 10. The apparatus of claim 1 , wherein the execution unit comprises circuitry. 11. A processor comprising: a plurality of registers; a fetch unit to fetch an instruction of an instruction set of the processor, the instruction to indicate a source of one or more floating point data elements, to indicate a number of fraction bits after a radix point, and to indicate a destination register of the processor; and a floating point execution unit coupled with the fetch unit and coupled with the plurality of the registers, the floating point execution unit including at least some circuitry, the floating point execution unit to perform the instruction to store a result that is to include one or more result floating point data elements in the destination register of the processor, each of the one or more result floating point data elements to include a difference between a corresponding floating point data element of the source in a corresponding position and a rounded version of the corresponding floating point data element of the source rounded to the indicated number of the fraction bits. 12. The processor of claim 11 , wherein the execution unit, in response to the instruction, is to store a packed data result that is to include a plurality of packed result floating point data elements. 13. The processor of claim 12 , further comprising a packed data operation mask register, and wherein the instruction is to indicate the packed data operation mask register. 14. The processor of claim 13 , wherein the instruction is to indicate the source of a single floating point data element, and wherein the execution unit, in response to the instruction, is to broadcast the single floating point data element. 15. An apparatus comprising: a plurality of packed data registers; a decode unit to decode an instruction of an instruction set, the instruction to indicate a 512-bit source packed data that is to have sixteen single precision floating point data elements, to have an immediate that has a plurality of bits to indicate a number of fraction bits after a radix point, to indicate a source packed data operation mask, and to indicate a 512-bit destination packed data register; and a floating point execution unit coupled with the decode unit and coupled with the plurality of the packed data registers, the floating point execution unit operable, in response to the decode of the instruction, to store a 512-bit result packed data that is to include sixteen result single precision floating point data elements in the 512-bit destination packed data register, each of the sixteen result single precision floating point data elements that corresponds to an unmasked mask element of the source packed data operation mask to include a difference between a corresponding single precision floating point data element of the 512-bit source packed data in a corresponding position and a rounded version of the corresponding single precision floating point data element of the 512-bit source packed data rounded to the indicated number of the fraction bits. 16. The apparatus of claim 15 , wherein the immediate is an 8-bit immediate, wherein the plurality of bits of the immediate to indicate the number of fraction bits are bits [7:4] of the 8-bit immediate. 17. A processor comprising: a plurality of packed data registers; a decode unit to decode an instruction of an instruction set, the instruction to indicate a 256-bit source packed data that is to have four double precision floating point data elements, to have an immediate that has a plurality of bits to indicate a number of fraction bits after a radix point, to indicate a source packed data operation mask, and to indicate a 256-bit destination packed data register; and a floating point execution unit coupled with the decode unit, and coupled with the plurality of the packed data registers, the floating point execution unit operable, in response to the decode of the instruction, to store a 256-bit result packed data that is to include four result double precision floating point data elements in the 256-bit destination packed data register, each of the four result double precision floating point data elements that corresponds to an unmasked mask element of the source packed data operation mask to include a difference between a corresponding double precision floating point data element of the 256-bit source packed data in a corresponding position and a rounded version of the corresponding double precision floating point data element of the 256-bit source packed data rounded to the indicated number of the fraction bits. 18. The processor of claim 17 , wherein the immediate is an 8-bit immediate, wherein the plurality of bits of the immediate to indicate the number of fraction bits are bits [7:4] of the 8-bit immediate, wherein bit [2] of the immediate is to provide floating point rounding control, and wherein bits [1:0] of the immediate are to provide a floating point rounding mode specifie
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
according to data content, e.g. floating-point registers, address registers · CPC title
Rounding · CPC title
Arithmetic instructions · CPC title
with variable precision · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.