Streaming engine with separately selectable element and group duplication
US-11860790-B2 · Jan 2, 2024 · US
US2018321937A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2018321937-A1 |
| Application number | US-201715586032-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 3, 2017 |
| Priority date | May 3, 2017 |
| Publication date | Nov 8, 2018 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed embodiments relate to instructions for dual-destination type conversion, accumulation, and atomic memory operations. In one example, a system includes a memory, a processor including: a fetch circuit to fetch the instruction from a code storage, the instruction including an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register including a plurality of single precision floating point data elements, a decode circuit to decode the fetched instruction, and an execution circuit to execute the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location.
Opening claim text (preview).
What is claimed is: 1 . A system used to execute an instruction, the system comprising: a memory; a processor comprising: a fetch circuit to fetch the instruction from a code storage, the instruction comprising an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register comprising a plurality of single precision floating point data elements; a decode circuit to decode the fetched instruction; and an execution circuit to execute the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location. 2 . The system of claim 1 , wherein the instruction further comprises a second destination identifier, and wherein the second location is identified by the second destination identifier. 3 . The system of claim 1 , wherein the second location is the source vector register. 4 . The system of claim 3 , wherein the source vector register, the first destination vector register, and the second destination vector register are 512-bit vector registers. 5 . The system of claim 1 , wherein the execution circuit is further to add each vector element of the first half of the double precision floating point values to data previously stored in the first location and to store a first sum to the first location, and to add each vector element of the second half of the double precision floating point values to data previously stored in the second location and to store a second sum to the second location. 6 . The system of claim 1 , wherein the locations identified by the first destination identifier and the second destination identifier are in the memory. 7 . The system of claim 6 , wherein the execution circuit is further to: perform a first atomic read-modify-write to read first data stored in the first location, add the first half of the double precision floating point values to the first data, and store double precision floating point sums to the first location; and perform a second atomic read-modify-write to read second data stored in the second location, add the second half of the double precision floating point values to the second data, and store double precision floating point sums to the second location. 8 . The system of claim 1 , wherein the execution circuit is to convert all elements of the source vector register in parallel. 9 . The system of claim 1 , wherein the opcode is to specify that only a lower half of the source vector register is to be converted and stored to the first location. 10 . The system of claim 1 , wherein the opcode is to specify that only an upper half of the source vector register is to be converted and stored to the first location. 11 . A method of executing an instruction, the method comprising: fetching the instruction from a code storage, the instruction comprising an opcode, a first destination identifier, and a source identifier to specify a source vector register comprising a plurality of single precision floating point data elements; decoding the fetched instruction by a decode circuit; and executing, by an execution circuit, the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision values to a second location. 12 . The method of claim 11 , wherein the instruction further comprises a second destination identifier, and wherein the second location is identified by the second destination identifier. 13 . The method of claim 11 , wherein the second location is the source vector register. 14 . The method of claim 13 , wherein the source vector register, the first destination vector register, and the second destination vector register are 512-bit vector registers. 15 . The method of claim 11 , further comprising: adding, by the execution circuit, each of the first half of the double precision floating point values to data previously stored in the first location, and adding each of the second half of the double precision floating point values to data previously stored in the second location. 16 . The method of claim 11 , wherein the locations identified by the first destination identifier and the second destination identifier are in the memory. 17 . The method of claim 16 , further comprising accumulating results in the first location and the second location by: performing a first atomic read-modify-write to read first data stored in the first location, add the first half of the double precision floating point values to the first data, and store double precision floating point results to the first location; and performing a second atomic read-modify-write to read second data stored in the second location, add the second half of the double precision floating point values to the second data, and store double precision floating point results to the second location. 18 . An apparatus for executing an instruction, the apparatus comprising: means for fetching an instruction, the means for fetching to fetch the instruction from a code storage, the instruction comprising an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register comprising a plurality of single precision floating point data elements; means for decoding to decode the fetched instruction; and means for executing the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location. 19 . The apparatus of claim 18 , wherein the instruction further comprises a second destination identifier, and wherein the second location is identified by the second destination identifier. 20 . The apparatus of claim 18 , wherein the second location is the source vector register.
with variable precision · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Instruction prefetching · CPC title
Decoding the operand specifier, e.g. specifier format · CPC title
using a mask · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.