Instructions for dual destination type conversion, mixed precision accumulation, and mixed precision atomic memory operations

US2018321937A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018321937-A1
Application numberUS-201715586032-A
CountryUS
Kind codeA1
Filing dateMay 3, 2017
Priority dateMay 3, 2017
Publication dateNov 8, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed embodiments relate to instructions for dual-destination type conversion, accumulation, and atomic memory operations. In one example, a system includes a memory, a processor including: a fetch circuit to fetch the instruction from a code storage, the instruction including an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register including a plurality of single precision floating point data elements, a decode circuit to decode the fetched instruction, and an execution circuit to execute the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system used to execute an instruction, the system comprising: a memory; a processor comprising: a fetch circuit to fetch the instruction from a code storage, the instruction comprising an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register comprising a plurality of single precision floating point data elements; a decode circuit to decode the fetched instruction; and an execution circuit to execute the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location. 2 . The system of claim 1 , wherein the instruction further comprises a second destination identifier, and wherein the second location is identified by the second destination identifier. 3 . The system of claim 1 , wherein the second location is the source vector register. 4 . The system of claim 3 , wherein the source vector register, the first destination vector register, and the second destination vector register are 512-bit vector registers. 5 . The system of claim 1 , wherein the execution circuit is further to add each vector element of the first half of the double precision floating point values to data previously stored in the first location and to store a first sum to the first location, and to add each vector element of the second half of the double precision floating point values to data previously stored in the second location and to store a second sum to the second location. 6 . The system of claim 1 , wherein the locations identified by the first destination identifier and the second destination identifier are in the memory. 7 . The system of claim 6 , wherein the execution circuit is further to: perform a first atomic read-modify-write to read first data stored in the first location, add the first half of the double precision floating point values to the first data, and store double precision floating point sums to the first location; and perform a second atomic read-modify-write to read second data stored in the second location, add the second half of the double precision floating point values to the second data, and store double precision floating point sums to the second location. 8 . The system of claim 1 , wherein the execution circuit is to convert all elements of the source vector register in parallel. 9 . The system of claim 1 , wherein the opcode is to specify that only a lower half of the source vector register is to be converted and stored to the first location. 10 . The system of claim 1 , wherein the opcode is to specify that only an upper half of the source vector register is to be converted and stored to the first location. 11 . A method of executing an instruction, the method comprising: fetching the instruction from a code storage, the instruction comprising an opcode, a first destination identifier, and a source identifier to specify a source vector register comprising a plurality of single precision floating point data elements; decoding the fetched instruction by a decode circuit; and executing, by an execution circuit, the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision values to a second location. 12 . The method of claim 11 , wherein the instruction further comprises a second destination identifier, and wherein the second location is identified by the second destination identifier. 13 . The method of claim 11 , wherein the second location is the source vector register. 14 . The method of claim 13 , wherein the source vector register, the first destination vector register, and the second destination vector register are 512-bit vector registers. 15 . The method of claim 11 , further comprising: adding, by the execution circuit, each of the first half of the double precision floating point values to data previously stored in the first location, and adding each of the second half of the double precision floating point values to data previously stored in the second location. 16 . The method of claim 11 , wherein the locations identified by the first destination identifier and the second destination identifier are in the memory. 17 . The method of claim 16 , further comprising accumulating results in the first location and the second location by: performing a first atomic read-modify-write to read first data stored in the first location, add the first half of the double precision floating point values to the first data, and store double precision floating point results to the first location; and performing a second atomic read-modify-write to read second data stored in the second location, add the second half of the double precision floating point values to the second data, and store double precision floating point results to the second location. 18 . An apparatus for executing an instruction, the apparatus comprising: means for fetching an instruction, the means for fetching to fetch the instruction from a code storage, the instruction comprising an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register comprising a plurality of single precision floating point data elements; means for decoding to decode the fetched instruction; and means for executing the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location. 19 . The apparatus of claim 18 , wherein the instruction further comprises a second destination identifier, and wherein the second location is identified by the second destination identifier. 20 . The apparatus of claim 18 , wherein the second location is the source vector register.

Assignees

Inventors

Classifications

  • with variable precision · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Instruction prefetching · CPC title

  • Decoding the operand specifier, e.g. specifier format · CPC title

  • using a mask · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018321937A1 cover?
Disclosed embodiments relate to instructions for dual-destination type conversion, accumulation, and atomic memory operations. In one example, a system includes a memory, a processor including: a fetch circuit to fetch the instruction from a code storage, the instruction including an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30014. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 08 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).