What technology area does this patent fall under?

Primary CPC classification G06F9/3001. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Systems, apparatuses, and methods for chained fused multiply add

US10146535B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10146535-B2
Application number	US-201615299420-A
Country	US
Kind code	B2
Filing date	Oct 20, 2016
Priority date	Oct 20, 2016
Publication date	Dec 4, 2018
Grant date	Dec 4, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value; a register file having a plurality of packed data registers including registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand; and execution circuitry to execute the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations. 2. The apparatus of claim 1 , wherein the instruction to define the first size and the second size, and the second size is a multiple of two of the first size. 3. The apparatus of claim 1 , wherein the execution circuitry to perform the packed fused multiply accumulate operations by, for each packed data element group of the sources of the first type, multiply each packed data element of the packed data element group of the sources of the first type by a corresponding packed data sub-element from the scalar value, wherein the scalar value corresponds to packed data position of the scalar for that source of the first type, add each of the multiplications of the packed data element group of the sources of the first type to generate an iteration result, add to the iteration result an iteration result of the previous iteration or an initial value, and store a final result into a packed data element position of the destination that corresponds to the location of the scalar packed data element, wherein the final result is a summation of all of the iteration results and the initial value. 4. The apparatus of claim 1 , wherein when the first size is half of the second size, a first addition is performed on each of the multiplications and a second addition is performed on a result of the first addition and a result from a previous iteration. 5. The apparatus of claim 1 , wherein when the first size is half of the second size, a single addition and saturation check is performed on each of the multiplications a result from a previous iteration. 6. The apparatus of claim 1 , wherein when the first size is a quarter of the second size, a first addition is performed on each of the multiplications and a second addition is performed on a result of the first addition and a result from a previous iteration. 7. The apparatus of claim 1 , wherein when the first size is a quarter of the second size, a single addition and saturation check is performed on each of the multiplications a result from a previous iteration. 8. A method comprising: hardware decoding a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value; and executing the decoded single instruction with execution circuitry to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations. 9. The method of claim 8 , wherein the instruction to define the first size and the second size, and the second size is a multiple of two of the first size. 10. The method of claim 9 , wherein the execution circuitry to perform the packed fused multiply accumulate operations by, for each packed data element group of the sources of the first type, multiply each packed data element of the packed data element group of the sources of the first type by a corresponding packed data sub-element from the scalar value, wherein the scalar value corresponds to packed data position of the scalar for that source of the first type, add each of the multiplications of the packed data element group of the sources of the first type to generate an iteration result, add to the iteration result an iteration result of the previous iteration or an initial value, and store a final result into a packed data element position of the destination that corresponds to the location of the scalar packed data element, wherein the final result is a summation of all of the iteration results and the initial value. 11. The method of claim 9 , wherein when the first size is half of the second size, a first addition is performed on each of the multiplications and a second addition is performed on a result of the first addition and a result from a previous iteration. 12. The method of claim 9 , wherein when the first size is half of the second size, a single addition and saturation check is performed on each of the multiplications a result from a previous iteration. 13. The method of claim 9 , wherein when the first size is a quarter of the second size, a first addition is performed on each of the multiplications and a second addition is performed on a result of the first addition and a result from a previous iteration. 14. The method of claim 9 , wherein when the first size is a quarter of the second size, a single addition and saturation check is performed on each of the multiplications a result from a previous iteration. 15. A non-transitory machine-readable medium storing an instruction which when executed by a processor causes the processor to perform a method, the method comprising: hardware decoding a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value; and executing the decoded single instruction with execution circuitry to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations. 16. The non-transitory machine-readable medium of claim 15 , wherein the instruction to define the first size and the second size, and the second size is a multiple of two of the first size. 17. The non-transitory machine-readable medium of claim 15 , wherein the execution circuitry to perform the packed fused multiply accumulate operations by, for each packed data element group of the sources of the first type, multiply each packed data element of the packed data element group of the sources of the first type by a corresponding packed data sub-element from the scalar

Assignees

Inventors

Classifications

G06F9/30112
comprising data of variable length · CPC title
G06F7/5443
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
G06F9/3016
Decoding the operand specifier, e.g. specifier format · CPC title
G06F9/3001Primary
Arithmetic instructions · CPC title
G06F9/30109
having multiple operands in a single register · CPC title

Patent family

Related publications grouped by family.

View patent family 61970281

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10146535B2 cover?: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source f…
Who is the assignee on this patent?: Intel Corp, Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).