Who is the assignee on this patent?

Corbal Jesus, Forsyth Andrew T, Espasa Roger, and 3 more

What technology area does this patent fall under?

Primary CPC classification G06F9/3001. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 15 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Super multiply add (super madd) instruction

US9733935B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9733935-B2
Application number	US-201113976404-A
Country	US
Kind code	B2
Filing date	Dec 23, 2011
Priority date	Dec 23, 2011
Publication date	Aug 15, 2017
Grant date	Aug 15, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of processing an instruction is described that includes fetching and decoding the instruction. The instruction has separate destination address, first operand source address and second operand source address components. The first operand source address identifies a location of a first mask pattern in mask register space. The second operand source address identifies a location of a second mask pattern in the mask register space. The method further includes fetching the first mask pattern from the mask register space; fetching the second mask pattern from the mask register space; merging the first and second mask patterns into a merged mask pattern; and, storing the merged mask pattern at a storage location identified by the destination address.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a first register to store a first vector input operand; a second register to a store a second vector input operand; a third register to a store a third vector input operand; a fourth register to store a packed data structure containing a first scalar input operands and a second scalar input operand; a decoder to decode a single instruction, having a first field specifying the first register, a second field specifying the second register, a third field specifying the third register, and a fourth field specifying the fourth register, into a decoded single instruction; and an execution unit comprising a multiplier coupled to the first register, the second register, the third register, and the fourth register, the execution unit to execute the decoded single instruction to for each element position, multiply the first scalar input operand with an element of the first vector input operand to produce a first value, multiply the second scalar input operand with a corresponding element of the second vector input operand to produce a second value, and add the first value, the second value, and a corresponding element of the third vector input operand to produce a result, and store in parallel a result for each element position of the first vector input operand, the second vector input operand, and the third vector input operand into a corresponding element position of a resultant register. 2. The processor of claim 1 , wherein said multiplier has a first input to receive the first vector input operand, a second input to receive the first scalar input operand, a third input to receive the second vector input operand, and a fourth input to receive the second scalar input operand such that the first values and the second values are calculated substantially simultaneously. 3. The processor of claim 1 , wherein said execution unit includes microcode to loop through said multiplier twice, a first loop to calculate the first values and a second loop to calculate the second values. 4. The processor of claim 1 , wherein said single instruction separately identifies a sign for each one of said first values, second values, and elements of the third vector input operand. 5. The processor of claim 4 , wherein said signs are provided in an immediate operand of the single instruction. 6. The processor of claim 1 , wherein individual locations of the first scalar input operand and the second scalar input operand within said packed data structure are determined from information placed in an immediate operand of the single instruction. 7. The processor of claim 1 , wherein the execution unit is to execute the decoded single instruction to further apply a write mask to the resultant register, and an instruction format of the single instruction includes a field to indicate the write mask. 8. A method, comprising: storing a first vector input operand in a first register; storing a second vector input operand in a second register; storing a third vector input operand in a third register; storing a packed data structure containing a first scalar input operand and a second scalar input operand in a fourth register; decoding a single instruction, having a first field specifying the first register, a second field specifying the second register, a third field specifying the third register, and a fourth field specifying the fourth register, into a decoded single instruction with a decoder of a processor; and executing the decoded single instruction with an execution unit of the processor to, for each element position, multiply the first scalar input operand with an element of the first vector input operand to produce a first value, multiply the second scalar input operand with a corresponding element of the second vector input operand to produce a second value, and add the first value, the second value, and a corresponding element of the third vector input operand to produce a result, and store in parallel a result for each element position of the first vector input operand, the second vector input operand, and the third vector input operand into a corresponding element position of a resultant register. 9. The method of claim 8 , wherein the executing comprises calculating the first values and the second values substantially simultaneously. 10. The method of claim 8 , wherein the executing comprises calculating the first values in a first microcode loop and then calculating the second values in a second microcode loop. 11. The method of claim 8 , wherein the executing comprises applying a write mask to the resultant register, and an instruction format of the single instruction includes a field to indicate the write mask. 12. The method of claim 8 , wherein said single instruction provides in an immediate value information sufficient to individually extract each of the first scalar input operand and the second scalar input operand from said packed data structure. 13. The method of claim 8 , wherein said single instruction comprises an instruction format with a fifth field that specifies the resultant register. 14. The method of claim 8 , wherein the execution unit is to not loop through a multiplier a plurality of times when executing the single instruction. 15. A non-transitory machine readable medium that stores code that when executed by a machine causes the machine to perform a method comprising: storing a first vector input operand in a first register; storing a second vector input operand in a second register; storing a third vector input operand in a third register; storing a packed data structure containing a first scalar input operand and a second scalar input operand in a fourth register; decoding a single instruction, having a first field specifying the first register, a second field specifying the second register, a third field specifying the third register, and a fourth field specifying the fourth register, into a decoded single instruction with a decoder of a processor; and executing the decoded single instruction with an execution unit of the processor to, for each element position, multiply the first scalar input operand with an element of the first vector input operand to produce a first value, multiply the second scalar input operand with a corresponding element of the second vector input operand to produce a second value, and add the first value, the second value, and a corresponding element of the third vector input operand to produce a result, and store in parallel a result for each element position of the first vector input operand, the second vector input operand, and the third vector input operand into a corresponding element position of a resultant register. 16. The non-transitory machine readable medium of claim 15 , wherein the executing comprises calculating the first values and the second values substantially simultaneously. 17. The non-transitory machine readable medium of claim 15 , wherein the executing comprises calculating the first values in a first microcode loop and then calculating the second values in a second microcode loop. 18. The non-transitory machine readable medium of claim 15 , wherein the executing comprises applying a write mask to the resultant register, and an instruction format of the single instruction includes a field to indicate the write mask. 19. The non-transitory machine readable medium of claim 15 , wherein the single instruction provides in an immediate value information sufficient to individually extract each of the first scalar input operand and th

Assignees

Inventors

Classifications

G06F9/30018
Bit or string instructions · CPC title
G06F9/30145
Instruction analysis, e.g. decoding, instruction word fields · CPC title
G06F9/3001Primary
Arithmetic instructions · CPC title
G06F9/30101
Special purpose registers · CPC title
G06F9/30036
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

Patent family

Related publications grouped by family.

View patent family 48669254

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9733935B2 cover?: A method of processing an instruction is described that includes fetching and decoding the instruction. The instruction has separate destination address, first operand source address and second operand source address components. The first operand source address identifies a location of a first mask pattern in mask register space. The second operand source address identifies a location of a seco…
Who is the assignee on this patent?: Corbal Jesus, Forsyth Andrew T, Espasa Roger, and 3 more
What technology area does this patent fall under?: Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 15 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).