Predication in a vector processor

US9575756B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9575756-B2
Application numberUS-201213569349-A
CountryUS
Kind codeB2
Filing dateAug 8, 2012
Priority dateAug 3, 2012
Publication dateFeb 21, 2017
Grant dateFeb 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for vector processor predication in an active memory device, the system comprising: memory in the active memory device; and a processing element in the active memory device, the processing element comprising a vector mask register, an arithmetic logic unit, and a load store unit, the processing element configured to perform a method comprising: setting one or more mask bits in the vector mask register in the processing element; applying the one or more mask bits by the processing element to predicate operation of the arithmetic logic unit or the load-store unit in the processing element associated with at least one of a plurality of sub-instructions; performing a compare of operands in the processing element using predication of a compare instruction to perform less than a maximum supported number of comparisons in parallel based on the one or more mask bits; storing compare results of the compare instruction as mask bit values of the vector mask register; analyzing a compare instruction syntax bit of the compare instruction to select between performing an OR-reduction and an AND-reduction on the mask bit values stored in response to performing less than the maximum supported number of comparisons in parallel by the predication of the compare instruction; reducing the mask bit values to a summary condition by performing a logical OR combination of the compare results based on determining that the OR-reduction is selected by the compare instruction syntax bit; reducing the mask bit values to the summary condition by performing a logical AND combination of the compare results based on determining that the AND-reduction is selected by the compare instruction syntax bit; writing the summary condition to a condition register; and using the summary condition of the condition register to determine a branch direction of a conditional branch instruction in the processing element. 2. The system of claim 1 , wherein applying the one or more mask bits by the processing element to predicate operation further comprises blocking one or more of: execution of at least one element of the sub-instructions and execution of at least one execution slot operating on a sub-element of at least one of the sub-instructions. 3. The system of claim 1 , wherein applying the one or more mask bits by the processing element to predicate operation further comprises blocking one or more of: a memory access sub-instruction and part of an arithmetic operation. 4. The system of claim 1 , wherein the processing element is further configured to perform: performing one or more of clock gating and data gating to one or more of: the arithmetic logic unit, the load-store unit, a vector computation register file, and a scalar computation register file based on the one or more mask bits. 5. The system of claim 1 , wherein the processing element is further configured to perform: populating mask bit values of the vector mask register from one or more of: the memory and the arithmetic logic unit; and performing logical operations by the processing element on the mask bit values to modify the mask bit values of the vector mask register. 6. The system of claim 1 , wherein performing the logical OR combination of the compare results further comprises including a current value of the condition register in the logical OR combination of the compare results, and performing the logical AND combination of the compare results further comprises including the current value of the condition register in the logical AND combination of the compare results. 7. A system for vector processor predication in an active memory device, the system comprising: memory in the active memory device, wherein the active memory device is a three-dimensional memory cube and the memory is divided into three-dimensional blocked regions as memory vaults; and a processing element in the active memory device, the processing element comprising a vector mask register, an arithmetic logic unit, and a load store unit, the processing element configured to perform a method comprising: fetching, in the processing element, an instruction from an instruction buffer in the processing element; decoding, in the processing element, the instruction comprising a plurality of sub-instructions to execute in parallel; setting one or more mask bits in the vector mask register in the processing element; applying the one or more mask bits by the processing element to predicate operation of the arithmetic logic unit or the load-store unit in the processing element associated with at least one of the sub-instructions; performing a compare of operands in the processing element using predication of a compare instruction to perform less than a maximum supported number of comparisons in parallel based on the one or more mask bits; storing compare results of the compare instruction as mask bit values of the vector mask register; analyzing a compare instruction syntax bit of the compare instruction to select between performing an OR-reduction and an AND-reduction on the mask bit values stored in response to performing less than the maximum supported number of comparisons in parallel by the predication of the compare instruction; reducing the mask bit values to a summary condition by performing a logical OR combination of the compare results based on determining that the OR-reduction is selected by the compare instruction syntax bit; reducing the mask bit values to the summary condition by performing a logical AND combination of the compare results based on determining that the AND-reduction is selected by the compare instruction syntax bit; writing the summary condition to a condition register; using the summary condition of the condition register to determine a branch direction of a conditional branch instruction in the processing element; and accessing the memory through one or more memory controllers in the active memory device for data operated upon by the instruction. 8. The system of claim 7 , wherein applying the one or more mask bits by the processing element to predicate operation further comprises blocking one or more of: execution of at least one element of the sub-instructions and execution of at least one execution slot operating on a sub-element of at least one of the sub-instructions. 9. The system of claim 7 , wherein applying the one or more mask bits by the processing element to predicate operation further comprises blocking one or more of: a memory access sub-instruction to prevent an access of the memory, and part of an arithmetic operation. 10. The system of claim 7 , wherein the vector mask register is comprised of a plurality of vector mask entries, each comprising a plurality of elements of the mask bits, forming two-dimensional vector masks in the vector mask register, and further comprising: generating multiple mask bits per cycle per element based on single instruction, multiple data-in-space compare operations to form the two-dimensional vector masks in the vector mask register; and using the two-dimensional vector masks with two-dimensional vector data, the two-dimensional vector masks corresponding to data sub-elements in the two-dimensional vector data to predicate. 11. The system of claim 7 , wherein the processing element is further configured to perform: performing one or more of clock gating and data gating to one or more of: the arithmetic logic unit, the load-store unit, a vector computation register file, and a scalar computation register file based on the one or more mask bits. 12. The system of claim 7 , wherein the processing element is further configured to perform: populating mask

Assignees

Inventors

Classifications

  • Bit or string instructions · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Group selection circuits, e.g. for memory block selection, chip selection, array selection · CPC title

  • for non-native instruction execution, e.g. executing a command; for Java instruction set · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9575756B2 cover?
Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instr…
Who is the assignee on this patent?
Fleischer Bruce M, Fox Thomas W, Jacobson Hans M, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).