System and method of loop vectorization by compressing indexes and data elements from iterations based on a control mask

US9740493B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9740493-B2
Application numberUS-201213994549-A
CountryUS
Kind codeB2
Filing dateSep 28, 2012
Priority dateSep 28, 2012
Publication dateAug 22, 2017
Grant dateAug 22, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: generating a first control mask for a first set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes: setting a bit of the first control mask to a first value when the condition indicates that an operation of the loop is to be executed; and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed; compressing, by executing an instruction with a processor, indexes corresponding to the first set of iterations of the loop according to the first control mask by: maintaining a first one of the indexes when a first bit of the first control mask associated with the first one of the indexes has the first value; and discarding, before the operation is executed, a second one of the indexes when a second bit of the first control mask associated with the second one of the indexes has the second value; populating an array with the compressed indexes, wherein the operation of the loop is to be performed on data elements corresponding to the compressed indexes in the array; and shifting indexes at higher order positions of the array to lower order positions of the array when the operation of the loop has been performed on data elements corresponding to the indexes at lower order positions of the array. 2. The method of claim 1 , further including compressing data elements corresponding to the indexes, wherein the indexes indicate at which memory locations results of the operation are to be stored for respective ones of the data elements. 3. The method of claim 2 , further including: loading the data elements into a first register; and loading the indexes corresponding to the first set of iterations of the loop into a second register. 4. The method of claim 1 , further including when a number of indexes that have been entered into the array meets a threshold, loading data elements corresponding to an amount of the indexes in the array into a register on which the operation is to be performed. 5. The method of claim 4 , further including, when the number of indexes that have been entered into the array does not meet the threshold, evaluating the condition for a second set of iterations of the loop without performing the operation on the data elements corresponding to the amount of the indexes in the array. 6. At least one tangible machine readable storage medium comprising instructions that, when executed, cause a machine to at least: generate a first control mask for a first set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes: setting a bit of the first control mask to a first value when the condition indicates that an operation of the loop is to be executed; and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed; compress indexes corresponding to the first set of iterations of the loop according to the first control mask by: maintaining a first one of the indexes when a first bit of the first control mask associated with the first one of the indexes has the first value; and discarding, before the operation is executed, a second one of the indexes when a second bit of the first control mask associated with the second one of the indexes has the second value; populate an array with the compressed indexes, wherein the operation of the loop is to be performed on data elements corresponding to the compressed indexes in the array; and shift indexes at higher order positions of the array to lower order positions of the array when the operation of the loop has been performed on data elements corresponding to the indexes at lower order positions of the array. 7. At least one tangible machine readable storage medium as defined in claim 6 , wherein the instructions, when executed, cause the machine to: load data elements into a first register; and load the indexes corresponding to the first set of iterations of the loop into a second register. 8. At least one tangible machine readable storage medium as defined in claim 6 , wherein the instructions, when executed, cause the machine to compress data elements corresponding to the indexes, wherein the indexes indicate at which memory locations results of the operation are to be stored for respective ones of the data elements. 9. At least one tangible machine readable storage medium as defined in claim 6 , wherein the instructions, when executed, cause the machine to, when a number of indexes that have been entered into the array meets a threshold, load data elements corresponding to an amount of the indexes in the array into a register on which the operation is to be performed. 10. At least one tangible machine readable storage medium as defined in claim 9 , wherein the instructions, when executed, cause the machine to, when the number of indexes that have been entered into the array does not meet the threshold, evaluate the condition for a second set of iterations of the loop without performing the operation on the data elements corresponding to the amount of the indexes in the array. 11. An apparatus comprising: a control mask generator to generate a first control mask for a first set of iterations of a loop by evaluating a condition of the loop, the control mask generator to generate the first control mask by: setting a bit of the first control mask to a first value when the condition indicates that an operation of the loop is to be executed; and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed; an index loader to load indexes corresponding to the first set of iterations of the loop into a first register; a data compressor to compress the indexes in the first register according to the first control mask by: maintaining a first one of the indexes when a first bit of the first control mask associated with the first one of the indexes has the first value; and discarding, before the operation is executed, a second one of the indexes when a second bit of the first control mask associated with the second one of the indexes has the second value; an array populater to populate an array with the compressed indexes; an array evaluator to determine whether a number of indexes that have been entered into the array meets a threshold; and a register loader to load data elements corresponding to an amount of the indexes in the array into a register on which the operation is to be performed, by computation performer circuitry, when the number of indexes meets the threshold, at least one of the control mask generator, the index loader, the data compressor, the array populater, and the array evaluator is implemented by hardware. 12. The apparatus of claim 11 , further including a data loader to load data elements corresponding to the indexes into a second register, wherein the data compressor is to compress the data elements in the second register according to the first control mask, and the indexes indicate memory locations at which results of the operation are to be stored for respective ones of the data elements. 13. The apparatus as defined in claim 11 , wherein the control mask generator is to evaluate the condition for a second set of iterations of the loop without performing the operation on the data elements corresponding to the amount of the indexes in the array when the number of indexes that have been entered into the array does not meet the threshold. 14. Th

Assignees

Inventors

Classifications

  • G06F8/452Primary

    Loops · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Special arrangements thereof, e.g. mask or switch · CPC title

  • G06F8/4441Primary

    Reducing the execution time required by the program code · CPC title

  • G06F9/38Primary

    Concurrent instruction execution, e.g. pipeline or look ahead · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9740493B2 cover?
Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the fi…
Who is the assignee on this patent?
Hughes Christopher J, Plotnikov Mikhail, Naraikin Andrey, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F8/452. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 22 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).