Vectorization in an optimizing compiler

US9047077B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9047077-B2
Application numberUS-201313791929-A
CountryUS
Kind codeB2
Filing dateMar 9, 2013
Priority dateFeb 21, 2013
Publication dateJun 2, 2015
Grant dateJun 2, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An optimizing compiler includes a vectorization mechanism that optimizes a computer program by substituting code that includes one or more vector instructions (vectorized code) for one or more scalar instructions. The cost of the vectorized code is compared to the cost of the code with only scalar instructions. When the cost of the vectorized code is less than the cost of the code with only scalar instructions, the vectorization mechanism determines whether the vectorized code will likely result in processor stalls. If not, the vectorization mechanism substitutes the vectorized code for the code with only scalar instructions. When the vectorized code will likely result in processor stalls, the vectorization mechanism does not substitute the vectorized code, and the code with only scalar instructions remains in the computer program.

First claim

Opening claim text (preview).

The invention claimed is: 1. An apparatus comprising: at least one processor; a memory coupled to the at least one processor; a computer program residing in the memory, the computer program including a plurality of instructions in an instruction set that includes a plurality of vector instructions that operate on data in a plurality of vector registers; and an optimizing compiler residing in the memory and executed by the at least one processor, the optimizing compiler including a vectorization mechanism that analyzes a portion of the computer program, generates from a first set of instructions in the computer program that does not include any of the plurality of vector instructions a second set of instructions that includes at least one of the plurality of vector instructions, determines when a second cost of processing the second set of instructions is less than a first cost of processing the first set of instructions, and when the second cost is less than the first cost, determining when the second set of instructions is likely to cause a processor running the computer program to stall when executing the second set of instructions by determining a ratio of vector instructions in the second set of instructions to total instructions in the second set of instructions, and when the second set of instructions is not likely to cause the processor running the computer program to stall when executing the second set of instructions, the vectorization mechanism substitutes the second set of instructions for the first set of instructions in the computer program, and when the second cost is not less than the first cost or when the second set of instructions is likely to cause the processor running the computer program to stall when executing the second set of instructions, the vectorization mechanism does not substitute the second set of instructions for the first set of instructions in the computer program. 2. The apparatus of claim 1 wherein the vectorization mechanism determines when the second set of instructions is likely to cause the processor running the computer program to stall by determining when the ratio is less than a predetermined threshold value, the second set of instructions is not likely to cause the processor running the computer program to stall, and by determining when the ratio is greater than or equal to the predetermined threshold, the second set of instructions is likely to cause the processor running the computer program to stall. 3. The apparatus of claim 2 wherein the vectorization mechanism increases the second cost by a predetermined multiplier factor when the second cost exceeds the predetermined threshold. 4. The apparatus of claim 1 wherein the vectorization mechanism determines when the second set of instructions is likely to cause a processor running the computer program to stall when executing the second set of instructions only when a number of instructions in the second set of instructions exceeds a predetermined threshold. 5. The apparatus of claim 1 wherein the vectorization mechanism determines when the second set of instructions is likely to cause the processor running the computer program to stall by determining a ratio of cost of vector instructions in the second set of instructions to total cost of all instructions in the second set of instructions, and when the ratio is less than a predetermined threshold value, the second set of instructions is not likely to cause the processor running the computer program to stall, and when the ratio is greater than or equal to the predetermined threshold, the second set of instructions is likely to cause the processor running the computer program to stall. 6. An article of manufacture comprising software stored on a non-transitory computer readable storage medium, the software comprising: an optimizing compiler including a vectorization mechanism that analyzes a portion of a computer program that includes a plurality of instructions for execution on at least one processor that includes a vector processing unit that processes a plurality of vector instructions that operate on data in a plurality of vector registers, generates from a first set of instructions in the computer program that does not include any of the plurality of vector instructions a second set of instructions that includes at least one of the plurality of vector instructions, determines when a second cost of processing the second set of instructions is less than a first cost of processing the first set of instructions, and when the second cost is less than the first cost, determining when the second set of instructions is likely to cause a processor running the computer program to stall when executing the second set of instructions by determining a ratio of vector instructions in the second set of instructions to total instructions in the second set of instructions, and when the second set of instructions is not likely to cause the processor running the computer program to stall when executing the second set of instructions, the vectorization mechanism substitutes the second set of instructions for the first set of instructions in the computer program, and when the second cost is not less than the first cost or when the second set of instructions is likely to cause the at least one processor to stall when executing the second set of instructions, the vectorization mechanism does not substitute the second set of instructions for the first set of instructions in the computer program. 7. The article of manufacture of claim 6 wherein the vectorization mechanism determines when the second set of instructions is likely to cause the processor running the computer program to stall by determining when the ratio is less than a predetermined threshold value, the second set of instructions is not likely to cause the processor running the computer program to stall, and by determining when the ratio is greater than or equal to the predetermined threshold, the second set of instructions is likely to cause the processor running the computer program to stall. 8. The article of manufacture of claim 7 wherein the vectorization mechanism increases the second cost by a predetermined multiplier factor when the second cost exceeds the predetermined threshold. 9. The article of manufacture of claim 6 wherein the vectorization mechanism determines when the second set of instructions is likely to cause a processor running the computer program to stall when executing the second set of instructions only when a number of instructions in the second set of instructions exceeds a predetermined threshold. 10. The article of manufacture of claim 6 wherein the vectorization mechanism determines when the second set of instructions is likely to cause the processor running the computer program to stall by determining a ratio of cost of vector instructions in the second set of instructions to total cost of all instructions in the second set of instructions, and when the ratio is less than a predetermined threshold value, the second set of instructions is not likely to cause the processor running the computer program to stall, and when the ratio is greater than or equal to the predetermined threshold, the second set of instructions is likely to cause the processor running the computer program to stall. 11. An apparatus comprising: at least one processor; a memory coupled to the at least one processor; a computer program residing in the memory, the computer program including a plurality of instructions in an instruction set that includes a plurality of vector instructions that operate on data in a plurality of vector registers; and an optimizing compiler residing in the memory and executed by the at least o

Assignees

Inventors

Classifications

  • G06F8/443Primary

    Optimisation · CPC title

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • Compilation · CPC title

  • Dependency analysis; Data or control flow analysis · CPC title

  • Prevention of errors by analysis, debugging or testing of software · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9047077B2 cover?
An optimizing compiler includes a vectorization mechanism that optimizes a computer program by substituting code that includes one or more vector instructions (vectorized code) for one or more scalar instructions. The cost of the vectorized code is compared to the cost of the code with only scalar instructions. When the cost of the vectorized code is less than the cost of the code with only sca…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F8/443. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 02 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).