Reducing the number of sequential operations in an application to be performed on a shared memory cell

US9449360B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9449360-B2
Application numberUS-201113997056-A
CountryUS
Kind codeB2
Filing dateDec 28, 2011
Priority dateDec 28, 2011
Publication dateSep 20, 2016
Grant dateSep 20, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatuses to reduce the number of sequential operations such as atomic operations in an application to be performed on a shared memory cell may be provided. A translation unit can detect in the application multiple atomic operations to be performed on the same memory and replaces the multiple atomic operations with an equivalent single atomic operation. In some implementations, the application includes shader code. In some implementations, each of the multiple atomic operations increment a value stored at the same memory by an update amount. The translation unit may calculate the partial prefix sum over all the atomic operations and replace the multiple atomic operations with a single atomic operation to increment the value stored at memory by the sum of the update amounts.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system comprising: a storage device to store an application; a central processor to load the application from the storage device to a translation unit; a graphics driver including: the translation unit to detect in the application a plurality of atomic operations that are to include all atomic operations in the application to be performed sequentially on a same address in memory, wherein each of the plurality of atomic operations is to increment a value stored in the memory by an update amount, and to replace the plurality of atomic operations in the application with an equivalent single atomic operation to produce a translated application by computation of a localized partial prefix sum of the update amounts and replacement of the plurality of atomic operations in the application with an atomic operation to increment the value stored in the memory by an amount equal to a sum of the update amounts; and a compiler to compile the translated application; and a graphics processor to execute the translated application. 2. The system of claim 1 , wherein the application includes shader code. 3. The system of claim 1 , wherein the localized partial prefix sum of the update amounts is to be up to a SIMD execution engine length. 4. The system of claim 1 , wherein the compiler includes a just-in-time compiler. 5. The system of claim 1 , wherein the graphics processor includes a SIMD architecture. 6. A computer implemented method comprising: detecting in an application a plurality of atomic operations including all atomic operations in the application to be performed sequentially on a same address in memory, wherein each of the plurality of atomic operations increments a value stored in the memory by an update amount; replacing the plurality of atomic operations in the application with an equivalent single atomic operation to produce a translated application by computing a localized partial prefix sum of the update amounts and replacing the plurality of atomic operations in the application with an atomic operation to increment the value stored in the memory by an amount equal to a sum of the update amounts; and compiling the translated application. 7. The computer implemented method of claim 6 , wherein the application includes shader code. 8. The computer implemented method of claim 6 , wherein the localized partial prefix sum of the update amounts is up to a SIMD execution engine length. 9. The computer implemented method of claim 6 , further including executing the translated application. 10. A non-transitory computer readable medium comprising a set of instructions which, if executed by a processor, cause a computer to: detect in an application a plurality of atomic operations that are to include all atomic operations in the application to be performed sequentially on a same address in memory, wherein each of the plurality of atomic operations is to increment a value stored in the memory by an update amount; replace the plurality of atomic operations in the application with an equivalent single atomic operation to produce a translated application by computation of a localized partial prefix sum of the update amounts and replacement of the plurality of atomic operations in the application with an atomic operation to increment the value stored in the memory by an amount equal to a sum of the update amounts; and compile the translated application. 11. The computer readable medium of claim 10 , wherein the application includes shader code. 12. The computer readable medium of claim 10 , wherein the localized partial prefix sum of the update amounts is to be up to a SIMD execution engine length. 13. The computer readable medium of claim 10 , wherein the instructions, if executed, cause a computer to compute partial prefix sums of a plurality of update amounts outside a critical section. 14. The computer readable medium of claim 10 , wherein the instructions, if executed, cause a computer to execute the translated application. 15. A system comprising: a storage device to store an application; a translation unit to: detect in the application a plurality of atomic operations that are to include all atomic operations in the application to be performed sequentially on a same address in memory, wherein each of the plurality of atomic operations is to increment a value stored in the memory by an update amount; and replace the plurality of atomic operations in the application with an equivalent single atomic operation to produce a translated application by computation of a localized partial refix sum of the u s date amounts and re s lacement of the plurality of atomic operations in the application with an atomic operation to increment the value stored in the memory by an amount equal to a sum of the update amounts; and a compiler to compile the translated application. 16. The system of claim 15 , wherein the application includes shader code. 17. The system of claim 15 , wherein the localized partial prefix sum of the update amounts is to be up to a SIMD execution engine length. 18. The system of claim 15 , wherein the compiler comprises a just-in-time compiler. 19. The system of claim 15 , further comprising a graphics processor to execute the translated application. 20. The system of claim 15 , wherein the translation unit is located in a graphics driver.

Assignees

Inventors

Classifications

  • Mutual exclusion algorithms · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • based on arbitration (arbitration in handling access to a common bus or bus system G06F13/36) · CPC title

  • Memory management · CPC title

  • G06F9/4552Primary

    Involving translation to a different instruction set architecture, e.g. just-in-time translation in a JVM · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9449360B2 cover?
Methods and apparatuses to reduce the number of sequential operations such as atomic operations in an application to be performed on a shared memory cell may be provided. A translation unit can detect in the application multiple atomic operations to be performed on the same memory and replaces the multiple atomic operations with an equivalent single atomic operation. In some implementations, th…
Who is the assignee on this patent?
Janczak Tomasz, Targowski Marek, Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).