Coalescing to avoid read-modify-write during compressed data operations
US-9058792-B1 · Jun 16, 2015 · US
US9449360B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9449360-B2 |
| Application number | US-201113997056-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 28, 2011 |
| Priority date | Dec 28, 2011 |
| Publication date | Sep 20, 2016 |
| Grant date | Sep 20, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatuses to reduce the number of sequential operations such as atomic operations in an application to be performed on a shared memory cell may be provided. A translation unit can detect in the application multiple atomic operations to be performed on the same memory and replaces the multiple atomic operations with an equivalent single atomic operation. In some implementations, the application includes shader code. In some implementations, each of the multiple atomic operations increment a value stored at the same memory by an update amount. The translation unit may calculate the partial prefix sum over all the atomic operations and replace the multiple atomic operations with a single atomic operation to increment the value stored at memory by the sum of the update amounts.
Opening claim text (preview).
The invention claimed is: 1. A system comprising: a storage device to store an application; a central processor to load the application from the storage device to a translation unit; a graphics driver including: the translation unit to detect in the application a plurality of atomic operations that are to include all atomic operations in the application to be performed sequentially on a same address in memory, wherein each of the plurality of atomic operations is to increment a value stored in the memory by an update amount, and to replace the plurality of atomic operations in the application with an equivalent single atomic operation to produce a translated application by computation of a localized partial prefix sum of the update amounts and replacement of the plurality of atomic operations in the application with an atomic operation to increment the value stored in the memory by an amount equal to a sum of the update amounts; and a compiler to compile the translated application; and a graphics processor to execute the translated application. 2. The system of claim 1 , wherein the application includes shader code. 3. The system of claim 1 , wherein the localized partial prefix sum of the update amounts is to be up to a SIMD execution engine length. 4. The system of claim 1 , wherein the compiler includes a just-in-time compiler. 5. The system of claim 1 , wherein the graphics processor includes a SIMD architecture. 6. A computer implemented method comprising: detecting in an application a plurality of atomic operations including all atomic operations in the application to be performed sequentially on a same address in memory, wherein each of the plurality of atomic operations increments a value stored in the memory by an update amount; replacing the plurality of atomic operations in the application with an equivalent single atomic operation to produce a translated application by computing a localized partial prefix sum of the update amounts and replacing the plurality of atomic operations in the application with an atomic operation to increment the value stored in the memory by an amount equal to a sum of the update amounts; and compiling the translated application. 7. The computer implemented method of claim 6 , wherein the application includes shader code. 8. The computer implemented method of claim 6 , wherein the localized partial prefix sum of the update amounts is up to a SIMD execution engine length. 9. The computer implemented method of claim 6 , further including executing the translated application. 10. A non-transitory computer readable medium comprising a set of instructions which, if executed by a processor, cause a computer to: detect in an application a plurality of atomic operations that are to include all atomic operations in the application to be performed sequentially on a same address in memory, wherein each of the plurality of atomic operations is to increment a value stored in the memory by an update amount; replace the plurality of atomic operations in the application with an equivalent single atomic operation to produce a translated application by computation of a localized partial prefix sum of the update amounts and replacement of the plurality of atomic operations in the application with an atomic operation to increment the value stored in the memory by an amount equal to a sum of the update amounts; and compile the translated application. 11. The computer readable medium of claim 10 , wherein the application includes shader code. 12. The computer readable medium of claim 10 , wherein the localized partial prefix sum of the update amounts is to be up to a SIMD execution engine length. 13. The computer readable medium of claim 10 , wherein the instructions, if executed, cause a computer to compute partial prefix sums of a plurality of update amounts outside a critical section. 14. The computer readable medium of claim 10 , wherein the instructions, if executed, cause a computer to execute the translated application. 15. A system comprising: a storage device to store an application; a translation unit to: detect in the application a plurality of atomic operations that are to include all atomic operations in the application to be performed sequentially on a same address in memory, wherein each of the plurality of atomic operations is to increment a value stored in the memory by an update amount; and replace the plurality of atomic operations in the application with an equivalent single atomic operation to produce a translated application by computation of a localized partial refix sum of the u s date amounts and re s lacement of the plurality of atomic operations in the application with an atomic operation to increment the value stored in the memory by an amount equal to a sum of the update amounts; and a compiler to compile the translated application. 16. The system of claim 15 , wherein the application includes shader code. 17. The system of claim 15 , wherein the localized partial prefix sum of the update amounts is to be up to a SIMD execution engine length. 18. The system of claim 15 , wherein the compiler comprises a just-in-time compiler. 19. The system of claim 15 , further comprising a graphics processor to execute the translated application. 20. The system of claim 15 , wherein the translation unit is located in a graphics driver.
Mutual exclusion algorithms · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
based on arbitration (arbitration in handling access to a common bus or bus system G06F13/36) · CPC title
Memory management · CPC title
Involving translation to a different instruction set architecture, e.g. just-in-time translation in a JVM · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.