Epoch-based determination of completion of barrier termination command
US-2021026568-A1 · Jan 28, 2021 · US
US12340259B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12340259-B2 |
| Application number | US-202117380424-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 20, 2021 |
| Priority date | Jul 20, 2021 |
| Publication date | Jun 24, 2025 |
| Grant date | Jun 24, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments include a parallel processing computer system that provides multiple memory synchronization domains in a single parallel processor to reduce unneeded synchronization operations. During execution, one execution kernel may synchronize with one or more other execution kernels by processing outstanding memory references. The parallel processor tracks memory references for each domain to each portion of local and remote memory. During synchronization, the processor synchronizes the memory references for a specific domain while refraining from synchronizing memory references for other domains. As a result, synchronization operations between kernels complete in a reduced amount of time relative to prior approaches.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for synchronizing threads executing in a processor, the method comprising: determining that a first thread has executed a first memory barrier instruction, wherein the first thread is included in a first set of threads associated with a first domain representing a first set of resources included in the processor that synchronize with one another; determining a set of memory apertures associated with the first domain that have been accessed by at least one thread included in the first set of threads; generating a memory barrier command that: specifies references to the set of memory apertures associated with the first domain to flush in response to the memory barrier command, and excludes references to other memory apertures that are excluded from being flushed in response to the memory barrier command, wherein the other memory apertures are accessed by threads associated with a second domain representing a second set of resources included in the processor that synchronize with one another, and wherein the second set of resources is different from the first set of resources; and transmitting the memory barrier command to the set of memory apertures. 2. The computer-implemented method of claim 1 , wherein the set of memory apertures includes a first memory aperture that is internal to the processor and excludes a second memory aperture that is external to the processor. 3. The computer-implemented method of claim 1 , wherein the set of memory apertures includes a first memory aperture that is internal to the processor and a second memory aperture that is external to the processor. 4. The computer-implemented method of claim 1 , wherein the set of memory apertures includes a first memory aperture that is external to the processor and a second memory aperture that is external to the processor and is exclusive of the first memory aperture. 5. The computer-implemented method of claim 1 , wherein: the set of memory apertures includes a first subset of memory apertures that includes at least one of a first memory aperture that is internal to the processor or a second memory aperture that is external to the processor, the set of memory apertures further includes a second subset of memory apertures that includes at least one of a third memory aperture that is internal to the processor or a fourth memory aperture that is external to the processor, and the first subset of memory apertures is not identical to the second subset of memory apertures. 6. The computer-implemented method of claim 1 , wherein the first memory barrier instruction includes a domain identifier associated with the first domain. 7. The computer-implemented method of claim 1 , further comprising: combining the first memory barrier instruction with a second memory barrier instruction executed by a second thread included in a second set of threads associated with a third domain, wherein the set of memory apertures further includes memory apertures that have been accessed by at least one thread included in the second set of threads. 8. The computer-implemented method of claim 7 , wherein the first memory barrier instruction includes a first domain identifier associated with the first domain, and the second memory barrier instruction includes a second domain identifier associated with the second domain. 9. The computer-implemented method of claim 1 , further comprising determining that each memory reference associated with the set of memory apertures has reached a serialization point. 10. The computer-implemented method of claim 1 , wherein the first domain is identified by task metadata associated with the first thread. 11. The computer-implemented method of claim 1 , wherein the first domain and the second domain are included in a set of domains comprising four domains. 12. The computer-implemented method of claim 1 , wherein a first memory aperture included in the set of memory apertures includes a synchronization point, and further comprising: determining that a memory reference executed by the first thread has reached the synchronization point; and indicating that memory barrier command has completed. 13. One or more non-transitory computer-readable media storing program instructions that, when executed by one or more processors, cause the one or more processors to perform steps of: determining that a first thread has executed a first memory barrier instruction, wherein the first thread is included in a first set of threads associated with a first domain representing a first set of resources included in a first processor of the one or more processors that synchronize with one another; determining a set of memory apertures associated with the first domain that have been accessed by at least one thread included in the first set of threads; generating a memory barrier command that: specifies references to the set of memory apertures associated with the first domain to flush in response to the memory barrier command, and excludes references to other memory apertures that are excluded from being flushed in response to the memory barrier command, wherein the other memory apertures are accessed by threads associated with a second domain representing a second set of resources included in the first processor that synchronize with one another, and wherein the second set of resources is different from the first set of resources; and transmitting the memory barrier command to the set of memory apertures. 14. The one or more non-transitory computer-readable media of claim 13 , wherein the set of memory apertures includes a first memory aperture that is internal to the first processor and excludes a second memory aperture that is external to the first processor. 15. The one or more non-transitory computer-readable media of claim 13 , wherein the set of memory apertures includes a first memory aperture that is internal to the first processor and a second memory aperture that is external to the first processor. 16. The one or more non-transitory computer-readable media of claim 13 , wherein the first memory barrier instruction includes a domain identifier associated with the first domain. 17. The one or more non-transitory computer-readable media of claim 13 , wherein the steps further comprise: combining the first memory barrier instruction with a second memory barrier instruction executed by a second thread included in a second set of threads associated with a third domain, wherein the set of memory apertures further includes memory apertures that have been accessed by at least one thread included in the second set of threads. 18. The one or more non-transitory computer-readable media of claim 17 , wherein the first memory barrier instruction includes a first domain identifier associated with the first domain, and the second memory barrier instruction includes a second domain identifier associated with the second domain. 19. The one or more non-transitory computer-readable media of claim 13 , wherein the steps further comprise determining that each memory reference associated with the set of memory apertures has reached a serialization point. 20. A system, comprising: a memory storing instructions; and a processor that is coupled to the memory and, when executing the instructions: determines that a first thread has executed a first memory barrier instruction, wherein the first thread is included in a first set of threads associated with a first domain representing a first set of resources includ
from multiple instruction streams, e.g. multistreaming · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Synchronisation or serialisation instructions · CPC title
Barrier synchronisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.