Sparse convolutional neural network accelerator
US-10891538-B2 · Jan 12, 2021 · US
US12223353B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12223353-B2 |
| Application number | US-202318481489-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 5, 2023 |
| Priority date | Mar 15, 2019 |
| Publication date | Feb 11, 2025 |
| Grant date | Feb 11, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatuses to synchronize lanes that diverge or threads that drift are disclosed. In one embodiment, a graphics multiprocessor includes a queue having an initial state of groups with a first group having threads of first and second instruction types and a second group having threads of the first and second instruction types. A regroup engine (or regroup circuitry) regroups threads into a third group having threads of the first instruction type and a fourth group having threads of the second instruction type.
Opening claim text (preview).
What is claimed is: 1. A graphics multiprocessor, comprising: a queue having an initial state of groups with a first group having threads of first and second instruction types and a second group having threads of the first and second instruction types; and a regroup circuitry to regroup threads from the initial state of groups of first and second groups into a regrouped state of groups including a third group having threads of the first instruction type and a fourth group having threads of the second instruction type based on an instruction type and to determine an order of inserting the third group and the fourth group into the queue to minimize divergence between threads. 2. The graphics multiprocessor of claim 1 , wherein the regroup circuitry to select one or more threads from one or more groups that are set to execute an instruction and combine the one or more threads into a single group. 3. The graphics multiprocessor of claim 1 , wherein each of the first instruction type and the second instruction type comprise one of a load/store instruction, an integer instruction, a floating point instruction, an integer mac instruction, an integer add instruction, a floating point add instruction, a floating point fused multiply-add (fma) instruction, a floating point sine instruction, or a floating point cosine instruction. 4. The graphics multiprocessor of claim 1 , further comprising: a thread scheduler coupled to the queue; and a plurality of processing resources coupled to the thread scheduler. 5. The graphics multiprocessor of claim 4 , wherein the thread scheduler is configured to schedule the first instruction type of the third group for execution on a first processing resource with full utilization of this first processing resource. 6. The graphics multiprocessor of claim 4 , wherein the thread scheduler is configured to schedule the second instruction type of the fourth group for execution on a second processing resource with full utilization of this second processing resource. 7. The graphics multiprocessor of claim 1 , wherein the regroup circuitry utilizes regrouping policies and an order that a new regrouped group is inserted in the queue is optimized depending on latencies. 8. A graphics processor, comprising: one or more processing resources to process groupings of threads; and thread control circuitry coupled to the one or more processing resources, the thread control circuitry is configured to determine groupings of instantiated threads, to determine progress of the threads for executing a task on the one or more processing resources, and to determine drift between threads. 9. The graphics processor of claim 8 , wherein the thread control circuitry is further configured to determine whether a drift between threads exceeds a threshold drift. 10. The graphics processor of claim 8 , wherein the thread control circuitry is further configured to accelerate at least one thread that lags other threads by at least the threshold drift. 11. The graphics processor of claim 8 , wherein the thread control circuitry accelerates the at least one thread by applying a higher priority to this at least one thread than other threads. 12. The graphics processor of claim 8 , wherein the processing resources to process threads for a single instruction multiple data (SIMD) execution model. 13. A method for scheduling optimization of threads of a graphics processing unit, a graphics multiprocessor, or a graphic processor, the method comprising: starting processing of groupings of threads on one or more processing resources; monitoring, with a thread control circuitry of the graphics processing unit, the graphics multiprocessor, or the graphic processor, progress of each thread for a group; and determining, with the thread control circuitry, drift between threads. 14. The method of claim 13 , further comprising: determining, with the graphics processing unit, the graphics multiprocessor, or the graphic processor, the groupings of threads. 15. The method of claim 13 , further comprising: determining, with the thread control circuitry, whether a drift between threads exceeds a threshold drift. 16. The method of claim 13 , further comprising: rescheduling, with the thread control circuitry, at least one thread that lags other threads. 17. The method of claim 13 , wherein the thread control circuitry provides a higher priority level for the at least one thread that lags other threads to reschedule the at least one thread.
from multiple instruction streams, e.g. multistreaming · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Concurrent instruction execution, e.g. pipeline or look ahead · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
single instruction multiple data [SIMD] multiprocessors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.