User-configurable memory allocation
US-2023102843-A1 · Mar 30, 2023 · US
US11954758B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11954758-B2 |
| Application number | US-202217652478-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 24, 2022 |
| Priority date | Feb 24, 2022 |
| Publication date | Apr 9, 2024 |
| Grant date | Apr 9, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.
Opening claim text (preview).
What is claimed is: 1. An apparatus for graphics processing, comprising: a memory; and at least one processor coupled to the memory and configured to: allocate one or more graphics processing unit (GPU) workloads to one or more wave slots of a plurality of wave slots; select one of a plurality of granularities based on whether the one or more GPU workloads are associated with or are unassociated with a texture block or a branching block: select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads, the selection of the first execution slot being based on the one of the plurality of granularities; and execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities. 2. The apparatus of claim 1 , wherein the plurality of wave slots comprise a plurality of first wave slots and a plurality of second wave slots, the one or more GPU workloads are allocated based on the plurality of first wave slots and the plurality of second wave slots, and a first GPU workload of the one or more GPU workloads is allocated to a first wave slot of the plurality of first wave slots and a second GPU workload of the one or more GPU workloads is allocated to a second wave slot of the plurality of second wave slots. 3. The apparatus of claim 2 , wherein the first GPU workload and the second GPU workload are consecutive in sequence. 4. The apparatus of claim 2 , wherein the first GPU workload and the second GPU workload are associated with a same instruction. 5. The apparatus of claim 1 , the at least one processor being further configured to: receive, from at least one GPU component at runtime, the one or more GPU workloads. 6. The apparatus of claim 1 , wherein a first granularity is selected as the one of the plurality of granularities when the one or more GPU workloads are associated with the texture block or the branching block. 7. The apparatus of claim 6 , wherein a second granularity is selected as the one of the plurality of granularities when the one or more GPU workloads are unassociated with the texture block or the branching block, the first granularity being smaller than the second granularity. 8. The apparatus of claim 1 , the at least one processor being further configured to: copy a context register for the one or more GPU workloads to the first execution slot, wherein the context register for the one or more GPU workloads is associated with a context state of the one or more GPU workloads. 9. The apparatus of claim 1 , the at least one processor being further configured to: copy a context register for the one or more GPU workloads to the one or more wave slots. 10. The apparatus of claim 1 , wherein each GPU workload of the one or more GPU workloads includes a uniform number of fibers. 11. The apparatus of claim 1 , wherein each wave slot of the plurality of wave slots is associated with one set of general purpose registers (GPRs) of a plurality of GPRs. 12. The apparatus of claim 1 , wherein the apparatus is a wireless communication device. 13. A method of graphics processing, comprising: allocating one or more graphics processing unit (GPU) workloads to one or more wave slots of a plurality of wave slots; selecting one of a plurality of granularities based on whether the one or more GPU workloads are associated with or are unassociated with a texture block or a branching block; selecting a first execution slot of a plurality of execution slots for executing the one or more GPU workloads, the selection of the first execution slot being based on the one of the plurality of granularities; and executing, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities. 14. The method of claim 13 , wherein the plurality of wave slots comprise a plurality of first wave slots and a plurality of second wave slots, the one or more GPU workloads are allocated based on the plurality of first wave slots and the plurality of second wave slots, and a first GPU workload of the one or more GPU workloads is allocated to a first wave slot of the plurality of first wave slots and a second GPU workload of the one or more GPU workloads is allocated to a second wave slot of the plurality of second wave slots. 15. The method of claim 14 , wherein the first GPU workload and the second GPU workload are consecutive in sequence. 16. The method of claim 14 , wherein the first GPU workload and the second GPU workload are associated with a same instruction. 17. The method of claim 13 , further comprising: receiving, from at least one GPU component, the one or more GPU workloads. 18. The method of claim 13 , wherein a first granularity is selected as the one of the plurality of granularities when the one or more GPU workloads are associated with the texture block or the branching block. 19. The method of claim 18 , wherein a second granularity is selected as the one of the plurality of granularities when the one or more GPU workloads are unassociated with the texture block or the branching block, the first granularity being smaller than the second granularity. 20. The method of claim 13 , further comprising: copying a context register for the one or more GPU workloads to the first execution slot, wherein the context register for the one or more GPU workloads is associated with a context state of the one or more GPU workloads. 21. The method of claim 13 , further comprising: copying a context register for the one or more GPU workloads to the one or more wave slots. 22. The method of claim 13 , wherein each GPU workload of the one or more GPU workloads includes a uniform number of fibers. 23. The method of claim 13 , wherein each wave slot of the plurality of wave slots is associated with one set of general purpose registers (GPRs) of a plurality of GPRs. 24. A non-transitory computer-readable medium storing computer executable code, the code when executed by at least one processor, causes the at least one processor to: allocate one or more graphics processing unit (GPU) workloads to one or more wave slots of a plurality of wave slots; select one of a plurality of granularities based on whether the one or more GPU workloads are associated with or are unassociated with a texture block or a branching block; select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads, the selection of the first execution slot being based on the one of the plurality of granularities; and execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities. 25. The non-transitory computer-readable medium of claim 24 , wherein the plurality of wave slots comprise a plurality of first wave slots and a plurality of second wave slots, the one or more GPU workloads are allocated based on the plurality of first wave slots and the plurality of second wave slots, and a first GPU workload of the one or more GPU workloads is allocated to a first wave slot of the plurality of first wave slots and a second GPU workload of the one or more GPU workloads is allocated to a second wave slot of the plurality of second wave slots. 26. The non-transitory computer-readable medium of claim 25 , wherein the first GPU workload and the second GPU workload are consecutive in sequence. 27. The
General purpose rendering architectures · CPC title
involving image processing hardware · CPC title
Register windows · CPC title
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
considering the load · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.