Sparse convolutional neural network accelerator
US-10891538-B2 · Jan 12, 2021 · US
US12229867B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12229867-B2 |
| Application number | US-202318310015-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 1, 2023 |
| Priority date | Aug 10, 2018 |
| Publication date | Feb 18, 2025 |
| Grant date | Feb 18, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment provides a graphics processor comprising a block of execution resources, a cache memory, a cache memory prefetcher, and circuitry including a programmable neural network unit, the programmable neural network unit comprising a network hardware block including circuitry to perform neural network operations and activation operations for a layer of a neural network, the programmable neural network unit addressable by cores within the block of graphics cores and the neural network hardware block configured to perform operations associated with a neural network configured to determine a prefetch pattern for the cache memory prefetcher.
Opening claim text (preview).
What is claimed is: 1. A graphics processor comprising: a block of execution resources; a cache memory; a cache memory prefetcher having an adjustable prefetch pattern that is adjustable to a learned prefetch pattern, the learned prefetch pattern learned by a neural network; and circuitry including a programmable neural network unit, the programmable neural network unit comprising a network hardware block including circuitry to perform neural network operations and activation operations for a layer of the neural network, the programmable neural network unit addressable by cores within the block of graphics cores and the neural network hardware block configured to perform operations associated with a neural network configured to determine a prefetch pattern for the cache memory prefetcher, wherein the prefetch pattern determined for the cache memory prefetcher is the learned prefetch pattern, the cache memory prefetcher is to prefetch data according to the learned prefetch pattern, and the learned prefetch pattern is based at least in part on a memory access pattern associated with a workload executed via the block of execution resources, wherein the neural network hardware block is configured, via the neural network, to: recognize the workload executed via the block of execution resources as the workload associated with the learned prefetch pattern based at least in part on the memory access pattern associated with the workload; and configure the prefetch pattern for the cache memory prefetcher to the learned prefetch pattern for use with the workload, the learned prefetch pattern one of a plurality of learned prefetch patterns associated with a plurality of workloads. 2. The graphics processor of claim 1 , wherein the cache memory includes multiple levels of cache memory and the cache memory prefetcher is configured to prefetch data into the multiple levels of cache memory. 3. The graphics processor of claim 2 , wherein the cache memory is associated with a unified memory architecture including a local memory of the graphics processor and a system memory coupled with a host processor. 4. The graphics processor of claim 3 , the memory access pattern associated with the workload is to the local memory of the graphics processor. 5. The graphics processor of claim 3 , the memory access pattern associated with the workload is to the system memory coupled with the host processor. 6. The graphics processor as in claim 1 , wherein the neural network hardware block includes a source data buffer, a neural network operations and activation operations block, and an output data buffer. 7. The graphics processor as in claim 6 , wherein the neural network operations and activation operations block is programmably configurable. 8. The graphics processor as in claim 7 , wherein the programmable neural network unit includes a block programming unit to configure layer state information for the neural network hardware block, the layer state information associated with one or more layers of a neural network to be processed by the programmable neural network unit. 9. The graphics processor as in claim 7 , wherein the programmable neural network unit includes a weight cache to cache weights associated with one or more layers of a neural network to be processed by the programmable neural network unit. 10. The graphics processor as in claim 9 , wherein the programmable neural network unit includes multiple neural network hardware blocks, each of the multiple neural network hardware blocks associated with respective layers of the neural network. 11. A graphics processing system comprising: a memory device; and a graphics processor coupled with the memory device, the graphics processor including: a block of execution resources; a cache memory; a cache memory prefetcher having an adjustable prefetch pattern that is adjustable to a learned prefetch pattern, the learned prefetch pattern learned by a neural network; and circuitry including a programmable neural network unit, the programmable neural network unit comprising a network hardware block including circuitry to perform neural network operations and activation operations for a layer of the neural network, the programmable neural network unit addressable by cores within the block of graphics cores and the neural network hardware block configured to perform operations associated with a neural network configured to determine a prefetch pattern for the cache memory prefetcher, wherein the prefetch pattern determined for the cache memory prefetcher is the learned prefetch pattern, the cache memory prefetcher is to prefetch data according to the learned prefetch pattern, and the learned prefetch pattern is based at least in part on a memory access pattern associated with a workload executed via the block of execution resources, wherein the neural network hardware block is configured, via the neural network, to: recognize the workload executed via the block of execution resources as the workload associated with the learned prefetch pattern based at least in part on the memory access pattern associated with the workload; and configure the prefetch pattern for the cache memory prefetcher to the learned prefetch pattern for use with the workload, the learned prefetch pattern one of a plurality of learned prefetch patterns associated with a plurality of workloads. 12. The graphics processing system of claim 11 , wherein the cache memory includes multiple levels of cache memory and the cache memory prefetcher is configured to prefetch data into the multiple levels of cache memory. 13. The graphics processing system of claim 12 , wherein the cache memory is associated with a unified memory architecture including a local memory of the graphics processor and a system memory coupled with a host processor. 14. The graphics processing system of claim 13 , the memory access pattern associated with the workload is to the local memory of the graphics processor. 15. The graphics processing system of claim 13 , the memory access pattern associated with the workload is to the system memory coupled with the host processor. 16. The graphics processing system as in claim 11 , wherein the neural network hardware block includes a source data buffer, a neural network operations and activation operations block, and an output data buffer. 17. The graphics processing system as in claim 16 , wherein the neural network operations and activation operations block is programmably configurable. 18. The graphics processing system as in claim 17 , wherein the programmable neural network unit includes a block programming unit to configure layer state information for the neural network hardware block, the layer state information associated with one or more layers of a neural network to be processed by the programmable neural network unit. 19. The graphics processing system as in claim 17 , wherein the programmable neural network unit includes a weight cache to cache weights associated with one or more layers of a neural network to be processed by the programmable neural network unit. 20. The graphics processing system as in claim 19 , wherein the programmable neural network unit includes multiple neural network hardware blocks, each of the multiple neural network hardware blocks associated with respective layers of the neural network.
Adversarial learning · CPC title
Supervised learning · CPC title
Generative networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Distributed learning, e.g. federated learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.