Sparse convolutional neural network accelerator
US-10891538-B2 · Jan 12, 2021 · US
US12353334B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12353334-B2 |
| Application number | US-202418407816-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 9, 2024 |
| Priority date | Apr 24, 2017 |
| Publication date | Jul 8, 2025 |
| Grant date | Jul 8, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an example, an apparatus comprises a plurality of compute engines; and logic, at least partially including hardware logic, to detect a cache line conflict in a last-level cache (LLC) communicatively coupled to the plurality of compute engines; and implement context-based eviction policy to determine a cache way in the cache to evict in order to resolve the cache line conflict. Other embodiments are also disclosed and claimed.
Opening claim text (preview).
The invention claimed is: 1. A system comprising: a last level cache (LLC) dynamically divided into private caches each corresponding to compute engines performing concurrent compute operations on different deep learning (DL) layers of a DL neural network; DL hardware circuitry communicatively coupled to the LLC by an interconnect, wherein the DL hardware circuitry comprising the compute engines to execute the concurrent compute operations on the DL layers using the LLC, wherein each compute engine corresponds to a different one of the DL layers; and a system cache controller communicably coupled to the DL hardware circuitry and the LLC, the system cache controller to: receive, from the DL hardware circuitry, a cache access request from a first compute engine of the compute engines performing the concurrent compute operations for a first DL layer of the DL neural network; and direct the cache access request to a first private cache of the private caches of the LLC, the first private cache corresponding to the first compute engine. 2. The system of claim 1 , wherein the cache access request comprises a context identifier (ID) corresponding to the DL hardware circuitry that originated the cache access request and metadata that indicates a total size of data to be accessed in one or more subsequent data access transactions; and wherein the system cache controller is further to: assign one or more of the DL hardware circuitry as clients of the LLC; and assign the context ID to the clients of the LLC. 3. The system of claim 2 , wherein the system cache controller is further to implement a cache eviction policy that is a function of the context identifier. 4. The system of claim 2 , wherein the LLC may be reconfigured dynamically into the private caches. 5. The system of claim 4 , wherein the LLC may be reconfigured with a variable cache size. 6. The system of claim 2 , wherein each of the clients of the LLC is assigned one of the private caches of the LLC for dedicated access. 7. The system of claim 2 , wherein at least one of the clients of the LLC can exchange data with each of other clients of the LLC and can access each of the private caches of the LLC. 8. A method comprising: dividing a last level cache (LLC) into private caches each corresponding to compute engines performing concurrent compute operations on different deep learning (DL) layers of a DL neural network; executing, by DL hardware circuitry communicatively coupled to the LLC via an interconnect, the concurrent compute operations on the DL layers using the LLC, wherein the DL hardware circuitry comprises the compute engines to perform the concurrent compute operations, and wherein each compute engine corresponds to a different one of the DL layers; receiving, by a system cache controller from the DL hardware circuitry, a cache access request from a first compute engine of the compute engines performing the concurrent compute operations for a first DL layer of the DL neural network; and directing, by the system cache controller, the cache access request to a first private cache of the private caches of the LLC, the first private cache corresponding to the first compute engine. 9. The method of claim 8 , wherein the cache access request comprises a context identifier (ID) corresponding to the DL hardware circuitry that originated the cache access request and metadata that indicates a total size of data to be accessed in one or more subsequent data access transactions; and wherein the system cache controller is further to assign one or more of the DL hardware circuitry as clients of the LLC and assign the context ID to the clients of the LLC. 10. The method of claim 9 , further comprising implementing, by the system cache controller, a cache eviction policy that is a function of the context identifier. 11. The method of claim 9 , wherein the LLC may be reconfigured dynamically into the private caches. 12. The method of claim 11 , wherein the LLC may be reconfigured with a variable cache size. 13. The method of claim 9 , wherein each of the clients of the LLC is assigned one of the private caches of the LLC for dedicated access. 14. The method of claim 9 , wherein at least one of the clients of the LLC can exchange data with each of other clients of the LLC and can access each of the private caches of the LLC. 15. A non-transitory computer-readable medium comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: dividing a last level cache (LLC) into private caches each corresponding to compute engines performing concurrent compute operations on different deep learning (DL) layers of a DL neural network; executing, by DL hardware circuitry communicatively coupled to the LLC via an interconnect, the concurrent compute operations on the DL layers using the LLC, wherein the DL hardware circuitry comprises the compute engines to perform the concurrent compute operations, and wherein each compute engine corresponds to a different one of the DL layers; receiving, by a system cache controller from the DL hardware circuitry, a cache access request from a first compute engine of the compute engines performing the concurrent compute operations for a first DL layer of the DL neural network; and directing, by the system cache controller, the cache access request to a first private cache of the private caches of the LLC, the first private cache corresponding to the first compute engine. 16. The non-transitory computer-readable medium of claim 15 , wherein the cache access request comprises a context identifier (ID) corresponding to the DL hardware circuitry that originated the cache access request and metadata that indicates a total size of data to be accessed in one or more subsequent data access transactions; and wherein the system cache controller is further to assign one or more of the DL hardware circuitry as clients of the LLC and assign the context ID to the clients of the LLC. 17. The non-transitory computer-readable medium of claim 16 , further comprising implementing, by the system cache controller, a cache eviction policy that is a function of the context identifier. 18. The non-transitory computer-readable medium of claim 16 , wherein the LLC may be reconfigured dynamically into the private caches. 19. The non-transitory computer-readable medium of claim 16 , wherein each of the clients of the LLC is assigned one of the private caches of the LLC for dedicated access. 20. The non-transitory computer-readable medium of claim 16 , wherein at least one of the clients of the LLC can exchange data with each of other clients of the LLC and can access each of the private caches of the LLC.
Supervised learning · CPC title
Distributed learning, e.g. federated learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.