System cache optimizations for deep learning compute engines
US-11003592-B2 · May 11, 2021 · US
US11586558B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11586558-B2 |
| Application number | US-202117307299-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 4, 2021 |
| Priority date | Apr 24, 2017 |
| Publication date | Feb 21, 2023 |
| Grant date | Feb 21, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an example, an apparatus comprises a plurality of compute engines; and logic, at least partially including hardware logic, to detect a cache line conflict in a last-level cache (LLC) communicatively coupled to the plurality of compute engines; and implement context-based eviction policy to determine a cache way in the cache to evict in order to resolve the cache line conflict. Other embodiments are also disclosed and claimed.
Opening claim text (preview).
The invention claimed is: 1. An apparatus comprising: deep learning (DL) hardware circuitry communicatively coupled to a last-level cache (LLC) by an interconnect, wherein the DL hardware circuitry to execute one or more layers of a DL network using the LLC and further comprise a prefetch scheduler to: determine target data addresses of a next DL processing cycle; send prefetch commands to the LLC based on the target data addresses, wherein the prefetch commands to cause data blobs in main memory to be traversed using a traversal schedule that tiles data in the LLC; and a controller to communicably coupled to the DL hardware circuitry and the LLC, the controller to: receive the prefetch commands and allocate the target data addresses from main memory; detect a cache line conflict in the LLC; implement context-based eviction policy to determine a cache way in the LLC to evict in order to resolve the cache line conflict; and assign one or more of the DL hardware circuitry as clients of the LLC. 2. The apparatus of claim 1 , wherein the controller is further to: assign a context identifier (ID) to the clients of the LLC. 3. The apparatus of claim 2 , wherein: the context-based eviction policy is a function of the context identifier. 4. The apparatus of claim 2 , wherein: the LLC can be reconfigured dynamically into a plurality of individually addressable caches. 5. The apparatus of claim 4 , wherein: the LLC can be reconfigured with a variable cache size. 6. An electronic device, comprising: a processor having a plurality of compute processing resources; a deep learning (DL) hardware circuitry communicatively coupled to the compute processing resources and a last-level cache (LLC) by an interconnect, wherein the DL hardware circuitry is to execute one or more layers of a DL network using the LLC and further comprises a prefetch scheduler to: determine target data addresses of a next DL processing cycle; and send prefetch commands to the LLC based on the target data addresses, wherein the prefetch commands to cause data blobs in main memory to be traversed using a traversal schedule that tiles data in the LLC; and a controller to communicably coupled to the DL hardware circuitry and the LLC, the controller to: receive the prefetch commands and allocate the target data addresses from the main memory; detect a cache line conflict in the LLC; implement context-based eviction policy to determine a cache way in the LLC to evict in order to resolve the cache line conflict; and assign one or more of the DL hardware circuitry as clients of the LLC. 7. The electronic device of claim 6 , wherein the controller is further to: assign a context identifier (ID) to the clients of the LLC. 8. The electronic device of claim 7 , wherein: the context-based eviction policy is a function of the context identifier. 9. The electronic device of claim 7 , wherein: the LLC can be reconfigured dynamically into a plurality of individually addressable caches. 10. The electronic device of claim 9 , wherein: the LLC can be reconfigured with a variable cache size. 11. A method comprising: executing, by deep learning (DL) hardware circuitry communicatively coupled to a last-level cache (LLC) by an interconnect, one or more layers of DL network using the LLC; determining, by a prefetch scheduler of the DL hardware circuitry, target data addresses of a next DL processing cycle; sending, by the prefetch scheduler, prefetch commands to the LLC based on the target data addresses, wherein the prefetch commands to cause data blobs in main memory to be traversed using a traversal schedule that tiles data in the LLC; receiving, by a controller communicably coupled to the DL hardware circuitry and the LLC, the prefetch commands; allocating, by the controller, the target data addresses from the main memory; detecting, by the controller, a cache line conflict in the LLC; implementing, by the controller, context-based eviction policy to determine a cache way in the LLC to evict in order to resolve the cache line conflict; and assigning, by the controller, one or more of the DL hardware circuitry as clients of the LLC. 12. The method of claim 11 , further comprising: assigning a context identifier (ID) to the clients of the LLC. 13. The method of claim 12 , wherein: the context-based eviction policy is a function of the context identifier. 14. The method of claim 12 , wherein the LLC can be reconfigured dynamically into a plurality of individually addressable caches. 15. The method of claim 14 , wherein the LLC can be reconfigured with a variable cache size. 16. A non-transitory computer-readable medium comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: execute, by deep learning (DL) hardware circuitry communicatively coupled to a last-level cache (LLC) by an interconnect, one or more layers of DL network using the LLC; determine, by a prefetch scheduler of the DL hardware circuitry, target data addresses of a next DL processing cycle; send, by the prefetch scheduler, prefetch commands to the LLC based on the target data addresses, wherein the prefetch commands to cause data blobs in main memory to be traversed using a traversal schedule that tiles data in the LLC; receive, by a controller communicably coupled to the DL hardware circuitry and the LLC, the prefetch commands; allocate, by the controller, the target data addresses from the main memory; detect, by the controller, a cache line conflict in the LLC; implement, by the controller, context-based eviction policy to determine a cache way in the LLC to evict in order to resolve the cache line conflict; and assign, by the controller, one or more of the DL hardware circuitry as clients of the LLC. 17. The non-transitory computer-readable medium of claim 16 , wherein the at least one processor to perform one or more operations further to: assign a context identifier (ID) to the clients of the LLC. 18. The non-transitory computer-readable medium of claim 17 , wherein: the context-based eviction policy is a function of the context identifier. 19. The non-transitory computer-readable medium of claim 17 , wherein the LLC can be reconfigured dynamically into a plurality of individually addressable caches. 20. The non-transitory computer-readable medium of claim 19 , wherein the LLC can be reconfigured with a variable cache size.
Supervised learning · CPC title
Distributed learning, e.g. federated learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
with a shared cache · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.