Sparse convolutional neural network accelerator
US-10891538-B2 · Jan 12, 2021 · US
US11868264B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11868264-B2 |
| Application number | US-202318168157-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 13, 2023 |
| Priority date | Apr 1, 2017 |
| Publication date | Jan 9, 2024 |
| Grant date | Jan 9, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory. The circuitry is configured to compress the compute data before a write of the compute data via the memory interface to the memory bus, in association with a read of the compute data associated with the multiple cache lines via the memory interface, decompress the compute data, and provide the decompressed compute data to the processing resource.
Opening claim text (preview).
What is claimed is: 1. A general-purpose graphics processor comprising: a memory interface; a cache memory communicatively coupled with the memory interface; a processing resource communicatively coupled with the memory interface and the cache memory, the processing resource to perform a general-purpose compute operation; and circuitry communicatively coupled with the cache memory and the memory interface, the circuitry to: compress compute data at cache sector granularity, the cache sector granularity is a sub-block granularity, said compress compute data including compress multiple cache lines of the compute data associated with a sector before a write of the compressed compute data associated with the sector via the memory interface, in association with a read of the compressed compute data associated with the multiple cache lines via the memory interface, decompress the compressed compute data to generate decompressed compute data, and provide the decompressed compute data to the processing resource for performance of the general-purpose compute operation. 2. The general-purpose graphics processor as in claim 1 , wherein the circuitry is further configured to access un-compressed memory regions at the cache sector granularity. 3. The general-purpose graphics processor as in claim 1 , wherein the circuitry is additionally configured to decompress compute data associated with the cache memory at multiple cache line granularity. 4. The general-purpose graphics processor as in claim 3 , wherein the processing resource is configured to perform the general-purpose compute operation on the decompressed compute data. 5. The general-purpose graphics processor as in claim 3 , wherein the compute data associated with the multiple cache lines is associated with a tile of compute data in a memory accessible via the memory interface. 6. The general-purpose graphics processor as in claim 1 , wherein the circuitry is configured to decompress a partial cache line in association with a read of compute data associated with the partial cache line. 7. The general-purpose graphics processor as in claim 1 , wherein the processing resource is configured to update compute data associated with the multiple cache lines via the general-purpose compute operation and the circuitry. 8. A method comprising: on a general-purpose graphics processor having a cache memory communicatively coupled with a memory interface: performing a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory via a processing resource communicatively coupled with the memory interface and the cache memory; compressing the compute data at cache sector granularity via circuitry communicatively coupled with the cache memory and the memory interface, wherein the cache sector granularity is a sub-block granularity and the circuitry is configured to compress multiple cache lines of the compute data associated with the sector before a write of the compressed compute data via the memory interface; in association with a read of the compressed compute data associated with the multiple cache lines via the memory interface, decompressing the compressed compute data via the circuitry to generate decompressed compute data; and providing the decompressed compute data to the processing resource for performance of the general-purpose compute operation. 9. The method as in claim 8 , further comprising accessing un-compressed memory regions at the cache sector granularity. 10. The method as in claim 8 , further comprising decompressing the compute data at multiple cache line granularity. 11. The method as in claim 10 , further comprising: performing the general-purpose compute operation on the decompressed compute data. 12. The method as in claim 8 , further comprising: decompressing a partial cache line in association with a read of compute data associated with the partial cache line. 13. The method as in claim 8 , further comprising: updating compute data associated with the multiple cache lines via the general-purpose compute operation. 14. A data processing system comprising: a memory device; and a general-purpose graphics processor including: a memory interface communicatively coupled with the memory device; a cache memory communicatively coupled with the memory interface; a processing resource communicatively coupled with the memory interface and the cache memory, the processing resource to perform a general purpose compute operation; and circuitry communicatively coupled with the cache memory and the memory interface, the circuitry to: compress compute data at cache sector granularity, the cache sector granularity is a sub-block granularity, said compress compute data including compress multiple cache lines of the compute data associated with a sector before a write of the compressed compute data associated with the sector via the memory interface, in association with a read of the compressed compute data associated with the multiple cache lines via the memory interface, decompress the compressed compute data to generate decompressed compute data, and provide the decompressed compute data to the processing resource for performance of the general-purpose compute operation. 15. The data processing system as in claim 14 , wherein the circuitry is further configured to accesses un-compressed memory regions at the cache sector granularity. 16. The data processing system as in claim 14 , wherein the circuitry is configured to decompress compute data associated with the cache memory at multiple cache line granularity. 17. The data processing system as in claim 16 , wherein the processing resource is configured to perform the general-purpose compute operation on the decompressed compute data. 18. The data processing system as in claim 16 , wherein the compute data associated with the multiple cache lines is associated with a tile of compute data in a memory accessible via the memory device. 19. The data processing system as in claim 14 , wherein the circuitry is additionally configured to decompress a partial cache line in association with a read of compute data associated with the partial cache line. 20. The data processing system as in claim 14 , wherein the processing resource is to configured to update compute data associated with the multiple cache lines via the general-purpose compute operation and the circuitry.
Cache access modes · CPC title
Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches · CPC title
Multiuser, multiprocessor or multiprocessing cache systems · CPC title
Partitioned cache, e.g. separate instruction and operand caches · CPC title
Overlapped cache accessing, e.g. pipeline (G06F12/0846 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.