Caching techniques for deep learning accelerator

US12094531B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12094531-B2
Application numberUS-202117146314-A
CountryUS
Kind codeB2
Filing dateJan 11, 2021
Priority dateJan 11, 2021
Publication dateSep 17, 2024
Grant dateSep 17, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, the accelerator can have processing units to perform at least matrix computations of an artificial neural network via execution of instructions. The processing units have a local memory store operands of the instructions. The accelerator can access a random access memory via a system buffer, or without going through the system buffer. A fetch instruction can request an item, available at a memory address in the random access memory, to be loaded into the local memory at a local address. The fetch instruction can include a hint for the caching of the item in the system buffer. During execution of the instruction, the hint can be used to determine whether to load the item through the system buffer or to bypass the system buffer in loading the item.

First claim

Opening claim text (preview).

What is claimed is: 1. A device, comprising: a plurality of processing units configured to execute instructions and perform at least matrix computations of an artificial neural network via execution of the instructions; a local memory coupled to the processing units and configured to store at least operands of the instructions during operations of the processing units in execution of the instructions; a memory configured as a buffer; a random access memory; and a logic circuit coupled to the buffer, the local memory, and the random access memory; wherein the instructions include a first instruction to fetch an item from the random access memory to the local memory; the first instruction includes a field related to caching the item in the buffer; and during execution of the first instruction the logic circuit is configured to determine whether to load the item through the buffer based at least in part on the field specified in the first instruction. 2. The device of claim 1 , wherein whether to load the item through the buffer is further based on a data type of the item. 3. The device of claim 2 , wherein a second determination of whether to load the item through the buffer is in response to a first determination that: the item is not already cached in the buffer; the item is not a set of instructions to be executed by the processing units; and second instructions cached in the local memory do not request the data item. 4. The device of claim 3 , wherein when the field has a first value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the item has a first type; or through the buffer, in response to a determination that the item has a second type. 5. The device of claim 4 , wherein the item of the first type containing data representative of weights of artificial neurons in the artificial neural network; and the item of the second type containing data representative of inputs to the artificial neurons in the artificial neural network. 6. The device of claim 4 , wherein the item of the second type containing data representative of weights of artificial neurons in the artificial neural network; and the item of the first type containing data representative of inputs to the artificial neurons in the artificial neural network. 7. The device of claim 4 , wherein when the field has a second value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the item has the first type and the buffer has insufficient capacity to cache the item without evicting data currently cached in the buffer; through the buffer, in response to a determination that the item has the second type; or through the buffer, in response to a determination that the item has the first type and the buffer has sufficient capacity to cache the item without evicting data currently cached in the buffer. 8. The device of claim 7 , wherein the item of the second type containing data representative of weights of artificial neurons in the artificial neural network; and the item of the first type containing data representative of inputs to the artificial neurons in the artificial neural network. 9. The device of claim 8 , wherein when the field has a third value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the buffer has insufficient capacity to cache the item without evicting data currently cached in the buffer; or through the buffer, in response to a determination that the buffer has sufficient capacity to cache the item without evicting data currently cached in the buffer. 10. The device of claim 9 , wherein when the field has a fourth value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the item has the second type, or through the buffer, in response to a determination that the item has the first type. 11. An apparatus, comprising: a Field-Programmable Gate Array (FPGA) or Application Specific Integrated circuit (ASIC), including: a memory interface; at least one processing unit configured to operate on two matrix operands of an instruction executed in the FPGA or ASIC; and a local memory configured to store operands of the instruction during execution of the instruction; a system buffer connected to the memory interface; and a random access memory connected to the memory interface; wherein in response to a first instruction specifying a memory address, a local address, and a hint, the memory interface is configured to determine, based on the hint, whether to fetch an item, available at the memory address in the random access memory, to the local address in the local memory, through the system buffer. 12. The apparatus of claim 11 , wherein whether to load the item through the system buffer is further based on a data type of the item. 13. The apparatus of claim 12 , wherein whether to load the item through the system buffer is further based on availability of free space in the system buffer to cache the item without evicting data currently cached in the system buffer. 14. The apparatus of claim 12 , wherein the data type is one of: representative of weights of artificial neurons in an artificial neural network; and representative of inputs to the artificial neurons in the artificial neural network. 15. A method, comprising: executing, by a plurality of processing units of a device, instructions to perform at least matrix computations of an artificial neural network; storing, in a local memory coupled to the processing units in the device, at least operands of the instructions during operations of the processing units in execution of the instructions; receiving a first instruction having a memory address and a local address to request an item at the memory address in a random access memory of the device to be fetched into the local memory at the local address, the first instruction having a field identifying a hint for caching the item in a system buffer of the device; and determining, during execution of the first instruction, whether to load the item through the system buffer based at least in part on the hint specified in the first instruction and a data type of the item. 16. The method of claim 15 , wherein when the hint has a first value, the item is loaded from the random access memory to the local memory: without going through the system buffer, in response to a determination that the item has a first type, or through the system buffer, in response to a determination that the item has a second type. 17. The method of claim 16 , wherein when the hint has a second value, the item is loaded from the random access memory to the local memory: without going through the system buffer, in response to a determination that the item has the first type and the system buffer has insufficient capacity to cache the item without evicting data currently cached in the system buffer; through the system buffer, in response to a determination that the item has the second type; or through the system buffer, in response to a determination that the item has the first type and the system buffer has sufficient capacity to cache the item without evicting data currently cached in the system buffer. 18. The method of claim 17 , wherein the item of the second type containing data representative of weights of arti

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12094531B2 cover?
Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, the accelerator can have processing units to perform at least matrix computations of an artificial neural network via execution of instructions. The processing units have a local memory store operands of the instructions. The accelerator can access a random access memory via a system buff…
Who is the assignee on this patent?
Micron Technology Inc
What technology area does this patent fall under?
Primary CPC classification G11C11/54. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).