Integrated Circuit Device with Deep Learning Accelerator and Random Access Memory
US-2021319821-A1 · Oct 14, 2021 · US
US12094531B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12094531-B2 |
| Application number | US-202117146314-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 11, 2021 |
| Priority date | Jan 11, 2021 |
| Publication date | Sep 17, 2024 |
| Grant date | Sep 17, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, the accelerator can have processing units to perform at least matrix computations of an artificial neural network via execution of instructions. The processing units have a local memory store operands of the instructions. The accelerator can access a random access memory via a system buffer, or without going through the system buffer. A fetch instruction can request an item, available at a memory address in the random access memory, to be loaded into the local memory at a local address. The fetch instruction can include a hint for the caching of the item in the system buffer. During execution of the instruction, the hint can be used to determine whether to load the item through the system buffer or to bypass the system buffer in loading the item.
Opening claim text (preview).
What is claimed is: 1. A device, comprising: a plurality of processing units configured to execute instructions and perform at least matrix computations of an artificial neural network via execution of the instructions; a local memory coupled to the processing units and configured to store at least operands of the instructions during operations of the processing units in execution of the instructions; a memory configured as a buffer; a random access memory; and a logic circuit coupled to the buffer, the local memory, and the random access memory; wherein the instructions include a first instruction to fetch an item from the random access memory to the local memory; the first instruction includes a field related to caching the item in the buffer; and during execution of the first instruction the logic circuit is configured to determine whether to load the item through the buffer based at least in part on the field specified in the first instruction. 2. The device of claim 1 , wherein whether to load the item through the buffer is further based on a data type of the item. 3. The device of claim 2 , wherein a second determination of whether to load the item through the buffer is in response to a first determination that: the item is not already cached in the buffer; the item is not a set of instructions to be executed by the processing units; and second instructions cached in the local memory do not request the data item. 4. The device of claim 3 , wherein when the field has a first value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the item has a first type; or through the buffer, in response to a determination that the item has a second type. 5. The device of claim 4 , wherein the item of the first type containing data representative of weights of artificial neurons in the artificial neural network; and the item of the second type containing data representative of inputs to the artificial neurons in the artificial neural network. 6. The device of claim 4 , wherein the item of the second type containing data representative of weights of artificial neurons in the artificial neural network; and the item of the first type containing data representative of inputs to the artificial neurons in the artificial neural network. 7. The device of claim 4 , wherein when the field has a second value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the item has the first type and the buffer has insufficient capacity to cache the item without evicting data currently cached in the buffer; through the buffer, in response to a determination that the item has the second type; or through the buffer, in response to a determination that the item has the first type and the buffer has sufficient capacity to cache the item without evicting data currently cached in the buffer. 8. The device of claim 7 , wherein the item of the second type containing data representative of weights of artificial neurons in the artificial neural network; and the item of the first type containing data representative of inputs to the artificial neurons in the artificial neural network. 9. The device of claim 8 , wherein when the field has a third value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the buffer has insufficient capacity to cache the item without evicting data currently cached in the buffer; or through the buffer, in response to a determination that the buffer has sufficient capacity to cache the item without evicting data currently cached in the buffer. 10. The device of claim 9 , wherein when the field has a fourth value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the item has the second type, or through the buffer, in response to a determination that the item has the first type. 11. An apparatus, comprising: a Field-Programmable Gate Array (FPGA) or Application Specific Integrated circuit (ASIC), including: a memory interface; at least one processing unit configured to operate on two matrix operands of an instruction executed in the FPGA or ASIC; and a local memory configured to store operands of the instruction during execution of the instruction; a system buffer connected to the memory interface; and a random access memory connected to the memory interface; wherein in response to a first instruction specifying a memory address, a local address, and a hint, the memory interface is configured to determine, based on the hint, whether to fetch an item, available at the memory address in the random access memory, to the local address in the local memory, through the system buffer. 12. The apparatus of claim 11 , wherein whether to load the item through the system buffer is further based on a data type of the item. 13. The apparatus of claim 12 , wherein whether to load the item through the system buffer is further based on availability of free space in the system buffer to cache the item without evicting data currently cached in the system buffer. 14. The apparatus of claim 12 , wherein the data type is one of: representative of weights of artificial neurons in an artificial neural network; and representative of inputs to the artificial neurons in the artificial neural network. 15. A method, comprising: executing, by a plurality of processing units of a device, instructions to perform at least matrix computations of an artificial neural network; storing, in a local memory coupled to the processing units in the device, at least operands of the instructions during operations of the processing units in execution of the instructions; receiving a first instruction having a memory address and a local address to request an item at the memory address in a random access memory of the device to be fetched into the local memory at the local address, the first instruction having a field identifying a hint for caching the item in a system buffer of the device; and determining, during execution of the first instruction, whether to load the item through the system buffer based at least in part on the hint specified in the first instruction and a data type of the item. 16. The method of claim 15 , wherein when the hint has a first value, the item is loaded from the random access memory to the local memory: without going through the system buffer, in response to a determination that the item has a first type, or through the system buffer, in response to a determination that the item has a second type. 17. The method of claim 16 , wherein when the hint has a second value, the item is loaded from the random access memory to the local memory: without going through the system buffer, in response to a determination that the item has the first type and the system buffer has insufficient capacity to cache the item without evicting data currently cached in the system buffer; through the system buffer, in response to a determination that the item has the second type; or through the system buffer, in response to a determination that the item has the first type and the system buffer has sufficient capacity to cache the item without evicting data currently cached in the system buffer. 18. The method of claim 17 , wherein the item of the second type containing data representative of weights of arti
Reinforcement learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Learning methods · CPC title
with prefetch · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.