What technology area does this patent fall under?

Primary CPC classification G11C11/54. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Caching techniques for deep learning accelerator

US12094531B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12094531-B2
Application number	US-202117146314-A
Country	US
Kind code	B2
Filing date	Jan 11, 2021
Priority date	Jan 11, 2021
Publication date	Sep 17, 2024
Grant date	Sep 17, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, the accelerator can have processing units to perform at least matrix computations of an artificial neural network via execution of instructions. The processing units have a local memory store operands of the instructions. The accelerator can access a random access memory via a system buffer, or without going through the system buffer. A fetch instruction can request an item, available at a memory address in the random access memory, to be loaded into the local memory at a local address. The fetch instruction can include a hint for the caching of the item in the system buffer. During execution of the instruction, the hint can be used to determine whether to load the item through the system buffer or to bypass the system buffer in loading the item.

First claim

Opening claim text (preview).

What is claimed is: 1. A device, comprising: a plurality of processing units configured to execute instructions and perform at least matrix computations of an artificial neural network via execution of the instructions; a local memory coupled to the processing units and configured to store at least operands of the instructions during operations of the processing units in execution of the instructions; a memory configured as a buffer; a random access memory; and a logic circuit coupled to the buffer, the local memory, and the random access memory; wherein the instructions include a first instruction to fetch an item from the random access memory to the local memory; the first instruction includes a field related to caching the item in the buffer; and during execution of the first instruction the logic circuit is configured to determine whether to load the item through the buffer based at least in part on the field specified in the first instruction. 2. The device of claim 1 , wherein whether to load the item through the buffer is further based on a data type of the item. 3. The device of claim 2 , wherein a second determination of whether to load the item through the buffer is in response to a first determination that: the item is not already cached in the buffer; the item is not a set of instructions to be executed by the processing units; and second instructions cached in the local memory do not request the data item. 4. The device of claim 3 , wherein when the field has a first value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the item has a first type; or through the buffer, in response to a determination that the item has a second type. 5. The device of claim 4 , wherein the item of the first type containing data representative of weights of artificial neurons in the artificial neural network; and the item of the second type containing data representative of inputs to the artificial neurons in the artificial neural network. 6. The device of claim 4 , wherein the item of the second type containing data representative of weights of artificial neurons in the artificial neural network; and the item of the first type containing data representative of inputs to the artificial neurons in the artificial neural network. 7. The device of claim 4 , wherein when the field has a second value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the item has the first type and the buffer has insufficient capacity to cache the item without evicting data currently cached in the buffer; through the buffer, in response to a determination that the item has the second type; or through the buffer, in response to a determination that the item has the first type and the buffer has sufficient capacity to cache the item without evicting data currently cached in the buffer. 8. The device of claim 7 , wherein the item of the second type containing data representative of weights of artificial neurons in the artificial neural network; and the item of the first type containing data representative of inputs to the artificial neurons in the artificial neural network. 9. The device of claim 8 , wherein when the field has a third value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the buffer has insufficient capacity to cache the item without evicting data currently cached in the buffer; or through the buffer, in response to a determination that the buffer has sufficient capacity to cache the item without evicting data currently cached in the buffer. 10. The device of claim 9 , wherein when the field has a fourth value, the item is loaded from the random access memory to the local memory: without going through the buffer, in response to a determination that the item has the second type, or through the buffer, in response to a determination that the item has the first type. 11. An apparatus, comprising: a Field-Programmable Gate Array (FPGA) or Application Specific Integrated circuit (ASIC), including: a memory interface; at least one processing unit configured to operate on two matrix operands of an instruction executed in the FPGA or ASIC; and a local memory configured to store operands of the instruction during execution of the instruction; a system buffer connected to the memory interface; and a random access memory connected to the memory interface; wherein in response to a first instruction specifying a memory address, a local address, and a hint, the memory interface is configured to determine, based on the hint, whether to fetch an item, available at the memory address in the random access memory, to the local address in the local memory, through the system buffer. 12. The apparatus of claim 11 , wherein whether to load the item through the system buffer is further based on a data type of the item. 13. The apparatus of claim 12 , wherein whether to load the item through the system buffer is further based on availability of free space in the system buffer to cache the item without evicting data currently cached in the system buffer. 14. The apparatus of claim 12 , wherein the data type is one of: representative of weights of artificial neurons in an artificial neural network; and representative of inputs to the artificial neurons in the artificial neural network. 15. A method, comprising: executing, by a plurality of processing units of a device, instructions to perform at least matrix computations of an artificial neural network; storing, in a local memory coupled to the processing units in the device, at least operands of the instructions during operations of the processing units in execution of the instructions; receiving a first instruction having a memory address and a local address to request an item at the memory address in a random access memory of the device to be fetched into the local memory at the local address, the first instruction having a field identifying a hint for caching the item in a system buffer of the device; and determining, during execution of the first instruction, whether to load the item through the system buffer based at least in part on the hint specified in the first instruction and a data type of the item. 16. The method of claim 15 , wherein when the hint has a first value, the item is loaded from the random access memory to the local memory: without going through the system buffer, in response to a determination that the item has a first type, or through the system buffer, in response to a determination that the item has a second type. 17. The method of claim 16 , wherein when the hint has a second value, the item is loaded from the random access memory to the local memory: without going through the system buffer, in response to a determination that the item has the first type and the system buffer has insufficient capacity to cache the item without evicting data currently cached in the system buffer; through the system buffer, in response to a determination that the item has the second type; or through the system buffer, in response to a determination that the item has the first type and the system buffer has sufficient capacity to cache the item without evicting data currently cached in the system buffer. 18. The method of claim 17 , wherein the item of the second type containing data representative of weights of arti

Assignees

Micron Technology Inc

Inventors

Classifications

G06N3/092
Reinforcement learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/08
Learning methods · CPC title
G06F12/0862
with prefetch · CPC title

Patent family

Related publications grouped by family.

View patent family 82322020

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12094531B2 cover?: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, the accelerator can have processing units to perform at least matrix computations of an artificial neural network via execution of instructions. The processing units have a local memory store operands of the instructions. The accelerator can access a random access memory via a system buff…
Who is the assignee on this patent?: Micron Technology Inc
What technology area does this patent fall under?: Primary CPC classification G11C11/54. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).