Machine learning accelerator mechanism

US12039435B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12039435-B2
Application numberUS-202217845794-A
CountryUS
Kind codeB2
Filing dateJun 21, 2022
Priority dateDec 30, 2017
Publication dateJul 16, 2024
Grant dateJul 16, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: at least one processor to perform operations to implement a neural network and to perform forward propagation compute and backward propagation compute for the neural network; and accelerator circuitry communicatively coupled to the at least one processor, the accelerator circuitry to: receive, from the at least one processor, input matrix data for each layer of the neural network used during performance of the forward propagation compute, the input matrix data received during performance of the forward propagation compute by the at least one processor; store the input matrix data in a memory of the accelerator circuitry; compute a transpose for the input matrix data; receive a weight matrix from the at least one processor; compute, in parallel with the backward propagation compute performed by the at least one processor that is separate from the accelerator circuitry, weight gradients by multiplying the transpose of the input matrix data with the weight matrix; and compute, in parallel with normalization operations of the neural network performed by the at least one processor, mean and variance calculations for the normalization operations. 2. The apparatus of claim 1 , wherein the at least one processor comprises a graphics processing unit (GPU) communicably coupled to a central processing unit (CPU), and wherein the accelerator circuitry comprises a Differentiable Neural Computer (DNC). 3. The apparatus of claim 2 , wherein the DNC comprising an external memory coupled to the CPU and the GPU to store knowledge data for the neural network. 4. The apparatus of claim 3 , wherein the CPU performs transformation compute operations for the neural network. 5. The apparatus of claim 4 , wherein the CPU performs a zero copy operation to facilitate a transfer of data between the CPU and the GPU. 6. The apparatus of claim 2 , wherein the accelerator circuitry comprises a scheduler to analyze CPU and GPU resources and a compute graph of the neural network, assign nodes of the compute graph to the CPU and the GPU resources and schedule the compute graph for processing at the CPU and the GPU resources. 7. The apparatus of claim 6 , wherein analyzing the CPU and GPU resources and the compute graph comprises determining a computation cost of operators to be performed at the CPU and the GPU. 8. The apparatus of claim 7 , wherein assigning the nodes of the compute graph to the CPU and the GPU resources comprises determining a shortest path for each of the operators based on the computation cost. 9. The apparatus of claim 7 , wherein the computation cost is determined based on individually processing the operators at the CPU and the GPU. 10. The apparatus of claim 7 , wherein the computation cost is determined based on simultaneously processing the operators at the CPU and the GPU. 11. A method comprising: receiving, by accelerator circuitry from at least one processor that implements a neural network, input matrix data for each layer of the neural network used during performance of a forward propagation compute for the neural network performed at the at least one processor, the input matrix data received during performance of the forward propagation compute by the at least one processor; storing the input matrix data in a memory of the accelerator circuitry; computing, by the accelerator circuitry, a transpose for the input matrix data; receiving, by the accelerator circuitry, a weight matrix from the at least one processor; computing, by the accelerator circuitry in parallel with backward propagation compute performed by the at least one processor that is separate from the accelerator circuitry, weight gradients by multiplying the transpose of the input matrix data with the weight matrix; and computing, by the accelerator circuitry in parallel with normalization operations of the neural network performed the at least one processor, mean and variance calculations for the normalization operations. 12. The method of claim 11 , wherein the at least one processor comprises a graphics processing unit (GPU) communicably coupled to a central processing unit (CPU), and wherein the accelerator circuitry comprises Differentiable Neural Computer (DNC), wherein the accelerator circuitry comprises a scheduler to analyze CPU and GPU resources and a compute graph of the neural network, assign nodes of the compute graph to the CPU and the GPU resources and schedule the compute graph for processing at the CPU and the GPU resources, and wherein analyzing the CPU and the GPU resources and the compute graph comprises determining a computation cost of operators to be performed at the CPU and the GPU. 13. The method of claim 12 , wherein the computation cost is determined based on individually processing the operators at the CPU and the GPU. 14. The method of claim 12 , wherein the computation cost is determined based on simultaneously processing the operators at the CPU and the GPU. 15. The method of claim 12 , wherein assigning the nodes of the compute graph to the CPU and the GPU resources comprises determining a shortest path for each of the operators based on the computation cost. 16. A system comprising: a memory; at least one processor communicably coupled to the memory, the at least one processor to perform operations to implement a neural network and to perform forward propagation compute and backward propagation compute for the neural network; and accelerator circuitry communicatively coupled to the memory and the at least one processor, the accelerator circuitry to: receive, from the at least one processor, input matrix data for each layer of the neural network used during performance of the forward propagation compute, the input matrix data received during the performance of the forward propagation compute by the at least one processor; store the input matrix data in accelerator circuitry memory; compute a transpose for the input matrix data; receive a weight matrix from the at least one processor; compute, in parallel with the backward propagation compute performed by the at least one processor that is separate from the accelerator circuitry, weight gradients by multiplying the transpose of the input matrix data with the weight matrix; and compute, in parallel with normalization operations of the neural network performed by the at least one processor, mean and variance calculations for the normalization operations. 17. The system of claim 16 , wherein the at least one processor comprises a graphics processing unit (GPU) communicably coupled to a central processing unit (CPU), and wherein the accelerator circuitry comprises a Differentiable Neural Computer (DNC). 18. The system of claim 17 , wherein the DNC comprising an external memory coupled to the CPU and the GPU to store knowledge data for the neural network. 19. The system of claim 18 , wherein the CPU performs transformation compute operations for the neural network. 20. The system of claim 19 , wherein the CPU performs a zero copy operation to facilitate a transfer of data between the CPU and the GPU. 21. A non-transitory computer-readable medium having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving, by accelerator circuitry from at least one processor of the one or more processors that implements a neural network, input matrix data for each layer of the neural netwo

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Neural networks · CPC title

  • Arrangements for program control, e.g. control units (program control for peripheral devices G06F13/10) · CPC title

  • Machine learning · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12039435B2 cover?
An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).