What technology area does this patent fall under?

Primary CPC classification G06T1/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Programmable coarse grained and sparse matrix compute hardware with advanced scheduling

US11210760B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11210760-B2
Application number	US-202016928353-A
Country	US
Kind code	B2
Filing date	Jul 14, 2020
Priority date	Apr 28, 2017
Publication date	Dec 28, 2021
Grant date	Dec 28, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

First claim

Opening claim text (preview).

The invention claimed is: 1. A compute apparatus comprising: a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex compute operation including multiple pipeline commands; a memory controller including a near-data compute unit; first circuitry to schedule the multiple pipeline commands to one or more of multiple types of compute units, wherein the multiple types of compute units include a general-purpose graphics compute unit and a near-data compute unit; and second circuitry to determine operations to perform for the single instruction, the second circuitry coupled with the memory controller, wherein the operations include to offload a compute kernel to the near-data compute unit and to offload the compute kernel to the near-data compute unit includes to determine an address range for a near-data compute operation within the compute kernel and offload the compute kernel to the near-data compute unit in response to a determination that the memory controller is associated with the address range of the near-data compute operation. 2. The compute apparatus as in claim 1 , additionally including third circuitry including a fetch unit to fetch the single instruction and store the single instruction to a cache memory. 3. The compute apparatus as in claim 2 , the third circuitry additionally including the decode unit. 4. The compute apparatus as in claim 3 , wherein the complex compute operation is to perform a convolution for a layer of a convolutional neural network, wherein the convolution includes multiple matrix operations. 5. The compute apparatus as in claim 4 , wherein the multiple types of compute units include a sparse compute unit, the sparse compute unit is configured to accelerate primitives associated with the multiple matrix operations. 6. The compute apparatus as in claim 5 , wherein the multiple matrix operations are performed on one or more sparse matrices. 7. The compute apparatus as in claim 6 , wherein the compute kernel offloaded to the near-data compute unit is to perform a gather operation to read elements of the one or more sparse matrices from memory. 8. The compute apparatus as in claim 6 , wherein the compute kernel offloaded to the near-data compute unit is to perform a scatter operation to write elements of a sparse output matrix to memory, the sparse output matrix generated by at least one of the multiple matrix operations. 9. The compute apparatus as in claim 6 , additionally including a machine learning accelerator to determine a set of operations to perform to execute the decoded instruction, wherein the set of operations includes to offload the compute kernel to the near-data compute unit and determine the multiple pipeline commands to perform for the complex compute operation. 10. The compute apparatus as in claim 9 , additionally including a micro-controller to provide the machine learning accelerator. 11. A method of performing machine learning operations, the method comprising: decoding a single instruction into a decoded instruction, the decoded instruction associated with a set of multiple machine learning operations to be performed via a compute pipeline of a general-purpose graphics processing unit; determining a set of pipeline commands to perform the set of multiple machine learning operations, wherein the set of pipeline commands offload a near-data compute operation to a near-data compute unit; and scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit. 12. The method as in claim 11 , wherein determining the set of pipeline commands to perform the set of multiple machine learning operations includes analyzing parameters associated with the decoded instruction. 13. The method as in claim 12 , wherein analyzing parameters associated with the decoded instruction includes selecting the near-data compute unit from a set of multiple near-data compute units. 14. The method as in claim 13 , wherein the set of multiple near-data compute units are associated with a set of multiple memory controllers, each memory controller in the set of multiple memory controllers having an associated address range. 15. The method as in claim 14 , wherein selecting the near-data compute unit from a set of multiple near-data compute units includes selecting the near-data compute unit within a memory controller of the set of multiple memory controllers for a memory address associated with the near-data compute operation. 16. A data processing system comprising: a general-purpose graphics processing unit including a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the data processing system to execute multiple pipeline commands to perform a complex machine learning compute operation; a memory coupled to the general-purpose graphics processing unit; and a memory controller coupled with the general-purpose graphics processing unit and the memory, the memory controller including a near-data compute unit, wherein the multiple pipeline commands include a command to offload an operation of a compute kernel to the near-data compute unit. 17. The data processing system as in claim 16 , the general-purpose graphics processing unit additionally including a sparse compute unit, wherein the multiple pipeline commands include a command to perform a matrix operation via the sparse compute unit. 18. The data processing system as in claim 17 , wherein the operation offloaded to the near-data compute unit is a gather operation to read elements of a sparse matrix associated with the matrix operation. 19. The data processing system as in claim 17 , wherein the operation offloaded to the near-data compute unit is a scatter operation to write elements of a sparse matrix associated with the matrix operation. 20. The data processing system as in claim 17 , wherein the sparse compute unit is configured to accelerate a primitive associated with the matrix operation.

Assignees

Intel Corp

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title

Patent family

Related publications grouped by family.

View patent family 61691810

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11210760B2 cover?: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).