What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Machine learning sparse computation mechanism for arbitrary neural networks, arithmetic compute microarchitecture, and sparsity for training mechanism

US12014265B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12014265-B2
Application number	US-202318302889-A
Country	US
Kind code	B2
Filing date	Apr 19, 2023
Priority date	Dec 29, 2017
Publication date	Jun 18, 2024
Grant date	Jun 18, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus to facilitate processing of a sparse matrix for arbitrary graph data is disclosed. The apparatus includes a graphics processing unit having a data management unit (DMU) that includes a scheduler for scheduling matrix operations, an active logic for tracking active input operands, and a skip logic for tracking unimportant input operands to be skipped by the scheduler. Processing circuitry is coupled to the DMU. The processing circuitry comprises a plurality of processing elements including logic to read operands and a multiplication unit to multiply two or more operands for the arbitrary graph data and customizable circuitry to provide custom functions.

First claim

Opening claim text (preview).

What is claimed is: 1. A hardware accelerator, comprising: a data management unit (DMU) including a scheduler to schedule matrix operations and a buffer to store active input operands; and a plurality of processing elements coupled to the DMU, each processing element includes an input buffer for edge data and message data, and customizable circuitry to support a vertex program for an arbitrary neural network, wherein the customizable circuitry is dynamically synthesized based on input including the vertex program for the arbitrary neural network. 2. The hardware accelerator of claim 1 , wherein the vertex program to specify types of data associated with edges and vertices in a graph that defines the arbitrary neural network and messages to be sent across vertices in the graph. 3. The hardware accelerator of claim 2 , the plurality of processing elements configured to execute the vertex program via the customizable circuitry. 4. The hardware accelerator of claim 3 , the customizable circuitry configured to support customized functions to be used to execute the vertex program. 5. The hardware accelerator of claim 4 , the customized functions including a multiply, accumulate, activate, and send message function. 6. The hardware accelerator of claim 1 , the hardware accelerator including a plurality of tiles, each tile including an instance of the DMU and the plurality of processing elements. 7. The hardware accelerator of claim 6 , each tile of the plurality of tiles including memory coupled with the plurality of processing elements of the tile. 8. The hardware accelerator of claim 7 , including circuitry configured to load vertex data to be processed by the vertex program into the memory of a tile of the plurality of tiles. 9. The hardware accelerator of claim 8 , wherein a processing element of a tile of the plurality of tiles is configured to: stream edge data from the memory into the input buffer for the edge data; and perform a function provided by the customizable circuitry on the edge data. 10. The hardware accelerator of claim 9 , the DMU configured to write output from the processing element to a memory external to the tile. 11. A graphics processor comprising: a host interface; and a data management unit (DMU) coupled with the host interface, the DMU including a scheduler to schedule matrix operations and a buffer to store active input operands; and a plurality of processing elements coupled to the DMU, each processing element includes an input buffer for edge data and message data, and customizable circuitry to: support a vertex program for an arbitrary neural network, and wherein the customizable circuitry is dynamically synthesized based on input including the vertex program for the arbitrary neural network. 12. The graphics processor of claim 11 , wherein the vertex program to specify types of data associated with edges and vertices in a graph that defines the arbitrary neural network and messages to be sent across vertices in the graph. 13. The graphics processor of claim 12 , the plurality of processing elements configured to execute the vertex program via the customizable circuitry. 14. The graphics processor of claim 13 , the customizable circuitry configured to support customized functions to be used to execute the vertex program. 15. The graphics processor of claim 14 , the customized functions including a multiply, accumulate, activate, and send message function. 16. The graphics processor of claim 11 , including a plurality of tiles, each tile including an instance of the DMU and the plurality of processing elements. 17. The graphics processor of claim 16 , each tile of the plurality of tiles including memory coupled with the plurality of processing elements of the tile. 18. The graphics processor of claim 17 , including circuitry configured to load vertex data to be processed by the vertex program into the memory of a tile of the plurality of tiles. 19. The graphics processor of claim 18 , wherein a processing element of a tile of the plurality of tiles is configured to: stream edge data from the memory into the input buffer for the edge data; and perform a function provided by the customizable circuitry on the edge data. 20. The graphics processor of claim 19 , the DMU configured to write output from the processing element to a memory external to the tile.

Assignees

Intel Corp

Inventors

Classifications

G06N3/04
Architecture, e.g. interconnection topology · CPC title
G06N3/063Primary
using electronic means · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title

Patent family

Related publications grouped by family.

View patent family 64559621

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12014265B2 cover?: An apparatus to facilitate processing of a sparse matrix for arbitrary graph data is disclosed. The apparatus includes a graphics processing unit having a data management unit (DMU) that includes a scheduler for scheduling matrix operations, an active logic for tracking active input operands, and a skip logic for tracking unimportant input operands to be skipped by the scheduler. Processing cir…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).