Programmable coarse grained and sparse matrix compute hardware with advanced scheduling

US10186011B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10186011-B2
Application numberUS-201715581182-A
CountryUS
Kind codeB2
Filing dateApr 28, 2017
Priority dateApr 28, 2017
Publication dateJan 22, 2019
Grant dateJan 22, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

First claim

Opening claim text (preview).

The invention claimed is: 1. A compute apparatus to perform machine learning operations, the compute apparatus comprising: a processor comprising: a fetch unit to fetch a single instruction; a decode unit to decode the single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation; and a parameter analyzer to determine a type of machine learning operations to perform for the single instruction. 2. The compute apparatus as in claim 1 , the fetch unit to store the single instruction to a cache memory. 3. The compute apparatus as in claim 2 , the parameter analyzer to determine the type of machine learning operations to perform for the single instruction via analysis of parameters associated with the decoded instruction. 4. The compute apparatus as in claim 3 , additionally including machine learning accelerator to determine a set of operations to perform to execute the decoded instruction. 5. The compute apparatus as in claim 4 , additionally including a micro-controller to execute firmware instructions, the firmware instructions to enable the parameter analyzer and the machine learning accelerator. 6. The compute apparatus as in claim 1 , wherein the complex machine learning compute operation is to perform a convolution for a convolutional neural network. 7. The compute apparatus as in claim 6 , wherein the convolution includes multiple matrix operations. 8. The compute apparatus as in claim 7 , additionally including a scheduler controller to schedule the multiple matrix operations to one or more of multiple types of compute units. 9. The compute apparatus as in claim 8 , wherein the multiple types of compute units include a general-purpose graphics compute unit and a sparse compute unit. 10. The compute apparatus as in claim 8 , wherein the multiple types of compute units include a general-purpose graphics compute unit and a near-data compute unit. 11. A method of performing machine learning operations, the method comprising: fetching and decoding a single instruction into a decoded instruction, the decoded instruction associated with a set of multiple machine learning operations to be performed via a compute pipeline of a general-purpose graphics processing unit; determining a set of pipeline commands to perform the set of multiple machine learning operations; and scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit. 12. The method as in claim 11 , wherein determining a set of pipeline commands to perform the set of multiple machine learning operations includes analyzing parameters associated with the decoded instruction. 13. The method as in claim 11 , additionally comprising retiring the decoded instruction in response to completion of the set of pipeline commands. 14. The method as in claim 11 , wherein the single instruction is to cause the general-purpose graphics processing unit to perform a convolution for a layer of a convolutional neural network. 15. The method as in claim 11 , wherein scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit includes scheduling the set of pipeline commands to multiple compute pipelines, the multiple compute pipelines including a general-purpose compute pipeline and at least one compute pipeline selected from a sparse compute pipeline or a near-data compute pipeline. 16. A data processing system comprising: a general-purpose graphics processing unit including a fetch unit to fetch a single instruction and a decode unit to decode the single instruction into a decoded instruction, the decoded instruction to cause the data processing system to execute multiple pipeline commands to perform a complex machine learning compute operation; and a memory coupled to the general-purpose graphics processing unit. 17. The data processing system as in claim 16 , the general-purpose graphics processing unit including a parameter analyzer to determine a type of machine learning operations to perform for the single instruction and machine learning accelerator to determine the multiple pipeline commands to execute to perform the complex machine learning compute operation. 18. The data processing system as in claim 17 , the general-purpose graphics processing unit including a micro-controller to execute firmware instructions, the firmware instructions to enable the parameter analyzer and the machine learning accelerator. 19. The data processing system as in claim 16 , additionally including a scheduler controller to schedule multiple matrix operations to one or more of multiple types of compute units. 20. The data processing system as in claim 19 , wherein the multiple types of compute units include a general-purpose graphics compute unit and one of a sparse compute unit or a near data compute unit.

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Learning methods · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10186011B2 cover?
One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 22 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).