Memory Reduction Method For Fixed Point Matrix Multiply
US-2017270073-A1 · Sep 21, 2017 · US
US10186011B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10186011-B2 |
| Application number | US-201715581182-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 28, 2017 |
| Priority date | Apr 28, 2017 |
| Publication date | Jan 22, 2019 |
| Grant date | Jan 22, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.
Opening claim text (preview).
The invention claimed is: 1. A compute apparatus to perform machine learning operations, the compute apparatus comprising: a processor comprising: a fetch unit to fetch a single instruction; a decode unit to decode the single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation; and a parameter analyzer to determine a type of machine learning operations to perform for the single instruction. 2. The compute apparatus as in claim 1 , the fetch unit to store the single instruction to a cache memory. 3. The compute apparatus as in claim 2 , the parameter analyzer to determine the type of machine learning operations to perform for the single instruction via analysis of parameters associated with the decoded instruction. 4. The compute apparatus as in claim 3 , additionally including machine learning accelerator to determine a set of operations to perform to execute the decoded instruction. 5. The compute apparatus as in claim 4 , additionally including a micro-controller to execute firmware instructions, the firmware instructions to enable the parameter analyzer and the machine learning accelerator. 6. The compute apparatus as in claim 1 , wherein the complex machine learning compute operation is to perform a convolution for a convolutional neural network. 7. The compute apparatus as in claim 6 , wherein the convolution includes multiple matrix operations. 8. The compute apparatus as in claim 7 , additionally including a scheduler controller to schedule the multiple matrix operations to one or more of multiple types of compute units. 9. The compute apparatus as in claim 8 , wherein the multiple types of compute units include a general-purpose graphics compute unit and a sparse compute unit. 10. The compute apparatus as in claim 8 , wherein the multiple types of compute units include a general-purpose graphics compute unit and a near-data compute unit. 11. A method of performing machine learning operations, the method comprising: fetching and decoding a single instruction into a decoded instruction, the decoded instruction associated with a set of multiple machine learning operations to be performed via a compute pipeline of a general-purpose graphics processing unit; determining a set of pipeline commands to perform the set of multiple machine learning operations; and scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit. 12. The method as in claim 11 , wherein determining a set of pipeline commands to perform the set of multiple machine learning operations includes analyzing parameters associated with the decoded instruction. 13. The method as in claim 11 , additionally comprising retiring the decoded instruction in response to completion of the set of pipeline commands. 14. The method as in claim 11 , wherein the single instruction is to cause the general-purpose graphics processing unit to perform a convolution for a layer of a convolutional neural network. 15. The method as in claim 11 , wherein scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit includes scheduling the set of pipeline commands to multiple compute pipelines, the multiple compute pipelines including a general-purpose compute pipeline and at least one compute pipeline selected from a sparse compute pipeline or a near-data compute pipeline. 16. A data processing system comprising: a general-purpose graphics processing unit including a fetch unit to fetch a single instruction and a decode unit to decode the single instruction into a decoded instruction, the decoded instruction to cause the data processing system to execute multiple pipeline commands to perform a complex machine learning compute operation; and a memory coupled to the general-purpose graphics processing unit. 17. The data processing system as in claim 16 , the general-purpose graphics processing unit including a parameter analyzer to determine a type of machine learning operations to perform for the single instruction and machine learning accelerator to determine the multiple pipeline commands to execute to perform the complex machine learning compute operation. 18. The data processing system as in claim 17 , the general-purpose graphics processing unit including a micro-controller to execute firmware instructions, the firmware instructions to enable the parameter analyzer and the machine learning accelerator. 19. The data processing system as in claim 16 , additionally including a scheduler controller to schedule multiple matrix operations to one or more of multiple types of compute units. 20. The data processing system as in claim 19 , wherein the multiple types of compute units include a general-purpose graphics compute unit and one of a sparse compute unit or a near data compute unit.
Related publications grouped by family.
Answers are generated from the same data shown on this page.