Sparse convolutional neural network accelerator
US-10891538-B2 · Jan 12, 2021 · US
US11550600B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11550600-B2 |
| Application number | US-202017090295-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 5, 2020 |
| Priority date | Nov 7, 2019 |
| Publication date | Jan 10, 2023 |
| Grant date | Jan 10, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are generally directed to a system and method for adapting executable object to a processing unit. An embodiment of a method to adapt an executable object from a first processing unit to a second processing unit, comprises: adapting the executable object optimized for the first processing unit of a first architecture, to the second processing unit of a second architecture, wherein the second architecture is different from the first architecture, wherein the executable object is adapted to perform on the second processing unit based on a plurality of performance metrics collected while the executable object is performed on the first processing unit and the second processing unit.
Opening claim text (preview).
What is claimed is: 1. A method comprising: adapting an executable object optimized for a first processing unit of a first architecture, to a second processing unit of a second architecture, wherein the second architecture is different from the first architecture, wherein the executable object is adapted to perform on the second processing unit based on a plurality of performance metrics collected while the executable object is performed on the first processing unit and the second processing unit; identifying a performance aspect of the executable object; and determining whether an identified performance aspect is present in a database that defines a correspondence between the performance aspect and an adaptation operation. 2. The method of claim 1 , further comprising: wherein identifying the performance aspect of the executable object is based on a first plurality of performance metrics of the executable object while the executable object is performed on the first processing unit and a second plurality of performance metrics of the executable object while the executable object is performed on the second processing unit, wherein the plurality of performance metrics include the first and second plurality of performance metrics; and applying an adaptation operation in the database that corresponds to the identified performance aspect to the executable object in response to a determination that the identified performance aspect is present in the database. 3. The method of claim 2 , wherein the database further includes architectural changes corresponding to the performance aspect, and said applying the adaptation operation to the executable object comprises: determining architectural changes of the second architecture with respect to the first architecture based on the identified performance aspect; and applying the adaptation operation to the executable object that corresponds to the determined architectural changes, wherein the identified performance aspect includes an instruction cache utilization, a constant cache utilization, a data cache utilization, and a data processing efficiency. 4. The method of claim 3 , wherein the instruction cache utilization includes an instruction cache latency, and the adaptation operation corresponding to the instruction cache latency includes disabling loop unrolling, wherein the constant cache utilization includes a constant cache latency coverage, and the adaptation operation corresponding to the constant cache latency coverage includes constant folding, wherein the data cache utilization includes a data cache miss ratio, and the adaptation operation corresponding to the data cache miss ratio includes decreasing a working set or changing a data access pattern, and wherein the data processing efficiency includes a calculation throughput, and the adaptation operation corresponding to the calculation throughput includes reducing an instruction count, wherein the performance aspect is identified using a machine learning based algorithm or a decision tree flow. 5. The method of claim 3 , further comprising presenting the determined architectural changes in response to a determination that the identified performance aspect is not present in the database, wherein the first processing unit is a graphics processing unit supporting SIMD architecture and the second processing unit is a graphics processing unit supporting SIMT architecture. 6. The method of claim 2 , further comprising presenting the identified performance aspect in response to a determination that the identified performance aspect is not present in the database. 7. An apparatus comprising: a processor to: adapt an executable object optimized for a first processing unit of a first architecture, to a second processing unit of a second architecture, wherein the second architecture is different from the first architecture, wherein the executable object is adapted to perform on the second processing unit based on a plurality of performance metrics collected while the executable object is performed on the first processing unit and the second processing unit; identify a performance aspect of the executable object; and determine whether an identified performance aspect is present in a database that defines a correspondence between the performance aspect and an adaptation operation. 8. The apparatus of claim 7 , wherein to identify the performance aspect of the executable object is based on a first plurality of performance metrics of the executable object while the executable object is performed on the first processing unit and a second plurality of performance metrics of the executable object while the executable object is performed on the second processing unit, wherein the plurality of performance metrics include the first and second plurality of performance metrics, wherein the processor is further to; apply an adaptation operation in the database that corresponds to the identified performance aspect to the executable object in response to a determination that the identified performance aspect is present in the database. 9. The apparatus of claim 8 , wherein the database further includes architectural changes corresponding to the performance aspect, and when applying the adaptation operation to the executable object, the processor is further to: determine architectural changes of the second architecture with respect to the first architecture based on the identified performance aspect; and apply the adaptation operation to the executable object that corresponds to the determined architectural changes, wherein the identified performance aspect includes an instruction cache utilization, a constant cache utilization, a data cache utilization, and a data processing efficiency. 10. The apparatus of claim 9 , wherein the instruction cache utilization includes an instruction cache latency, and the adaptation operation corresponding to the instruction cache latency includes disabling loop unrolling, wherein the constant cache utilization includes a constant cache latency coverage, and the adaptation operation corresponding to the constant cache latency coverage includes constant folding, wherein the data cache utilization includes a data cache miss ratio, and the adaptation operation corresponding to the data cache miss ratio includes decreasing a working set or changing a data access pattern, and wherein the data processing efficiency includes a calculation throughput, and the adaptation operation corresponding to the calculation throughput includes reducing an instruction count, wherein the performance aspect is identified using a machine learning based algorithm or a decision tree flow, wherein the performance aspect is identified using a machine learning based algorithm or a decision tree flow. 11. The apparatus of claim 10 , wherein the processor is further to present the identified performance aspect in response to a determination that the identified performance aspect is not present in the database. 12. The apparatus of claim 10 , wherein the processor is further to present the determined architectural changes in response to a determination that the identified performance aspect is not present in the database, wherein the first processing unit is a graphics processing unit supporting SIMD architecture and the second processing unit is a graphics processing unit supporting SIMT architecture. 13. At least one non-transitory computer-readable medium comprising a plurality of instructions which, when executed, cause a computing device to perform operations comprising: adapting an executable object optimized for a first processing unit of a first architecture, to a
Training; Learning · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Activation functions · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.