System and method for adapting executable object to a processing unit
US-11550600-B2 · Jan 10, 2023 · US
US12530204B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12530204-B2 |
| Application number | US-202418582311-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 20, 2024 |
| Priority date | Nov 7, 2019 |
| Publication date | Jan 20, 2026 |
| Grant date | Jan 20, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are generally directed to a system and method for adapting executable object to a processing unit. An embodiment of a method to adapt an executable object from a first processing unit to a second processing unit, comprises: adapting the executable object optimized for the first processing unit of a first architecture, to the second processing unit of a second architecture, wherein the second architecture is different from the first architecture, wherein the executable object is adapted to perform on the second processing unit based on a plurality of performance metrics collected while the executable object is performed on the first processing unit and the second processing unit.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: identifying, by processing circuitry of a computing device, a performance aspect in a database that defines a correspondence between the performance aspect and an adaptation operation; and applying the adaptation operation in the database. 2 . The method of claim 1 , wherein the adaption operation corresponds to the identified performance aspect present in the database, the method further comprising: adapting the executable object optimized for a first processing unit of a first architecture, to a second processing unit of a second architecture, wherein the second architecture is different from the first architecture, wherein the executable object is adapted to perform on the second processing unit based on a plurality of performance metrics collected while the executable object is performed on the first processing unit and the second processing unit; identifying the performance aspect of the executable object based on a first plurality of performance metrics of the executable object while the executable object is performed on the first processing unit and a second plurality of performance metrics of the executable object while the executable object is performed on the second processing unit, wherein the plurality of performance metrics include the first and second plurality of performance metrics, wherein the database further includes architectural changes corresponding to the performance aspect, and wherein applying the adaptation operation to the executable object comprises: determining architectural changes of the second architecture with respect to the first architecture based on the identified performance aspect; and applying the adaptation operation to the executable object that corresponds to the determined architectural changes, wherein the identified performance aspect includes an instruction cache utilization, a constant cache utilization, a data cache utilization, and a data processing efficiency. 3 . The method of claim 2 , wherein the instruction cache utilization includes an instruction cache latency, and the adaptation operation corresponding to the instruction cache latency includes disabling loop unrolling, wherein the constant cache utilization includes a constant cache latency coverage, and the adaptation operation corresponding to the constant cache latency coverage includes constant folding, wherein the data cache utilization includes a data cache miss ratio, and the adaptation operation corresponding to the data cache miss ratio includes decreasing a working set or changing a data access pattern, and wherein the data processing efficiency includes a calculation throughput, and the adaptation operation corresponding to the calculation throughput includes reducing an instruction count, wherein the performance aspect is identified using a machine learning based algorithm or a decision tree flow. 4 . The method of claim 2 , further comprising presenting the determined architectural changes in response to a determination that the identified performance aspect is not present in the database, wherein the first processing unit comprises a graphics processing unit supporting a single instruction, multiple data (SIMD) architecture and the second processing unit comprises a graphics processing unit supporting a single instruction, multiple threads (SIMT) architecture, wherein the processing circuitry is coupled to a memory, the processing circuitry comprising one or more of graphics processing circuitry or application processing circuitry. 5 . The method of claim 1 , further comprising presenting the identified performance aspect in response to a determination that the identified performance aspect is not present in the database. 6 . An apparatus comprising: processing circuitry coupled to a memory, the processing circuitry to: identify a performance aspect in a database that defines a correspondence between the performance aspect and an adaptation operation; and apply the adaptation operation in the database. 7 . The apparatus of claim 6 , wherein the adaption operation corresponds to the identified performance aspect present in the database, wherein the processing circuitry is further to: adapt the executable object optimized for a first processing unit of a first architecture, to a second processing unit of a second architecture, wherein the second architecture is different from the first architecture, wherein the executable object is adapted to perform on the second processing unit based on a plurality of performance metrics collected while the executable object is performed on the first processing unit and the second processing unit; identify the performance aspect of the executable object based on a first plurality of performance metrics of the executable object while the executable object is performed on the first processing unit and a second plurality of performance metrics of the executable object while the executable object is performed on the second processing unit, wherein the plurality of performance metrics include the first and second plurality of performance metrics, wherein the database further includes architectural changes corresponding to the performance aspect, and wherein applying the adaptation operation to the executable object comprises: determining architectural changes of the second architecture with respect to the first architecture based on the identified performance aspect; and applying the adaptation operation to the executable object that corresponds to the determined architectural changes, wherein the identified performance aspect includes an instruction cache utilization, a constant cache utilization, a data cache utilization, and a data processing efficiency. 8 . The apparatus of claim 7 , wherein the instruction cache utilization includes an instruction cache latency, and the adaptation operation corresponding to the instruction cache latency includes disabling loop unrolling, wherein the constant cache utilization includes a constant cache latency coverage, and the adaptation operation corresponding to the constant cache latency coverage includes constant folding, wherein the data cache utilization includes a data cache miss ratio, and the adaptation operation corresponding to the data cache miss ratio includes decreasing a working set or changing a data access pattern, and wherein the data processing efficiency includes a calculation throughput, and the adaptation operation corresponding to the calculation throughput includes reducing an instruction count, wherein the performance aspect is identified using a machine learning based algorithm or a decision tree flow. 9 . The apparatus of claim 8 , wherein the processor circuitry is further to present the determined architectural changes in response to a determination that the identified performance aspect is not present in the database, wherein the first processing unit comprises a graphics processing unit supporting a single instruction, multiple data (SIMD) architecture and the second processing unit comprises a graphics processing unit supporting a single instruction, multiple threads (SIMT) architecture, wherein the processing circuitry comprises one or more of graphics processing circuitry or application processing circuitry. 10 . The apparatus of claim 7 , wherein the processor circuitry is further to present the identified performance aspect in response to a determination that the identified performance aspect is not present in the database. 11 . At least one non-transitory computer-readable medium having stored thereon instructions which, when executed, cause a computing device to perform operations comprisin
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Memory management · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.