Managing artificial intelligence model partitions for edge computing environment
US-2022012607-A1 · Jan 13, 2022 · US
US12493785B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12493785-B2 |
| Application number | US-202017129222-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 21, 2020 |
| Priority date | Nov 27, 2020 |
| Publication date | Dec 9, 2025 |
| Grant date | Dec 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present disclosure relate to a method, an electronic device, and a computer program product for deploying a machine learning model. The method includes: acquiring a machine learning model in accordance with an open neural network exchange format; converting the machine learning model to an intermediate representation using a multi-level intermediate representation method; and deploying a computation associated with the machine learning model to at least one computing device using the intermediate representation.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: acquiring, in a front end of a compiler executing on at least one processing unit coupled to at least one memory, a machine learning model in accordance with an open neural network exchange format; converting, in a back end of the compiler, the machine learning model from the open neural network exchange format to an intermediate representation using a multi-level intermediate representation method, the intermediate representation having a plurality of distinct levels including at least a first level comprising a first representation format having associated therewith a first compiler utility of the back end of the compiler, and a second level comprising a second representation format different than the first representation format and having associated therewith a second compiler utility of the back end of the compiler, different than the first compiler utility; generating, in the back end of the compiler, first hardware-specific code from the first level of the intermediate representation for a first device type, the first hardware-specific code being generated utilizing the first compiler utility and being configured in accordance with a first parallelism algorithm implemented by a first scheduler for parallel performance of computations of the first hardware-specific code, the first parallelism algorithm providing one of data parallelism, model parallelism and pipelined parallelism; generating, in the back end of the compiler, second hardware-specific code from the second level of the intermediate representation for a second device type different than the first device type, the second hardware-specific code being generated utilizing the second compiler utility and being configured in accordance with a second parallelism algorithm, different than the first parallelism algorithm, implemented by a second scheduler, different than the first scheduler, for parallel performance of computations of the second hardware-specific code, the second parallelism algorithm providing a different one of the data parallelism, model parallelism and pipelined parallelism than that provided by the first parallelism algorithm; deploying a computation associated with the machine learning model to at least one computing device using the intermediate representation; wherein deploying the computation comprises: linking the first hardware-specific code to a first application programming interface associated with a first neural network architecture; linking the second hardware-specific code to a second application programming interface associated with a second neural network architecture different than the first neural network architecture; executing the first hardware-specific code generated for the first device type on a first computing device having the first device type, via the first application programming interface; and executing the second hardware-specific code generated for the second device type on a second computing device having the second device type, via the second application programming interface; automatically detecting a change in a hardware configuration of at least one of the first computing device having the first device type and the second computing device having the second device type; and regenerating, in the back end of the compiler, at least one of the first hardware-specific code and the second hardware-specific code, responsive to the automatically detected change, for execution utilizing the changed hardware configuration. 2 . The method according to claim 1 , further including: acquiring a computation graph associated with the machine learning model, wherein the computation graph represents dependencies between multiple parts of the computation associated with the machine learning model; and executing in parallel the multiple parts of the computation based on the computation graph and the intermediate representation. 3 . The method according to claim 2 , further including: determining parameters associated with the multiple parts of the computation based on the computation graph and the intermediate representation; and storing, if it is determined that at least one of the parameters is associated with at least two of the multiple parts, data associated with the at least one parameter for use by the at least two parts of the computation. 4 . The method according to claim 2 , wherein executing in parallel the multiple parts includes: executing in parallel the multiple parts in response to receiving a user instruction for parallel execution of the multiple parts; or executing in parallel the multiple parts based on a pre-configuration regarding parallel execution. 5 . The method according to claim 1 , further including: executing in parallel the computation and a computation associated with another machine learning model, wherein the other machine learning model is acquired in accordance with the open neural network exchange format and has been converted to another intermediate representation using the multi-level intermediate representation method, and the computation associated with the other machine learning model has been deployed to the at least one computing device using the other intermediate representation. 6 . The method according to claim 1 , wherein the at least one computing device includes multiple computing devices, and deploying the computation associated with the machine learning model to the at least one computing device includes: determining device types corresponding to multiple parts of the computation associated with the machine learning model; and deploying the multiple parts of the computation to one or more of the multiple computing devices based on types of the multiple computing devices and the determined device types. 7 . The method according to claim 1 , wherein the at least one computing device includes at least one of the following: a central processing unit; and a dedicated processing unit. 8 . The method according to claim 1 , wherein the at least one computing device includes multiple computing devices, and the method further includes: redeploying, if a configuration of one of the multiple computing devices is changed, the computation to the multiple computing devices based on the changed configuration. 9 . An electronic device, including: at least one processing unit; and at least one memory which is coupled to the at least one processing unit and stores instructions for execution by the at least one processing unit, wherein the instructions, when executed by the at least one processing unit, cause the electronic device to perform actions comprising: acquiring, in a front end of a compiler executing on the at least one processing unit coupled to the at least one memory, a machine learning model in accordance with an open neural network exchange format; converting, in a back end of the compiler, the machine learning model from the open neural network exchange format to an intermediate representation using a multi-level intermediate representation method, the intermediate representation having a plurality of distinct levels including at least a first level comprising a first representation format having associated therewith a first compiler utility of the back end of the compiler, and a second level comprising a second representation format different than the first representation format and having associated therewith a second compiler utility of the back end of the compiler, different than the first compiler utility; generating, in the back end of the compiler, first hardware-specific code from the first level of the intermediate representation for a first device type, the first hardware-specific code being gen
Related publications grouped by family.
Answers are generated from the same data shown on this page.