Method, electronic device, and computer program product for deploying machine learning model

US12493785B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12493785-B2
Application numberUS-202017129222-A
CountryUS
Kind codeB2
Filing dateDec 21, 2020
Priority dateNov 27, 2020
Publication dateDec 9, 2025
Grant dateDec 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure relate to a method, an electronic device, and a computer program product for deploying a machine learning model. The method includes: acquiring a machine learning model in accordance with an open neural network exchange format; converting the machine learning model to an intermediate representation using a multi-level intermediate representation method; and deploying a computation associated with the machine learning model to at least one computing device using the intermediate representation.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: acquiring, in a front end of a compiler executing on at least one processing unit coupled to at least one memory, a machine learning model in accordance with an open neural network exchange format; converting, in a back end of the compiler, the machine learning model from the open neural network exchange format to an intermediate representation using a multi-level intermediate representation method, the intermediate representation having a plurality of distinct levels including at least a first level comprising a first representation format having associated therewith a first compiler utility of the back end of the compiler, and a second level comprising a second representation format different than the first representation format and having associated therewith a second compiler utility of the back end of the compiler, different than the first compiler utility; generating, in the back end of the compiler, first hardware-specific code from the first level of the intermediate representation for a first device type, the first hardware-specific code being generated utilizing the first compiler utility and being configured in accordance with a first parallelism algorithm implemented by a first scheduler for parallel performance of computations of the first hardware-specific code, the first parallelism algorithm providing one of data parallelism, model parallelism and pipelined parallelism; generating, in the back end of the compiler, second hardware-specific code from the second level of the intermediate representation for a second device type different than the first device type, the second hardware-specific code being generated utilizing the second compiler utility and being configured in accordance with a second parallelism algorithm, different than the first parallelism algorithm, implemented by a second scheduler, different than the first scheduler, for parallel performance of computations of the second hardware-specific code, the second parallelism algorithm providing a different one of the data parallelism, model parallelism and pipelined parallelism than that provided by the first parallelism algorithm; deploying a computation associated with the machine learning model to at least one computing device using the intermediate representation; wherein deploying the computation comprises: linking the first hardware-specific code to a first application programming interface associated with a first neural network architecture; linking the second hardware-specific code to a second application programming interface associated with a second neural network architecture different than the first neural network architecture; executing the first hardware-specific code generated for the first device type on a first computing device having the first device type, via the first application programming interface; and executing the second hardware-specific code generated for the second device type on a second computing device having the second device type, via the second application programming interface; automatically detecting a change in a hardware configuration of at least one of the first computing device having the first device type and the second computing device having the second device type; and regenerating, in the back end of the compiler, at least one of the first hardware-specific code and the second hardware-specific code, responsive to the automatically detected change, for execution utilizing the changed hardware configuration. 2 . The method according to claim 1 , further including: acquiring a computation graph associated with the machine learning model, wherein the computation graph represents dependencies between multiple parts of the computation associated with the machine learning model; and executing in parallel the multiple parts of the computation based on the computation graph and the intermediate representation. 3 . The method according to claim 2 , further including: determining parameters associated with the multiple parts of the computation based on the computation graph and the intermediate representation; and storing, if it is determined that at least one of the parameters is associated with at least two of the multiple parts, data associated with the at least one parameter for use by the at least two parts of the computation. 4 . The method according to claim 2 , wherein executing in parallel the multiple parts includes: executing in parallel the multiple parts in response to receiving a user instruction for parallel execution of the multiple parts; or executing in parallel the multiple parts based on a pre-configuration regarding parallel execution. 5 . The method according to claim 1 , further including: executing in parallel the computation and a computation associated with another machine learning model, wherein the other machine learning model is acquired in accordance with the open neural network exchange format and has been converted to another intermediate representation using the multi-level intermediate representation method, and the computation associated with the other machine learning model has been deployed to the at least one computing device using the other intermediate representation. 6 . The method according to claim 1 , wherein the at least one computing device includes multiple computing devices, and deploying the computation associated with the machine learning model to the at least one computing device includes: determining device types corresponding to multiple parts of the computation associated with the machine learning model; and deploying the multiple parts of the computation to one or more of the multiple computing devices based on types of the multiple computing devices and the determined device types. 7 . The method according to claim 1 , wherein the at least one computing device includes at least one of the following: a central processing unit; and a dedicated processing unit. 8 . The method according to claim 1 , wherein the at least one computing device includes multiple computing devices, and the method further includes: redeploying, if a configuration of one of the multiple computing devices is changed, the computation to the multiple computing devices based on the changed configuration. 9 . An electronic device, including: at least one processing unit; and at least one memory which is coupled to the at least one processing unit and stores instructions for execution by the at least one processing unit, wherein the instructions, when executed by the at least one processing unit, cause the electronic device to perform actions comprising: acquiring, in a front end of a compiler executing on the at least one processing unit coupled to the at least one memory, a machine learning model in accordance with an open neural network exchange format; converting, in a back end of the compiler, the machine learning model from the open neural network exchange format to an intermediate representation using a multi-level intermediate representation method, the intermediate representation having a plurality of distinct levels including at least a first level comprising a first representation format having associated therewith a first compiler utility of the back end of the compiler, and a second level comprising a second representation format different than the first representation format and having associated therewith a second compiler utility of the back end of the compiler, different than the first compiler utility; generating, in the back end of the compiler, first hardware-specific code from the first level of the intermediate representation for a first device type, the first hardware-specific code being gen

Assignees

Inventors

Classifications

  • Architecture, e.g. interconnection topology · CPC title

  • using electronic means · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • G06N3/105Primary

    Shells for specifying net layout · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12493785B2 cover?
Embodiments of the present disclosure relate to a method, an electronic device, and a computer program product for deploying a machine learning model. The method includes: acquiring a machine learning model in accordance with an open neural network exchange format; converting the machine learning model to an intermediate representation using a multi-level intermediate representation method; and…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).