Deploying parallelizable deep learning models by adapting to the computing devices

US12547874B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12547874-B2
Application numberUS-202117245541-A
CountryUS
Kind codeB2
Filing dateApr 30, 2021
Priority dateApr 30, 2021
Publication dateFeb 10, 2026
Grant dateFeb 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an approach to deploying parallelizable deep learning models by adapting to the computing devices, a deep learning model is split into a plurality of slices, where each slice can exchange data with related slices. Virtual models are created from the plurality of slices, where the virtual models are based on capabilities of a plurality of devices on which the one or more virtual models are to be deployed, and further where each virtual model contains each slice of the plurality of slices. The one or more virtual models are stored in a cache. Responsive to determining that the deep learning model is to be deployed on one or more devices, a candidate model is selected from the virtual models in the cache, where the selection is based on information from a device monitor about the devices.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: splitting, by one or more computer processors, a deep learning model comprising a recurrent neural network (RNN) into a plurality of slices, wherein each of the slices comprises at least one of a plurality of layers comprising the deep learning model, wherein each of the slices requires a computing capacity, and wherein each of the slices records its back or forward slice to remember a last state of an input and to build up a double linked layer network; collecting, by a device monitor, a plurality of device resources of a plurality of devices; monitoring a risk factor of the plurality of devices based on the device resources; selecting a plurality of top n devices from the devices, wherein the top n devices have a lowest risk factor; distributing, by a virtual model cache, the slices to the top n devices based on the computing capacity required by the slices and the device resources of the top n devices; and running the deep learning model on the top n devices. 2 . The computer-implemented method of claim 1 , wherein each slice of the plurality of slices comprises a different network layer of a plurality of different network layers of the deep learning model. 3 . The computer-implemented method of claim 1 , wherein the splitting further comprises: responsive to determining that the deep learning model cannot be split into the plurality of slices by layers, splitting, by the one or more computer processors, the deep learning model into the plurality of slices based on a set of predetermined rules, wherein the set of predetermined rules split the deep learning model into a plurality of smallest parallelizable layers. 4 . The computer-implemented method of claim 1 , further comprising: encoding, by the one or more computer processors, features of the plurality of devices per time slice, wherein the features include at least one of, but are not limited to, central processing unit (CPU) capacity, graphical processing unit (GPU) capacity; and disk capacity, and further wherein the time slice is a predetermined period of time; predicting, by the one or more computer processors, the risk factor for each device of the plurality of devices, wherein the risk factor is predicted using a long short-term memory (LSTM) model; responsive to determining that the risk factor for any device of the plurality of devices exceeds a predetermined threshold, selecting, by the one or more computer processors, the top n devices of the plurality of devices, wherein n is a predetermined number, and further wherein the top n devices have a lowest risk factor; responsive to selecting the top n devices of the plurality of devices, creating, by the one or more computer processors, one or more new virtual models from the plurality of slices, wherein the one or more new virtual models are based on the device resources of the top n devices of the plurality of devices; and updating, by the one or more computer processors, the cache with the one or more new virtual models. 5 . The computer-implemented method of claim 1 , wherein the device monitor collects the device capabilities from the plurality of devices on a time schedule. 6 . The computer-implemented method of claim 1 , further comprising: monitoring, by the one or more computer processors, a health of each device of the plurality of devices; responsive to determining that the health of any device of the plurality of devices is below a predetermined threshold, marking, by the one or more computer devices, the any device as a failed device; removing, by the one or more computer processors, the failed device from the plurality of devices; creating, by the one or more computer processors, a new virtual model, wherein the new virtual model does not include the failed device; deploying, by the one or more computer processors, the new virtual model to the one or more devices of the plurality of devices; and updating, by the one or more computer processors, the cache with the new virtual model. 7 . The computer-implemented method of claim 1 , further comprising: creating, by the one or more computer processors, one or more virtual models from the plurality of slices, wherein the one or more virtual models are based on the device resources of the plurality of devices on which the one or more virtual models are to be deployed, and further wherein each virtual model of the plurality of virtual models contains each slice of the plurality of slices; and confirming, by the one or more computer processors, that the one or more virtual models match the deep learning model, wherein confirming that the one or more virtual models match the deep learning model is determined by one or more predetermined validation rules. 8 . A computer program product comprising one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions including instructions to: split, by one or more computer processors, a deep learning model comprising a recurrent neural network (RNN) into a plurality of slices, wherein each of the slices comprises at least one of a plurality of layers comprising the deep learning model, wherein each of the slices requires a computing capacity, and wherein each of the slices records its back or forward slice to remember a last state of an input and to build up a double linked layer network; collect, by a device monitor, a plurality of device resources of a plurality of devices; monitor a risk factor of the plurality of devices based on the device resources; select a plurality of top n devices from the devices, wherein the top n devices have a lowest risk factor; distribute, by a virtual model cache, the slices to the top n devices based on the computing capacity required by the slices and the device resources of the top n devices; and run the deep learning model on the top n devices. 9 . The computer program product of claim 8 , wherein each slice of the plurality of slices comprises a different network layer of a plurality of different network layers of the deep learning model. 10 . The computer program product of claim 8 , wherein the instructions to split further comprise: responsive to determining that the deep learning model cannot be split into the plurality of slices by layers, splitting, by the one or more computer processors, the deep learning model into the plurality of slices based on a set of predetermined rules, wherein the set of predetermined rules split the deep learning model into a plurality of smallest parallelizable layers. 11 . The computer program product of claim 8 , further comprising instructions to: encode features of the plurality of devices per time slice, wherein the features include at least one of, but are not limited to, central processing unit (CPU) capacity, graphical processing unit (GPU) capacity; and disk capacity, and further wherein the time slice is a predetermined period of time; predict the risk factor for each device of the plurality of devices, wherein the risk factor is predicted using a long short-term memory (LSTM) model; responsive to determining that the risk factor for any device of the plurality of devices exceeds a predetermined threshold, select the top n devices of the plurality of devices, wherein n is a predetermined number, and further wherein the top n devices have a lowest risk factor; responsive to selecting the top n devices of the plurality of devices, create one or more new virtual models from the plurality of slices, wherein the one or more new virtual models are based on the device resources of the top n devices of th

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Distributed learning, e.g. federated learning · CPC title

  • G06N3/044Primary

    Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/0442Primary

    characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12547874B2 cover?
In an approach to deploying parallelizable deep learning models by adapting to the computing devices, a deep learning model is split into a plurality of slices, where each slice can exchange data with related slices. Virtual models are created from the plurality of slices, where the virtual models are based on capabilities of a plurality of devices on which the one or more virtual models are to…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/044. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).