What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 20 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Partitioning and placement of models of models at a network edge

US12307294B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12307294-B2
Application number	US-202217578872-A
Country	US
Kind code	B2
Filing date	Jan 19, 2022
Priority date	Aug 18, 2021
Publication date	May 20, 2025
Grant date	May 20, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques and mechanisms are described for enabling a user to run heavy deep learning workloads on standard edge networks without off-loading computation to a cloud, leveraging the available edge computing resources, and efficiently partitioning and distributing a Deep Neural Network (DNN) over a network. The techniques enable the user to split a workload into multiple parts and process the workload on a set of smaller, less capable compute nodes in a distributed manner, without compromising on performance, and while meeting a Service Level Objective (SLO).

First claim

Opening claim text (preview).

What is claimed is: 1. A method of optimized placement of workloads at an edge of a network, the method comprising: identifying, by an orchestration system of the network, a model configured to process data generated by a computing device in the network; determining, by the orchestration system of the network, one or more locations in the model at which to split the model the one or more locations being associated with optimized execution of the model; identifying, by the orchestration system of the network, a first computing device at the edge of the network that is optimized to execute a first workload comprising a first portion of the model; identifying, by the orchestration system of the network, a second computing device at the edge of the network that is optimized to execute a second workload comprising a second portion of the model; generating, by the orchestration system and based on splitting the model at a location of the one or more locations, the first workload and the second workload; deploying, based on packaging the first workload, the first workload to the first computing device to enable the first computing device to execute the first portion of the model; and deploying, based on packaging the second workload, the second workload to the second computing device, to enable the second computing device to execute the second portion of the model. 2. The method of claim 1 , further comprising: determining, based at least partly on monitoring the first computing device, that an event occurs that results in a deteriorated performance of the first computing device; identifying a third computing device at which to run the first workload associated with the first portion of the model; and deploying the first workload to the third computing device. 3. The method of claim 2 , wherein the event comprises one of a CPU overload or a disconnect from the network. 4. The method of claim 1 , wherein generating the first workload comprises packaging the first workload in a container configured to execute on a local area network via an execution model. 5. The method of claim 1 , wherein the model comprises a deep learning neural network. 6. The method of claim 1 , wherein determining the one or more locations includes generating an application graph of the model that identifies one or more potential split locations between one or more layers of the model based on a topology of the model. 7. The method of claim 1 , wherein identifying the first computing device includes at least one of: determining that an amount of central processing unit (CPU) available on the first computing device is sufficient to support the first workload; or determining that an amount of bandwidth available to the first computing device is sufficient to receive data over the network to support the first workload. 8. The method of claim 1 , wherein identifying the first computing device is based at least in part on determining that a processor type or device type associated with the first computing device is optimized for running the first workload. 9. A system comprising: one or more processors; and one or more non-transitory computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving, from a first computing device in a network, a model configured to process data generated by the first computing device in the network; determining, a location in the model at which to split the model to optimize throughput of the network; determining, based on the location and the model, a second computing device at an edge of in the network optimized to execute a first workload associated with a first portion of the model; determining, based on the location and the model, a third computing device at the edge of the network optimized to execute a second workload associated with a second portion of the model; generating, based on splitting the model at the location, the first workload and the second workload; deploying the first workload to the first computing device at the edge; and deploying the second workload to the second computing device at the edge. 10. The system of claim 9 , the operations further comprising: determining, based at least in part on monitoring the second computing device, that an event occurs that results in a deteriorated performance of the second computing device; identifying a fourth computing device at which to run the first workload associated with the first portion of the model; and deploying the first workload to the fourth computing device. 11. The system of claim 10 , wherein the event comprises one of a CPU overload or a disconnect from the network. 12. The system of claim 10 , wherein generating the first workload comprises packaging the first workload in a container configured to execute on a local area network via an execution model. 13. The system of claim 9 , wherein the model comprises a deep learning neural network. 14. The system of claim 9 , wherein determining the location includes generating an application graph of the model that identifies a split location between one or more layers of the model and is based on a topology of the model, the split location being associated with optimizing the throughput of the network. 15. The system of claim 9 , wherein determining the second computing device or the third computing device is based at least in part on: determining that an amount of central processing unit (CPU) available on the second computing device is sufficient to support the first workload; or determining that an amount of bandwidth available to the second computing device is sufficient to receive data over the network to support the first workload. 16. The system of claim 9 , wherein determining the second computing device is based at least in part on determining that a processor type or device type associated with the second computing device is optimized for running the first workload. 17. One or more non-transitory computer-readable media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: identifying, based on monitoring a network, a model configured to process data generated by a computing device in the network; determining, a location in the model at which to split the model to optimize throughput of the network; determining, based on the location and the model, a first computing device at an edge of the network optimized to execute a first workload associated with a first portion of the model; determining, based on the location and the model, a second computing device at the edge of the network optimized to execute a second workload associated with a second portion of the model; generating, based on splitting the model at the location, the first workload and the second workload; deploying the first workload to the first computing device at the edge; and deploying the second workload to the second computing device at the edge. 18. The one or more non-transitory computer-readable media of claim 17 , the operations further comprising: determining, based at least in part on monitoring the first computing device, that an event occurs that results in a deteriorated performance of the first computing device; identifying a third computing device at which to run the first workload associated with the first portion of the model; and deploying the first workload to the third computing de

Assignees

Cisco Tech Inc

Inventors

Classifications

G06F2209/501
Performance criteria · CPC title
G06F2209/5017
Task decomposition · CPC title
G06F2209/503
Resource availability · CPC title
G06N3/08Primary
Learning methods · CPC title
G06F9/505
considering the load · CPC title

Patent family

Related publications grouped by family.

View patent family 85229262

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12307294B2 cover?: Techniques and mechanisms are described for enabling a user to run heavy deep learning workloads on standard edge networks without off-loading computation to a cloud, leveraging the available edge computing resources, and efficiently partitioning and distributing a Deep Neural Network (DNN) over a network. The techniques enable the user to split a workload into multiple parts and process the wo…
Who is the assignee on this patent?: Cisco Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 20 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).