Pipelined neural network processing with continuous and asynchronous updates
US-11663444-B2 · May 30, 2023 · US
US12307294B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12307294-B2 |
| Application number | US-202217578872-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 19, 2022 |
| Priority date | Aug 18, 2021 |
| Publication date | May 20, 2025 |
| Grant date | May 20, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques and mechanisms are described for enabling a user to run heavy deep learning workloads on standard edge networks without off-loading computation to a cloud, leveraging the available edge computing resources, and efficiently partitioning and distributing a Deep Neural Network (DNN) over a network. The techniques enable the user to split a workload into multiple parts and process the workload on a set of smaller, less capable compute nodes in a distributed manner, without compromising on performance, and while meeting a Service Level Objective (SLO).
Opening claim text (preview).
What is claimed is: 1. A method of optimized placement of workloads at an edge of a network, the method comprising: identifying, by an orchestration system of the network, a model configured to process data generated by a computing device in the network; determining, by the orchestration system of the network, one or more locations in the model at which to split the model the one or more locations being associated with optimized execution of the model; identifying, by the orchestration system of the network, a first computing device at the edge of the network that is optimized to execute a first workload comprising a first portion of the model; identifying, by the orchestration system of the network, a second computing device at the edge of the network that is optimized to execute a second workload comprising a second portion of the model; generating, by the orchestration system and based on splitting the model at a location of the one or more locations, the first workload and the second workload; deploying, based on packaging the first workload, the first workload to the first computing device to enable the first computing device to execute the first portion of the model; and deploying, based on packaging the second workload, the second workload to the second computing device, to enable the second computing device to execute the second portion of the model. 2. The method of claim 1 , further comprising: determining, based at least partly on monitoring the first computing device, that an event occurs that results in a deteriorated performance of the first computing device; identifying a third computing device at which to run the first workload associated with the first portion of the model; and deploying the first workload to the third computing device. 3. The method of claim 2 , wherein the event comprises one of a CPU overload or a disconnect from the network. 4. The method of claim 1 , wherein generating the first workload comprises packaging the first workload in a container configured to execute on a local area network via an execution model. 5. The method of claim 1 , wherein the model comprises a deep learning neural network. 6. The method of claim 1 , wherein determining the one or more locations includes generating an application graph of the model that identifies one or more potential split locations between one or more layers of the model based on a topology of the model. 7. The method of claim 1 , wherein identifying the first computing device includes at least one of: determining that an amount of central processing unit (CPU) available on the first computing device is sufficient to support the first workload; or determining that an amount of bandwidth available to the first computing device is sufficient to receive data over the network to support the first workload. 8. The method of claim 1 , wherein identifying the first computing device is based at least in part on determining that a processor type or device type associated with the first computing device is optimized for running the first workload. 9. A system comprising: one or more processors; and one or more non-transitory computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving, from a first computing device in a network, a model configured to process data generated by the first computing device in the network; determining, a location in the model at which to split the model to optimize throughput of the network; determining, based on the location and the model, a second computing device at an edge of in the network optimized to execute a first workload associated with a first portion of the model; determining, based on the location and the model, a third computing device at the edge of the network optimized to execute a second workload associated with a second portion of the model; generating, based on splitting the model at the location, the first workload and the second workload; deploying the first workload to the first computing device at the edge; and deploying the second workload to the second computing device at the edge. 10. The system of claim 9 , the operations further comprising: determining, based at least in part on monitoring the second computing device, that an event occurs that results in a deteriorated performance of the second computing device; identifying a fourth computing device at which to run the first workload associated with the first portion of the model; and deploying the first workload to the fourth computing device. 11. The system of claim 10 , wherein the event comprises one of a CPU overload or a disconnect from the network. 12. The system of claim 10 , wherein generating the first workload comprises packaging the first workload in a container configured to execute on a local area network via an execution model. 13. The system of claim 9 , wherein the model comprises a deep learning neural network. 14. The system of claim 9 , wherein determining the location includes generating an application graph of the model that identifies a split location between one or more layers of the model and is based on a topology of the model, the split location being associated with optimizing the throughput of the network. 15. The system of claim 9 , wherein determining the second computing device or the third computing device is based at least in part on: determining that an amount of central processing unit (CPU) available on the second computing device is sufficient to support the first workload; or determining that an amount of bandwidth available to the second computing device is sufficient to receive data over the network to support the first workload. 16. The system of claim 9 , wherein determining the second computing device is based at least in part on determining that a processor type or device type associated with the second computing device is optimized for running the first workload. 17. One or more non-transitory computer-readable media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: identifying, based on monitoring a network, a model configured to process data generated by a computing device in the network; determining, a location in the model at which to split the model to optimize throughput of the network; determining, based on the location and the model, a first computing device at an edge of the network optimized to execute a first workload associated with a first portion of the model; determining, based on the location and the model, a second computing device at the edge of the network optimized to execute a second workload associated with a second portion of the model; generating, based on splitting the model at the location, the first workload and the second workload; deploying the first workload to the first computing device at the edge; and deploying the second workload to the second computing device at the edge. 18. The one or more non-transitory computer-readable media of claim 17 , the operations further comprising: determining, based at least in part on monitoring the first computing device, that an event occurs that results in a deteriorated performance of the first computing device; identifying a third computing device at which to run the first workload associated with the first portion of the model; and deploying the first workload to the third computing de
Performance criteria · CPC title
Task decomposition · CPC title
Resource availability · CPC title
Learning methods · CPC title
considering the load · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.