Deep neural network workload scheduling

US2019266015A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019266015-A1
Application numberUS-201815906963-A
CountryUS
Kind codeA1
Filing dateFeb 27, 2018
Priority dateFeb 27, 2018
Publication dateAug 29, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer-executable instructions for scheduling neural network workloads on an edge device. A performance model for each neural network model is received. Parameters for each neural network workload is determined based on an associated performance model. Processing core assignments are determined from a plurality of processing cores for each neural network workload based on the corresponding performance model and processing core utilization. Image streams are received and associated with a neural network workload. Each neural network workload is scheduled to run on the processing cores based on the processing core assignments.

First claim

Opening claim text (preview).

1 . A method for scheduling neural network workloads on an edge device, the method comprising operations performed using an electronic processor, the operations comprising: receiving a performance model for each of a plurality of neural network models; determining parameters for each of a plurality of neural network workloads, wherein each of the neural network workloads is associated with one of the plurality of neural network models, and the parameters are determined based on a performance model associated with the neural network model associated with the neural network workload; determining processing core assignments from a plurality of processing cores for each of the neural network workloads based on the corresponding performance model and processing core utilization; receiving, from a plurality of cameras, a plurality of image streams, wherein each image stream is associated with one of the neural network workloads; and scheduling each of the neural network workloads to run on the plurality of processing cores based on the processing core assignments to enable each of the neural network workloads to execute with the determined parameters on the determined parameters. 2 . The method of claim 1 , further comprising executing each of the neural network workloads with the determined parameters on the determined processing core based on the scheduling and the assigned image stream. 3 . The method of claim 1 , further comprising modeling the performance model for each of the plurality neural network models. 4 . The method of claim 3 , wherein the modeling comprises: running each of the plurality of neural network models a plurality of times with varying parameters and varying processing core utilizations to generate training data; and training the performance model based on the generated training data. 5 . The method of claim 1 , wherein the parameters comprise a batch size, a sampling rate, and precision. 6 . The method of claim 1 , wherein the scheduling comprises: determining a first neural network workload has an increased core processing need for a first initial period of time based on a configuration of a first neural network model associated with the first neural network workload; determining a second neural network workload has an increased core processing need for a second initial period of time based on a configuration of a second neural network model associated with the second neural network workload; and scheduling the second neural network workload to start execution after the first initial period of time. 7 . The method of claim 6 , wherein the first and second neural networks are convolutional neural networks. 8 . The method of claim 1 , wherein a first neural network workload is dependent on a second neural network workload, wherein determining the parameters comprises: detecting, using the second neural network workload, a match in the assigned image stream; and increasing a memory usage parameter of the first neural network workload based on the second neural network detecting the match. 9 . The method of claim 8 , wherein the first neural network uses a first neural network framework, and wherein the second neural network uses a second, different neural network framework. 10 . The method of claim 1 , wherein the determining processing core assignments further comprises: determining a plurality of unassigned processing cores, wherein a number of neural network workloads is less than a number of available processing cores; assigning additional processing cores to a neural network workload based on a runtime value from the corresponding performance model. 11 . A system for scheduling neural network workloads on an edge device, the system comprising: an allocator configured to: receive a performance model for each of a plurality of neural network models; determine parameters for each of a plurality of neural network workloads, wherein each of the neural network workloads is associated with one of the plurality of neural network models, and the parameters are determined based on a performance model associated with the neural network model associated with the neural network workload, determine processing core assignments from a plurality of processing cores for each of the neural network workloads based on the corresponding performance model and processing core utilization; and receive, from a plurality of cameras, a plurality of image streams, wherein each image stream is associated with one of the neural network workloads; and a scheduler configured to schedule each of the neural network workloads to run on the plurality of processing cores based on the processing core assignments to enable each of the neural network workloads to execute with the determined parameters on the determined parameters; and a plurality of processing cores configured to execute each of the neural networks with the determined parameters on the determined processing core based on the scheduling and the assigned image stream. 12 . The system of claim 11 , further comprising a profiler configured to model the performance model for each of the plurality neural network models. 13 . The system of claim 12 , wherein to model the performance model the profiler is further configured to: run each of the plurality of neural network models a plurality of times with varying parameters and varying processing core utilizations to generate training data; and train the performance model based on the generated training data. 14 . The system of claim 11 , wherein the parameters comprise a batch size, a sampling rate, and precision. 15 . The system of claim 11 , wherein to schedule the scheduler is further configured to: determine a first neural network workload has an increased core processing need for a first initial period of time based on a configuration of a first neural network model associated with the first neural network workload; determine a second neural network workload has an increased core processing need for a second initial period of time based on a second neural network model associated with the second neural network workload; and schedule the second neural network workload to start execution after the first initial period of time. 16 . The system of claim 15 , wherein the first and second neural networks are convolutional neural networks. 17 . A computer-readable storage media storing computer-executable instructions for scheduling neural network workloads on an edge device, the stored instructions comprising: instructions receive a performance model for each of a plurality of neural network models; instructions to determine parameters for each of a plurality of neural network workloads, wherein each of the neural network workloads is associated with one of the plurality of neural network models, and the parameters are determined based on a performance model associated with the neural network model associated with the neural network workload, instructions to determine processing core assignments from a plurality of processing cores for each of the neural network workloads based on the corresponding performance model and processing core utilization; and instructions to receive, from a plurality of cameras, a plurality of image streams, wherein each image stream is associated with one of the neural network workloads; and instructions to schedule each of the neural network workloads to run on the plurality of processing cores based on the processing core assignments to enable each of the neural network workloads to execute with the de

Assignees

Inventors

Classifications

  • G06N3/063Primary

    using electronic means · CPC title

  • Combinations of networks · CPC title

  • considering hardware capabilities · CPC title

  • the resource being the memory · CPC title

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019266015A1 cover?
Systems, methods, and computer-executable instructions for scheduling neural network workloads on an edge device. A performance model for each neural network model is received. Parameters for each neural network workload is determined based on an associated performance model. Processing core assignments are determined from a plurality of processing cores for each neural network workload based o…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 29 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).