Deep Learning Job Scheduling Method and System and Related Device
US-2021011762-A1 · Jan 14, 2021 · US
US11954521B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11954521-B2 |
| Application number | US-202017038720-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 30, 2020 |
| Priority date | Mar 30, 2018 |
| Publication date | Apr 9, 2024 |
| Grant date | Apr 9, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A deep learning job scheduling method includes obtaining a job request of a deep learning job, determining a target job description file template from a plurality of pre-stored job description file templates based on the job request, determining an identifier of a target job basic image from identifiers of a plurality of pre-stored job basic images based on the job request, generating a target job description file based on the target job description file template and the identifier of the target job basic image, sending the target job description file to a container scheduler, and selecting the target job basic image from the pre-stored job base images based on the target job description file, and creating at least one container for executing the job request.
Opening claim text (preview).
What is claimed is: 1. A deep learning job scheduling method, comprising: obtaining a job request of a deep learning job comprising a deep learning library type and a job type; determining a target job description file template from a plurality of pre-stored job description file templates based on the deep learning library type and the job type; determining an identifier of a target job basic image from identifiers of a plurality of pre-stored job basic images based on the deep learning library type and the job type, wherein the pre-stored job basic images comprise an image of a deep learning library, an image of a dependency library, and an image of a deep learning program; generating a target job description file based on the target job description file template and the identifier; sending the target job description file to a container scheduler; selecting, by the container scheduler, the target job basic image from the pre-stored job basic images based on the target job description file; and creating a container for executing the job request. 2. The deep learning job scheduling method of claim 1 , wherein the deep learning job comprises a task, wherein the job request further comprises one of the following two options: (1) at least one of a job name, a deep learning program storage location, an application boot file, a dataset storage location, a type of the task, a quantity of the task, a job command line parameter, or a resource requirement of the task, or (2) at least one of the job name, the deep learning program, the application boot file, the dataset storage location, the type, the quantity, the job command line parameter, or the resource requirement, and wherein the deep learning scheduling method further comprises generating the target job description file based on the job request, the target job description file template, and the identifier. 3. The deep learning job scheduling method of claim 2 , further comprising filling the target job description file template with information comprised in the job request and the identifier to obtain the target job description file. 4. The deep learning job scheduling method of claim 1 , wherein the dependency library is used when the deep learning job is executed, and wherein an instantiation of the deep learning program is the deep learning job. 5. The deep learning job scheduling method of claim 1 , wherein the pre-stored job description file templates is based on deep learning library types and job types, wherein each of the pre-stored job description file templates corresponds to one deep learning library type and one job type, wherein the pre-stored job basic images are based on the deep learning library types and the job types, and wherein each of the pre-stored job basic images corresponds to one deep learning library type and one job type. 6. The deep learning job scheduling method of claim 1 , wherein after sending the target job description file to the container scheduler, the deep learning job scheduling method further comprises storing a job identifier indicating the job request in a queue when the container scheduler fails in scheduling, wherein the job identifier comprises at least one of the job request, information comprised in the job request, the target job description file, a pointer, or a data structure, wherein the pointer points to at least one of the job request, the information comprised in the job request, and the target job description file, and wherein the data structure points to at least one of the job request, the information carried in the job request, and the target job description file. 7. The deep learning job scheduling method of claim 6 , wherein after storing the job identifier, the deep learning job scheduling method further comprises: determining that the container scheduler has a condition for resubmitting a job request; extracting the job identifier from the queue; and resubmitting the job request to the container scheduler based on the job identifier. 8. A deep learning job scheduling system, comprising: one or more processors; a memory coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause a job scheduler to: obtain a job request of a deep learning job comprising a deep learning library type and a job type; determine a target job description file template from a plurality of pre-stored job description file templates based on the deep learning library type and the job type; determine an identifier of a target job basic image from identifiers of a plurality of pre-stored job basic images based on the deep learning library type and the job type, wherein the pre-stored job basic images comprise an image of a deep learning library, an image of a dependency library, and an image of a deep learning program; generate a target job description file based on the target job description file template and the identifier; and send the target job description file; and a container scheduler coupled to the job scheduler and configured to: receive the target job description file from the job scheduler; select the target job basic image from the pre-stored job basic images based on the target job description file; and create at least one container for executing the job request. 9. The deep learning job scheduling system of claim 8 , wherein the deep learning job comprises a task, wherein the job request further comprises one of the following two options: (1) at least one of a job name, a deep learning program storage location, an application boot file, a dataset storage location, a type of the task, a quantity of the task, a job command line parameter, and a resource requirement of the task, or (2) at least one of the job name, the deep learning program, the application boot file, the dataset storage location, the type of the task, the quantity of the task, the job command line parameter, and the resource requirement, and wherein the job scheduler is further configured to generate the target job description file based on the job request, the target job description file template, and the identifier. 10. The deep learning job scheduling system of claim 9 , wherein the one or more processors are further configured to execute the instructions to cause the job scheduler is to fill the target job description file template with information comprised in the job request and the identifier to obtain the target job description file. 11. The deep learning job scheduling system of claim 8 , wherein the dependency library is used when the deep learning job is executed, and wherein an instantiation of the deep learning program is the deep learning job. 12. The deep learning job scheduling system of claim 8 , wherein the plurality of pre-stored job description file templates is based on deep learning library types and job types, wherein each of the pre-stored job description file templates corresponds to one deep learning library type and one job type, wherein the pre-stored job basic images are based on the deep learning library types and the job types, and wherein each of the pre-stored job basic images corresponds to one deep learning library type and one job type. 13. The deep learning job scheduling system of claim 8 , wherein the container scheduler is further configured to store a job identifier indicating the job request in a queue in response to the container scheduler fails in scheduling, wherein the job identifier comprises at least one of the job request, information comprised in the job request, the target job description file, a pointer, and a data structure, wherein the pointer points to at least
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
Techniques for rebalancing the load in a distributed system · CPC title
Learning methods · CPC title
Machine learning · CPC title
the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.