Task parallel processing method, apparatus and system, storage medium and computer device

US11221877B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11221877-B2
Application numberUS-201916575344-A
CountryUS
Kind codeB2
Filing dateSep 18, 2019
Priority dateNov 20, 2017
Publication dateJan 11, 2022
Grant dateJan 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a task parallel processing method, a device, a system, a storage medium and computer equipment, which are capable of distributing and regulating tasks to be executed according to a task directed acyclic graph, and may thereby realize task parallelism of a multi-core processor and improve the efficiency of data processing.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, implemented by at least one processor, for processing a plurality of tasks in parallel, the method comprising: generating a directed acyclic graph based on dependencies among the plurality of tasks that are to be executed by the at least one processor; distributing the plurality of tasks to a plurality of work queues of the at least one processor based on the directed acyclic graph; and regulating, in respective work queues, one or more distributed tasks to be executed in parallel based on the dependencies among the one or more distributed tasks indicated by the directed acyclic graph; wherein distributing the plurality of tasks to the plurality of work queues of the at least one processor comprises: topologically sorting the directed acyclic graph to obtain topologically sorted sequences of the plurality of tasks; sorting the topologically sorted sequences based on preset amounts of execution time of the respective tasks; obtaining a longest sequence from the topologically sorted sequences; and distributing the plurality of tasks to the respective work queues based on the longest sequence and the dependencies among the plurality of tasks. 2. The method of claim 1 , further comprising: prior to generating the directed acyclic graph, partitioning a program based on at least one of operation nodes or data nodes of the program; and obtaining the plurality of tasks based on the partition. 3. The method of claim 2 , wherein: partitioning the program comprises partitioning the program based on the operation nodes; the program comprises an operation request, the operation request comprising a model; and partitioning the program based on the operation nodes comprises: partitioning at least one of the model or input data of the model. 4. The method of claim 3 , wherein: partitioning the program comprises partitioning the model; and partitioning the model comprises: setting weights corresponding to the obtained plurality of tasks, respectively; and setting correspondences between input data and output data of the obtained plurality of tasks based on the weights. 5. The method of claim 3 , wherein: partitioning the program comprises partitioning the model; and partitioning the model comprises: partitioning, based on a preset rule, the model in at least one of a height-width direction or a channel direction of the model. 6. The method of claim 3 , wherein: partitioning the program comprises partitioning the input data of the model; and partitioning the input data of the model comprises: partitioning, based on a preset rule, the input data of the model in a height-width direction of the input data. 7. The method of claim 2 , wherein: partitioning the program comprises partitioning the program based on the operation nodes; the program comprises an operation request, wherein the operation request does not comprise a model; and partitioning the program based on the operation nodes comprises: partitioning at least one of input data or output data of the operation request. 8. The method of claim 7 , wherein partitioning at least one of the input data or output data of the operation request comprises: partitioning, based on a preset rule, at least one of the input data or output data in a height-width direction of the corresponding input data or output data. 9. The method of claim 1 , wherein generating the directed acyclic graph comprises: determining parallel nodes and sequential nodes of the directed acyclic graph based on the dependencies among the plurality of tasks; and generating the directed acyclic graph based on the parallel nodes and the sequential nodes. 10. The method of claim 1 , wherein regulating, in the respective work queues, the one or more distributed tasks to be executed in parallel comprises: setting reference counts for the one or more distributed tasks to be executed in parallel based on the directed acyclic graph; when a first task to be executed is already executed, modifying the reference count of a second task to be executed, wherein the second task depends on the first task; and when the reference count of a task to be executed reaches a preset value, controlling, in the respective work queues, tasks to be executed whose reference counts reach the preset value to start operating. 11. A non-transitory computer readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform operations for processing a plurality of tasks in parallel, the operations comprising: generating a directed acyclic graph based on dependencies among the plurality of tasks that are to be executed by the at least one processor; distributing the plurality of tasks to a plurality of work queues of the at least one processor based on the directed acyclic graph; and regulating, in respective work queues, one or more distributed tasks to be executed in parallel based on the dependencies among the one or more distributed tasks indicated by the directed acyclic graph; wherein distributing the plurality of tasks to the plurality of work queues of the at least one processor comprises: topologically sorting the directed acyclic graph to obtain topologically sorted sequences of the plurality of tasks; sorting the topologically sorted sequences based on preset amounts of execution time of the respective tasks; obtaining a longest sequence from the topologically sorted sequences; and distributing the plurality of tasks to the respective work queues based on the longest sequence and the dependencies among the plurality of tasks. 12. A system for processing a plurality of tasks in parallel, comprising: a memory storing computer-readable instructions; and at least one processor coupled to the memory, wherein the computer-readable instructions, when executed by the at least one processor, cause the at least one processor to perform operations comprising: generating a directed acyclic graph based on dependencies among the plurality of tasks that are to be executed by the at least one processor; distributing the plurality of tasks to a plurality of work queues of the at least one processor based on the directed acyclic graph; and regulating, in respective work queues, one or more distributed tasks to be executed in parallel based on the dependencies among the one or more distributed tasks indicated by the directed acyclic graph; wherein distributing the plurality of tasks to the plurality of work queues of the at least one processor comprises: topologically sorting the directed acyclic graph to obtain topologically sorted sequences of the plurality of tasks; sorting the topologically sorted sequences based on preset amounts of execution time of the respective tasks; obtaining a longest sequence from the topologically sorted sequences; and distributing the plurality of tasks to the respective work queues based on the longest sequence and the dependencies among the plurality of tasks. 13. The system of claim 12 , wherein the operations comprise: prior to generating the directed acyclic graph, partitioning a program based on at least one of operation nodes or data nodes of the program; and obtaining the plurality of tasks based on the partition. 14. The system of claim 13 , wherein the at least one processor includes a multiple-core processor configured to execute the operations for partitioning the program. 15. The system of claim 13 , wherein: the at least one processor includes first and second processors; the first processor is configured to execut

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Concurrent instruction execution, e.g. pipeline or look ahead · CPC title

  • Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title

  • Remote procedure calls [RPC]; Web services · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11221877B2 cover?
The present disclosure provides a task parallel processing method, a device, a system, a storage medium and computer equipment, which are capable of distributing and regulating tasks to be executed according to a task directed acyclic graph, and may thereby realize task parallelism of a multi-core processor and improve the efficiency of data processing.
Who is the assignee on this patent?
Shanghai Cambricon Inf Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).