Pipelining and parallelizing graph execution method for neural network model computation and apparatus thereof

US12468921B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12468921-B2
Application numberUS-202217838342-A
CountryUS
Kind codeB2
Filing dateJun 13, 2022
Priority dateApr 27, 2022
Publication dateNov 11, 2025
Grant dateNov 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a pipelining and parallelizing graph execution method for neural network model computation and apparatus, and provides a pipelining and parallelizing graph execution method for neural network model computation and apparatus in a deep learning training system. The method includes the graph execution flow in a neural network model computation process and a process of cooperative work of all functional modules. The pipelining and parallelizing graph execution method for neural network model computation includes creating a graph executive on a native machine according to a physical computation graph compiled and generated by a deep learning framework.

First claim

Opening claim text (preview).

What is claimed is: 1 . A pipelining and parallelizing graph execution method for neural network model computation, wherein several executives are provided in a neural network model; a total of 2*N executives are provided, and N is a positive integer; several memory blocks are provided in the executive; the method specifically comprises the following steps: S1, dividing training data into several batches of subdata; S2, inputting the several batches of subdata into the neural network model in sequence; executing, by an nth executive, self-kernel function computation on an ith batch of subdata after the ith batch of subdata is input, and writing an execution result into an idle memory block of the nth executive; then inputting an (i+1)th batch of subdata, wherein i and n are both positive integers; S3, executing, by the nth executive, the operation in S2 on the (i+1)th batch of subdata, and sending an address of the memory block where the ith batch is located to an (n+1)th executive after the (i+1)th batch of subdata is input; parsing, by the (n+1)th executive, the memory block where the ith batch is located to obtain an execution result of the nth executive on the ith batch of subdata, executing the self-kernel function computation by taking the execution result of the nth executive as input data of the (n+1)th executive, and writing the execution result into an idle memory block of the (n+1)th executive; then inputting an (i+2)th batch of subdata; S4, executing, by the nth executive, the operation in S2 on the (i+2)th batch of subdata, and executing, by the nth executive and the (n+1)th executive, the operation in S3 on the (i+1)th batch of subdata after the (i+2)th batch of subdata is input; at the same time, sending, by the (n+1)th executive, the address of the memory block where the ith batch is located to an (n+2)th executive; parsing, by the (n+2)th executive, the memory block where the ith batch is located to obtain an execution result of the (n+1)th executive on the ith batch of subdata, executing the self-kernel function computation by taking the execution result of the (n+1)th executive as input data of the (n+2)th executive, and writing the execution result into an idle memory block of the (n+2)th executive; S5, reclaiming, by the nth executive, the memory block sent to the (n+1)th executive; and S6, executing, by the last executive, the self-kernel function computation; writing the execution result to a memory block of the last executive; and reclaiming the memory block on its own immediately at the end of the execution. 2 . The pipelining and parallelizing graph execution method for neural network model computation according to claim 1 , wherein before executing the self-kernel function computation, an executive may check whether there is an idle memory block in the executive, execute the self-kernel function computation on the ith batch of subdata under the condition that there is an idle memory block, and otherwise, instruct the ith batch to wait for an idle memory block. 3 . The pipelining and parallelizing graph execution method for neural network model computation according to claim 2 , wherein for an (N*n+1)th batch of subdata, before executing the self-kernel function computation, the executive may check whether the executive where an (N*n−1)th batch of subdata is located completes execution, wherein n is a positive integer. 4 . The pipelining and parallelizing graph execution method for neural network model computation according to claim 1 , wherein the step S5 specifically comprises the following operations: S51, informing, by the (n+1)th executive, the nth executive that the memory block sent to the (n+1)th executive has been consumed; S52, reclaiming, by the nth executive, the memory block sent to the (n+1)th executive, and marking the memory block as being free. 5 . The pipelining and parallelizing graph execution method for neural network model computation according to claim 1 , further comprising constructing an executive, wherein the constructing an executive specifically comprises the following substeps: S01, creating an operator kernel function task queue: adding a current operator kernel function computation task into a current kernel function task queue in sequence; S02, creating a thread of an executive: acquiring, by the thread of the executive, a current task to be processed in sequence from the kernel function task queue, and submitting the current task to be processed to a thread pool; S03, creating an executive of a kernel function: creating an executive used for operator kernel function computation according to a current kernel function task and context information of a current thread, and using the executive to run the kernel function task in the task queue; S04, creating an event recall queue: adding tasks that have been processed by a task executive into an event recall queue; S05, creating a thread of the event recall queue: taking out and returning, by the thread of the event recall queue, the tasks that have been processed in the event recall queue. 6 . A neural network model computation-oriented graph execution apparatus, comprising an executive construction module and an executive pipelining and parallelizing working module, wherein the executive construction module is configured to construct an executive; and the executive pipelining and parallelizing working module is configured to implement the pipelining and parallelizing graph execution method for neural network model computation according to claim 1 . 7 . A neural network model computation-oriented graph execution apparatus, comprising a memory and one or more processors, wherein the memory stores an executable code; and the one or more processors, when executing the executable code, implement the pipelining and parallelizing graph execution method for neural network model computation according to claim 1 . 8 . A computer-readable storage medium on which a program is stored, wherein the program, when executed by a processor, implements the pipelining and parallelizing graph execution method for neural network model computation according to claim 1 .

Assignees

Inventors

Classifications

  • using a plurality of independent parallel functional units · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • Task decomposition · CPC title

  • by program, e.g. task dispatcher, supervisor, operating system · CPC title

  • Mechanisms to release resources · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12468921B2 cover?
The present disclosure provides a pipelining and parallelizing graph execution method for neural network model computation and apparatus, and provides a pipelining and parallelizing graph execution method for neural network model computation and apparatus in a deep learning training system. The method includes the graph execution flow in a neural network model computation process and a process …
Who is the assignee on this patent?
Zhejiang Lab
What technology area does this patent fall under?
Primary CPC classification G06N3/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).