Parallel computing scheme generation for neural networks

US12468924B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12468924-B2
Application numberUS-202217953991-A
CountryUS
Kind codeB2
Filing dateSep 27, 2022
Priority dateMar 27, 2020
Publication dateNov 11, 2025
Grant dateNov 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device receives a computation graph and transforms the computation graph into a dataflow graph comprising recursive subgraphs. Each recursive subgraph comprises a tuple of another recursive subgraph and an operator node, or an empty graph. The device determines a number of partitioning recursions based on a number of parallel computing devices. For each partitioning recursion, the device determines costs corresponding to operator nodes, determines a processing order of the recursive subgraphs, and processes the recursive subgraphs. To process a recursive subgraph, the device selects a partitioning axis for tensors associated with an operator node of the recursive subgraph. The device outputs a partitioning scheme comprising partitioning axes for each tensor associated with the operator nodes.

First claim

Opening claim text (preview).

What is claimed is: 1 . A device for determining a parallel computation scheme for a neural network, the device comprising at least one processor configured to: receive a computation graph for the neural network; transform the computation graph into a recursive dataflow graph comprising a plurality of recursive subgraphs, wherein each of the recursive subgraphs is respectively a tuple of another of the recursive subgraphs and an operator node; determine a number of partitioning recursions based on a number of parallel computing devices; for each of the partitioning recursions: determine a plurality of costs corresponding to a plurality of operator nodes associated with the recursive dataflow graph, determine a processing order of the plurality of recursive subgraphs based on a descending order of the plurality of costs, process the plurality of recursive subgraphs in the determined processing order, wherein processing a recursive subgraph, of the plurality of recursive subgraphs, comprises selecting a partitioning axis for tensors associated with an operator node of the recursive subgraph; output a partitioning scheme comprising partitioning axes for each of the tensors associated with the plurality of operator nodes; and wherein to select the partitioning axis for the tensors associated with the operator node based on an inter-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for executing a neighboring operator node based on a shared tensor between the operator node and the neighboring operator node or for executing the operator node based on an output of the neighboring operator node. 2 . The device according to claim 1 , wherein the at least one processor is further configured to: determine the number of partitioning recursions such that 2 N is equal to the number of parallel computing devices, wherein N is the number of partitioning recursions. 3 . The device according to claim 1 , wherein the at least one processor is further configured to: determine the plurality of costs corresponding to the plurality of operator nodes based on an amount of data to be communicated between the parallel computing devices for each operator node. 4 . The device according to claim 1 , wherein the device at least one processor is further configured to: select the partitioning axis for the tensors associated with the operator node based on an intra-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for the operator node. 5 . The device according to claim 4 , wherein the at least one processor is further configured to: select the partitioning axis for the tensors associated with the operator node based on the intra-operator communication cost, and based on determining that no partitioning axis has been determined for a neighboring operator node at a current partitioning recursion. 6 . The device according to claim 1 , wherein at least one processor is further configured to: select the partitioning axis for the tensors associated with the operator node based on an intra-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for the operator node; and select the partitioning axis for the tensors associated with the operator node based on the intra-operator communication cost and the inter-operator communication cost, and based on determining that at least one partitioning axis has been determined for the neighboring operator node or another neighboring operator node at the current partitioning recursion. 7 . The device according to claim 1 , wherein the at least one processor is further configured to: determine whether the partitioning scheme complies with at least one memory requirement associated with the parallel computing devices; and output the partitioning scheme based on determining that the partitioning scheme complies with the at least one memory requirement associated with the parallel computing devices. 8 . A method for determining a parallel computation scheme for a neural network, the method comprising: receiving a computation graph for the neural network; transforming the computation graph into a recursive dataflow graph comprising a plurality of recursive subgraphs, wherein each of the recursive subgraphs respectively is a tuple of another one of recursive subgraphs and an operator node; determining a number of partitioning recursions based on a number of parallel computing devices; for each of the partitioning recursions: determining a plurality of costs corresponding to a plurality of operator nodes associated with the recursive dataflow graph, determining a processing order of the plurality of recursive subgraphs based on a descending order of the plurality of costs, and processing the plurality of recursive subgraphs in the determined processing order, wherein processing a recursive subgraph, of the recursive subgraphs, comprises selecting a partitioning axis for tensors associated with an operator node of the recursive subgraph; outputting a partitioning scheme comprising partitioning axes for each of the tensors associated with the plurality of operator nodes; and selecting the partitioning axis for the tensors associated with the operator node based on an inter-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for executing a neighboring operator node based on a shared tensor between the operator node and the neighboring operator node or for executing the operator node based on an output of the neighboring operator node. 9 . The method according to claim 8 , the method further comprising: determining the number of partitioning recursions such that 2 N is equal to the number of parallel computing devices, wherein N is the number of partitioning recursions. 10 . The method according to claim 8 , the method further comprising: determining the plurality of costs corresponding to the plurality of operator nodes based on an amount of data to be communicated between the parallel computing devices for each operator node. 11 . The method according to claim 8 , the method further comprising: selecting the partitioning axis for the tensors associated with the operator node based on an intra-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for the operator node. 12 . The method according to claim 11 , the method further comprising: selecting the partitioning axis for the tensors associated with the operator node based on the intra-operator communication cost, and based on determining that no partitioning axis has been determined for a neighboring operator node at a current partitioning recursion. 13 . The method according to claim 8 , the method further comprising: selecting the partitioning axis for the tensors associated with the operator node based on the intra-operator communication cost, and based on determining that no partitioning axis has been determined for a neighboring operator node at a current partitioning recursion, and selecting the partitioning axis for the tensors associated with the operator node based on the intra-operator communication cost and the inter-operator communication cost, and based on determining that at least one partitioning axis has been determined for the neighboring operator node or another neighboring operator node at a current partitioning recursion. 14 . The method according to claim 8 , the method further comprising: determ

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Distributed learning, e.g. federated learning · CPC title

  • Combinations of networks · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12468924B2 cover?
A device receives a computation graph and transforms the computation graph into a dataflow graph comprising recursive subgraphs. Each recursive subgraph comprises a tuple of another recursive subgraph and an operator node, or an empty graph. The device determines a number of partitioning recursions based on a number of parallel computing devices. For each partitioning recursion, the device dete…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).