What technology area does this patent fall under?

Primary CPC classification G06N3/06. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Parallel computing scheme generation for neural networks

US12468924B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12468924-B2
Application number	US-202217953991-A
Country	US
Kind code	B2
Filing date	Sep 27, 2022
Priority date	Mar 27, 2020
Publication date	Nov 11, 2025
Grant date	Nov 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device receives a computation graph and transforms the computation graph into a dataflow graph comprising recursive subgraphs. Each recursive subgraph comprises a tuple of another recursive subgraph and an operator node, or an empty graph. The device determines a number of partitioning recursions based on a number of parallel computing devices. For each partitioning recursion, the device determines costs corresponding to operator nodes, determines a processing order of the recursive subgraphs, and processes the recursive subgraphs. To process a recursive subgraph, the device selects a partitioning axis for tensors associated with an operator node of the recursive subgraph. The device outputs a partitioning scheme comprising partitioning axes for each tensor associated with the operator nodes.

First claim

Opening claim text (preview).

What is claimed is: 1 . A device for determining a parallel computation scheme for a neural network, the device comprising at least one processor configured to: receive a computation graph for the neural network; transform the computation graph into a recursive dataflow graph comprising a plurality of recursive subgraphs, wherein each of the recursive subgraphs is respectively a tuple of another of the recursive subgraphs and an operator node; determine a number of partitioning recursions based on a number of parallel computing devices; for each of the partitioning recursions: determine a plurality of costs corresponding to a plurality of operator nodes associated with the recursive dataflow graph, determine a processing order of the plurality of recursive subgraphs based on a descending order of the plurality of costs, process the plurality of recursive subgraphs in the determined processing order, wherein processing a recursive subgraph, of the plurality of recursive subgraphs, comprises selecting a partitioning axis for tensors associated with an operator node of the recursive subgraph; output a partitioning scheme comprising partitioning axes for each of the tensors associated with the plurality of operator nodes; and wherein to select the partitioning axis for the tensors associated with the operator node based on an inter-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for executing a neighboring operator node based on a shared tensor between the operator node and the neighboring operator node or for executing the operator node based on an output of the neighboring operator node. 2 . The device according to claim 1 , wherein the at least one processor is further configured to: determine the number of partitioning recursions such that 2 N is equal to the number of parallel computing devices, wherein N is the number of partitioning recursions. 3 . The device according to claim 1 , wherein the at least one processor is further configured to: determine the plurality of costs corresponding to the plurality of operator nodes based on an amount of data to be communicated between the parallel computing devices for each operator node. 4 . The device according to claim 1 , wherein the device at least one processor is further configured to: select the partitioning axis for the tensors associated with the operator node based on an intra-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for the operator node. 5 . The device according to claim 4 , wherein the at least one processor is further configured to: select the partitioning axis for the tensors associated with the operator node based on the intra-operator communication cost, and based on determining that no partitioning axis has been determined for a neighboring operator node at a current partitioning recursion. 6 . The device according to claim 1 , wherein at least one processor is further configured to: select the partitioning axis for the tensors associated with the operator node based on an intra-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for the operator node; and select the partitioning axis for the tensors associated with the operator node based on the intra-operator communication cost and the inter-operator communication cost, and based on determining that at least one partitioning axis has been determined for the neighboring operator node or another neighboring operator node at the current partitioning recursion. 7 . The device according to claim 1 , wherein the at least one processor is further configured to: determine whether the partitioning scheme complies with at least one memory requirement associated with the parallel computing devices; and output the partitioning scheme based on determining that the partitioning scheme complies with the at least one memory requirement associated with the parallel computing devices. 8 . A method for determining a parallel computation scheme for a neural network, the method comprising: receiving a computation graph for the neural network; transforming the computation graph into a recursive dataflow graph comprising a plurality of recursive subgraphs, wherein each of the recursive subgraphs respectively is a tuple of another one of recursive subgraphs and an operator node; determining a number of partitioning recursions based on a number of parallel computing devices; for each of the partitioning recursions: determining a plurality of costs corresponding to a plurality of operator nodes associated with the recursive dataflow graph, determining a processing order of the plurality of recursive subgraphs based on a descending order of the plurality of costs, and processing the plurality of recursive subgraphs in the determined processing order, wherein processing a recursive subgraph, of the recursive subgraphs, comprises selecting a partitioning axis for tensors associated with an operator node of the recursive subgraph; outputting a partitioning scheme comprising partitioning axes for each of the tensors associated with the plurality of operator nodes; and selecting the partitioning axis for the tensors associated with the operator node based on an inter-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for executing a neighboring operator node based on a shared tensor between the operator node and the neighboring operator node or for executing the operator node based on an output of the neighboring operator node. 9 . The method according to claim 8 , the method further comprising: determining the number of partitioning recursions such that 2 N is equal to the number of parallel computing devices, wherein N is the number of partitioning recursions. 10 . The method according to claim 8 , the method further comprising: determining the plurality of costs corresponding to the plurality of operator nodes based on an amount of data to be communicated between the parallel computing devices for each operator node. 11 . The method according to claim 8 , the method further comprising: selecting the partitioning axis for the tensors associated with the operator node based on an intra-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for the operator node. 12 . The method according to claim 11 , the method further comprising: selecting the partitioning axis for the tensors associated with the operator node based on the intra-operator communication cost, and based on determining that no partitioning axis has been determined for a neighboring operator node at a current partitioning recursion. 13 . The method according to claim 8 , the method further comprising: selecting the partitioning axis for the tensors associated with the operator node based on the intra-operator communication cost, and based on determining that no partitioning axis has been determined for a neighboring operator node at a current partitioning recursion, and selecting the partitioning axis for the tensors associated with the operator node based on the intra-operator communication cost and the inter-operator communication cost, and based on determining that at least one partitioning axis has been determined for the neighboring operator node or another neighboring operator node at a current partitioning recursion. 14 . The method according to claim 8 , the method further comprising: determ

Assignees

Huawei Tech Co Ltd

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/045
Combinations of networks · CPC title
G06N5/01
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
G06F9/5038
considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title

Patent family

Related publications grouped by family.

View patent family 70057127

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12468924B2 cover?: A device receives a computation graph and transforms the computation graph into a dataflow graph comprising recursive subgraphs. Each recursive subgraph comprises a tuple of another recursive subgraph and an operator node, or an empty graph. The device determines a number of partitioning recursions based on a number of parallel computing devices. For each partitioning recursion, the device dete…
Who is the assignee on this patent?: Huawei Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/06. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Reducing computation in neural networks using self-modifying code

Learning to search deep network architectures

Behavioral modeling of a data center utilizing human knowledge to enhance a machine learning algorithm

Building automation system with a parallel relationship computation engine

Hardware accelerator template and design framework for implementing recurrent neural networks

Hybrid parallelization strategies for machine learning programs on top of mapreduce

Frequently asked questions