What technology area does this patent fall under?

Primary CPC classification G06F7/5443. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

SGCNAX: a scalable graph convolutional neural network accelerator with workload balancing

US12493779B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12493779-B2
Application number	US-202117551876-A
Country	US
Kind code	B2
Filing date	Dec 15, 2021
Priority date	Dec 15, 2020
Publication date	Dec 9, 2025
Grant date	Dec 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

We introduce SGCNAX, a scalable GCN accelerator architecture for the high-performance and energy-efficient acceleration of GCNs. Unlike prior GCN accelerators that either employ limited loop optimization techniques, or determine the design variables based on random sampling, we systematically explore the loop optimization techniques for GCN acceleration and provide a flexible GCN dataflow that adapts to different GCN configurations to achieve optimal efficiency. We further provide two hardware-based techniques to address the workload imbalance problem caused by the unbalanced distribution of zeros in GCNs. Specifically, SGCNAX exploits an outer-product-based computation architecture that mitigates the intra-PE (Processing Elements) workload imbalance, and employs a group-and-shuffle approach to mitigate the inter-PE workload imbalance.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A scalable graph convolutional neural network (GCN) accelerator, comprising: an accelerator circuit including: a processing element (PE) array that includes a plurality of PEs, wherein the plurality of PEs each include; a multiply-and-accumulate (MAC) array configured to perform multiplication and accumulation, a sparse matrix buffer configured to store data of a sparse matrix, an input dense matrix buffer configured to store data of a first dense matrix, an output dense matrix buffer configured to store data of a second dense matrix; and a PE control unit configured to: control data reading from the sparse matrix buffer, the input dense matrix buffer, and the output dense matrix buffer to the MAC array, and control data writing from the MAC array to the output dense matrix buffer. 2 . The accelerator of claim 1 , wherein the multiply-and-accumulate (MAC) array is configured to: receive a non-zero scalar and a first position associated with the non-zero scalar from the sparse matrix buffer; receive a first row vector and a second position associated with a first element in the first row vector from the input dense matrix buffer; and multiply each of element values in the first row vector with the non-zero scalar to produce a second row vector. 3 . The accelerator of claim 2 , wherein the MAC array is further configured to: receive a third row vector from the output dense matrix buffer according to a third position; add the second row vector to the third row vector to generate a fourth row vector; and store the fourth row vector to the output dense matrix buffer according to the third position. 4 . The accelerator of claim 2 , wherein: the data of the sparse matrix includes a block of an input feature matrix, and the non-zero scalar is a scalar of the block of the input feature matrix; the data of the first dense matrix includes a block of an weight matrix, and the first row vector is a row vector of the block of the weight matrix; and the MAC array is configured to multiply each of element values in the first row vector of the weight matrix with the non-zero scalar of the input feature matrix to produce a second row vector. 5 . The accelerator of claim 4 , wherein the MAC array is further configured to: receive a third row vector from the output dense matrix buffer according to a third position, wherein the second row vector and the third row vector each are a partial product for an intermediate matrix; add the second row vector to the third row vector to generate a fourth row vector for the intermediate matrix; and store the fourth row vector to the output dense matrix buffer according to the third position. 6 . The accelerator of claim 2 , wherein: the data of the sparse matrix includes a block of an adjacency matrix and, the non-zero scalar is a scalar of the block of the adjacency matrix; the data of the first dense matrix includes a block of an intermediate matrix, and the first row vector is a row vector of the block of the intermediate matrix; and the MAC array is configured to multiply each of element values in the first row vector of the intermediate matrix with the non-zero scalar of the adjacency matrix to produce a second row vector. 7 . The accelerator of claim 6 , wherein the MAC array is further configured to: receive a third row vector from the output dense matrix buffer according to a third position, wherein the second row vector and the third row vector each are a partial product for an output matrix; add the second row vector to the third row vector to generate a fourth row vector for the output matrix; and store the fourth row vector to the output dense matrix buffer according to the third position. 8 . The accelerator of claim 1 , further comprising: a processing unit configured to: receive a graph input to be processed using a graph convolutional neural network having a plurality of layers, determine a partitioning of each of the plurality of layers of the graph convolutional neural network into a sequence of sublayers, and determine a processing order of the sublayers in the sequence; wherein: a PE of the plurality of PEs is configured to process the graph input by processing the sublayers in the sequence according to the determined processing order. 9 . The accelerator of claim 1 , wherein the accelerator circuit further includes: a data dispatcher coupled to the PE array and configured to distribute input data to the PE array; and a permutation network coupled to the PE array and configured to collect output data from the PE array. 10 . The accelerator of claim 9 , wherein: the data dispatcher is configured to group a plurality of rows of a matrix into a plurality of groups according to a density-sorted rank order; the data dispatcher is further configured to map the plurality of groups to the plurality of PEs of the PE array, respectively; and the permutation network is configured to unshuffle outputs from the plurality of PEs of the PE array. 11 . The accelerator of claim 9 , wherein: the accelerator chip further includes an accumulator buffer; and the permutation network is configured to send output data from the PE array to the accumulator buffer. 12 . The accelerator of claim 1 , further comprising: an off-chip memory external to the accelerator circuit; wherein the accelerator circuit further includes: a first buffer that is inside the accelerator chip; and a circuit control unit configured to control the reading of data from the off-chip memory to the first buffer.

Assignees

Univ George Washington

Inventors

Classifications

G06F17/16
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
G06F7/5443Primary
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 81942565

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12493779B2 cover?: We introduce SGCNAX, a scalable GCN accelerator architecture for the high-performance and energy-efficient acceleration of GCNs. Unlike prior GCN accelerators that either employ limited loop optimization techniques, or determine the design variables based on random sampling, we systematically explore the loop optimization techniques for GCN acceleration and provide a flexible GCN dataflow that …
Who is the assignee on this patent?: Univ George Washington
What technology area does this patent fall under?: Primary CPC classification G06F7/5443. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Time Domain Unrolling Sparse Matrix Multiplication System and Method

Accelerator for sparse-dense matrix multiplication

Hardware accelerator architecture for processing very-sparse and hyper-sparse matrix data

Sparse convolutional neural network accelerator

Frequently asked questions