What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Parallel computational architecture with reconfigurable core-level and vector-level parallelism

US11847553B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11847553-B2
Application number	US-201816008949-A
Country	US
Kind code	B2
Filing date	Jun 14, 2018
Priority date	Jun 14, 2018
Publication date	Dec 19, 2023
Grant date	Dec 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Neural network processing hardware using parallel computational architectures with reconfigurable core-level and vector-level parallelism is provided. In various embodiments, a neural network model memory is adapted to store a neural network model comprising a plurality of layers. Each layer has at least one dimension and comprises a plurality of synaptic weights. A plurality of neural cores is provided. Each neural core includes a computation unit and an activation memory. The computation unit is adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations. The computation unit has a plurality of vector units. The activation memory is adapted to store the input activations and the output activations. The system is adapted to partition the plurality of cores into a plurality of partitions based on dimensions of the layer and the vector units.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a neural network model memory adapted to store a neural network model comprising a plurality of layers, each layer having at least one dimension and comprising a plurality of synaptic weights; a plurality of neural cores, each neural core comprising a computation unit, the computation unit adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations, the computation unit having a plurality of vector units, and an activation memory adapted to store the input activations and the output activations; wherein the system is adapted to partition the plurality of cores into a plurality of partitions based on a comparison of at least a portion of dimensions of a layer and a quantity of the vector units, wherein the comparison includes a comparison of a size of the feature dimensions for the layer with the quantity of the vector units, and wherein partitioning the plurality of cores comprises at least one of the plurality of cores being subdivided when the size of the feature dimensions of the layer is less than the quantity of the vector units. 2. The system of claim 1 , further comprising: at least one controller operatively coupled to the neural network model memory and to the plurality of cores, the at least one controller being adapted to, for each layer of the neural network model configure the plurality of cores to implement the layer, and provide input activations for the layer to the plurality of cores. 3. The system of claim 2 , further comprising a network on a chip (NoC) coupled to the plurality of cores. 4. The system of claim 3 , wherein input activations are provided to the plurality of cores via the NoC. 5. The system of claim 3 , wherein configuring the plurality of cores comprises distributing parameters to the plurality of cores via the NoC. 6. The system of claim 5 , wherein configuring the plurality of cores further comprises distributing instructions to the plurality of cores via the NoC. 7. The system of claim 1 , wherein the plurality of partitions for each layer is further determined based on spatial dimensions of the input activations for that layer. 8. The system of claim 1 , wherein the plurality of partitions for each layer is further determined based on spatial dimensions and a size of the feature dimensions of the input activations for that layer. 9. The system of claim 1 , wherein the plurality of partitions for each layer is further determined based on spatial dimensions of the output activations for that layer. 10. The system of claim 1 , wherein the plurality of partitions for each layer is further determined based on spatial dimensions and feature dimensions of the output activations for that layer. 11. The system of claim 1 , wherein the plurality of partitions for each layer is further determined based on one or more of spatial dimensions of the input activations, feature dimensions of the input activations, spatial dimensions of the output activations, or feature dimensions of the output activations for that layer. 12. The system of claim 11 , wherein the plurality of partitions for each layer is further determined by a dimension of the plurality of cores. 13. The system of claim 1 , wherein the cores within each of the plurality of partitions are configured to compute partial sums. 14. The system of claim 13 , wherein the partial sums are aggregated to compute a result for an associated layer. 15. The system of claim 14 , wherein the partial sums are transmitted via a network on a chip (NoC) for aggregation. 16. The system of claim 2 , wherein the at least one controller is further adapted to, upon computation of output activations of a layer, redistribute the output activations among the plurality of cores. 17. The system of claim 16 , wherein the redistribution is via a network. 18. The system of claim 16 , wherein the redistribution is determined based on one or more of spatial dimensions of the input activations, feature dimensions of the input activations, spatial dimensions of the output activations, or feature dimensions of the output activations for that layer. 19. A method comprising: reading a neural network model comprising a plurality of layers, each layer having at least one dimension and comprising a plurality of synaptic weights; for each layer of the neural network model comparing at least a portion of dimensions of a layer and a quantity of vector units, wherein the comparison includes a comparison of a size of the feature dimensions with the quantity of the vector units; partitioning a plurality of cores into a plurality of partitions based on the comparison, wherein partitioning the plurality of cores comprises at least one of the plurality of cores being subdivided when the size of the feature dimensions of the layer is less than the quantity of the vector units, configuring the plurality of cores to implement the layer, providing to the plurality of cores input activations for the layer, and applying the synaptic weights associated with the layer to the input activations to produce a plurality of output activations. 20. The method of claim 19 , further comprising: computing partial sums within each partition; transmitting the partial sums among cores within each partition; aggregating the partial sums to compute the output activations. 21. The method of claim 19 , wherein configuring the plurality of cores comprises distributing parameters to the plurality of cores via a network. 22. The method of claim 19 , wherein configuring the plurality of cores comprises distributing instructions to the plurality of cores via a network. 23. The method of claim 19 , wherein the plurality of partitions for each layer is further determined based on one or more of spatial dimensions of the input activations, feature dimensions of the input activations, spatial dimensions of the output activations, or feature dimensions of the output activations for that layer. 24. The system of claim 23 , wherein the plurality of partitions for each layer is further determined by a dimension of the plurality of cores.

Assignees

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/063Primary
using electronic means · CPC title
G06F15/8092
Array of vector units · CPC title
G06N3/04
Architecture, e.g. interconnection topology · CPC title
G06F9/5027Primary
the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title

Patent family

Related publications grouped by family.

View patent family 68840593

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11847553B2 cover?: Neural network processing hardware using parallel computational architectures with reconfigurable core-level and vector-level parallelism is provided. In various embodiments, a neural network model memory is adapted to store a neural network model comprising a plurality of layers. Each layer has at least one dimension and comprises a plurality of synaptic weights. A plurality of neural cores is…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Multi-memory on-chip computational network

Configurable accelerator framework

Sparse convolutional neural network accelerator

Superpixel methods for convolutional neural networks

Systems and methods for deep learning processor

Systems and methods for a multi-core optimized recurrent neural network

Frequently asked questions