Parallel computational architecture with reconfigurable core-level and vector-level parallelism

US11847553B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11847553-B2
Application numberUS-201816008949-A
CountryUS
Kind codeB2
Filing dateJun 14, 2018
Priority dateJun 14, 2018
Publication dateDec 19, 2023
Grant dateDec 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Neural network processing hardware using parallel computational architectures with reconfigurable core-level and vector-level parallelism is provided. In various embodiments, a neural network model memory is adapted to store a neural network model comprising a plurality of layers. Each layer has at least one dimension and comprises a plurality of synaptic weights. A plurality of neural cores is provided. Each neural core includes a computation unit and an activation memory. The computation unit is adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations. The computation unit has a plurality of vector units. The activation memory is adapted to store the input activations and the output activations. The system is adapted to partition the plurality of cores into a plurality of partitions based on dimensions of the layer and the vector units.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a neural network model memory adapted to store a neural network model comprising a plurality of layers, each layer having at least one dimension and comprising a plurality of synaptic weights; a plurality of neural cores, each neural core comprising a computation unit, the computation unit adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations, the computation unit having a plurality of vector units, and an activation memory adapted to store the input activations and the output activations; wherein the system is adapted to partition the plurality of cores into a plurality of partitions based on a comparison of at least a portion of dimensions of a layer and a quantity of the vector units, wherein the comparison includes a comparison of a size of the feature dimensions for the layer with the quantity of the vector units, and wherein partitioning the plurality of cores comprises at least one of the plurality of cores being subdivided when the size of the feature dimensions of the layer is less than the quantity of the vector units. 2. The system of claim 1 , further comprising: at least one controller operatively coupled to the neural network model memory and to the plurality of cores, the at least one controller being adapted to, for each layer of the neural network model configure the plurality of cores to implement the layer, and provide input activations for the layer to the plurality of cores. 3. The system of claim 2 , further comprising a network on a chip (NoC) coupled to the plurality of cores. 4. The system of claim 3 , wherein input activations are provided to the plurality of cores via the NoC. 5. The system of claim 3 , wherein configuring the plurality of cores comprises distributing parameters to the plurality of cores via the NoC. 6. The system of claim 5 , wherein configuring the plurality of cores further comprises distributing instructions to the plurality of cores via the NoC. 7. The system of claim 1 , wherein the plurality of partitions for each layer is further determined based on spatial dimensions of the input activations for that layer. 8. The system of claim 1 , wherein the plurality of partitions for each layer is further determined based on spatial dimensions and a size of the feature dimensions of the input activations for that layer. 9. The system of claim 1 , wherein the plurality of partitions for each layer is further determined based on spatial dimensions of the output activations for that layer. 10. The system of claim 1 , wherein the plurality of partitions for each layer is further determined based on spatial dimensions and feature dimensions of the output activations for that layer. 11. The system of claim 1 , wherein the plurality of partitions for each layer is further determined based on one or more of spatial dimensions of the input activations, feature dimensions of the input activations, spatial dimensions of the output activations, or feature dimensions of the output activations for that layer. 12. The system of claim 11 , wherein the plurality of partitions for each layer is further determined by a dimension of the plurality of cores. 13. The system of claim 1 , wherein the cores within each of the plurality of partitions are configured to compute partial sums. 14. The system of claim 13 , wherein the partial sums are aggregated to compute a result for an associated layer. 15. The system of claim 14 , wherein the partial sums are transmitted via a network on a chip (NoC) for aggregation. 16. The system of claim 2 , wherein the at least one controller is further adapted to, upon computation of output activations of a layer, redistribute the output activations among the plurality of cores. 17. The system of claim 16 , wherein the redistribution is via a network. 18. The system of claim 16 , wherein the redistribution is determined based on one or more of spatial dimensions of the input activations, feature dimensions of the input activations, spatial dimensions of the output activations, or feature dimensions of the output activations for that layer. 19. A method comprising: reading a neural network model comprising a plurality of layers, each layer having at least one dimension and comprising a plurality of synaptic weights; for each layer of the neural network model comparing at least a portion of dimensions of a layer and a quantity of vector units, wherein the comparison includes a comparison of a size of the feature dimensions with the quantity of the vector units; partitioning a plurality of cores into a plurality of partitions based on the comparison, wherein partitioning the plurality of cores comprises at least one of the plurality of cores being subdivided when the size of the feature dimensions of the layer is less than the quantity of the vector units, configuring the plurality of cores to implement the layer, providing to the plurality of cores input activations for the layer, and applying the synaptic weights associated with the layer to the input activations to produce a plurality of output activations. 20. The method of claim 19 , further comprising: computing partial sums within each partition; transmitting the partial sums among cores within each partition; aggregating the partial sums to compute the output activations. 21. The method of claim 19 , wherein configuring the plurality of cores comprises distributing parameters to the plurality of cores via a network. 22. The method of claim 19 , wherein configuring the plurality of cores comprises distributing instructions to the plurality of cores via a network. 23. The method of claim 19 , wherein the plurality of partitions for each layer is further determined based on one or more of spatial dimensions of the input activations, feature dimensions of the input activations, spatial dimensions of the output activations, or feature dimensions of the output activations for that layer. 24. The system of claim 23 , wherein the plurality of partitions for each layer is further determined by a dimension of the plurality of cores.

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Array of vector units · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • G06F9/5027Primary

    the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11847553B2 cover?
Neural network processing hardware using parallel computational architectures with reconfigurable core-level and vector-level parallelism is provided. In various embodiments, a neural network model memory is adapted to store a neural network model comprising a plurality of layers. Each layer has at least one dimension and comprises a plurality of synaptic weights. A plurality of neural cores is…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).