Sparse convolutional neural network accelerator
US-10891538-B2 · Jan 12, 2021 · US
US12412086B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12412086-B2 |
| Application number | US-202117177632-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 17, 2021 |
| Priority date | Apr 24, 2017 |
| Publication date | Sep 9, 2025 |
| Grant date | Sep 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus to facilitate optimization of a neural network (NN) is disclosed. The apparatus includes optimization logic to define a NN topology having one or more macro layers, adjust the one or more macro layers to adapt to input and output components of the NN and train the NN based on the one or more macro layers.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a graphics processing unit (GPU) comprising hardware circuitry to: define a neural network (NN) topology of an NN, the NN topology implemented by the hardware circuitry as having one or more macro layers, wherein the one or more macro layers comprise a first sub-topology of the NN topology having multiple NN layers, wherein each macro layer comprises a topology definition, a scale of input features, and a scale of output features, and wherein each macro layer comprises a set of user interface components; adjust the one or more macro layers to adapt to input and output components of the NN; wrap the one or more macro layers within a first macro stub layer corresponding to a first topology of the NN, wherein the first macro stub layer comprises additional layers comprising at least one of concatenation layers, elementwise operation layers, or merge layers; perform, using processing resources of the hardware circuitry and a first training data set, a first adjustment of weights of the one or more macro layers of the NN corresponding to the first topology of the NN; identify one or more other sub-topologies from the first topology of the NN, wherein the one or more other sub-topologies comprises additional macro layers having different combinations of the multiple NN layers, wherein the one or more other sub-topologies are trained concurrently during a first training; wrap the additional macro layers of each of the one or more other sub-topologies within additional macro stub layers, wherein the additional macro stub layers comprise the additional layers comprising at least one of the concatenation layers, the elementwise operation layers, or the merge layers; perform, a second adjustment of the weights of the additional macro layers wrapped within the additional macro stub layers of the one or more other sub-topologies, wherein the one or more other sub-topologies are trained concurrently during a second training; determine error values incurred during the second training of each sub-topology of the one or more other sub-topologies; identify a target sub-topology having a lowest error value of the error values; and retrain the NN utilizing the target sub-topology as a second topology of the NN to generate an updated NN for use in an inference phase of the NN. 2. The apparatus of claim 1 , wherein the one or more macro layers each comprise the first sub-topology including a plurality of the multiple NN layers. 3. The apparatus of claim 2 , wherein the GPU is to replace the first topology of the NN with the one or more macro layers, and provide an input features node to record output from the first topology. 4. The apparatus of claim 2 , wherein the one or more macro layers comprise a standard set of components to facilitate training. 5. The apparatus of claim 1 , wherein the GPU is to optimize the NN by automatically tuning one or more layers in the NN. 6. The apparatus of claim 5 , wherein automatically tuning the one or more layers comprises automatically constructing the NN based on received performance and accuracy constraints. 7. The apparatus of claim 5 , wherein the GPU is to provide auto-tuning based on one or more statistical algorithms. 8. The apparatus of claim 1 , wherein the GPU is to optimize the NN based on the NN topology of the NN. 9. The apparatus of claim 1 , wherein the GPU is to perform clustering of processing units to process information relating to modalities. 10. The apparatus of claim 9 , further comprising: a first cluster of two or more processing units; a second cluster of two or more processing units; and one or more routers coupled between the first cluster and the second cluster. 11. A method comprising: defining a neural network (NN) topology of an NN, the NN topology implemented by hardware circuitry of a graphics processing unit (GPU) as having one or more macro layers, wherein the one or more macro layers comprise a first sub-topology of the NN topology having multiple NN layers, wherein each macro layer comprises a topology definition, a scale of input features, and a scale of output features, and wherein each macro layer comprises a set of user interface components; adjusting the one or more macro layers to adapt to input and output components of the NN; wrapping the one or more macro layers within a first macro stub layer corresponding to a first topology of the NN, wherein the first macro stub layer comprises additional layers comprising at least one of concatenation layers, elementwise operation layers, or merge layers; performing, using processing resources of the hardware circuitry and a first training data set, a first adjustment of weights of the one or more macro layers of the NN corresponding to the first topology of the NN; identifying one or more other sub-topologies from the first topology of the NN, wherein the one or more other sub-topologies comprises additional macro layers having different combinations of the multiple NN layers, wherein the one or more other sub-topologies are trained concurrently during a first training; wrapping the additional macro layers of each of the one or more other sub-topologies within additional macro stub layers, wherein the additional macro stub layers comprise additional layers comprising at least one of the concatenation layers, the elementwise operation layers, or the merge layers; performing a second adjustment of the weights of the additional macro layers wrapped within the additional macro stub layers of the one or more other sub-topologies, wherein the one or more other sub-topologies are trained concurrently during a second training; determining error values incurred during the second training of each sub-topology of the one or more other sub-topologies; identifying a target sub-topology having a lowest error value of the error values; and retraining the NN utilizing the target sub-topology as a second topology of the NN to generate an updated NN for use in an inference phase of the NN. 12. The method of claim 11 , wherein the one or more macro layers each comprise the first sub-topology including a plurality of the multiple NN layers. 13. The method of claim 12 , further comprising: replacing the first topology of the NN with the one or more macro layers; and providing an input features node to record output from the first topology. 14. The method of claim 13 , wherein the one or more macro layers comprise a standard set of components to facilitate training. 15. The method of claim 11 , further comprising optimizing the NN by automatically tuning one or more layers in the NN. 16. At least one non-transitory computer readable medium having instructions, which when executed by one or more processors, cause the one or more processors to: define a neural network (NN) topology of an NN, the NN topology implemented by hardware circuitry of a graphics processing unit (GPU) comprising the one or more processors as having one or more macro layers, wherein the one or more macro layers comprise a first sub-topology of the NN topology having multiple NN layers, wherein each macro layer comprises a topology definition, a scale of input features, and a scale of output features, and wherein each macro layer comprises a set of user interface components; adjust the one or more macro layers to adapt to input and output components of the NN; wrap the one or more macro layers within a first macro stub layer corresponding to a first topology of the NN, wherein the first macro stub layer comprises additional layers comprising at least one of concatenatio
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Distributed learning, e.g. federated learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.