Sparse convolutional neural network accelerator
US-10891538-B2 · Jan 12, 2021 · US
US11727246B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11727246-B2 |
| Application number | US-201916283021-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 22, 2019 |
| Priority date | Apr 17, 2017 |
| Publication date | Aug 15, 2023 |
| Grant date | Aug 15, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments provide systems and methods which facilitate optimization of a convolutional neural network (CNN). One embodiment provides for a non-transitory machine-readable medium storing instructions that cause one or more processors to perform operations comprising processing a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format. Processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an integer format. Quantizing the weights includes generating a quantization table to enable non-uniform quantization of the weights and quantizing the weights from the floating-point format to the integer format using the quantization table. The operations additionally comprise performing an inference operation utilizing the processed CNN with the integer format weights.
Opening claim text (preview).
What is claimed is: 1. One or more non-transitory computer-readable storage mediums having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: processing, via a graphics multiprocessor having a single instruction multiple thread (SIMT) architecture, a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format, wherein the executable computer program instructions provide a machine learning framework to provide a library of machine learning primitives to accelerate machine-learning operations, processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an 8-bit integer format having a static precision, wherein quantizing the weights includes: generating a quantization table to enable non-uniform quantization of the weights, wherein generating the quantization table includes executing a quantization primitive provided by the machine learning framework and the machine learning framework provides a plurality of quantization primitives including a plurality of quantization and dequantization operations, and quantizing the weights from the floating-point format to the 8-bit integer format using the quantization table; and performing an inference operation utilising the processed CNN with the weights in the 8-bit integer format. 2. The one or more storage mediums of claim 1 , wherein the quantization table is structured to maintain accuracy of inference by the processed CNN after quantization of the weights of the trained CNN. 3. The one or more storage mediums of claim 2 , wherein the quantization of the weights of the trained CNN is performed without retraining. 4. The one or more storage mediums of claim 1 , wherein the floating-point format is a 32-bit floating-point format. 5. The one or more storage mediums of claim 1 , wherein the floating-point format is a 16-bit floating-point format. 6. A system comprising: one or more processors including one or more graphics multiprocessors having a single instruction multiple thread (SIMT) architecture; and a memory to store data including data relating to one or more convolutional neural networks (CNNs) and instructions associated with a machine learning framework to provide a library of machine learning primitives to accelerate machine-learning operations; wherein the one or more graphics multiprocessors are to: process a trained CNN to generate a processed CNN, the trained CNN having weights in a floating-point format, wherein processing the trained CNN includes for the one or more graphics multiprocessors to quantize the weights in the floating-point format to generate weights in an 8-bit integer format having a static precision, wherein quantizing the weights includes for the one or more graphics multiprocessors to: generate a quantization table to enable non-uniform quantization of the weights, wherein to generate the quantization table includes to accelerate operations associated with a quantization primitive provided by the machine learning framework to cause generation of the quantization table via the one or more graphics multiprocessors and the machine learning framework provides a plurality of quantization primitives including a plurality of quantization and dequantization operations, and quantize the weights from the floating-point format to the 8-bit integer format using the quantization table; and perform an inference operation utilising the processed CNN with weights in the 8-bit integer format. 7. The system of claim 6 , wherein the quantization table is structured to maintain accuracy of inference by the processed CNN after quantization of the weights of the trained CNN. 8. The system of claim 7 , wherein the quantization of the weights of the trained CNN is performed without retraining. 9. The system of claim 6 , wherein the floating-point format is a 32-bit floating-point format. 10. The system of claim 6 , wherein the floating-point format is a 16-bit floating-point format. 11. A graphics multiprocessor having a single instruction multiple thread (SIMT) architecture, the graphics multiprocessor comprising: a plurality of processing cores; and one or more cache memories to cache data for the plurality of processing cores; wherein the graphics multiprocessor is to: process a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format, wherein processing the trained CNN includes to quantize, via the graphics multiprocessor, the weights in the floating-point format to generate weights in an 8-bit integer format having a static precision, wherein to quantize the weights includes, via the graphics multiprocessor, to: generate a quantization table to enable non-uniform quantization of the weights, wherein to generate the quantization table includes to accelerate operations associated with a quantization primitive provided by a machine learning framework to cause generation of the quantization table via the one or more graphics multiprocessors and the machine learning framework provides a plurality of quantization primitives including a plurality of quantization and dequantization operations, and quantize the weights from the floating-point format to the 8-bit integer format using the quantization table; and perform an inference operation utilising the processed CNN with the weights in the 8-bit integer format. 12. The graphics multiprocessor of claim 11 , wherein the quantization table is structured to maintain accuracy of inference by the processed CNN after quantization of the weights of the trained CNN. 13. The graphics multiprocessor of claim 12 , wherein the quantization of the weights of the trained CNN is performed without retraining. 14. Graphics multiprocessor of claim 11 , wherein the floating-point format is a floating point format selected from a set of floating point formats including a 16-bit floating-point format and a 32-bit floating-point format.
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
using electronic means · CPC title
Architecture, e.g. interconnection topology · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.