What technology area does this patent fall under?

Primary CPC classification G06N3/082. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Convolutional neural network optimization mechanism

US11727246B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11727246-B2
Application number	US-201916283021-A
Country	US
Kind code	B2
Filing date	Feb 22, 2019
Priority date	Apr 17, 2017
Publication date	Aug 15, 2023
Grant date	Aug 15, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments provide systems and methods which facilitate optimization of a convolutional neural network (CNN). One embodiment provides for a non-transitory machine-readable medium storing instructions that cause one or more processors to perform operations comprising processing a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format. Processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an integer format. Quantizing the weights includes generating a quantization table to enable non-uniform quantization of the weights and quantizing the weights from the floating-point format to the integer format using the quantization table. The operations additionally comprise performing an inference operation utilizing the processed CNN with the integer format weights.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more non-transitory computer-readable storage mediums having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: processing, via a graphics multiprocessor having a single instruction multiple thread (SIMT) architecture, a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format, wherein the executable computer program instructions provide a machine learning framework to provide a library of machine learning primitives to accelerate machine-learning operations, processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an 8-bit integer format having a static precision, wherein quantizing the weights includes: generating a quantization table to enable non-uniform quantization of the weights, wherein generating the quantization table includes executing a quantization primitive provided by the machine learning framework and the machine learning framework provides a plurality of quantization primitives including a plurality of quantization and dequantization operations, and quantizing the weights from the floating-point format to the 8-bit integer format using the quantization table; and performing an inference operation utilising the processed CNN with the weights in the 8-bit integer format. 2. The one or more storage mediums of claim 1 , wherein the quantization table is structured to maintain accuracy of inference by the processed CNN after quantization of the weights of the trained CNN. 3. The one or more storage mediums of claim 2 , wherein the quantization of the weights of the trained CNN is performed without retraining. 4. The one or more storage mediums of claim 1 , wherein the floating-point format is a 32-bit floating-point format. 5. The one or more storage mediums of claim 1 , wherein the floating-point format is a 16-bit floating-point format. 6. A system comprising: one or more processors including one or more graphics multiprocessors having a single instruction multiple thread (SIMT) architecture; and a memory to store data including data relating to one or more convolutional neural networks (CNNs) and instructions associated with a machine learning framework to provide a library of machine learning primitives to accelerate machine-learning operations; wherein the one or more graphics multiprocessors are to: process a trained CNN to generate a processed CNN, the trained CNN having weights in a floating-point format, wherein processing the trained CNN includes for the one or more graphics multiprocessors to quantize the weights in the floating-point format to generate weights in an 8-bit integer format having a static precision, wherein quantizing the weights includes for the one or more graphics multiprocessors to: generate a quantization table to enable non-uniform quantization of the weights, wherein to generate the quantization table includes to accelerate operations associated with a quantization primitive provided by the machine learning framework to cause generation of the quantization table via the one or more graphics multiprocessors and the machine learning framework provides a plurality of quantization primitives including a plurality of quantization and dequantization operations, and quantize the weights from the floating-point format to the 8-bit integer format using the quantization table; and perform an inference operation utilising the processed CNN with weights in the 8-bit integer format. 7. The system of claim 6 , wherein the quantization table is structured to maintain accuracy of inference by the processed CNN after quantization of the weights of the trained CNN. 8. The system of claim 7 , wherein the quantization of the weights of the trained CNN is performed without retraining. 9. The system of claim 6 , wherein the floating-point format is a 32-bit floating-point format. 10. The system of claim 6 , wherein the floating-point format is a 16-bit floating-point format. 11. A graphics multiprocessor having a single instruction multiple thread (SIMT) architecture, the graphics multiprocessor comprising: a plurality of processing cores; and one or more cache memories to cache data for the plurality of processing cores; wherein the graphics multiprocessor is to: process a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format, wherein processing the trained CNN includes to quantize, via the graphics multiprocessor, the weights in the floating-point format to generate weights in an 8-bit integer format having a static precision, wherein to quantize the weights includes, via the graphics multiprocessor, to: generate a quantization table to enable non-uniform quantization of the weights, wherein to generate the quantization table includes to accelerate operations associated with a quantization primitive provided by a machine learning framework to cause generation of the quantization table via the one or more graphics multiprocessors and the machine learning framework provides a plurality of quantization primitives including a plurality of quantization and dequantization operations, and quantize the weights from the floating-point format to the 8-bit integer format using the quantization table; and perform an inference operation utilising the processed CNN with the weights in the 8-bit integer format. 12. The graphics multiprocessor of claim 11 , wherein the quantization table is structured to maintain accuracy of inference by the processed CNN after quantization of the weights of the trained CNN. 13. The graphics multiprocessor of claim 12 , wherein the quantization of the weights of the trained CNN is performed without retraining. 14. Graphics multiprocessor of claim 11 , wherein the floating-point format is a floating point format selected from a set of floating point formats including a 16-bit floating-point format and a 32-bit floating-point format.

Assignees

Intel Corp

Inventors

Classifications

G06N3/082Primary
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/063Primary
using electronic means · CPC title
G06N3/04Primary
Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

View patent family 61952520

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11727246B2 cover?: Embodiments provide systems and methods which facilitate optimization of a convolutional neural network (CNN). One embodiment provides for a non-transitory machine-readable medium storing instructions that cause one or more processors to perform operations comprising processing a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).