Convolutional neural network optimization mechanism

US12020135B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12020135-B2
Application numberUS-202117446101-A
CountryUS
Kind codeB2
Filing dateAug 26, 2021
Priority dateApr 17, 2017
Publication dateJun 25, 2024
Grant dateJun 25, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable storage medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: providing a machine learning framework including a library of machine learning primitives to perform operations to optimize a machine learning model; and processing, via the machine learning framework, a trained convolutional neural network (CNN) model having an associated list of instructions to generate a processed CNN model, the trained CNN model having weights in a floating-point format, wherein processing the trained CNN includes: traversing the list of instructions to reduce a count of instructions in the list of instructions, wherein to reduce the count of instructions includes to pruning one or more conditional branches in the list of instructions that descend from a weight having a first threshold value; determining whether pixels of a second convolution window of the trained CNN model differ from a previously stored first convolution window and eliminating the second convolution window from the trained CNN model in response to determination that the second convolution window matches the first convolution window; quantizing one or more weights in the floating-point format via a quantization primitive provided by the machine learning framework to generate quantized weights, including generating a quantization table to enable non-uniform quantization of the weights, wherein the quantization primitive causes a graphics processor of the one or more processors to execute an instruction to quantize the one or more weights into one or more quantized weights; and outputting the processed CNN model, the processed CNN model including the reduced count of instructions and the one or more quantized weights. 2. The non-transitory computer-readable storage medium of claim 1 , wherein the graphics processor includes a graphics multiprocessor having a single instruction multiple thread (SIMT) architecture and the instruction to quantize the one or more weights is an instruction provided by an instruction set architecture of the graphics multiprocessor. 3. The non-transitory computer-readable storage medium of claim 1 , wherein eliminating the second convolution window from the trained CNN model in response to determination that the second convolution window matches the first convolution window includes generating a checksum signature for the first convolution window and eliminating the second convolution window upon determination that the checksum signature for the second convolution window matches the checksum signature for the first convolution window. 4. The non-transitory computer-readable storage medium of claim 1 , wherein processing the trained CNN includes bypassing traversal of conditional branches in the list of instructions for conditional branches that descend from a weight having a second threshold value. 5. The non-transitory computer-readable storage medium of claim 1 , wherein processing the trained CNN includes expanding the CNN model into elementary operations. 6. The non-transitory computer-readable storage medium of claim 1 , wherein outputting the processed CNN model includes compressing a representation of instructions associated with the processed CNN model and generating an executable application to perform the instructions associated with the processed CNN model. 7. The non-transitory computer-readable storage medium of claim 1 , wherein the machine learning framework provides a plurality of quantization primitives to perform a plurality of quantization and dequantization operations. 8. The non-transitory computer-readable storage medium of claim 1 , wherein the quantization table is structured to maintain accuracy of inference by the processed CNN after quantization of the weights of the trained CNN. 9. The non-transitory computer-readable storage medium of claim 8 , wherein the quantization of the weights of the trained CNN is performed without retraining. 10. The non-transitory computer-readable storage medium of claim 1 , wherein the floating-point format is a 32-bit floating-point format. 11. The non-transitory computer-readable storage medium of claim 1 , wherein the floating-point format is a 16-bit floating-point format. 12. The non-transitory computer-readable storage medium of claim 1 , wherein the quantized weights are in an 8-bit integer format. 13. A data processing system comprising: one or more processors including one or more graphics multiprocessors; and a memory to store data including data relating to one or more convolutional neural networks (CNNs) and instructions to provide a machine learning framework including a library of machine learning primitives to perform operations to optimize a machine learning model, wherein the instructions cause the one or more processors to perform operations comprising: processing, via the machine learning framework, a trained convolutional neural network (CNN) model having an associated list of instructions to generate a processed CNN model, the trained CNN model having weights in a floating-point format, wherein processing the trained CNN includes: traversing the list of instructions to reduce a count of instructions, wherein to reduce the count of instructions includes to pruning one or more conditional branches in the list of instructions that descend from a weight having a first threshold value; determining whether pixels of a second convolution window of the trained CNN model differ from a previously stored first convolution window and eliminating the second convolution window from the trained CNN model in response to determination that the second convolution window matches the first convolution window; quantizing one or more weights in the floating-point format via a quantization primitive provided by the machine learning framework to generate quantized weights, including generating a quantization table to enable non-uniform quantization of the weights, wherein the quantization primitive causes a graphics processor of the one or more processors to execute an instruction to quantize the one or more weights into one or more quantized weights; and outputting the processed CNN model, the processed CNN model including the reduced count of instructions and the one or more quantized weights. 14. The data processing system of claim 13 , wherein the graphics multiprocessor includes single instruction multiple thread (SIMT) architecture and the instruction to quantize the one or more weights is an instruction provided by an instruction set architecture of the graphics multiprocessor. 15. The data processing system of claim 13 , wherein processing the trained CNN includes bypassing traversal of conditional branches in the list of instructions for conditional branches that descend from a weight having a second threshold value. 16. The data processing system of claim 13 , wherein processing the trained CNN includes expanding the CNN model into elementary operations. 17. The data processing system of claim 13 , wherein outputting the processed CNN model includes compressing a representation of instructions associated with the processed CNN model and generating an executable application to perform the instructions associated with the processed CNN model. 18. The data processing system of claim 13 , wherein the machine learning framework provides a plurality of quantization primitives to perform a plurality of quantization and dequantization operations. 19. The data pro

Assignees

Inventors

Classifications

  • G06N3/082Primary

    modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12020135B2 cover?
A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).