Who is the assignee on this patent?

Intel Corp, Yao Anbang, Zhou Aojun, and 3 more

What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Loss-error-aware quantization of a low-bit neural network

US12112256B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12112256-B2
Application number	US-201816982441-A
Country	US
Kind code	B2
Filing date	Jul 26, 2018
Priority date	Jul 26, 2018
Publication date	Oct 8, 2024
Grant date	Oct 8, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, apparatus, systems and articles of manufacture for loss-error-aware quantization of a low-bit neural network are disclosed. An example apparatus includes a network weight partitioner to partition unquantized network weights of a first network model into a first group to be quantized and a second group to be retrained. The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights. In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss. The example apparatus includes a weight updater to update the second group of network weights based on the difference. The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus for loss-error-aware quantization of a low-bit neural network, the apparatus comprising: a network weight partitioner to partition unquantized network weights of a first deep neural network model into a first group of network weights to be quantized and a second group of network weights to be retrained; a loss calculator to process network weights of the first deep neural network model to calculate a first loss with respect to a loss function; a weight quantizer to quantize the first group of network weights to generate low-bit second network weights corresponding to the first group of network weights; the loss calculator to calculate a second loss of the low-bit second network weights with respect to the loss function and to determine a difference between the first loss and the second loss; a weight updater to update the second group of network weights based on the difference between the first loss and the second loss, the second group of network weights to be partitioned by the network weight partitioner to continue partitioning, quantizing, and retraining unquantized network weights; and a network model deployer to deploy a low-bit deep neural network model including the low-bit second network weights. 2. The apparatus of claim 1 , further including a network initializer to initialize weights, calculate a scaling factor, and set interval bound factors, wherein the network weight partitioner is to partition unquantized network weights using the interval bound factors. 3. The apparatus of claim 2 , wherein the network initializer, the network weight partitioner, the loss calculator, the weight quantizer, the loss calculator, and the weight updater are to process each layer of the first deep neural network model to generate the low-bit second network weights for each layer to enable the model deployer to deploy the low-bit deep neural network model including a plurality of layers. 4. The apparatus of claim 3 , wherein only convolutional layers and fully connected layers of the first deep neural network model are to be processed to generate the low-bit deep neural network model. 5. The apparatus of claim 1 , wherein the weight quantizer is to quantize network weights into at least one of binary or ternary equivalent weights. 6. The apparatus of claim 1 , wherein the loss calculator is to determine an approximation error between quantized network weights and the first deep neural network model to generate the difference between the first loss and the second loss. 7. The apparatus of claim 1 , wherein the first group of network weights is to be quantized using a center of weight distribution for the first group of network weights. 8. A tangible computer-readable storage medium comprising computer readable instructions which, when executed, cause at least one processor to implement at least: a network weight partitioner to partition unquantized network weights of a first deep neural network model into a first group of network weights to be quantized and a second group of network weights to be retrained; a loss calculator to process network weights of the first deep neural network model to calculate a first loss with respect to a loss function; a weight quantizer to quantize the first group of network weights to generate low-bit second network weights corresponding to the first group of network weights; the loss calculator to calculate a second loss of the low-bit second network weights with respect to the loss function and to determine a difference between the first loss and the second loss; a weight updater to update the second group of network weights based on the difference between the first loss and the second loss, the second group of network weights to be partitioned by the network weight partitioner to continue partitioning, quantizing, and retraining unquantized network weights; and a network model deployer to deploy a low-bit deep neural network model including the low-bit second network weights. 9. The computer-readable storage medium of claim 8 , wherein the instructions, when executed, further cause the at least one processor to implement a network initializer to initialize network weights, calculate a scaling factor, and set interval bound factors, wherein the network weight partitioner is to partition unquantized network weights using the interval bound factors. 10. The computer-readable storage medium of claim 9 , wherein the network initializer, the network weight partitioner, the loss calculator, the weight quantizer, the loss calculator, and the weight updater are to process each layer of the first deep neural network model to generate second network weights for each layer to enable the model deployer to deploy the low-bit deep neural network model including a plurality of layers. 11. The computer-readable storage medium of claim 10 , wherein only convolutional layers and fully connected layers of the first deep neural network model are to be processed to generate the low-bit deep neural network model. 12. The computer-readable storage medium of claim 8 , wherein the weight quantizer is to quantize network weights into at least one of binary or ternary equivalent weights. 13. The computer-readable storage medium of claim 8 , wherein the loss calculator is to determine an approximation error between quantized network weights and the first deep neural network model to generate the difference between the first loss and the second loss. 14. The computer-readable storage medium of claim 8 , wherein the first group of network weights is to be quantized using a center of weight distribution for the first group of network weights. 15. A computer-implemented method comprising: partitioning, using at least one processor, unquantized network weights of a first deep neural network model into a first group of network weights to be quantized and a second group of network weights to be retrained; processing, using the at least one processor, network weights of the first deep neural network model to calculate a first loss with respect to a loss function; quantizing, using the at least one processor, the first group of network weights to generate low-bit second network weights corresponding to the first group of network weights; calculating, using the at least one processor, a second loss of the second network weights with respect to the loss function to determine a difference between the first loss and the second loss; updating, using the at least one processor, the second group of network weights based on the difference between the first loss and the second loss, the second group of network weights to be partitioned using the at least one processor to continue partitioning, quantizing, and retraining unquantized network weights; and deploying, using the at least one processor, a low-bit deep neural network model including the low-bit second network weights. 16. The method of claim 15 , wherein the method is to process each layer of the first deep neural network model to generate second network weights for each layer to deploy the low-bit deep neural network model including a plurality of layers. 17. The method of claim 16 , wherein only convolutional layers and fully connected layers of the first deep neural network model are to be processed to generate the low-bit deep neural network model. 18. The method of claim 15 , wherein quantizing further includes quantizing network weights into at least one of binary or ternary equivalent weights. 19. The method of claim 15 , wherein quantizing furthe

Assignees

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/047
Probabilistic or stochastic networks · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/09
Supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 69180311

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12112256B2 cover?: Methods, apparatus, systems and articles of manufacture for loss-error-aware quantization of a low-bit neural network are disclosed. An example apparatus includes a network weight partitioner to partition unquantized network weights of a first network model into a first group to be quantized and a second group to be retrained. The example apparatus includes a loss calculator to process network …
Who is the assignee on this patent?: Intel Corp, Yao Anbang, Zhou Aojun, and 3 more
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).