What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for generating fixed-point quantized neural network

US12355471B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12355471-B2
Application number	US-202218084948-A
Country	US
Kind code	B2
Filing date	Dec 20, 2022
Priority date	Aug 4, 2017
Publication date	Jul 8, 2025
Grant date	Jul 8, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating-point parameter values based on the statistical distribution for each channel, determining fractional lengths of a bias and a weight for each channel among the parameters of the fixed-point expression for each channel based on a result of performing a convolution operation, and generating a fixed-point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating a fixed-point quantized neural network, the method comprising: analyzing a statistical distribution of floating-point values for each channel of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network; quantizing the floating-point values for each channel to fixed-point values for each channel based on the statistical distribution for each channel; determining fractional lengths of fixed-point expressions of parameters for performing an operation of the quantized fixed-point values, for each channel; and generating a fixed-point quantized neural network in which the parameters for each channel have the determined fractional lengths being different for at least some channels, including performing a channel-wise quantization for each channel included in the pre-trained floating-point neural network, wherein the determining of the fractional lengths comprises determining a fractional length of a bias for each channel based on fractional lengths of input activations and fractional lengths of weights for each channel input to multiply-accumulate (MAC) operations, and the determining of the fractional lengths further comprises determining a fractional length of a weight of the weights for each channel by decreasing the fractional length of the weight by a difference between the fractional length of one of the fixed-point expressions corresponding to a result of one of the MAC operations to which the weight was input and the determined fractional length of the bias. 2. The method of claim 1 , wherein the analyzing of the statistical distribution comprises obtaining statistics for each channel of the floating-point values of weights, input activations, and output activations used in each channel during pre-training of the pre-trained floating-point neural network. 3. The method of claim 1 , wherein the operation comprises a partial sum operation of a convolution operation between a plurality of channels, the partial sum operation comprises a plurality of multiply-accumulate (MAC) operations and an Add operation, the parameters comprise the bias and the weight for each channel. 4. The method of claim 3 , wherein the determining of the fractional length of the bias comprises determining the fractional length of the bias based on a maximum fractional length among fractional lengths of fixed-point expressions corresponding to results of the MAC operations. 5. The method of claim 4 , wherein the partial sum operation comprises: a first MAC operation between a first input activation of a first channel of an input feature map of the feature maps and a first weight of a first channel of the kernel; a second MAC operation between a second input activation of a second channel of the input feature map and a second weight of a second channel of the kernel; and an Add operation between a result of the first MAC operation, a result of the second MAC operation, and the bias, and the determining of the fractional length of the bias further comprises: obtaining a first fractional length of a first fixed-point expression corresponding to the result of the first MAC operation; obtaining a second fractional length of a second fixed-point expression corresponding to the result of the second MAC operation; and determining the fractional length of the bias to be a maximum fractional length among the first fractional length and the second fractional length. 6. The method of claim 5 , further comprising bit-shifting a fractional length of a fixed-point expression having a smaller fractional length among the first fixed-point expression and the second fixed-point expression based on the determined fractional length of the bias, wherein the fixed-point quantized neural network comprises information about an amount of the bit-shifting. 7. The method of claim 3 , wherein the determining of the fractional length of the bias comprises determining the fractional length of the bias to be a minimum fractional length among fractional lengths of fixed-point expressions respectively corresponding to results of the MAC operations. 8. The method of claim 3 , wherein the partial sum operation comprises: a first MAC operation between a first input activation of a first channel of an input feature map of the feature maps and a first weight of a first channel of the kernel; a second MAC operation between a second input activation of a second channel of the input feature map and a second weight of a second channel of the kernel; and an Add operation between a result of the first MAC operation, a result of the second MAC operation, and the bias, the determining of the fractional lengths further comprises: obtaining a first fractional length of a first fixed-point expression corresponding to the result of the first MAC operation; and obtaining a second fractional length of a second fixed-point expression corresponding to the result of the second MAC operation, the determining of the fractional length of the bias comprises determining the fractional length of the bias to be a minimum fractional length among the first fractional length and the second fractional length, and the determining of the fractional lengths further comprises tuning a fractional length of the weight input to one of the first MAC operation and the second MAC operation that produces a result having a fixed-point expression having the minimum fractional length by decreasing the fractional length of the weight by a difference between the first fractional length and the second fractional length. 9. The method of claim 1 , wherein the statistical distribution for each channel is a distribution approximated by a normal distribution or a Laplace distribution, and the quantizing of the floating-point values comprises determining fixed-point expression of the fixed-point values based on a fractional length for each channel determined based on any one or any combination of any two or more of a mean, a variance, a standard deviation, a maximum value, and a minimum value of the floating-point values for each channel obtained from the statistical distribution for each channel. 10. The method of claim 1 , further comprising retraining, after the determining of the fractional lengths is completed, the fixed-point quantized neural network with the determined fractional lengths of the parameters for each channel set as constraints of the fixed-point quantized neural network to fine tune the fixed-point quantized neural network. 11. An apparatus for generating a fixed-point quantized neural network, the apparatus comprising: a memory configured to store at least one program; and a processor configured to execute the at least one program, wherein the processor executing the at least one program configures the processor to: analyze a statistical distribution of floating-point values for each channel of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, quantize the floating-point values for each channel to fixed-point values for each channel based on the statistical distribution for each channel, determine fractional lengths of fixed-point expressions of parameters for performing an operation of the quantized fixed-point values, for each channel, and generate a fixed-point quantized neural network in which the parameters for each channel have the determined fractional lengths being different for at least some channels, including performing a channel-wise quantization for each channel included in the pre-trained floating-point neural network, wherein the determining of the fractional lengths comprises determining a fracti

Assignees

Samsung Electronics Co Ltd

Inventors

Classifications

G06F2207/4824
Neural networks · CPC title
G06F7/5443
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
G06F7/49947
Rounding · CPC title
G06F7/483
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
G06N3/084
Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

View patent family 63407020

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12355471B2 cover?: A method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the fl…
Who is the assignee on this patent?: Samsung Electronics Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).