Who is the assignee on this patent?

Electronics & Telecommunications Res Inst

What technology area does this patent fall under?

Primary CPC classification G06N3/048. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Quantization method and device for weights of batch normalization layer

US11455539B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11455539-B2
Application number	US-201916541275-A
Country	US
Kind code	B2
Filing date	Aug 15, 2019
Priority date	Nov 12, 2018
Publication date	Sep 27, 2022
Grant date	Sep 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An embodiment of the present invention provides a quantization method for weights of a plurality of batch normalization layers, including: receiving a plurality of previously learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of first weights; performing a first quantization on the plurality of first weights using the first distribution information to obtain a plurality of second weights; obtaining second distribution information of the plurality of second weights; and performing a second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights, and thereby reducing an error that may occur when quantizing the weight of the batch normalization layer.

First claim

Opening claim text (preview).

What is claimed is: 1. A quantization method for performing a quantization, comprising: for a plurality of batch normalization layers implemented in hardware in a neural network, performing operations for quantizing weights of the plurality of batch normalization layers to reduce a bit-width requirement amount and to reduce a memory capacity required to store the weights, the operations including: receiving a plurality of previously-learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of previously-learned first weights; performing a first quantization on the plurality of previously-learned first weights using the first distribution information to obtain a plurality of second weights; assigning a first bit width, which is a part of all bit widths assigned to the quantization, to the first quantization; obtaining second distribution information of the plurality of second weights; assigning a second bit width, which is a part of all bit widths assigned to the quantization, to a second quantization; performing the second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights having the first bit width and the second bit width; wherein the first bit width and the second bit width are a same bit width that is reduced from bit widths of the previously-learned first weights before the quantization. 2. The quantization method of claim 1 , wherein the first bit width and the second bit width are 4 bits. 3. The quantization method of claim 1 , wherein the first distribution information includes an average value and a variance value of the plurality of previously-learned first weights, and the second distribution information includes an average value and a variance value of the plurality of second weights. 4. The quantization method of claim 1 , wherein the first quantization is an integer power-of-two quantization, and the second quantization is a dynamic range floating point quantization. 5. The quantization method of claim 1 , further comprising repeating the receiving, the obtaining of the first distribution information, and the first quantizing, for the plurality of previously-learned first weights a predetermined number of times. 6. The quantization method of claim 1 , further comprising repeatedly applying a quantization process for a first layer of remaining layers among the plurality of batch normalization layers. 7. A batch normalization layer quantization device for performing a quantization, comprising: an input part that receives a plurality of previously-learned first weights of a plurality of batch normalization layers implemented in hardware in a neural network, and input data of the plurality of batch normalization layers; a processor that, for the plurality of batch normalization layers, performs operations for quantizing weights of the plurality of batch normalization layers to reduce a bit-width requirement amount and to reduce a memory capacity required to store the weights, the operations including: obtaining first distribution information of the plurality of previously-learned first weights; performing a first quantization on the plurality of previously-learned first weights using the first distribution information to obtain a plurality of second weights; assigning a first bit width, which is a part of all bit widths assigned to the quantization, to the first quantization; obtaining second distribution information of the second plurality of weights; assigning a second bit width, which is a part of all bit widths assigned to the quantization, to a second quantization; performing the second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights having the first bit width and the second bit width; and performing normalization on the input data using the plurality of final weights; and a memory that stores the plurality of final weights using the first bit width and the second bit width; wherein the first bit width and the second bit width are a same bit width that is reduced from bit widths of the previously-learned first weights before the quantization. 8. The batch normalization layer quantization device of claim 7 , wherein the first bit width and the second bit width are 4 bits. 9. The batch normalization layer quantization device of claim 7 , wherein the first quantization is an integer power-of-two quantization, and the second quantization is a dynamic range floating point quantization. 10. The batch normalization layer quantization device of claim 7 , wherein the first distribution information includes an average value and a variance value of the plurality of previously-learned first weights, and the second distribution information includes an average value and a variance value of the plurality of second weights. 11. The batch normalization layer quantization device of claim 7 , wherein the processor repeats the receiving, the obtaining of the first distribution information, and the first quantizing, for the plurality of previously-learned first weights a predetermined number of times. 12. A quantization method for performing a quantization, comprising: for a plurality of batch normalization layers implemented in hardware in a neural network, performing operations for quantizing weights of the plurality of batch normalization layers to reduce a bit-width requirement amount and to reduce a memory capacity required to store the weights, the operations including: receiving a plurality of previously-learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of previously-learned first weights; performing a first quantization on the plurality of previously-learned first weights using the first distribution information to obtain a plurality of second weights; assigning a first bit width, which is a part of all bit widths assigned to the quantization, to the first quantization; obtaining second distribution information of the plurality of second weights; assigning a second bit width, which is a part of all bit widths assigned to the quantization, to a second quantization; performing the second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights having the first bit width and the second bit width; and performing normalization on an input data using the plurality of final weights; wherein the first bit width and the second bit width are a same bit width that is reduced from bit widths of the previously-learned first weights before the quantization. 13. The quantization method of claim 12 , wherein the first distribution information includes an average value and a variance value of the plurality of previously-learned first weights, and the second distribution information includes an average value and a variance value of the plurality of second weights. 14. The quantization method of claim 13 , further comprising repeatedly applying a quantization process for a first layer of remaining layers among the plurality of batch normalization layers.

Assignees

Electronics & Telecommunications Res Inst

Inventors

Classifications

G06N3/048Primary
Activation functions · CPC title
G06N3/0495Primary
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06F17/18
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
G06N3/08
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 70550573

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11455539B2 cover?: An embodiment of the present invention provides a quantization method for weights of a plurality of batch normalization layers, including: receiving a plurality of previously learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of first weights; performing a first quantization on the plurality of first weights using the …
Who is the assignee on this patent?: Electronics & Telecommunications Res Inst
What technology area does this patent fall under?: Primary CPC classification G06N3/048. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Information processing method and terminal device

System and method enabling one-hot neural networks on a machine learning compute platform

Neural network compression

Method for encoding and decoding quantized matrix and apparatus using same

Method and device for encoding/decoding images

Convolution neural network training apparatus and method thereof

Method and apparatus for adaptive bit-allocation in neural systems

Frequently asked questions