Method and apparatus for quantizing deep neural network

US2022398430A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022398430-A1
Application numberUS-202217659067-A
CountryUS
Kind codeA1
Filing dateApr 13, 2022
Priority dateJun 9, 2021
Publication dateDec 15, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for quantizing a deep neural network is provided, which includes extracting first statistical information on output values of a first normalization layer included in the deep neural network, determining a discretization interval associated with input values of a subsequent layer of the first normalization layer by using the extracted first statistical information, and quantizing the input values of the subsequent layer into discretized values having the determined discretization interval.

First claim

Opening claim text (preview).

1 . A method for quantizing a deep neural network used for inference, the method performed by one or more processors and comprising: extracting first statistical information on output values of a first normalization layer included in the deep neural network; determining a discretization interval associated with input values of a subsequent layer of the first normalization layer by using the extracted first statistical information; and quantizing the input values of the subsequent layer into discretized values having the determined discretization interval, wherein the extracting the first statistical information includes extracting a first scale factor for one or more channels associated with the first normalization layer from information indicative of a distribution of the output values of the first normalization layer, the method further comprising extracting, from information indicative of a distribution of output values of a second normalization layer disposed on a shortcut path included in the deep neural network and disposed before the subsequent layer, a second scale factor for one or more channels associated with the second normalization layer, wherein the determining the discretization interval includes calculating a clipping value using the extracted first scale factor and the extracted second scale factor. 2 . The method according to claim 1 , wherein the deep neural network is a model trained by using training data, and the determining the discretization interval includes: calculating the clipping value using the extracted first scale factor and the extracted second scale factor without requiring use of at least a portion of the training data for the deep neural network. 3 . The method according to claim 1 , wherein the second normalization layer is indirectly connected to the subsequent layer, while there is no separate normalization layer disposed between the subsequent layer and the second normalization layer. 4 . The method according to claim 1 , wherein the determining the discretization interval includes determining the discretization interval associated with the input values of the subsequent layer by using the calculated clipping value and a number of bits of data used for inference in the deep neural network. 5 . The method according to claim 1 , wherein the calculating the clipping value includes: selecting a maximum value from among values calculated based on the first scale factor and the second scale factor for each of the one or more channels associated with the first normalization layer and the one or more channels associated with the second normalization layer; and calculating the clipping value by using the selected maximum value and a preset value corresponding to a performance equal to or greater than a predetermined reference. 6 . The method according to claim 1 , wherein the output values of the first normalization layer and the output values of the second normalization layer have a normal distribution. 7 . The method according to claim 1 , wherein a number of bits of data used for training the deep neural network is greater than a number of bits of data used for inference of the deep neural network. 8 . A computer program stored in a non-transitory computer-readable recording medium for executing, on a computer, the method according to claim 1 . 9 . A computing device, comprising: a memory storing one or more instructions; and a processor configured to execute the stored one or more instructions to: extract first statistical information on output values of a first normalization layer included in a deep neural network; determine a discretization interval associated with input values of a subsequent layer of the first normalization layer by using the extracted first statistical information; and quantize the input values of the subsequent layer into discretized values having the determined discretization interval, wherein the processor is further configured to: extract a first scale factor for one or more channels associated with the first normalization layer from information indicative of a distribution of the output values of the first normalization layer; extract, from information indicative of a distribution of output values of a second normalization layer disposed on a shortcut path included in the deep neural network and disposed before the subsequent layer, a second scale factor for one or more channels associated with the second normalization layer; and calculate a clipping value using the extracted first scale factor and the extracted second scale factor. 10 . The computing device according to claim 9 , wherein the deep neural network is a model trained by using training data, and the processor is further configured to calculate the clipping value using the extracted first scale factor and the extracted second scale factor without requiring use of at least a portion of the training data for the deep neural network. 11 . The computing device according to claim 9 , wherein the second normalization layer is indirectly connected to the subsequent layer, while there is no separate normalization layer disposed between the subsequent layer and the second normalization layer. 12 . The computing device according to claim 9 , wherein the processor is further configured to determine the discretization interval associated with the input values of the subsequent layer by using the calculated clipping value and a number of bits of data used for inference in the deep neural network. 13 . The computing device according to claim 9 , wherein the processor is further configured to: select a maximum value from among values calculated based on the first scale factor and the second scale factor for each of the one or more channels associated with the first normalization layer and the one or more channels associated with the second normalization layer; and calculate the clipping value by using the selected maximum value and a preset value corresponding to a performance equal to or greater than a predetermined reference. 14 . The computing device according to claim 9 , wherein the output values of the first normalization layer and the output values of the second normalization layer have a normal distribution. 15 . The computing device according to claim 9 , wherein a number of bits of data used for training the deep neural network is greater than a number of bits of data used for inference of the deep neural network.

Assignees

Inventors

Classifications

  • G06N3/04Primary

    Architecture, e.g. interconnection topology · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • G06N3/0495Primary

    Quantised networks; Sparse networks; Compressed networks · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022398430A1 cover?
A method for quantizing a deep neural network is provided, which includes extracting first statistical information on output values of a first normalization layer included in the deep neural network, determining a discretization interval associated with input values of a subsequent layer of the first normalization layer by using the extracted first statistical information, and quantizing the in…
Who is the assignee on this patent?
Rebellions Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).