Dynamic quantization for deep neural network inference system and method

US11580719B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11580719-B2
Application numberUS-202017128365-A
CountryUS
Kind codeB2
Filing dateDec 21, 2020
Priority dateJul 6, 2017
Publication dateFeb 14, 2023
Grant dateFeb 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for dynamically quantizing feature maps of a received image. The method includes convolving an image based on a predicted maximum value, a predicted minimum value, trained kernel weights and the image data. The input data is quantized based on the predicted minimum value and predicted maximum value. The output of the convolution is computed into an accumulator and re-quantized. The re-quantized value is output to an external memory. The predicted min value and the predicted max value are computed based on the previous max values and min values with a weighted average or a pre-determined formula. Initial min value and max value are computed based on known quantization methods and utilized for initializing the predicted min value and predicted max value in the quantization process.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: receiving a set of input values within a first range having a first bit depth; performing a convolution operation on the set of input values to produce a set of intermediate values having a second bit depth that is different from the first bit depth; re-quantizing the set of intermediate values by performing a division operation on the set of intermediate values to produce a set of output values within a second range having the first bit depth; determining a predicted maximum for the second range based on a maximum of the set of input values; and determining a predicted minimum for the second range based on a minimum of the set of input values. 2. The method of claim 1 , wherein: the set of input values is a first set of input values; the receiving includes receiving a plurality of sets of input values that includes the first set of input values; the determining of the predicted maximum for the second range includes: determining a respective maximum of each set of the plurality of sets of input values; and averaging the respective maximums; and the determining of the predicted minimum for the second range includes: determining a respective minimum of each set of the plurality of sets of input values; and averaging the respective minimums. 3. The method of claim 2 , wherein the plurality of sets of input values are associated with a plurality of images. 4. The method of claim 1 , wherein: the set of input values is a first set of input values; the receiving includes receiving a plurality of sets of input values that includes the first set of input values; the performing of the convolution operation performs the convolution operation on each set of the plurality of sets of input values to produce a respective set of intermediate values; the determining of the predicted maximum for the second range includes: determining a respective maximum of each of the respective sets of intermediate values; and determining an initial maximum based on the respective maximums; and the determining of the predicted minimum for the second range includes: determining a respective minimum of each of the respective sets of intermediate values; and determining an initial minimum based on the respective minimums. 5. The method of claim 4 , wherein each set of the plurality of sets of input values is associated with a respective layer of a feature map. 6. The method of claim 4 , wherein the determining of the initial maximum and the initial minimum are performed based on the first set of input values being associated with a first image in a sequence. 7. The method of claim 1 , wherein the convolution operation includes: receiving a set of weightings; applying the set of weightings to the set of input values to produce a weighted set of input values; and applying a finite impulse response filter operation to the weighted set of input values. 8. The method of claim 1 , wherein the first bit depth is 8 bits and the second bit depth is 32 bits. 9. An integrated circuit comprising: an input configured to receive a set of input values within a first range and having a first bit depth; a convolution circuit coupled to the input and configured to: perform a convolution operation on the set of input values to produce a set of intermediate values having a second bit depth that is different from the first bit depth; and determine a second range by: determining a predicted maximum for the second range based on a maximum of the set of input values; and determining a predicted minimum for the second range based on a minimum of the set of input values; and a re-quantization circuit coupled to the convolution circuit and configured to re-quantize the set of intermediate values by performing a division operation on the set of intermediate values to produce a set of output values within the second range having the first bit depth. 10. The integrated circuit of claim 9 , wherein: the set of input values is a first set of input values; and the convolution circuit is configured to: determine the predicted maximum by: determining a respective maximum of each set of a plurality of sets of input values that includes the first set of input values; and averaging the respective maximums; and determine the predicted minimum by: determining a respective minimum of each set of the plurality of sets of input values; and averaging the respective minimums. 11. The integrated circuit of claim 10 , wherein the plurality of sets of input values are associated with a plurality of images. 12. The integrated circuit of claim 9 , wherein: the set of input values is a first set of input values; and the convolution circuit is configured to: perform the convolution operation on each set of a plurality of sets of input values that includes the first set of input values to produce a respective set of intermediate values; determine the predicted maximum by: determining a respective maximum of each of the respective sets of intermediate values; and determining an initial maximum based on the respective maximums; and determine the predicted minimum by: determining a respective minimum of each of the respective sets of intermediate values; and determining an initial minimum based on the respective minimums. 13. The integrated circuit of claim 12 , wherein each set of the plurality of sets of input values is associated with a respective layer of a feature map. 14. The integrated circuit of claim 12 , wherein the convolution circuit is configured to perform the determination of the initial maximum and the initial minimum based on the first set of input values being associated with a first image in a sequence. 15. The integrated circuit of claim 9 , wherein the convolution circuit is configured to perform the convolution operation by: applying a set of weightings to the set of input values to produce a weighted set of input values; and applying a finite impulse response filter operation to the weighted set of input values. 16. The integrated circuit of claim 9 , wherein the first bit depth is 8 bits and the second bit depth is 32 bits. 17. An integrated circuit comprising: an input configured to receive a set of input values within a first range and having a first bit depth; a convolution circuit coupled to the input and configured to: perform a convolution operation on the set of input values to produce a set of intermediate values having a second bit depth that is different from the first bit depth; and determine a second range by: determining a predicted maximum for the second range based on a maximum of the set of input values; and determining a predicted minimum for the second range based on a minimum of the set of input values; and a re-quantization circuit coupled to the convolution circuit and configured to re-quantize the set of intermediate values by performing a shift on the set of intermediate values to produce a set of output values within the second range having the first bit depth. 18. The integrated circuit of claim 17 , wherein the set of input values is associated with a feature map. 19. The integrated circuit of claim 17 , wherein the convolution circuit is configured to perform the convolution operation by: applying a set of weightings to the set of input values to produce a weighted set of input values; and applying a finite impulse response filter operation to the weighted set of input values. 20. The integrated circuit of claim

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Learning methods · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Supervised learning · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11580719B2 cover?
A method for dynamically quantizing feature maps of a received image. The method includes convolving an image based on a predicted maximum value, a predicted minimum value, trained kernel weights and the image data. The input data is quantized based on the predicted minimum value and predicted maximum value. The output of the convolution is computed into an accumulator and re-quantized. The re-…
Who is the assignee on this patent?
Texas Instruments Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).