Dynamic quantization of neural networks

US12282852B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12282852-B2
Application numberUS-202318363408-A
CountryUS
Kind codeB2
Filing dateAug 1, 2023
Priority dateDec 28, 2017
Publication dateApr 22, 2025
Grant dateApr 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus for applying dynamic quantization of a neural network is described herein. The apparatus includes a scaling unit and a quantizing unit. The scaling unit is to calculate an initial desired scale factors of a plurality of inputs, weights and a bias and apply the input scale factor to a summation node. Also, the scaling unit is to determine a scale factor for a multiplication node based on the desired scale factors of the inputs and select a scale factor for an activation function and an output node. The quantizing unit is to dynamically requantize the neural network by traversing a graph of the neural network.

First claim

Opening claim text (preview).

What is claimed is: 1. At least one non-transitory computer readable medium comprising instructions to cause at least one processor circuit to at least: determine a scale factor based on a maximum value over a window of past first values; train a neural network based on second values scaled by the scale factor; and quantize respective ones of the scaled second values. 2. The at least one non-transitory computer readable medium of claim 1 , wherein the neural network is a floating point neural network. 3. The at least one non-transitory computer readable medium of claim 1 , wherein the neural network is a long short term memory (LSTM) network. 4. The at least one non-transitory computer-readable medium of claim 1 , wherein the window of the past first values includes a past maximum value. 5. The at least one non-transitory computer-readable medium of claim 1 , wherein the second values correspond to weights of the neural network. 6. The at least one non-transitory computer-readable medium of claim 1 , wherein the second values correspond to third values to be operated upon by the neural network. 7. The at least one non-transitory computer readable medium of claim 1 , wherein one or more of the at least one processor circuit is to perform the quantizing of the respective ones of the scaled second values from floating point values to integer values. 8. The at least one non-transitory computer readable medium of claim 1 , wherein one or more of the at least one processor circuit is to apply the scale factor to the second values in floating point form before the quantization to preserve resolution of the second values in integer form after the quantization. 9. An apparatus comprising: interface circuitry; computer-readable instructions; and programmable circuitry to at least one of instantiate or execute the computer-readable instructions to at least: determine a scale factor based on a maximum value over a window of past first values; train a neural network based on a plurality of second values scaled by the scale factor; and quantize respective ones of the scaled second values. 10. The apparatus of claim 9 , wherein the neural network includes a floating point neural network. 11. The apparatus of claim 9 , wherein the neural network includes a long short term memory (LSTM) network. 12. The apparatus of claim 9 , wherein the window of the past first values includes a past maximum value. 13. The apparatus of claim 9 , wherein the second values correspond to weights of the neural network. 14. The apparatus of claim 9 , wherein the second values correspond to third values to be operated upon by the neural network. 15. The apparatus of claim 9 , wherein the programmable circuitry is to perform the quantizing of the respective ones of the scaled second values from floating point values to integer values. 16. The apparatus of claim 9 , wherein the programmable circuitry is to apply the scale factor to the second values in floating point form before the quantization to preserve resolution of the second values in integer form after the quantization. 17. A method comprising: determining a scale factor based on a maximum value over a window of past first values; training a neural network based on a plurality of second values scaled by the scale factor; and quantizing respective ones of the plurality of scaled second values. 18. The method of claim 17 , wherein the neural network is at least one of a floating point neural network or a long short term memory (LSTM) network. 19. The method of claim 17 , wherein the window of the past first values includes a past maximum value. 20. The method of claim 17 , wherein the plurality of second values correspond to at least one of weights of the neural network or third values to be operated upon by the neural network.

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • G06N3/0495Primary

    Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Activation functions · CPC title

  • adaptive, e.g. self learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12282852B2 cover?
An apparatus for applying dynamic quantization of a neural network is described herein. The apparatus includes a scaling unit and a quantizing unit. The scaling unit is to calculate an initial desired scale factors of a plurality of inputs, weights and a bias and apply the input scale factor to a summation node. Also, the scaling unit is to determine a scale factor for a multiplication node bas…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/0495. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).