Dynamic quantization of neural networks
US-11755901-B2 · Sep 12, 2023 · US
US12282852B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12282852-B2 |
| Application number | US-202318363408-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 1, 2023 |
| Priority date | Dec 28, 2017 |
| Publication date | Apr 22, 2025 |
| Grant date | Apr 22, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus for applying dynamic quantization of a neural network is described herein. The apparatus includes a scaling unit and a quantizing unit. The scaling unit is to calculate an initial desired scale factors of a plurality of inputs, weights and a bias and apply the input scale factor to a summation node. Also, the scaling unit is to determine a scale factor for a multiplication node based on the desired scale factors of the inputs and select a scale factor for an activation function and an output node. The quantizing unit is to dynamically requantize the neural network by traversing a graph of the neural network.
Opening claim text (preview).
What is claimed is: 1. At least one non-transitory computer readable medium comprising instructions to cause at least one processor circuit to at least: determine a scale factor based on a maximum value over a window of past first values; train a neural network based on second values scaled by the scale factor; and quantize respective ones of the scaled second values. 2. The at least one non-transitory computer readable medium of claim 1 , wherein the neural network is a floating point neural network. 3. The at least one non-transitory computer readable medium of claim 1 , wherein the neural network is a long short term memory (LSTM) network. 4. The at least one non-transitory computer-readable medium of claim 1 , wherein the window of the past first values includes a past maximum value. 5. The at least one non-transitory computer-readable medium of claim 1 , wherein the second values correspond to weights of the neural network. 6. The at least one non-transitory computer-readable medium of claim 1 , wherein the second values correspond to third values to be operated upon by the neural network. 7. The at least one non-transitory computer readable medium of claim 1 , wherein one or more of the at least one processor circuit is to perform the quantizing of the respective ones of the scaled second values from floating point values to integer values. 8. The at least one non-transitory computer readable medium of claim 1 , wherein one or more of the at least one processor circuit is to apply the scale factor to the second values in floating point form before the quantization to preserve resolution of the second values in integer form after the quantization. 9. An apparatus comprising: interface circuitry; computer-readable instructions; and programmable circuitry to at least one of instantiate or execute the computer-readable instructions to at least: determine a scale factor based on a maximum value over a window of past first values; train a neural network based on a plurality of second values scaled by the scale factor; and quantize respective ones of the scaled second values. 10. The apparatus of claim 9 , wherein the neural network includes a floating point neural network. 11. The apparatus of claim 9 , wherein the neural network includes a long short term memory (LSTM) network. 12. The apparatus of claim 9 , wherein the window of the past first values includes a past maximum value. 13. The apparatus of claim 9 , wherein the second values correspond to weights of the neural network. 14. The apparatus of claim 9 , wherein the second values correspond to third values to be operated upon by the neural network. 15. The apparatus of claim 9 , wherein the programmable circuitry is to perform the quantizing of the respective ones of the scaled second values from floating point values to integer values. 16. The apparatus of claim 9 , wherein the programmable circuitry is to apply the scale factor to the second values in floating point form before the quantization to preserve resolution of the second values in integer form after the quantization. 17. A method comprising: determining a scale factor based on a maximum value over a window of past first values; training a neural network based on a plurality of second values scaled by the scale factor; and quantizing respective ones of the plurality of scaled second values. 18. The method of claim 17 , wherein the neural network is at least one of a floating point neural network or a long short term memory (LSTM) network. 19. The method of claim 17 , wherein the window of the past first values includes a past maximum value. 20. The method of claim 17 , wherein the plurality of second values correspond to at least one of weights of the neural network or third values to be operated upon by the neural network.
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Activation functions · CPC title
adaptive, e.g. self learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.