Method and apparatus for learning low-precision neural network that combines weight quantization and activation quantization
US-11270187-B2 · Mar 8, 2022 · US
US12400112B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12400112-B2 |
| Application number | US-202017115285-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 8, 2020 |
| Priority date | Dec 8, 2020 |
| Publication date | Aug 26, 2025 |
| Grant date | Aug 26, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A neural inference chip is provided, including at least one neural inference core. The at least one neural inference core is adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of intermediate outputs. The at least one neural inference core comprises a plurality of activation units configured to receive the plurality of intermediate outputs and produce a plurality of activations. Each of the plurality of activation units is configured to apply a configurable activation function to its input. The configurable activation function has at least a re-ranging term and a scaling term, the re-ranging term determining the range of the activations and the scaling term determining the scale of the activations. Each of the plurality of activations units is configured to obtain the re-ranging term and the scaling term from one or more look up tables.
Opening claim text (preview).
What is claimed is: 1. A neural inference chip comprising: at least one neural inference core adapted to apply a plurality of synaptic weights to a plurality of input data tensors to produce a plurality of intermediate output data tensors, the at least one neural inference core comprising a plurality of activation units configured to receive the plurality of intermediate output data tensors and produce a plurality of activations, each of the plurality of activation units being configured to apply a configurable activation function to its input data tensor, the configurable activation function having at least a re-ranging term and a scaling term, the re-ranging term applying a bias that determines a range of the plurality of activations and the scaling term applying a slope that determines a scale of the plurality of activations, wherein the re-ranging term and the scaling term are obtained separately from predetermined sets, each predetermined set being associated with one of the re-ranging term or the scaling term, each of the plurality of activations units obtaining the re-ranging term and the scaling term from one or more look up tables. 2. The neural inference chip of claim 1 , wherein the plurality of activations have flexible precision. 3. The neural inference chip of claim 1 , wherein the plurality of activations have a floating point value. 4. The neural inference chip of claim 1 , wherein the plurality of input data tensors have 16 -bit precision. 5. The neural inference chip of claim 1 , wherein the plurality of input data tensors have 32 -bit precision. 6. The neural inference chip of claim 1 , wherein each predetermined set is a lookup table. 7. The neural inference chip of claim 1 , wherein each of the re-ranging term and the scaling term are learned. 8. The neural inference chip of claim 1 , wherein the configurable activation function is selected from a list consisting of: Boolean, trinary, linear, ReLU, shifted ReLU, ExpReLU, sigmoid, and tanh. 9. An integrated circuit, comprising: at least one neural inference core, adapted to apply a plurality of synaptic weights to a plurality of input data tensors to produce a plurality of intermediate output data tensors, the at least one neural inference core comprising a plurality of activation units configured to receive the plurality of intermediate output data tensors and produce a plurality of activations, each of the plurality of activation units being configured to apply an activation function to its input data tensor, the activation function being Boolean, trinary, linear, ReLU, shifted ReLU, ExpReLU, sigmoid, or tanh, the activation function having at least a re-ranging term and a scaling term, the re-ranging term applying a bias that determines a range of the plurality of activations and the scaling term applying a slope that determines a scale of the plurality of activations, each of the re-ranging term and the scaling term having an associated lookup table, each of the plurality of activations units obtaining the re-ranging term and the scaling term separately from the associated lookup tables. 10. A computer-implemented method, comprising: applying a plurality of synaptic weights to a plurality of input data tensors to produce a plurality of intermediate output data tensors; receiving the plurality of intermediate output data tensors and produce therefrom a plurality of activations, wherein producing the plurality of activations comprises: applying a configurable activation function to the plurality of intermediate output data tensors, the configurable activation function having at least a re-ranging term and a scaling term, the re-ranging term applying a bias that determines a range of the plurality of activations and the scaling term applying a slope that determines a scale of the plurality of activations, each of the re-ranging term and the scaling term having an associated lookup table, and obtaining the re-ranging term and the scaling term separately from the associated lookup tables. 11. The method of claim 10 , wherein the plurality of activations have flexible precision. 12. The method of claim 10 , wherein the plurality of activations have a floating point value. 13. The method of claim 10 , wherein the plurality of input data tensors have 16-bit precision. 14. The method of claim 10 , wherein the plurality of input data tensors have 32-bit precision. 15. The method of claim 10 , wherein each of the re-ranging term and the scaling term have one associated lookup table. 16. The method of claim 10 , wherein each of the re-ranging term and the scaling term are learned. 17. The method of claim 10 , wherein the configurable activation function is selected from a list consisting of: Boolean, trinary, linear, ReLU, shifted ReLU, ExpReLU, sigmoid, and tanh.
Related publications grouped by family.
Answers are generated from the same data shown on this page.