Variable quantization for neural networks
US-2022284260-A1 · Sep 8, 2022 · US
US12572785B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12572785-B2 |
| Application number | US-202217860439-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 8, 2022 |
| Priority date | Jul 8, 2022 |
| Publication date | Mar 10, 2026 |
| Grant date | Mar 10, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to a method of inter-layer format conversion for a neural network, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the method comprising: extracting data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; determining one or more conversion parameters based on the extracted data statistics and the second data format; and generating the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters.
Opening claim text (preview).
The invention claimed is: 1 . A method of operation of a neural network hardware accelerator, the method comprising: executing a neural network on the hardware accelerator, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the second data format differing from the first data format; extracting, by the hardware accelerator, data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; determining, by the hardware accelerator, one or more conversion parameters based on the extracted data statistics and the second data format; generating, by the hardware accelerator, the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters; and providing, by the hardware accelerator, the generated second data to the second layer for processing. 2 . The method of claim 1 , further comprising obtaining, by the hardware accelerator, one or more format parameters representative of the second data format, wherein determining one or more conversion parameters is performed using the one or more format parameters. 3 . The method of claim 2 , wherein said one or more format parameters comprises one or more of: a bit size, a precision, fixed point, floating point. 4 . The method of claim 1 , wherein the first data format is a first number format and the second data format is a second number format. 5 . The method of claim 4 , wherein the first number format or the second number format comprises an integer format, a floating point format, or a block floating point format. 6 . The method of claim 4 , wherein modifying said data output by the first layer using the one or more conversion parameters comprises converting said data output by the first layer from a first precision to a second precision. 7 . The method of claim 4 , wherein modifying said data output by the first layer using the one or more conversion parameters comprises converting a size of said data output by the first layer from a first number of bits to a second number of bits. 8 . The method of claim 1 , wherein said data statistics comprises one or more of: a mean, a variance, a minimum, a maximum, or a combination thereof. 9 . The method of claim 1 , wherein said one or more conversion parameters comprise one or more of: an exponent of said data output by the first layer, a scale factor between said first data and said second data, a zero point, an indication of linearity or non-linearity, an indication of exponent bias, or a combination thereof. 10 . The method of claim 1 , further comprising generating, by the hardware accelerator, subsequent second data for the second layer by modifying subsequent data output by the first layer using at least one of the one or more conversion parameters. 11 . A non-transitory computer-readable medium comprising machine-readable code which, when executed by a processor of a neural network hardware accelerator, causes the processor to perform a method comprising: executing a neural network, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the second data format differing from the first data format; extracting data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; determining one or more conversion parameters based on the extracted data statistics and the second data format; generating the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters; and providing the generated second data to the second layer for processing. 12 . Hardware for executing a neural network, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the second data format differing from the first data format, the hardware further comprising: statistics extraction circuitry configured for extracting data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; format deduction circuitry configured for determining one or more conversion parameters based on the extracted data statistics and the second data format; and modification circuitry configured for generating the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters and for providing the generated second data to the second layer for processing. 13 . The hardware of claim 12 , wherein said format deduction circuitry is further configured to obtain one or more format parameters representative of the second data format, and said format deduction module is configured to determine said one or more conversion parameters using the one or more format parameters. 14 . The hardware of claim 13 , wherein said one or more format parameters comprises one or more of: a bit size, a precision, fixed point, floating point. 15 . The hardware of claim 12 , wherein the first data format is a first number format and the second data format is a second number format. 16 . The hardware of claim 15 , wherein said modification module modifies said data output by the first layer using the one or more conversion parameters by converting said data output by the first layer from a first precision to a second precision. 17 . The hardware of claim 12 , wherein said data statistics comprises one or more of: a mean, a variance, a minimum, a maximum, or a combination thereof. 18 . The hardware of claim 12 , wherein said one or more conversion parameters comprise one or more of: an exponent of said data output by the first layer, a scale factor between said first data and said second data, or a combination thereof. 19 . A method of quantization-aware training of a neural network executing on a hardware accelerator, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the second data format differing from the first data format, the method comprising: extracting, by the hardware accelerator, data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; determining, by the hardware accelerator, one or more conversion parameters based on the extracted data statistics and the second data format; generating, by the hardware accelerator, the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters; and providing, by the hardware accelerator, the generated second data to the second layer for processing.
Programmable structures, i.e. where the code converter contains apparatus which is operator-changeable to modify the conversion process · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Conversion to or from floating-point codes · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.