Hardware node having a matrix vector unit with block-floating point processing
US-10167800-B1 · Jan 1, 2019 · US
US12175349B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12175349-B2 |
| Application number | US-201816180250-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 5, 2018 |
| Priority date | Nov 3, 2017 |
| Publication date | Dec 24, 2024 |
| Grant date | Dec 24, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Hierarchical methods for selecting fixed point number formats with reduced mantissa bit lengths for representing values input to, and/or output, from, the layers of a DNN. The methods begin with one or more initial fixed point number formats for each layer. The layers are divided into subsets of layers and the mantissa bit lengths of the fixed point number formats are iteratively reduced from the initial fixed point number formats on a per subset basis. If a reduction causes the output error of the DNN to exceed an error threshold, then the reduction is discarded, and no more reductions are made to the layers of the subset. Otherwise a further reduction is made to the fixed point number formats for the layers in that subset. Once no further reductions can be made to any of the subsets the method is repeated for continually increasing numbers of subsets until a predetermined number of layers per subset is achieved.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method of selecting a fixed point number format for representing values input to, and/or output from, a plurality of layers of a Deep Neural Network (DNN) for use in configuring a hardware implementation of the DNN, the method comprising: receiving an instantiation of the DNN configured to represent the values of each of the plurality of layers using one or more initial fixed point number formats for that layer, each initial fixed point number format comprising an exponent and a mantissa bit length; forming a plurality of disjoint subsets from the plurality of layers; for each subset of the plurality of subsets, iteratively adjusting the fixed point number formats for the layers in the subset to fixed point number formats with a next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds an error threshold; in response to determining that the subsets comprise greater than a lower threshold number of layers, forming a higher number of disjoint subsets than the plurality of disjoint subsets from the plurality of layers and repeating the iterative adjusting; and in response to determining that the subsets comprise less than or equal to the lower threshold number of layers, outputting the fixed point number formats for the plurality of layers. 2. The method of claim 1 , wherein iteratively adjusting the fixed point number formats for the layers in the subset to fixed point number formats with the next lowest mantissa bit length comprises: determining a fixed point number format with the next lowest mantissa bit length for the fixed point number formats for each layer of the subset; adjusting the fixed point number formats used by the instantiation of the DNN for each layer in the subset to the determined fixed point number formats with the next lowest mantissa bit length; determining an output of the adjusted instantiation of the DNN in response to test input data; determining an output error of the adjusted instantiation of the DNN; in response to determining that the output error exceeds the error threshold, reversing the adjustment of the instantiation of the DNN; and in response to determining that the output error does not exceed the error threshold, repeating the determining the fixed point number formats, adjusting the fixed point number formats, determining the output, and determining the output error. 3. The method of claim 1 , further comprising identifying a sequence of the plurality of layers wherein each layer is preceded in the sequence by any layer of the plurality of layers on which it depends, and wherein each of the subsets comprises a contiguous set of layers in the sequence. 4. The method of claim 1 , wherein the plurality of layers from which the disjoint subsets are formed do not include a first layer of the DNN and/or a last layer of the DNN. 5. The method of claim 1 , wherein a first adjustment of the fixed point number formats is made for all of the subsets before a second adjustment of the fixed point number formats is made for any of the subsets. 6. The method of claim 1 , wherein all iterative adjustments of the fixed point number formats for the layers in a first subset are completed before a first adjustment of the fixed point number formats for the layers in a second subset. 7. The method of claim 1 , wherein there is an initial fixed point number format for input data values of at least one layer of the plurality of layers and there is an initial fixed point number format for weights of at least one layer of the plurality of layers, and iteratively adjusting the fixed point number formats for the layers in the subset to fixed point number formats with the next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds the error threshold comprises: iteratively adjusting the fixed point number formats for the input data values for the layers in the subset to fixed point number formats with the next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds the error threshold; and subsequent to iteratively adjusting the fixed point number formats for the input data values, iteratively adjusting the fixed point number formats for the weights for the layers in the subset to fixed point number formats with the next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds the error threshold. 8. The method of claim 7 , wherein there is an initial fixed point number format for output data values of at least one layer of the plurality of layers, and iteratively adjusting the fixed point number formats for the layers in the subset to a fixed point number format with the next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds the error threshold further comprises: subsequent to iteratively adjusting the fixed point number formats for the input data values, iteratively adjusting the fixed point number formats for the output data values for the layers in the subset to fixed point number formats with the next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds the error threshold. 9. The method of claim 1 , wherein the DNN is a classification network and the output error is a top-1 classification accuracy or a top-5 classification accuracy of an output of the instantiation of the DNN in response to test input data. 10. The method of claim 1 , wherein the DNN is a classification network and the output error is a sum of absolute differences between logits of an output of the instantiation of the DNN in response to test input data and logits of a baseline output or is a sum of absolute differences between SoftMax normalised logits of an output of the instantiation of the DNN in response to test input data and SoftMax normalised logits of a baseline output. 11. The method of claim 10 , further comprising generating the baseline output by applying the test input data to an instantiation of the DNN configured to represent values input to and output from each layer of the DNN using a floating point number format. 12. The method of claim 1 , wherein the lower threshold number of layers is one. 13. The method of claim 1 , wherein the lower threshold number of layers is greater than one. 14. The method of claim 1 , wherein forming a higher number of disjoint subsets from the plurality of layers comprises: dividing the layers in each subset into a plurality of disjoint subsets and/or forming twice as many disjoint subsets from the plurality of layers. 15. The method of claim 1 , wherein the values input to and/or output from the plurality of layers comprise one or more of input data values, output data values, weights and biases. 16. The method of claim 1 , wherein the DNN is a convolutional neural network. 17. The method of claim 1 , further comprising configuring a hardware implementation of the DNN to represent values of at least one of the plurality of layers using a fixed point number format output for the at least one layer. 18. A non-transitory computer readable storage medium having stored thereon computer readable instructions that, when executed at a computer system, cause the computer system to perform the method as set forth in claim 1 . 19. A hardware implementation of a Deep Neural Network (DNN) comprising: hardware logic configured to: receive input data values, a set of weights or a set of biases for a layer of the DNN; receive information indicating a fixed po
Architecture, e.g. interconnection topology · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Significance control · CPC title
Activation functions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.