Memory compression in a deep neural network
US-2019164538-A1 · May 30, 2019 · US
US11775831B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11775831-B2 |
| Application number | US-202318154727-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 13, 2023 |
| Priority date | Sep 26, 2016 |
| Publication date | Oct 3, 2023 |
| Grant date | Oct 3, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are described for efficiently reducing the amount of total computation in convolutional neural networks (CNNs) without affecting the output result or classification accuracy. Computation redundancy in CNNs is reduced by exploiting the computing nature of the convolution and subsequent pooling (e.g., sub-sampling) operations. In some implementations, the input features may be divided into a group of precision values and the operation(s) may be cascaded. A maximum may be identified (e.g., by 90% probability) using a small number of bits in the input features, and the full-precision convolution may then be performed on the maximum input. Accordingly, the total number of bits used to perform the convolution is reduced without affecting the output features or the final classification accuracy.
Opening claim text (preview).
What is claimed is: 1. One or more non-transitory computer-readable storage media storing instructions which, when executed by at least one processor, cause the at least one processor to perform operations comprising: in one or more layers of a convolutional neural network (CNN), performing a first iteration that includes computing a value based on a first set of most significant bits (MSBs) for each of a plurality of data sets; examining a first set of values computed for the plurality of data sets in the first iteration to determine whether a maximum value is present among the first set of values; responsive to identifying the maximum value, performing a full precision computation of the value for a data set, of the plurality of data sets, that exhibited the maximum value; and propagating the full precision computation of the value to a subsequent layer of the CNN. 2. The one or more non-transitory computer-readable storage media of claim 1 , further comprising: responsive to determining that the first set of values are the same, performing, by the at least one processor, a second iteration that includes computing the value based on a second set of MSBs for each of the plurality of data sets, the second set of MSBs being larger than the first set of MSBs. 3. The one or more non-transitory computer-readable storage media of claim 2 , further storing instructions, which, when executed by the at least one processor, cause the at least one processor to perform operations comprising: examining a second set of values computed for the plurality of data sets in the second iteration to determine whether the maximum value is present among the second set of values; and responsive to identifying the maximum value among the second set of values, performing, by the at least one processor, the full precision computation of the value for a data set, of the plurality of data sets, that exhibited the maximum value in the second iteration. 4. The one or more non-transitory computer-readable storage media of claim 2 , wherein the computing in each of the first iteration and the second iteration employs a convolution and a pooling. 5. The one or more non-transitory computer-readable storage media of claim 4 , wherein the convolution is a N×N convolution, where N is any integer. 6. The one or more non-transitory computer-readable storage media of claim 4 , wherein the pooling is a N×N pooling, where N is any integer. 7. The one or more non-transitory computer-readable storage media of claim 4 , wherein the convolution is a 3×3 convolution, and the pooling is a 2×2 pooling. 8. The one or more non-transitory computer-readable storage media of claim 2 , wherein at least one of the first iteration and the second iteration is performed with a precision less than that of the full precision computation. 9. The one or more non-transitory computer-readable storage media of claim 8 , wherein the precision is 8-bit precision. 10. The one or more non-transitory computer-readable storage media of claim 1 , wherein the CNN is employed to analyze an image. 11. The one or more non-transitory computer-readable storage media of claim 1 , wherein: the first iteration computes a value that approximates the full precision computation of the value; and the full precision computation is performed on the data set that includes less data than the plurality of data sets.
Convolutional networks [CNN, ConvNet] · CPC title
Learning methods · CPC title
using non-contact-making devices, e.g. tube, solid state device; using unspecified devices · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.