Memory compression in a deep neural network
US-2019164538-A1 · May 30, 2019 · US
US11556779B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11556779-B2 |
| Application number | US-201716335775-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 21, 2017 |
| Priority date | Sep 26, 2016 |
| Publication date | Jan 17, 2023 |
| Grant date | Jan 17, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are described for efficiently reducing the amount of total computation in convolutional neural networks (CNNs) without affecting the output result or classification accuracy. Computation redundancy in CNNs is reduced by exploiting the computing nature of the convolution and subsequent pooling (e.g., sub-sampling) operations. In some implementations, the input features may be divided into a group of precision values and the operation(s) may be cascaded. A maximum may be identified (e.g., by 90% probability) using a small number of bits in the input features, and the full-precision convolution may then be performed on the maximum input. Accordingly, the total number of bits used to perform the convolution is reduced without affecting the output features or the final classification accuracy.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method performed by at least one processor, the method comprising: in one or more layers of a convolutional neural network (CNN) executed by the at least one processor, performing a first iteration that includes computing a value based on a first set of most significant bits (MSBs) for each of a plurality of data sets; examining, by the at least one processor, a first set of values computed for the plurality of data sets in the first iteration to determine whether a maximum value is present among the first set of values; responsive to identifying the maximum value, performing, by the at least one processor, a full precision computation of the value for a data set, of the plurality of data sets, that exhibited the maximum value; and propagating, by the at least one processor, the full precision computation of the value to a subsequent layer of the CNN. 2. The method of claim 1 , further comprising: responsive to determining that the first set of values are the same, performing, by the at least one processor, a second iteration that includes computing the value based on a second set of MSBs for each of the plurality of data sets, the second set of MSBs being larger than the first set of MSBs. 3. The method of claim 2 , further comprising: examining, by the at least one processor, a second set of values computed for the plurality of data sets in the second iteration to determine whether the maximum value is present among the second set of values; and responsive to identifying the maximum value among the second set of values, performing, by the at least one processor, the full precision computation of the value for a data set, of the plurality of data sets, that exhibited the maximum value in the second iteration. 4. The method of claim 2 , wherein the computing in each of the first iteration and the second iteration employs a convolution and a pooling. 5. The method of claim 4 , wherein the convolution is a N×N convolution, where N is any integer. 6. The method of claim 4 , wherein the pooling is a N×N pooling, where N is any integer. 7. The method of claim 4 , wherein the convolution is a 3×3 convolution, and the pooling is a 2×2 pooling. 8. The method of claim 2 , wherein at least one of the first iteration and the second iteration is performed with a precision less than that of the full precision computation. 9. The method of claim 8 , wherein the precision is 8-bit precision. 10. The method of claim 1 , wherein the CNN is employed to analyze an image. 11. The method of claim 1 , wherein: the first iteration computes a value that approximates the full precision computation of the value; and the full precision computation is performed on the data set the includes less data than the plurality of data sets. 12. A system comprising: at least one processor; and memory communicatively coupled to the at least one processor, the memory storing instructions which, when executed by the at least one processor, cause the at least one processor to perform operations comprising: in one or more layers of a convolutional neural network (CNN), performing a first iteration that includes computing a value based on a first set of most significant bits (MSBs) for each of a plurality of data sets; examining a first set of values computed for the plurality of data sets in the first iteration to determine whether a maximum value is present among the first set of values; responsive to identifying the maximum value, performing a full precision computation of the value for a data set, of the plurality of data sets, that exhibited the maximum value; and propagating the full precision computation of the value to a subsequent layer of the CNN. 13. The system of claim 12 , the operations further comprising: responsive to determining that the first set of values are the same, performing, by the at least one processor, a second iteration that includes computing the value based on a second set of MSBs for each of the plurality of data sets, the second set of MSBs being larger than the first set of MSBs. 14. The system of claim 13 , the operations further comprising: examining, by the at least one processor, a second set of values computed for the plurality of data sets in the second iteration to determine whether the maximum value is present among the second set of values; and responsive to identifying the maximum value among the second set of values, performing, by the at least one processor, the full precision computation of the value for a data set, of the plurality of data sets, that exhibited the maximum value in the second iteration. 15. The system of claim 13 , wherein the computing in each of the first iteration and the second iteration employs a convolution and a pooling. 16. The system of claim 15 , wherein the convolution is a N×N convolution, where N is any integer. 17. The system of claim 15 , wherein the pooling is a N×N pooling, where N is any integer. 18. The system of claim 15 , wherein at least one of the first iteration and the second iteration is performed with a precision less than that of the full precision computation. 19. The system of claim 12 , wherein: the first iteration computes a value that approximates the full precision computation of the value; and the full precision computation is performed on the data set the includes less data than the plurality of data sets.
Combinations of networks · CPC title
using non-contact-making devices, e.g. tube, solid state device; using unspecified devices · CPC title
Physics · mapped topic
Learning methods · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.