System, Method, and Accelerator to Process Convolutional Neural Network Layers
US-2019220734-A1 · Jul 18, 2019 · US
US11748599B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11748599-B2 |
| Application number | US-202016797871-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 21, 2020 |
| Priority date | Feb 21, 2019 |
| Publication date | Sep 5, 2023 |
| Grant date | Sep 5, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques including receiving a first set of values for processing by a machine learning (ML) network, storing a first portion of the first set of values in an on-chip memory, processing the first portion of the first set of values in a first layer of the ML network to generate a second portion of a second set of values, overwriting the stored first portion with the generated second portion, processing the second portion in a second layer of the ML network to generate a third portion of a third set of values, storing the third portion, repeating the steps of storing the first portion, processing the first portion, overwriting the stored first portion, processing the second portion, and storing the third portion for a fourth portion of the first set of values until all portions of the first set of values are processed to generate the third set of values.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: receiving a first set of values for processing by a machine learning network having multiple layers; storing a first portion of the first set of values in an on-chip memory, wherein the first portion is less than all values of the first set of values, the first set of values including multiple portions; processing the first portion of the first set of values in a first layer of the machine learning network to generate a portion of a second set of values; overwriting the stored first portion of the first set of values with the generated first portion of the second set of values, including storing a first part of the first portion of the second set of values; processing the first portion of the second set of values in a second layer of the machine learning network to generate a first portion of a third set of values; storing the first portion of the third set of values to a memory; repeating the steps of storing, processing, overwriting, processing, and storing for a next of the multiple portions of the first set of values until all of the multiple portions of the first set of values have been processed to generate all of multiple portions of the third set of values, wherein, in processing a second portion of the first set of values to generate a second portion of the second set of values, a first part of the second portion of the second set of values is not generated, the first part of the second portion of the second set of values being restored from the stored first part of the first portion of the second set of values; and outputting the third set of values. 2. The method of claim 1 , wherein the on-chip memory comprises at least one of a cache memory or static random access memory. 3. The method of claim 1 , wherein processing the first portion of the first set of values comprises: dividing the first portion into a set of tiles; and processing each tile of the set of tiles in the first layer of the machine learning network. 4. The method of claim 1 , wherein the first layer and second layer are grouped in a layer group. 5. The method of claim 1 , wherein the machine learning network comprises a convolutional neural network. 6. The method of claim 1 , wherein: the first part of the first portion of the second set of values is expected to be generated based on the second portion of the first set of values. 7. The method of claim 6 , further comprising: processing a first part of the first portion of the first set of values to generate a second part of the first portion of the second set of values, wherein the first part of the first portion of the first set of values is less than all values of the first portion of the first set of values and the second part of the first portion of the second set of values is less than all values of the first portion of the second set of values; and storing the second part of the first portion of the second set of values; wherein processing the first portion of the first set of values comprises: generating the first portion of the second set of values without generating the second part of the portion of the second set of values, and restoring the second part of the first portion of the second set of values from the stored second part of the first portion of the second set of values. 8. The method of claim 1 , wherein a size for the first portion is predetermined based on the size of the on-chip memory. 9. The method of claim 8 , wherein the predetermined size for the first portion is based on a separate analysis of the machine learning network. 10. The method of claim 1 , wherein the set of values comprises a tensor and wherein the first portion is generated by removing values from one dimension of the tensor. 11. A device, comprising: an on-chip memory; and one or more processors operatively coupled to the on-chip memory, wherein the one or more processors are configured to execute non-transitory instructions causing the one or more processors to: receive a first set of values for processing by a machine learning network having multiple layers; store a first portion of the first set of values in the cache memory, wherein the first portion is less than all values of the first set of values, the first set of values including multiple portions; process the first portion of the first set of values in a first layer of the machine learning network to generate a first portion of a second set of values; overwrite the stored first portion of the first set of values with the generated first portion of the second set of values, including storing a first part of the first portion of the second set of values; process the first portion of the second set of values in a second layer of the machine learning network to generate a first portion of a third set of values; store the first portion of the third set of values to a memory; repeat the operations of store, process, overwrite, process, and store until all of the multiple portions of the first set of values have been processed to generate all of multiple portions of the third set of values, wherein, in processing a second portion of the first set of values to generate a second portion of the second set of values, a first part of the second portion of the second set of values is not generated, the first part of the second portion of the second set of values being restored from the stored part of the first portion of the second set of values; and output the third set of values. 12. The device of claim 11 , wherein the on-chip memory comprises at least one of a cache memory or static random access memory. 13. The device of claim 11 , wherein the instructions stored thereon further cause the one or more processors to process the first portion of the first set of values by: dividing the first portion into a set of tiles; and processing each tile of the set of tiles in the first layer of the machine learning network. 14. The device of claim 11 , wherein the first layer and second layer are grouped in a layer group. 15. The device of claim 11 , wherein the machine learning network comprises a convolutional neural network. 16. The device of claim 11 , wherein the first part of the first portion of the second set of values is expected to be generated based on the second portion of the first set of values. 17. The device of claim 16 , wherein the instructions stored thereon further cause the one or more processors to: process a first part of the first portion of the first set of values to generate a second part of the first portion of the second set of values, wherein the first part of the first portion of the first set of values is less than all values of the first portion of the first set of values and the second part of the first portion of the second set of values is less than all values of the first portion of the second set of values; and store the second part of the first portion of the second set of values; wherein the instructions for processing the first portion of the first set of values further causes the one or more processors to: generate the first portion of the second set of values without generating the second part of the first portion of the second set of values, and restore the second part of the first portion of the second set of values from the stored second part of the first portion of the second set of values. 18. The device of claim 11 , wherein the set of values comprises a tensor and wherein the first portion is generated by removing values from one dimension of the tensor.
Convolutional networks [CNN, ConvNet] · CPC title
using electronic means · CPC title
with main memory updating (G06F12/0806 takes precedence) · CPC title
Learning methods · CPC title
Memory management · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.