Virtual space memory bandwidth reduction
US-2020183833-A1 · Jun 11, 2020 · US
US12165237B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12165237-B2 |
| Application number | US-202217946753-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 16, 2022 |
| Priority date | Sep 16, 2022 |
| Publication date | Dec 10, 2024 |
| Grant date | Dec 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor-implemented method for a memory storage format to accelerate machine learning (ML) on a computing device is described. The method includes receiving an image in a first layer storage format of a neural network. The method also includes assigning addresses to image pixels of each of three channels of the first layer storage format for accessing the image pixels in a blocked ML storage acceleration format. The method further includes storing the image pixels in the blocked ML storage acceleration format according to the assigned addresses of the image pixels. The method also includes accelerating inference video processing of the image according to the assigned addresses for the image pixels corresponding to the blocked ML storage acceleration format.
Opening claim text (preview).
What is claimed is: 1. A processor-implemented method for a memory storage format to accelerate machine learning (ML) on a computing device, comprising: receiving an image in a first layer storage format of a neural network; assigning addresses to image pixels of each of three channels of the first layer storage format for accessing the image pixels in a blocked ML storage acceleration format; splitting the image into a plurality of stripes according to an image width and an image height, in which a stripe height of each of the stripes is less than the image height; splitting each of the stripes into memory blocks having a memory block size according to a variable stride size to form the blocked ML storage acceleration format; storing the image pixels in the blocked ML storage acceleration format according to the assigned addresses of the image pixels; and accelerating inference video processing of the image according to the assigned addresses for the image pixels of the image corresponding to the blocked ML storage acceleration format. 2. The method of claim 1 , in which the assigning of addresses comprises: computing the assigned addresses to layout the image pixels within the memory blocks, in which each of the image pixels for each channel in the image are assigned to the memory blocks. 3. The method of claim 1 , in which storing the image comprises arranging image pixels in the memory blocks according to a spatial axis or a channel axis of the memory blocks. 4. The method of claim 3 , further comprising: storing the image pixels in the memory blocks in a spatial domain; and then storing the image pixels in a channel domain. 5. The method of claim 4 , further comprising: storing an initial group of the image pixels in an initial memory block of the memory blocks in a first channel of the initial memory block; storing a next group of the image pixels in the initial memory block of the memory blocks in a second channel of the initial memory block; storing a subsequent group of the image pixels in the initial memory block of the memory blocks in a third channel of the initial memory block; and repeating storing for each memory block of the memory blocks and for each consecutive group of the image pixels. 6. The method of claim 3 , further comprising: storing the image pixels in the memory blocks in a channel domain of the memory blocks; and then storing the image pixels in a spatial domain of the memory blocks. 7. The method of claim 6 , further comprising: storing a selected image pixel in an initial memory block of the memory blocks in a first channel of the initial memory block; storing a next image pixel in the initial memory block of the memory blocks for a second channel of the initial memory block; storing a subsequent image pixel in the initial memory block of the memory blocks in a third channel of the initial memory block; and repeating the storing of the selected image pixel, the storing of the next image pixel, and the storing of the subsequent image pixel for each memory block of the memory blocks and for each consecutive selected, next, and subsequent ones of the image pixels. 8. The method of claim 1 , in which accelerating inference video processing comprises simultaneously processing each of the three channels of the first layer storage format in the blocked ML storage acceleration format through matrix units of a neural signal processor (NSP) of the computing device. 9. The method of claim 1 , in which a precision of the first layer storage format of the neural network comprises 16-bit floating point (FP16) or quantized eight-bit integer (INT8). 10. A non-transitory computer-readable medium having program code recorded thereon for a memory storage format to accelerate machine learning (ML) on a computing device, the program code being executed by a processor and comprising: program code to receive an image in a first layer storage format of a neural network; program code to assign addresses to image pixels of each of three channels of the first layer storage format for accessing the image pixels in a blocked ML storage acceleration format; program code to split the image into a plurality of stripes according to an image width and an image height, in which a stripe height of each of the stripes is less than the image height; program code to split each of the stripes into memory blocks having a memory block size according to a variable stride size to form the blocked ML storage acceleration format; program code to store the image pixels in the blocked ML storage acceleration format according to the assigned addresses of the image pixels; and program code to accelerate inference video processing of the image according to the assigned addresses for the image pixels of the image corresponding to the blocked ML storage acceleration format. 11. The non-transitory computer-readable medium of claim 10 , in which the program code to assign addresses comprises: program code to compute the assigned addresses to layout the image pixels within the memory blocks, in which each of the image pixels for each channel in the image are assigned to the memory blocks. 12. The non-transitory computer-readable medium of claim 10 , in which the program code to store the image pixels comprises program code to arrange the image pixels in the memory blocks according to a spatial axis or a channel axis of the memory blocks. 13. The non-transitory computer-readable medium of claim 12 , further comprising: program code to store the image pixels in the memory blocks in a spatial domain; and then program code to store the image pixels in a channel domain. 14. The non-transitory computer-readable medium of claim 13 , further comprising: program code to store an initial group of the image pixels in an initial memory block of the memory blocks in a first channel of the initial memory block; program code to store a next group of the image pixels in the initial memory block of the memory blocks in a second channel of the initial memory block; program code to store a subsequent group of the image pixels in the initial memory block of the memory blocks in a third channel of the initial memory block; and program code to repeat program code to store for each memory block of the memory blocks and for each consecutive group of the image pixels. 15. The non-transitory computer-readable medium of claim 12 , further comprising: program code to store the image pixels in the memory blocks in a channel domain of the memory blocks; and then program code to store the image pixels in a spatial domain of the memory blocks. 16. The non-transitory computer-readable medium of claim 15 , further comprising: program code to store a selected image pixel in an initial memory block of the memory blocks in a first channel of the initial memory block; program code to store a next image pixel in the initial memory block of the memory blocks for a second channel of the initial memory block; program code to store a subsequent image pixel in the initial memory block of the memory blocks in a third channel of the initial memory block; and program code to repeat the program code to store the selected image pixel, the program code to store the next image pixel, and the program code to store the subsequent image pixel for each memory block of the memory blocks and for each consecutive selected, next, and subsequent ones of the image pixels. 17. The non-transitory computer-readable medium of claim 10 , in which the program code to accelerate inference video processing compr
Allocation control and policies · CPC title
in block erasable memory, e.g. flash memory · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
using electronic means · CPC title
Artificial neural networks [ANN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.