Super-tiling in neural network processing to enable analytics at lower memory speed

US11748599B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11748599-B2
Application numberUS-202016797871-A
CountryUS
Kind codeB2
Filing dateFeb 21, 2020
Priority dateFeb 21, 2019
Publication dateSep 5, 2023
Grant dateSep 5, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques including receiving a first set of values for processing by a machine learning (ML) network, storing a first portion of the first set of values in an on-chip memory, processing the first portion of the first set of values in a first layer of the ML network to generate a second portion of a second set of values, overwriting the stored first portion with the generated second portion, processing the second portion in a second layer of the ML network to generate a third portion of a third set of values, storing the third portion, repeating the steps of storing the first portion, processing the first portion, overwriting the stored first portion, processing the second portion, and storing the third portion for a fourth portion of the first set of values until all portions of the first set of values are processed to generate the third set of values.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving a first set of values for processing by a machine learning network having multiple layers; storing a first portion of the first set of values in an on-chip memory, wherein the first portion is less than all values of the first set of values, the first set of values including multiple portions; processing the first portion of the first set of values in a first layer of the machine learning network to generate a portion of a second set of values; overwriting the stored first portion of the first set of values with the generated first portion of the second set of values, including storing a first part of the first portion of the second set of values; processing the first portion of the second set of values in a second layer of the machine learning network to generate a first portion of a third set of values; storing the first portion of the third set of values to a memory; repeating the steps of storing, processing, overwriting, processing, and storing for a next of the multiple portions of the first set of values until all of the multiple portions of the first set of values have been processed to generate all of multiple portions of the third set of values, wherein, in processing a second portion of the first set of values to generate a second portion of the second set of values, a first part of the second portion of the second set of values is not generated, the first part of the second portion of the second set of values being restored from the stored first part of the first portion of the second set of values; and outputting the third set of values. 2. The method of claim 1 , wherein the on-chip memory comprises at least one of a cache memory or static random access memory. 3. The method of claim 1 , wherein processing the first portion of the first set of values comprises: dividing the first portion into a set of tiles; and processing each tile of the set of tiles in the first layer of the machine learning network. 4. The method of claim 1 , wherein the first layer and second layer are grouped in a layer group. 5. The method of claim 1 , wherein the machine learning network comprises a convolutional neural network. 6. The method of claim 1 , wherein: the first part of the first portion of the second set of values is expected to be generated based on the second portion of the first set of values. 7. The method of claim 6 , further comprising: processing a first part of the first portion of the first set of values to generate a second part of the first portion of the second set of values, wherein the first part of the first portion of the first set of values is less than all values of the first portion of the first set of values and the second part of the first portion of the second set of values is less than all values of the first portion of the second set of values; and storing the second part of the first portion of the second set of values; wherein processing the first portion of the first set of values comprises: generating the first portion of the second set of values without generating the second part of the portion of the second set of values, and restoring the second part of the first portion of the second set of values from the stored second part of the first portion of the second set of values. 8. The method of claim 1 , wherein a size for the first portion is predetermined based on the size of the on-chip memory. 9. The method of claim 8 , wherein the predetermined size for the first portion is based on a separate analysis of the machine learning network. 10. The method of claim 1 , wherein the set of values comprises a tensor and wherein the first portion is generated by removing values from one dimension of the tensor. 11. A device, comprising: an on-chip memory; and one or more processors operatively coupled to the on-chip memory, wherein the one or more processors are configured to execute non-transitory instructions causing the one or more processors to: receive a first set of values for processing by a machine learning network having multiple layers; store a first portion of the first set of values in the cache memory, wherein the first portion is less than all values of the first set of values, the first set of values including multiple portions; process the first portion of the first set of values in a first layer of the machine learning network to generate a first portion of a second set of values; overwrite the stored first portion of the first set of values with the generated first portion of the second set of values, including storing a first part of the first portion of the second set of values; process the first portion of the second set of values in a second layer of the machine learning network to generate a first portion of a third set of values; store the first portion of the third set of values to a memory; repeat the operations of store, process, overwrite, process, and store until all of the multiple portions of the first set of values have been processed to generate all of multiple portions of the third set of values, wherein, in processing a second portion of the first set of values to generate a second portion of the second set of values, a first part of the second portion of the second set of values is not generated, the first part of the second portion of the second set of values being restored from the stored part of the first portion of the second set of values; and output the third set of values. 12. The device of claim 11 , wherein the on-chip memory comprises at least one of a cache memory or static random access memory. 13. The device of claim 11 , wherein the instructions stored thereon further cause the one or more processors to process the first portion of the first set of values by: dividing the first portion into a set of tiles; and processing each tile of the set of tiles in the first layer of the machine learning network. 14. The device of claim 11 , wherein the first layer and second layer are grouped in a layer group. 15. The device of claim 11 , wherein the machine learning network comprises a convolutional neural network. 16. The device of claim 11 , wherein the first part of the first portion of the second set of values is expected to be generated based on the second portion of the first set of values. 17. The device of claim 16 , wherein the instructions stored thereon further cause the one or more processors to: process a first part of the first portion of the first set of values to generate a second part of the first portion of the second set of values, wherein the first part of the first portion of the first set of values is less than all values of the first portion of the first set of values and the second part of the first portion of the second set of values is less than all values of the first portion of the second set of values; and store the second part of the first portion of the second set of values; wherein the instructions for processing the first portion of the first set of values further causes the one or more processors to: generate the first portion of the second set of values without generating the second part of the first portion of the second set of values, and restore the second part of the first portion of the second set of values from the stored second part of the first portion of the second set of values. 18. The device of claim 11 , wherein the set of values comprises a tensor and wherein the first portion is generated by removing values from one dimension of the tensor.

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • with main memory updating (G06F12/0806 takes precedence) · CPC title

  • Learning methods · CPC title

  • Memory management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11748599B2 cover?
Techniques including receiving a first set of values for processing by a machine learning (ML) network, storing a first portion of the first set of values in an on-chip memory, processing the first portion of the first set of values in a first layer of the ML network to generate a second portion of a second set of values, overwriting the stored first portion with the generated second portion, p…
Who is the assignee on this patent?
Texas Instruments Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).