Time-based memory allocation for neural network inference
US-11610102-B1 · Mar 21, 2023 · US
US12443447B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12443447-B2 |
| Application number | US-202418675294-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 28, 2024 |
| Priority date | Jul 19, 2021 |
| Publication date | Oct 14, 2025 |
| Grant date | Oct 14, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for executing machine learning (ML) models including receiving an indication to run an ML model on a processing core; receiving a static memory allocation for running the ML model on the processing core; determining that a layer of the ML model uses more memory than the static memory allocated; transmitting, to a shared memory, a memory request for blocks of the shared memory; receiving an allocation of the requested blocks; running the layer of the ML model using the static memory and the range of memory addresses; and outputting results of running the layer of the ML model.
Opening claim text (preview).
What is claimed is: 1. A device comprising: a first processing core configured to perform a first algorithm; a second processing core; a first memory coupled to the first processing core and configurable to be allocated to the first algorithm; and a second memory coupled to the first processing core and the second processing core, wherein: the second memory is configurable to be shared between the first processing core and the second processing core; and the first processing core is configured to perform the first algorithm using the first memory and the second memory. 2. The device of claim 1 , wherein: the second processing core is configured to perform a second algorithm; the device further comprises comprising a third memory coupled to the first processing core and configurable to be allocated to the second algorithm; and the second processing core is configured to perform the second algorithm using the third memory and the second memory. 3. The device of claim 1 , wherein the first processing core is configured to: determine whether execution of the first algorithm would exceed the first memory; and based on whether execution of the first algorithm would exceed the first memory, determine whether to request a portion of the second memory to be allocated to the first algorithm. 4. The device of claim 3 , wherein the first processing core is configured to receive an indication of a memory size associated with the first algorithm. 5. The device of claim 1 further comprising a direct memory access (DMA) circuit coupled between the first processing core and the first memory and between the first processing core and the second memory. 6. The device of claim 1 , wherein the first algorithm is a machine-learning algorithm. 7. The device of claim 1 , wherein the second memory is a cache memory. 8. The device of claim 1 , wherein: the device is configured to allocate a set of virtual memory addresses to the first memory and to a portion of the second memory; and the set of memory addresses is contiguous. 9. The device of claim 1 further comprising a semiconductor chip that includes the first processing core, the second processing core, the first memory, and the second memory. 10. The device of claim 1 , wherein the first processing core is configured to, based on completion of the first algorithm, provide a request to the second memory to release a portion of the second memory allocated to the first algorithm. 11. A device comprising: a set of processing cores that each include: a respective first memory; and a respective memory access circuit; an interconnect coupled to the set of processing core; and a second memory coupled to each core of the set of processing cores via the interconnect, wherein each core of the set of processing cores is configured to: execute a respective algorithm using the respective first memory; and determine whether to execute the respective algorithm further using a portion of the second memory. 12. The device of claim 11 , wherein the respective algorithm is a machine-learning algorithm. 13. The device of claim 11 , wherein the second memory is a cache memory. 14. The device of claim 11 further comprising a semiconductor chip that includes the set of processing cores. 15. The device of claim 14 , wherein the semiconductor chip includes the second memory. 16. A method comprising: determining that execution of an algorithm by a first processing core would exceed a capacity of a first memory; in response, requesting allocation of a portion of a second memory that is shared between the first processing core and a second processing core; and executing the algorithm by the first processing core using the first memory and the portion of the second memory. 17. The method of claim 16 further comprising receiving an indication of a memory size associated with the algorithm. 18. The method of claim 16 further comprising allocating a range of virtual memory addresses to the first memory and the portion of the second memory. 19. The method of claim 18 , wherein the virtual memory addresses of the range of virtual memory addresses are contiguous. 20. The method of claim 16 , wherein the algorithm is a machine-learning algorithm.
User address space allocation, e.g. contiguous or non contiguous base addressing · CPC title
Accessing, addressing or allocating within memory systems or architectures (digital input from, or digital output to record carriers, e.g. to disk storage units, G06F3/06) · CPC title
Memory management, e.g. access or allocation · CPC title
Logical partitioning of resources; Management or configuration of virtualized resources (specific details on emulation or internal functioning of virtual machines G06F9/455) · CPC title
Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.