What technology area does this patent fall under?

Primary CPC classification G06F3/0625. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Using per memory bank load caches for reducing power use in a system on a chip

US11593001B1 · US · B1

Patent metadata
Field	Value
Publication number	US-11593001-B1
Application number	US-202117391861-A
Country	US
Kind code	B1
Filing date	Aug 2, 2021
Priority date	Aug 2, 2021
Publication date	Feb 28, 2023
Grant date	Feb 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A VPU and associated components include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators are used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer is included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU executes a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a plurality of memory banks; a plurality of load caches, each load cache associated with a single memory bank of the plurality of memory banks; wherein each pair of memory bank and load cache comprises processing circuitry to: receive data representative of a memory read address; compare the memory read address to a load cache memory address corresponding to a prior memory read stored in the load cache; based at least in part on the comparison, determine that the memory read address at least partially overlaps with the load cache memory address; and read at least a portion of data corresponding to the memory read address from the load cache. 2. The processor of claim 1 , further comprising processing circuitry to initialize an enable bit of the load cache based at least in part on the prior memory read being stored in the load cache. 3. The processor of claim 1 , wherein the load cache memory address is stored in a TAG memory of the load cache. 4. The processor of claim 1 , wherein the portion of the data is read from the load cache, and the remaining portion of the data is read from the memory bank. 5. The processor of claim 1 , wherein, during a first read operation, an enable bit of the load cache is set such that the load cache a memory address corresponding to the first read operation is not accessed. 6. The processor of claim 1 , wherein the data corresponding to the memory read address is used for processing in at least one of a computer vision algorithm, a spatial filtering algorithm, a deep learning algorithm, or a convolutional operation. 7. The processor of claim 1 , wherein, for execution of at least one algorithm, each of the plurality of load caches are disabled. 8. The processor of claim 1 , wherein the plurality of load caches correspond to a first superbank, and another plurality of load caches corresponding to second superbank are disabled. 9. The processor of claim 1 , wherein each load cache of the plurality of load caches stores data from two or more prior memory reads. 10. The processor of claim 1 , wherein the load cache stores a row address and a column address associated with the prior memory read, and the comparing includes comparing the row address and the column address to a row address and a column address associated with the memory read address. 11. The processor of claim 1 , wherein the processor is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing deep learning operations; a system on chip (SoC); a system including a programmable vision accelerator (PVA); a system including a vison processing unit; a system implemented using an edge device; a system implemented using a robot; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. 12. A system comprising: a memory; and a processor comprising: a plurality of memory banks; a plurality of load caches, each load cache associated with a single memory bank of the plurality of memory banks; wherein each pair of memory bank and load cache comprises processing circuitry to: receive data representative of a memory read address in the memory; compare the memory read address to a load cache memory address corresponding to a prior memory read stored in the load cache; based at least in part on the comparison, determine that the memory read address at least partially overlaps with the load cache memory address; and read at least a portion of data corresponding to the memory read address from the load cache. 13. The system of claim 12 , wherein the processing circuitry is further to initialize an enable bit of the load cache based at least in part on the prior memory read being stored in the load cache. 14. The system of claim 12 , wherein the load cache memory address is stored in a TAG memory of the load cache. 15. The system of claim 12 , wherein the portion of the data is read from the load cache, and the remaining portion of the data is read from the memory bank. 16. The system of claim 12 , wherein, during a first read operation, an enable bit of the load cache is set such that the load cache a memory address corresponding to the first read operation is not accessed. 17. The system of claim 12 , wherein, for execution of at least one algorithm, each of the plurality of load caches are disabled. 18. The system of claim 12 , wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing deep learning operations; a system on chip (SoC); a system including a programmable vision accelerator (PVA); a system including a vison processing unit; a system implemented using an edge device; a system implemented using a robot; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. 19. A method comprising: receiving data representative of a memory read address of a memory bank; comparing the memory read address to a load cache memory address corresponding to a prior memory read of the memory bank stored in a load cache corresponding to the memory bank; based at least in part on the comparison, determine that the memory read address at least partially overlaps with the load cache memory address; and reading at least a portion of data corresponding to the memory read address from the load cache. 20. The method of claim 19 , wherein the portion of the data is read from the load cache, and the remaining portion of the data is read from the memory bank.

Assignees

Nvidia Corp

Inventors

Classifications

G06F12/0802
Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches · CPC title
G06F3/0679
Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP] · CPC title
G06F3/0655
Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices · CPC title
G06F3/0625Primary
Power saving in storage systems · CPC title
G06F12/0864Primary
using pseudo-associative means, e.g. set-associative or hashing · CPC title

Patent family

Related publications grouped by family.

View patent family 85177238

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11593001B1 cover?: A VPU and associated components include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per me…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06F3/0625. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).