What technology area does this patent fall under?

Primary CPC classification G06F3/0625. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Using per memory bank load caches for reducing power use in a system on a chip

US12093539B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12093539-B2
Application number	US-202218069722-A
Country	US
Kind code	B2
Filing date	Dec 21, 2022
Priority date	Aug 2, 2021
Publication date	Sep 17, 2024
Grant date	Sep 17, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: one or more memory banks; and one or more data caches associated with the one or more memory banks; wherein processing circuitry is to: determine that one or more memory read addresses correspond to one or more prior memory reads stored in the one or more data caches based at least on determining one or more memory read patterns corresponding to relative memory locations between the one or more memory read addresses and one or more data cache memory addresses, the one or more data cache memory addresses corresponding to the one or more prior memory reads and being different from the one or more memory read addresses; and based at least on the determination, read one or more data portions corresponding to the one or more memory read addresses from the one or more data caches. 2. The system of claim 1 , wherein the determining the one or more memory read patterns includes determining the one or more memory read addresses at least partially overlap with the one or more data cache memory addresses corresponding to the one or more prior memory reads. 3. The system of claim 1 , wherein the determining the one or more memory read patterns is based at least on an amount of overlap between the one or more memory read addresses and the one or more data cache memory addresses. 4. The system of claim 1 , wherein, based at least on the determination, one or more second data portions corresponding to the one or more memory read addresses are read from the one or more memory banks. 5. The system of claim 1 , wherein, for execution of at least one algorithm, the one or more data caches are disabled. 6. The system of claim 1 , wherein the one or more data caches correspond to one or more first superbanks, and one or more other data caches corresponding to one or more second superbanks are disabled. 7. The system of claim 1 , wherein the one or more memory read patterns include a sliding window read pattern. 8. The system of claim 1 , wherein the one or more data caches store one or more first row addresses and one or more first column addresses associated with the one or more prior memory reads, and the determining includes comparing the one or more first row addresses and the one or more first column addresses to one or more second row addresses and one or more second column addresses associated with the one or more memory read addresses. 9. The system of claim 1 , wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing deep learning operations; a system on chip (SoC); a system including a programmable vision accelerator (PVA); a system including a vison processing unit; a system implemented using an edge device; a system implemented using a robot; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. 10. A method comprising: sending first data indicating one or more memory read addresses of one or more memory banks, the one or more memory banks associated with one or more data caches; and based at least on the one or more memory read addresses corresponding at least partly to data stored in the one or more data caches as a result of one or more prior memory reads, receiving second data indicating one or more data portions read from the one or more data caches based at least on the one or more data portions corresponding to the one or more memory read addresses and one or more memory read patterns being determined that corresponds to relative memory locations between the one or more memory read addresses and one or more data cache memory addresses, the one or more data cache memory addresses corresponding to the one or more prior memory reads and being different from the one or more memory read addresses. 11. The method of claim 10 , wherein the first data represents one or more instructions of an instruction set architecture. 12. The method of claim 10 , wherein the first data corresponds to a sliding window memory access pattern on the one or more memory banks. 13. The method of claim 10 , wherein the first data indicates at least one of: one or more offsets used to generate the one or more memory read addresses, one or more line addresses used to generate the one or more memory read addresses, or one or more increments used to generate the one or more memory read addresses. 14. The method of claim 10 , further comprising enabling, using application code, the one or more data caches, wherein the receiving the second data is based at least on the enabling of the one or more data caches. 15. The method of claim 10 , further comprising: analyzing one or more memory access patterns associated with the one or more memory read addresses; and based at least on the one or more memory access patterns, enabling the one or more data caches, wherein the receiving the second data is based at least on the enabling of the one or more data caches. 16. The method of claim 10 , wherein the first data corresponds to at least one of a computer vision algorithm, a spatial filtering algorithm, a deep learning algorithm, or a convolutional operation. 17. A processor comprising: one or more circuits to read one or more data portions corresponding to one or more memory read addresses from one or more data caches based at least on the one or more memory read addresses corresponding to data stored in the one or more data caches during one or more prior memory reads and based at least on determining one or more memory read patterns corresponding to relative memory locations between the one or more memory read addresses and one or more data cache memory addresses, the one or more data cache memory addresses corresponding to the one or more prior memory reads and being different from the one or more memory read addresses. 18. The processor of claim 17 , wherein the one or more circuits are further to initialize one or more enable bits of the one or more data caches based at least on the data corresponding to one or more prior memory reads being stored in the one or more data caches. 19. The processor of claim 17 , wherein the determining the one or more memory read patterns includes determining that the one or more memory read addresses at least partially overlap with the one or more data cache memory addresses corresponding to the one or more prior memory reads. 20. The processor of claim 17 , wherein the processor is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing deep learning operations; a system on chip (SoC); a system including a programmable vision accelerator (PVA); a system including a vison processing unit; a system implemented using an edge device; a system implemented using a robot; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.

Assignees

Nvidia Corp

Inventors

Classifications

G06F12/0802
Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches · CPC title
G06F3/0655
Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices · CPC title
G06F3/0679
Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP] · CPC title
G06F12/0215
with look ahead addressing means · CPC title
G06F12/0846
Cache with multiple tag or data arrays being simultaneously accessible · CPC title

Patent family

Related publications grouped by family.

View patent family 85177238

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12093539B2 cover?: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardwa…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06F3/0625. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).