Method and apparatus for efficient access to multidimensional data structures and/or other large data blocks

US12499052B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12499052-B2
Application numberUS-202217691422-A
CountryUS
Kind codeB2
Filing dateMar 10, 2022
Priority dateMar 10, 2022
Publication dateDec 16, 2025
Grant dateDec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

First claim

Opening claim text (preview).

What is claimed is: 1 . A parallel processor comprising: an interface to an external memory; a plurality of multicore processors, each multicore processor having a respectively different shared memory; and a plurality of memory access hardware circuits, each memory access hardware circuit being coupled to a multicore processor of the plurality of multicore processors and being configured to: receive, from the coupled multicore processor, a memory access request for a block of data; and in response to the memory access request, access the block of data at a source memory location in the respectively different shared memory of the coupled multicore processor or the external memory, and asynchronously transfer the block of data from the source memory location to a destination memory location in the respectively different shared memory of the coupled multicore processor or the external memory; wherein the asynchronous transfer is from a location in the external memory to another location in the external memory, or from a location in the respectively different shared memory of the coupled multicore processor. 2 . A method performed in a parallel processing unit comprising a plurality of multiprocessors, the method comprising: receiving by a memory access hardware circuit coupled to a multicore processor of the plurality of multicore processors, from the coupled multicore processor, a memory access request for a block of data, wherein each multicore processor includes a respectively different shared memory, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to one of the multicore processors; and in response to the memory access request, accessing the block of data at a source memory location in the respectively different shared memory of the coupled multicore processor or the external memory, and asynchronously transferring by the memory access hardware circuit, the block of data from the source memory location to a destination memory location in the respectively different shared memory of the coupled multicore processor and an external memory, wherein the asynchronous transfer is from a location in the external memory to another location in the external memory, or from a location in the respectively different shared memory of the coupled multicore processor to another location in the respectively different shared memory of the coupled multicore processor. 3 . A method performed in a parallel processing unit comprising a plurality of multiprocessors, the method comprising: receiving by a memory access hardware circuit coupled to a multicore processor of the plurality of multicore processors, from the coupled multicore processor, a memory access request for a block of data, wherein each multicore processor includes a respectively different non-cached shared memory, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to one of the multicore processors; and in response to the memory access request, accessing the block of data at a source memory location in the respectively different non-cached shared memory of the coupled multicore processor or the external memory, and asynchronously transferring by the memory access hardware circuit, the block of data from the source memory location to a destination memory location in the respectively different non-cached shared memory of the coupled multicore processor and an external memory, wherein the asynchronous transfer is from a location in the external memory to another location in the external memory, or the asynchronous transfer is from a location in the respectively different shared memory of the coupled multicore processor to another location in the respectively different shared memory of the coupled multicore processor. 4 . A memory access hardware circuit comprising: an interface to an external memory; a memory input/output interface to receive memory access requests from a multicore processor; at least one memory interface to a respectively different shared memory at each of one or more other multicore processors and the multicore processor, wherein the respectively different shared memory is a non-cached shared memory; and a processing pipeline configured to: receive, from the multicore processor, a memory access request for a block of data; and in response to the memory access request, access the block of data at a source memory location in the respectively different shared memory of the coupled multicore processor or the external memory, and asynchronously transfer the block of data from the source memory location to a destination memory location in the respectively different shared memory of the coupled multicore processor and the external memory, wherein the asynchronous transfer is from a location in the external memory to another location in the external memory, or the asynchronous transfer is from a location in the respectively different shared memory of the coupled multicore processor to another location in the respectively different shared memory of the coupled multicore processor. 5 . A parallel processor comprising: an interface to an external memory; a plurality of multicore processors, each multicore processor having a respectively different shared memory, wherein the respectively different shared memory is a non-cached shared memory; and a plurality of memory access hardware circuits, each memory access hardware circuit being coupled to a multicore processor of the plurality of multicore processors and being configured to: receive, from the coupled multicore processor, a memory access request for a block of data; and in response to the memory access request, access the block of data at a source memory location in the respectively different shared memory of the coupled multicore processor or the external memory, and asynchronously transfer the block of data from the source memory location to a destination memory location in the respectively different shared memory of the coupled multicore processor or the external memory, wherein the asynchronous transfer is from a location in the external memory to another location in the external memory, or the asynchronous transfer is from a location in the respectively different shared memory of the coupled multicore processor to another location in the respectively different shared memory of the coupled multicore processor. 6 . The parallel processor according to claim 5 , wherein the asynchronous transfer is from a location in the external memory to another location in the external memory. 7 . The parallel processor according to claim 5 , wherein the asynchronous transfer is from a location in the respectively different shared memory of the coupled multicore processor to another location in the respectively different shared memory of the coupled multicore processor. 8 . The parallel processor according to claim 5 , wherein the memory access hardware circuit coupled to the multicore processor is configured to read and write to the respectively different shared memory of the multicore processor coupled to the memory access hardware circuit and to the external memory. 9 . The parallel processor according to claim 5 , wherein the memory access hardware circuit coupled to the multicore processor is configured to copy the data block from the external memory to the respectively different shared memory of the coupled multicore processor. 10 . The parallel processor according to claim 5 , wherein the asynchronous transfer is from a location in the respectively different shared memory of the coupled multicore processor to another location in the respectively different shared memor

Assignees

Inventors

Classifications

  • Addressing or allocation; Relocation (program address sequencing G06F9/00; arrangements for selecting an address in a digital store G11C8/00) · CPC title

  • Buffers; Shared memory; Pipes · CPC title

  • Local memory within processor subsystem · CPC title

  • Distributed memory · CPC title

  • Details of cache specific to multiprocessor cache arrangements · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12499052B2 cover?
A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each c…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F12/0875. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).