Data processing array interface having interface tiles with multiple direct memory access circuits
US-12164451-B2 · Dec 10, 2024 · US
US10049061B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10049061-B2 |
| Application number | US-201213674520-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 12, 2012 |
| Priority date | Nov 12, 2012 |
| Publication date | Aug 14, 2018 |
| Grant date | Aug 14, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments relate to loading and storing of data. An aspect includes a method for transferring data in an active memory device that includes memory and a processing element. An instruction is fetched and decoded for execution by the processing element. Based on determining that the instruction is a gather instruction, the processing element determines a plurality of source addresses in the memory from which to gather data elements and a destination address in the memory. One or more gathered data elements are transferred from the source addresses to contiguous locations in the memory starting at the destination address. Based on determining that the instruction is a scatter instruction, a source address in the memory from which to read data elements at contiguous locations and one or more destination addresses in the memory to store the data elements at non-contiguous locations are determined, and the data elements are transferred.
Opening claim text (preview).
What is claimed is: 1. A method for transferring data in an active memory device comprising a three-dimensional memory cube that includes memory divided into three-dimensional blocked regions as memory vaults, one or more memory controllers, and a processing element, the method comprising: fetching and decoding an instruction for execution by the processing element of the active memory device; and based on determining that the instruction is a gather instruction comprising a first source address pointer, a first destination address pointer, a first stride pointer, and a first count, the processing element performing: determining a plurality of source addresses in the memory from which to gather data elements based on the first source address pointer pointing to a list of the source addresses and the first count indicating a number of the source addresses in the list of the source addresses, the plurality of source addresses identifying non-contiguous locations in one or more of the memory vaults; determining a destination address in the memory as pointed to by the first destination address pointer; determining a first stride size vector as pointed to by the first stride pointer, wherein the first stride size vector supports different stride sizes associated with each of the source addresses; accessing the non-contiguous locations in one or more of the memory vaults through the one or more memory controllers in the active memory device as gathered data elements by realizing multiple vector data element accesses simultaneously, wherein the memory vaults each comprise at least one data element from each of a plurality of memory layers; and transferring the gathered data elements from the plurality of source addresses to contiguous locations in the memory starting at the destination address and incrementing the destination address based on the first stride size vector as each of the gathered data elements is transferred; and based on determining that the instruction is a scatter instruction comprising a second source address pointer, a second destination address pointer, a second stride pointer, and a second count: determining a source address in the memory from which to read a plurality of data elements at contiguous locations as pointed to by the second source address pointer; determining a plurality of destination addresses in the memory to store the data elements at non-contiguous locations based on the second destination address pointer pointing to a list of the destination addresses and the second count indicating a number of the destination addresses in the list of the destination addresses; determining a second stride size vector as pointed to by the second stride pointer, wherein the second stride size vector supports different stride sizes associated with the source address; identifying filter criteria associated with the instruction; and transferring one or more of the data elements from the source address to the destination addresses while applying the filter criteria to limit transferring between the source and destination addresses according to the filter criteria based on a data value of the one or more of data elements to be transferred, wherein the filter criteria prevent one or more excluded data values from being stored at the destination addresses while continuing to store one or more included data values at the destination addresses and incrementing the source address based on the second stride size vector regardless of the filter criteria; wherein the processing element provides virtual address computation functionality that supports an execution of the gather instruction or the scatter instruction. 2. The method of claim 1 , wherein the instruction, the plurality of source addresses, and the destination address are provided by a main processor in communication with the processing element. 3. The method of claim 2 , wherein the plurality of source addresses and the destination address are received from the main processor in an effective address format and are translated by the processing element to a real address format when performing load and store operations to the memory. 4. The method of claim 2 , wherein determining the plurality of source addresses in the memory from which to gather data elements further comprises receiving the first source address pointer from the main processor that identifies a location in the memory containing the plurality of source addresses. 5. The method of claim 1 , wherein the active memory device further comprises multiple instances of the processing element coupled to an interconnect network, the multiple instances of the processing element operable to access any of the memory vaults across the interconnect network. 6. A processing element of an active memory device comprising a three-dimensional memory cube that includes memory divided into three-dimensional blocked regions as memory vaults, one or more memory controllers, and the processing element, comprising: a load store queue that interfaces with one or more of the memory vaults in the active memory device; an instruction buffer coupled to the load store queue; and a decoder coupled to the instruction buffer, the decoder decodes an instruction received at the instruction buffer and based on determining that the instruction is a gather instruction comprising a first source address pointer, a first destination address pointer, a first stride pointer, and a first count, the processing element performs: determining a plurality of source addresses in the memory from which to gather data elements based on the first source address pointer pointing to a list of the source addresses and the first count indicating a number of the source addresses in the list of the source addresses, the plurality of source addresses identifying non-contiguous locations in one or more of the memory vaults; determining a destination address in the memory as pointed to by the first destination address pointer; determining a first stride size vector as pointed to by the first stride pointer, wherein the first stride size vector supports different stride sizes associated with each of the source addresses; accessing the non-contiguous locations in one or more of the memory vaults through the one or more memory controllers in the active memory device as gathered data elements by realizing multiple vector data element accesses simultaneously, wherein the memory vaults each comprise at least one data element from each of a plurality of memory layers; and transferring the gathered data elements from the plurality of source addresses to contiguous locations in the memory starting at the destination address and incrementing the destination address based on the first stride size vector as each of the gathered data elements is transferred; and based on determining that the instruction is a scatter instruction comprising a second source address pointer, a second destination address pointer, a second stride pointer, and a second count: determining a source address in the memory from which to read a plurality of data elements at contiguous locations as pointed to by the second source address pointer; determining a plurality of destination addresses in the memory to store the data elements at non-contiguous locations based on the second destination address pointer pointing to a list of the destination addresses and the second count indicating a number of the destination addresses in the list of the destination addresses; determining a second stride size vector as pointed to by the second stride pointer, wherein the second stride size vector supports different stride sizes associated with the source address; identifying filter criteria associated with the instruction; and transferring one or more of the data ele
Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
using stride · CPC title
using burst mode transfer, e.g. direct memory access {DMA}, cycle steal (G06F13/32 takes precedence) · CPC title
Cross-Sectional Technologies · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.