Active memory device gather, scatter, and filter

US10049061B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10049061-B2
Application numberUS-201213674520-A
CountryUS
Kind codeB2
Filing dateNov 12, 2012
Priority dateNov 12, 2012
Publication dateAug 14, 2018
Grant dateAug 14, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments relate to loading and storing of data. An aspect includes a method for transferring data in an active memory device that includes memory and a processing element. An instruction is fetched and decoded for execution by the processing element. Based on determining that the instruction is a gather instruction, the processing element determines a plurality of source addresses in the memory from which to gather data elements and a destination address in the memory. One or more gathered data elements are transferred from the source addresses to contiguous locations in the memory starting at the destination address. Based on determining that the instruction is a scatter instruction, a source address in the memory from which to read data elements at contiguous locations and one or more destination addresses in the memory to store the data elements at non-contiguous locations are determined, and the data elements are transferred.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for transferring data in an active memory device comprising a three-dimensional memory cube that includes memory divided into three-dimensional blocked regions as memory vaults, one or more memory controllers, and a processing element, the method comprising: fetching and decoding an instruction for execution by the processing element of the active memory device; and based on determining that the instruction is a gather instruction comprising a first source address pointer, a first destination address pointer, a first stride pointer, and a first count, the processing element performing: determining a plurality of source addresses in the memory from which to gather data elements based on the first source address pointer pointing to a list of the source addresses and the first count indicating a number of the source addresses in the list of the source addresses, the plurality of source addresses identifying non-contiguous locations in one or more of the memory vaults; determining a destination address in the memory as pointed to by the first destination address pointer; determining a first stride size vector as pointed to by the first stride pointer, wherein the first stride size vector supports different stride sizes associated with each of the source addresses; accessing the non-contiguous locations in one or more of the memory vaults through the one or more memory controllers in the active memory device as gathered data elements by realizing multiple vector data element accesses simultaneously, wherein the memory vaults each comprise at least one data element from each of a plurality of memory layers; and transferring the gathered data elements from the plurality of source addresses to contiguous locations in the memory starting at the destination address and incrementing the destination address based on the first stride size vector as each of the gathered data elements is transferred; and based on determining that the instruction is a scatter instruction comprising a second source address pointer, a second destination address pointer, a second stride pointer, and a second count: determining a source address in the memory from which to read a plurality of data elements at contiguous locations as pointed to by the second source address pointer; determining a plurality of destination addresses in the memory to store the data elements at non-contiguous locations based on the second destination address pointer pointing to a list of the destination addresses and the second count indicating a number of the destination addresses in the list of the destination addresses; determining a second stride size vector as pointed to by the second stride pointer, wherein the second stride size vector supports different stride sizes associated with the source address; identifying filter criteria associated with the instruction; and transferring one or more of the data elements from the source address to the destination addresses while applying the filter criteria to limit transferring between the source and destination addresses according to the filter criteria based on a data value of the one or more of data elements to be transferred, wherein the filter criteria prevent one or more excluded data values from being stored at the destination addresses while continuing to store one or more included data values at the destination addresses and incrementing the source address based on the second stride size vector regardless of the filter criteria; wherein the processing element provides virtual address computation functionality that supports an execution of the gather instruction or the scatter instruction. 2. The method of claim 1 , wherein the instruction, the plurality of source addresses, and the destination address are provided by a main processor in communication with the processing element. 3. The method of claim 2 , wherein the plurality of source addresses and the destination address are received from the main processor in an effective address format and are translated by the processing element to a real address format when performing load and store operations to the memory. 4. The method of claim 2 , wherein determining the plurality of source addresses in the memory from which to gather data elements further comprises receiving the first source address pointer from the main processor that identifies a location in the memory containing the plurality of source addresses. 5. The method of claim 1 , wherein the active memory device further comprises multiple instances of the processing element coupled to an interconnect network, the multiple instances of the processing element operable to access any of the memory vaults across the interconnect network. 6. A processing element of an active memory device comprising a three-dimensional memory cube that includes memory divided into three-dimensional blocked regions as memory vaults, one or more memory controllers, and the processing element, comprising: a load store queue that interfaces with one or more of the memory vaults in the active memory device; an instruction buffer coupled to the load store queue; and a decoder coupled to the instruction buffer, the decoder decodes an instruction received at the instruction buffer and based on determining that the instruction is a gather instruction comprising a first source address pointer, a first destination address pointer, a first stride pointer, and a first count, the processing element performs: determining a plurality of source addresses in the memory from which to gather data elements based on the first source address pointer pointing to a list of the source addresses and the first count indicating a number of the source addresses in the list of the source addresses, the plurality of source addresses identifying non-contiguous locations in one or more of the memory vaults; determining a destination address in the memory as pointed to by the first destination address pointer; determining a first stride size vector as pointed to by the first stride pointer, wherein the first stride size vector supports different stride sizes associated with each of the source addresses; accessing the non-contiguous locations in one or more of the memory vaults through the one or more memory controllers in the active memory device as gathered data elements by realizing multiple vector data element accesses simultaneously, wherein the memory vaults each comprise at least one data element from each of a plurality of memory layers; and transferring the gathered data elements from the plurality of source addresses to contiguous locations in the memory starting at the destination address and incrementing the destination address based on the first stride size vector as each of the gathered data elements is transferred; and based on determining that the instruction is a scatter instruction comprising a second source address pointer, a second destination address pointer, a second stride pointer, and a second count: determining a source address in the memory from which to read a plurality of data elements at contiguous locations as pointed to by the second source address pointer; determining a plurality of destination addresses in the memory to store the data elements at non-contiguous locations based on the second destination address pointer pointing to a list of the destination addresses and the second count indicating a number of the destination addresses in the list of the destination addresses; determining a second stride size vector as pointed to by the second stride pointer, wherein the second stride size vector supports different stride sizes associated with the source address; identifying filter criteria associated with the instruction; and transferring one or more of the data ele

Assignees

Inventors

Classifications

  • Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • using stride · CPC title

  • G06F13/28Primary

    using burst mode transfer, e.g. direct memory access {DMA}, cycle steal (G06F13/32 takes precedence) · CPC title

  • Cross-Sectional Technologies · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10049061B2 cover?
Embodiments relate to loading and storing of data. An aspect includes a method for transferring data in an active memory device that includes memory and a processing element. An instruction is fetched and decoded for execution by the processing element. Based on determining that the instruction is a gather instruction, the processing element determines a plurality of source addresses in the mem…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F13/28. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 14 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).