Efficient reduce-scatter via near-memory computation

US2024168639A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024168639-A1
Application numberUS-202217990092-A
CountryUS
Kind codeA1
Filing dateNov 18, 2022
Priority dateNov 18, 2022
Publication dateMay 23, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus for performing distributed reduction operations using near-memory computation includes memory and a first near-memory compute node. The first-near-memory compute node is coupled to a plurality of near-memory compute nodes. The first near-memory compute node comprises logic to store first data loaded from a second near-memory compute node, perform a reduction operation on the first data and second data to compute a result; and store the result within the first near-memory compute node. In some aspects, the near-memory compute node includes a PIM execution unit and carries out the reduction operation utilizing PIM commands.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for performing distributed reduction operations using near-memory computation, the system comprising: a first near-memory compute node; and a second near-memory compute node coupled to the first-near memory compute node, wherein: the first near-memory compute node comprises a processor, memory, and a processing-in-memory (PIM) execution unit comprising logic to: store first data loaded from the second near-memory compute node; perform a reduction operation on the first data and second data to compute a result; and store the result within the first near-memory compute node. 2 . The system of claim 1 , wherein PIM execution unit further comprises logic to: receive one or more memory access requests; and based on the one or more memory access requests, trigger the operations of storing first data, performing the reduction operation, and storing the result. 3 . The system of claim 2 , wherein the one or more memory access requests are received from the processor of the first near-memory compute node. 4 . The system of claim 3 , wherein the processor is configured to send the one or more access requests to the second near-memory compute node. 5 . The system of claim 2 , wherein one or more of the memory access requests are addressed to a memory address, and wherein the triggering of the operations is responsive to the memory address being within a memory address range. 6 . The system of claim 2 , wherein the triggering of the operations is responsive to one or more of the memory access requests including an indication of a memory request type. 7 . The system of claim 2 , wherein the one or more access requests are received from a second processor associated with the second near-memory compute node. 8 . The system of claim 1 , wherein performing the reduction operation on the first data and the second data includes performing an add, multiply, MIN, MAX, AND, OR, or XOR operation on the first data and the second data to compute the result. 9 . The system of claim 1 , wherein storing the result within the first near-memory compute node includes executing a PIM store command within the first near-memory compute node. 10 . The system of claim 1 , wherein the first and second near-memory compute nodes are coupled to a plurality of other near-memory compute nodes in at least one of a ring topology or a tree topology. 11 . The system of claim 1 , wherein the reduction operation forms part of an all-reduce operation. 12 . An apparatus for performing distributed reduction operations using near-memory computation, the apparatus comprising: memory; and a first processing-in-memory (PIM) execution unit comprising logic to execute a combined PIM load and a PIM add command to: load first data from a second PIM execution unit; perform a reduction operation on the first data and second data to compute a first result; and store the first result within the memory of the first PIM execution unit. 13 . The apparatus of claim 12 , wherein the first PIM execution unit further comprises logic to: receive a memory access request; and trigger execution of the combined PIM load and PIM add command. 14 . The apparatus of claim 13 , wherein the memory access request is addressed to a memory address, and the execution is triggered in response to the memory address being within a memory address range. 15 . The apparatus of claim 13 , wherein the execution is triggered in response to the memory access request including an indication of a memory request type. 16 . The apparatus of claim 12 , wherein the first data is used as a first operand and the second data is used as a second operand of the reduction operation. 17 . The apparatus of claim 16 , wherein the PIM execution unit is coupled to a plurality of PIM execution units in at least one of a ring topology or a tree topology. 18 . A method for performing distributed reduction operations using near-memory computation, the method comprising: receiving, by a first near-memory compute node of a plurality of near-memory compute nodes, one or more memory access requests; and triggering, based upon the one or more memory access requests, operations including: storing, by the first near-memory compute node, first data within the first near-memory compute node, the first data being loaded from a second near-memory compute node; performing, by the first near-memory compute node, a reduction operation on the first data and second data to compute a result; and storing, by the first near-memory compute node, the result within the first near-memory compute node. 19 . The method of claim 18 , wherein performing the reduction operation on the first data and the second data includes adding, multiplying, minimizing, maximizing, ANDing, or ORing the first data and the second data to compute the first result. 20 . The method of claim 18 , wherein the reduction operation forms part of an all-reduce operation.

Assignees

Inventors

Classifications

  • Improving I/O performance · CPC title

  • Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory · CPC title

  • Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title

  • G06F3/0679Primary

    Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP] · CPC title

  • G06F3/0613Primary

    in relation to throughput · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024168639A1 cover?
An apparatus for performing distributed reduction operations using near-memory computation includes memory and a first near-memory compute node. The first-near-memory compute node is coupled to a plurality of near-memory compute nodes. The first near-memory compute node comprises logic to store first data loaded from a second near-memory compute node, perform a reduction operation on the first …
Who is the assignee on this patent?
Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0679. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).