Using Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All
US-2018052803-A1 · Feb 22, 2018 · US
US12093209B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12093209-B2 |
| Application number | US-202217862227-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 11, 2022 |
| Priority date | May 27, 2022 |
| Publication date | Sep 17, 2024 |
| Grant date | Sep 17, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Technologies for batching remote descriptors of serialized objects in streaming pipelines are described. One method of a first computing device generates a streaming batch of remote descriptors. Each remote descriptor uniquely identifies a contiguous block of a serialized object. The first computing device sends at least one of the remote descriptors to a second computing device before the streaming batch is completed. At least some contents of a contiguous block are obtained for storage at a second memory associated with the second computing device before the streaming batch is completed.
Opening claim text (preview).
What is claimed is: 1. A method of operating a first computing device, the method comprising: generating a first streaming batch of a plurality of remote descriptors, each of the plurality of remote descriptors being an object that uniquely identifies a contiguous block of a serialized object stored in a first memory associated with the first computing device; sending at least one of the plurality of remote descriptors to a second computing device before the first streaming batch is completed, wherein at least some contents of at least one of the contiguous blocks are obtained from the first memory for storage at a second memory associated with the second computing device before the first streaming batch is completed; and completing the first streaming batch responsive to i) each of a plurality of slots of the first streaming batch being assigned one of the plurality of remote descriptors or ii) a first timing window ending. 2. The method of claim 1 , wherein generating the first streaming batch comprises: assigning a first remote descriptor of the plurality of remote descriptors to a first slot of the first streaming batch during the first timing window, wherein the first remote descriptor is an object that uniquely identifies a first contiguous block of a first serialized object stored in the first memory associated with the first computing device; and assigning a second remote descriptor of the plurality of remote descriptors to a second slot of the first streaming batch during the first timing window, wherein the second remote descriptor is an object that uniquely identifies a second contiguous block of a second serialized object stored in the first memory; sending the at least one of the plurality of remote descriptors to the second computing device comprises: sending the first remote descriptor to the second computing device before the first streaming batch is completed; and sending the second remote descriptor to the second computing device before the first streaming batch is completed, wherein the second computing device is to obtain at least some contents of the first contiguous block from the first memory for storage at the second memory associated with the second computing device before the first streaming batch is completed. 3. The method of claim 2 , wherein the first remote descriptor comprises a starting address of the first contiguous block, a size of the first contiguous block, a physical machine identifier corresponding to the first memory, a remote direct memory access (RDMA) access key, and a value of a reference count token representing one or more shares of ownership of the first serialized object, wherein a size of the first remote descriptor is less than the size of the first contiguous block. 4. The method of claim 3 , further comprising: receiving a first message from the second computing device to release the first remote descriptor; updating the value of the reference count token responsive to receiving the first message; and releasing the first serialized object from the first memory responsive to the value of the reference count token satisfying a threshold value. 5. The method of claim 2 , further comprising: initializing the first streaming batch comprising a specified number of the plurality of slots, each of the plurality of slots corresponding to an individual streaming batch item, wherein the first timing window starts responsive to a first streaming batch item being assigned to the first streaming batch, and wherein the obtained contents of the first contiguous block are processed by the second computing device once the first streaming batch is completed. 6. The method of claim 2 , further comprising: receiving a third remote descriptor associated with a second streaming batch during a second timing window, wherein the third remote descriptor is a second object that uniquely identifies a third contiguous block of a third serialized object stored in a third memory associated with a third computing device; and performing, using the third remote descriptor, an RDMA GET operation to obtain at least some contents of the third contiguous block from the third memory for transfer to the first memory. 7. The method of claim 6 , wherein receiving the second remote descriptor comprises receiving the second remote descriptor from a fourth computing device. 8. The method of claim 6 , further comprising: sending a second message to the third computing device to release the third remote descriptor, wherein: the third remote descriptor comprises a starting address of the third contiguous block, a size of the third contiguous block, a physical machine identifier corresponding to the third memory, an RDMA access key, and a value of a reference count token representing one or more shares of ownership of the third serialized object, wherein a size of the third remote descriptor is less than the size of the third contiguous block; and the value of the reference count token is updated in response to the second message, wherein the second serialized object is released responsive to the value of the reference count token satisfying a threshold value. 9. A computing system comprising: a first computing device; and a first memory coupled to the first computing device, wherein the first computing device is to: generate a first streaming batch of a plurality of remote descriptors, each of the plurality of remote descriptors being an object that uniquely identifies a contiguous block of a serialized object stored in a first memory associated with the first computing device; send at least one of the plurality of remote descriptors to a second computing device before the first streaming batch is completed, wherein at least some contents of at least one of the contiguous blocks are obtained from the first memory for storage at a second memory associated with the second computing device before the first streaming batch is completed; and complete the first streaming batch responsive to i) each of a plurality of slots of the first streaming batch being assigned one of the plurality of remote descriptors or ii) a first timing window ending. 10. The computing system of claim 9 , wherein: the first computing device, to generate the first streaming batch, is further to: assign a first remote descriptor of the plurality of remote descriptors to a first slot of the first streaming batch during a first timing window, wherein the first remote descriptor is an object that uniquely identifies a first contiguous block of a first serialized object stored in the first memory associated with the first computing device; and assign a second remote descriptor of the plurality of remote descriptors to a second slot of the first streaming batch during the first timing window, wherein the second remote descriptor is an object that uniquely identifies a second contiguous block of a second serialized object stored in the first memory; the first computing device, to send the at least one of the plurality of remote descriptors to the second computing device, is further to: send the first remote descriptor to the second computing device before the first streaming batch is completed; and send the second remote descriptor to the second computing device the first streaming batch is completed, wherein the second computing device is to obtain at least some contents of the first contiguous block from the first memory for storage at the second memory associated with the second computing device before the first streaming batch is completed. 11. The computing system of claim 10 , wherein the first remote descriptor comprises a starting address of the first contiguous block, a size of t
for remote control or remote monitoring of applications · CPC title
to perform operations on memory · CPC title
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Distributed shared memory [DSM], e.g. remote direct memory access [RDMA] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.