One-sided reliable remote direct memory operations

US11347678B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11347678-B2
Application numberUS-201816055978-A
CountryUS
Kind codeB2
Filing dateAug 6, 2018
Priority dateAug 6, 2018
Publication dateMay 31, 2022
Grant dateMay 31, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided to allow more sophisticated operations to be performed remotely by machines that are not fully functional. Operations that can be performed reliably by a machine that has experienced a hardware and/or software error are referred to herein as Remote Direct Memory Operations or “RDMOs”. Unlike RDMAs, which typically involve trivially simple operations such as the retrieval of a single value from the memory of a remote machine, RDMOs may be arbitrarily complex. The techniques described herein can help applications run without interruption when there are software faults or glitches on a remote system with which they interact.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: implementing, on a first computing device that contains local volatile memory: a first execution candidate, on the first computing device, capable of performing an operation; and a second execution candidate, on the first computing device, capable of performing the operation; wherein the first execution candidate and the second execution candidate have direct access to the local volatile memory of the first computing device; wherein the operation requires access to data in the local volatile memory; receiving a request to perform the operation, from a requesting entity executing on a second computing device that (a) is remote relative to the first computing device and (b) does not have direct access to the local volatile memory; responsive to the request, attempting to perform the operation using the first execution candidate; determining that the first execution candidate failed to successfully perform the operation; and responsive to determining that the first execution candidate failed to successfully perform the operation, attempting to perform the operation using the second execution candidate. 2. The method of claim 1 wherein attempting to perform the operation using the second execution candidate comprises attempting to perform the operation using the second execution candidate without informing the requesting entity that the first execution candidate failed to perform the operation. 3. The method of claim 1 wherein attempting to perform the operation using the second execution candidate includes: informing the requesting entity that the first execution candidate failed to perform the operation; receiving a second request to perform the operation from the requesting entity; and responsive to the second request, attempting to perform the operation using the second execution candidate. 4. The method of claim 1 wherein the operation involves reading and/or writing data on a persistent storage device that is directly accessible to the first computing device and not directly accessible to the second computing device. 5. The method of claim 1 wherein: one execution candidate of the first execution candidate and the second execution candidate is an application running on one or more processors of the first computing device; and another execution candidate of the first execution candidate of the second execution candidate is implemented in a network interface controller of the first computing device. 6. The method of claim 5 wherein the other execution candidate is implemented in firmware of the network interface controller. 7. The method of claim 5 wherein the other execution candidate is software executing on one or more processors within the network interface controller. 8. The method of claim 5 wherein the other execution candidate is an interpreter that performs the operation by interpreting instructions specified in data provided to the network interface controller. 9. The method of claim 1 wherein one of the first execution candidate and the second execution candidate is implemented within an operating system executing on the first computing device. 10. The method of claim 1 wherein the second execution candidate is associated with a different reliability domain than the first execution candidate and wherein one of the first execution candidate and the second execution candidate is implemented within a privileged domain on the first computing device. 11. The method of claim 1 wherein: the first execution candidate executes on a first set of one or more cores of a processor in the first computing device; the second execution candidate executes on a second set of one or more cores of the processor in the first computing device; and membership of the second set of one or more cores is different than membership of the first set of one or more cores. 12. The method of claim 11 wherein the second set of one or more cores includes at least one core that does not belong to the first set of one or more cores. 13. The method of claim 1 wherein one of the first execution candidate and the second execution candidate includes an interpreter that interprets instructions which, when interpreted, cause performance of the operation. 14. The method of claim 13 wherein the interpreter is implemented on a network interface controller through which the first computing device communicates, over a network, with the second computing device. 15. The method of claim 1 , further comprising: determining that the first execution candidate should be first utilized for performing the operation; wherein the first execution candidate is used to attempt to perform the operation, prior to said attempting to perform the operation using the second execution candidate, based on said determining that the first execution candidate should be first utilized for performing the operation. 16. The method of claim 15 , wherein said determining that the first execution candidate should be first utilized for performing the operation is based on the request to perform the operation identifying the first execution candidate. 17. One or more non-transitory computer-readable media storing one or more sequences of instructions that, when executed by one or more processors, cause: implementing, on a first computing device that contains local volatile memory: a first execution candidate, on the first computing device, capable of performing an operation; and a second execution candidate, on the first computing device, capable of performing the operation; wherein the first execution candidate and the second execution candidate have direct access to the local volatile memory of the first computing device; wherein the operation requires access to data in the local volatile memory; receiving a request to perform the operation, from a requesting entity executing on a second computing device that (a) is remote relative to the first computing device and (b) does not have direct access to the local volatile memory; responsive to the request, attempting to perform the operation using the first execution candidate; determining that the first execution candidate failed to successfully perform the operation; and responsive to determining that the first execution candidate failed to successfully perform the operation, attempting to perform the operation using the second execution candidate. 18. The one or more non-transitory computer-readable media of claim 17 , wherein attempting to perform the operation using the second execution candidate comprises: attempting to perform the operation using the second execution candidate without informing the requesting entity that the first execution candidate failed to perform the operation. 19. The one or more non-transitory computer-readable media of claim 17 , wherein the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cause: determining that the first execution candidate should be first utilized for performing the operation; wherein the first execution candidate is used to attempt to perform the operation, prior to said attempting to perform the operation using the second execution candidate, based on said determining that the first execution candidate should be first utilized for performing the operation. 20. The one or more non-transitory computer-readable media of claim 19 , wherein said determining that the first execution candidate should be first utilized for perfor

Assignees

Inventors

Classifications

  • Failover techniques · CPC title

  • Distributed shared memory [DSM], e.g. remote direct memory access [RDMA] · CPC title

  • G06F15/167Primary

    using a common memory, e.g. mailbox · CPC title

  • Hypervisors; Virtual machine monitors · CPC title

  • using centralised failover control functionality · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11347678B2 cover?
Techniques are provided to allow more sophisticated operations to be performed remotely by machines that are not fully functional. Operations that can be performed reliably by a machine that has experienced a hardware and/or software error are referred to herein as Remote Direct Memory Operations or “RDMOs”. Unlike RDMAs, which typically involve trivially simple operations such as the retrieval…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F11/2023. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 31 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).