Methods and systems for efficiently moving data between nodes in a cluster

US10484472B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10484472-B2
Application numberUS-201514840512-A
CountryUS
Kind codeB2
Filing dateAug 31, 2015
Priority dateJul 31, 2015
Publication dateNov 19, 2019
Grant dateNov 19, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Exemplary embodiments provide methods, mediums, and systems for efficiently moving data between cluster nodes. Upon receiving a request to read or write data at a first cluster node that is in communication with a client, the first node effects the transfer to or from a second cluster node. The transfer is carried out using a combination of remote data memory access (“RDMA”), or a similar technique that bypasses a part of the network stack, and transport control protocol (“TCP”), or a similar technique that does not bypass a part of the network stack. The data is transferred using RDMA, while certain control messages are sent using TCP. By combining RDMA content transfers and TCP control messages, data transfers can be carried out faster, more efficiently, and with less processing overhead. Other embodiments are described and claimed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving, via a transport control protocol (TCP) by a first node, a read request from a client device to read data, comprising metadata and content, that is stored on a remote volume associated with a second node content; allocating, by the first node, a buffer within memory of the first node for receiving the content using remote direct memory access (RDMA) based upon a size of the content exceeding a size threshold and resource consumption for allocating the buffer being below a threshold; transmitting an address of the buffer to the second node via the TCP to trigger the second node to perform an RDMA write operation to write the content into the buffer using the address; receiving, via the TCP by the first node from the second node, the metadata comprising instructions for reconstructing the data using the content within the buffer, wherein a response header comprises an indication of whether the RDMA write operation was successful; and reconstructing and transmitting the data to the client device using the metadata and the content based upon the instructions. 2. The method of claim 1 , comprising: deallocating the buffer from the memory based upon transmitting the data to the client device. 3. The method of claim 1 , further comprising: extract the content from the buffer based upon the flag indicating that the RDMA write operation by the second node wrote the content into the buffer within the memory of the first node. 4. The method of claim 1 , wherein the reconstructing comprises: combining the metadata received via the TCP and the content received through the buffer via the RDMA to construct the data. 5. The method of claim 1 , further comprising: receiving the content via TCP as opposed to the RDMA when the size of the content is less than the size threshold. 6. The method of claim 1 , wherein the size threshold is between about 16 kilobytes and about 32 kilobytes. 7. The method of claim 1 , further comprising: reverting to data transmission via the TCP when communication via the RDMA is impossible. 8. The method of claim 1 , wherein the RDMA write operation is performed by the second node to facilitate execution of the read request by the first node. 9. A non-transitory computer readable medium storing instructions that, when executed, cause circuitry of a computing device to: receive, via a transport control protocol (TCP) by a first node, a read request from a client device to read data, comprising metadata and content, that is stored on a remote volume associated with a second node; allocate, by the first node, a buffer within memory of the first node for receiving the content using remote direct memory access (RDMA) based upon a size of the content exceeding a size threshold and resource consumption for allocating the buffer being below a threshold; transmit an address of the buffer to the second node using the TCP to trigger the second node to perform an RDMA write operation to write the content into the buffer using the address; receive, via the TCP by the first node from the second node, the metadata comprising instructions for reconstructing the data using the content within the buffer, wherein a response header comprises an indication of whether the RDMA write operation was successful; and reconstruct and transmit the data to the client device using the metadata and the content based upon the instructions. 10. The medium of claim 9 , wherein the instructions cause the computing device: deallocate the buffer from the memory based upon transmitting the data to the client device. 11. The medium of claim 9 , wherein the instructions cause the computing device to: extract the content from the buffer based upon the flag indicating that the RDMA write operation by the second node wrote the content into the buffer within the memory of the first node. 12. The medium of claim 9 , wherein the instructions cause the computing device: combine the metadata received via the TCP and the content received through the buffer via the RDMA to construct the data. 13. The medium of claim 9 , wherein the instructions cause the computing device: receiving the content via TCP as opposed to the RDMA when the size of the content is less than the size threshold. 14. The medium of claim 9 , wherein the instructions cause the computing device: revert to data transmission via the TCP when communication via the RDMA is impossible. 15. The medium of claim 9 , wherein the RDMA write operation is performed by the second node to facilitate execution of the read request by the first node. 16. A computing device, comprising: a memory comprising machine executable code; and a processor coupled to the memory, the processor configured to execute the machine executable code to cause the processor to: receive, via a transport control protocol (TCP) by a first node, a read request from a client device to read data, comprising metadata and content, that is stored on a remote volume associated with a second node; allocate, by the first node, a buffer within memory of the first node for receiving the content using remote direct memory access (RDMA) based upon a size of the content exceeding a size threshold and resource consumption for allocating the buffer being below a threshold; transmit an address of the buffer to the second node using the TCP to trigger the second node to perform an RDMA write operation to write the content into the buffer using the address; receive, via the TCP by the first node from the second node, the metadata comprising instructions for reconstructing the data using the content within the buffer, wherein a response header comprises an indication of whether the RDMA write operation was successful; and reconstruct and transmit the data to the client device using the metadata and the content based upon the instructions. 17. The computing device of claim 16 , wherein the buffer is deallocated from the memory based upon transmitting the data to the client device. 18. The computing device of claim 16 , wherein the content is transmitted via the TCP when the size of the content is less than the size threshold. 19. The computing device of claim 18 , wherein the size threshold is between about 16 kilobytes and about 32 kilobytes. 20. The computing device of claim 16 , wherein the machine executable code causes the processor to revert to data transmission via the TCP when communication via the RDMA is impossible.

Assignees

Inventors

Classifications

  • Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields · CPC title

  • Multiprotocol handlers, e.g. single devices capable of handling multiple protocols · CPC title

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

  • H04L43/16Primary

    Threshold monitoring · CPC title

  • using a common memory, e.g. mailbox · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10484472B2 cover?
Exemplary embodiments provide methods, mediums, and systems for efficiently moving data between cluster nodes. Upon receiving a request to read or write data at a first cluster node that is in communication with a client, the first node effects the transfer to or from a second cluster node. The transfer is carried out using a combination of remote data memory access (“RDMA”), or a similar techn…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification H04L67/1097. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Nov 19 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).