Distributed data set storage and retrieval

US9703789B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9703789-B2
Application numberUS-201615220192-A
CountryUS
Kind codeB2
Filing dateJul 26, 2016
Priority dateJul 27, 2015
Publication dateJul 11, 2017
Grant dateJul 11, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus comprising a processor component to: receive metadata of data organization within a data set; receive indications of which node devices will be storing the data set as multiple data blocks within a data file; and receive, from each node device, a pointer request to a location within the data file for storing a data set portion as a data block. In response to the data set including partitioned data, for each request for a pointer: determine the location within the data file; generate a map data map entry for the data block; generate therein a sub-block count of data sub-blocks within the data block; generate therein a sub-entry for each data sub-block including size and a hashed identifier derived from a partition label; and provide a pointer to the node device. In response to successful storage of all data blocks, store the map data in the data file.

First claim

Opening claim text (preview).

The invention claimed is: 1. An apparatus comprising a processor component and a storage to store instructions that, when executed by the processor component, cause the processor component to perform operations comprising: receive, from at least one node device of multiple node devices, at least a portion of metadata indicative of organization of data within a data set; receive, from the multiple node devices, indications of which node devices among the multiple node devices are to be involved in a storage of the data set as multiple data blocks within a data file maintained by one or more storage devices, wherein: the organization of the multiple data blocks within the data file is indicated in map data that comprises multiple map entries; and each map entry of the multiple map entries corresponds to one or more data blocks of the multiple data blocks; receive, from each node device involved in the storage of the data set, a request for a pointer to a location within the data file at which the node device is to store at least one data set portion as a data block; in response to an indication received from the at least one node device that the data set comprises partitioned data, wherein the data within the data set is organized into multiple partitions that are each distributable to a single node device and each map entry corresponds to a single data block, for each request for a pointer received from a node device involved in the storage of the data set: determine the location within the data file at which the node device is to store the data block; generate a map entry within the map data that corresponds to the data block; generate within the map entry a data sub-block count indicative of a quantity of data sub-blocks to be stored by the node device within the data block, wherein each data sub-block comprises a data set portion of the data set that is to be stored by the node device; generate within the map entry a separate map sub-entry for each of the data sub-blocks, wherein each map sub-entry comprises a sub-block size indicative of a size of a corresponding data set portion and a hashed identifier derived from a partition label of the partition to which the corresponding data set portion belongs; and provide a pointer to the node device, the pointer comprising an indication of the location at which the node device is to store the data block in the data file; and in response to successful storage of all data blocks of the data set within the data file by all of the node devices involved in the storage of the data set, store the map data in the data file. 2. The apparatus of claim 1 , wherein in response to a lack of indication received from the at least one node device that the data set comprises partitioned data, the processor component is caused to perform operations comprising: for each request for a pointer received from a node device involved in the storage of the data set: determine the location within the data file at which the node device is to store the data block; compare a data block size of the data block to a data block size indicated in the map data for an adjacent data block to be stored by another node device of the multiple node devices at an adjacent location within the data file to detect a match between the two data block sizes; in response to detection of a match between the two data block sizes, increment a data block count of a map entry within the map data that corresponds to the adjacent data block; in response to detection of a lack of a match between the two data block sizes, generate a new map entry within the map data that corresponds to the data block, wherein the new map entry comprises a data block count indicative of correspondence to a single data block and a data block size indicative of the size of the data block; and provide a pointer to the node device, the pointer comprising an indication of the location at which the node device is to store the data block in the data file. 3. The apparatus of claim 1 , wherein the at least a portion of the metadata comprises the indication received from the at least one node device that the data set comprises partitioned data. 4. The apparatus of claim 1 , wherein: each node device involved in the storage of the data set is required to generate a single request for a pointer for the storage of all data set portions distributed to the node device; and the processor component is caused to determine that all pointers have been generated for the storage of all data set portions of the data set in the data file by all of the node devices involved in the storage of the data set based on reception of a single request for a pointer from each node device involved in the storage of the data set. 5. The apparatus of claim 1 , wherein the apparatus comprises one of the node devices involved in the storage of the data set. 6. The apparatus of claim 1 , wherein to receive indications of which node devices among the multiple node devices are involved in the storage of the data set within the data file, the processor component is caused to perform operations comprising: recurringly receive indications of status from each node device of the multiple node devices; and recurringly update a stored indication of whether each node device of the multiple node devices is involved in the storage of the data set. 7. The apparatus of claim 1 , wherein to store the map data in the data file, the processor component is caused to perform operations comprising: determine whether a size of the map data exceeds a predetermined data size; and in response to a determination that the size of the map data exceeds the predetermined data size: divide the map data into one or more map extensions; store the one or more map extensions within the data file at locations dispersed among the data blocks stored by node devices involved in the storage of the data set; and store, within the data file, a map base comprising one or more pointers to the location of each map extension within the data file. 8. The apparatus of claim 7 , wherein a size of each map extension stored within the data file at a location following a first one of the map extensions is twice the size of a preceding map extension. 9. The apparatus of claim 1 , wherein the processor component is caused to perform operations comprising provide an indication of a task to perform with the data set to the node devices involved in the storage of the data set to enable at least a first node device of the multiple node devices to perform the task with a first data set portion of the data set and at least a second node device of the multiple node devices to perform the task with a second data set portion of the data set at least partially in parallel. 10. The apparatus of claim 1 , wherein each hashed identifier comprises an integer value derived from a hash taken of a partition label that uniquely identifies one of the partitions of the multiple partitions. 11. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause a processor component to perform operations comprising: receive, from at least one node device of multiple node devices, at least a portion of metadata indicative of organization of data within a data set; receive, from the multiple node devices, indications of which node devices among the multiple node devices are to be involved in a storage of the data set as multiple data blocks within a data file maintained by one or more storage devices, wherein: the organization of the multiple data blocks within the data file is indicated in map data that comp

Assignees

Inventors

Classifications

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Management of space entities, e.g. partitions, extents, pools · CPC title

  • Indexing; Data structures therefor; Storage structures · CPC title

  • Management specifically adapted to NAS (management of storage area networks [SAN] G06F3/067) · CPC title

  • G06F16/137Primary

    Hash-based (content-based indexing of textual data G06F16/31) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9703789B2 cover?
An apparatus comprising a processor component to: receive metadata of data organization within a data set; receive indications of which node devices will be storing the data set as multiple data blocks within a data file; and receive, from each node device, a pointer request to a location within the data file for storing a data set portion as a data block. In response to the data set including …
Who is the assignee on this patent?
Sas Inst Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/137. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 11 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).