Dynamic partitioning techniques for data streams
US-2015134796-A1 · May 14, 2015 · US
US9619148B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9619148-B2 |
| Application number | US-201615220034-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 26, 2016 |
| Priority date | Jul 27, 2015 |
| Publication date | Apr 11, 2017 |
| Grant date | Apr 11, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus includes processor component caused to: retrieve metadata of organization of data within a data set, and map data of organization of data blocks within a data file; receive indications of which node devices are available to perform a processing task with a data set portion; and in response to the data set including partitioned data, compare the quantities of available node devices and of the node devices last involved in storing the data set. In response to a match, for each map data map entry: retrieve a hashed identifier for a data sub-block, and a size for each of the data sub-blocks within the corresponding data block; divide the hashed identifier by the quantity of available node devices; compare the modulo value to a designation assigned to each of the available node devices; and provide a pointer to the available node device assigned the matching designation.
Opening claim text (preview).
The invention claimed is: 1. An apparatus comprising a processor component and a storage to store instructions that, when executed by the processor component, cause the processor component to perform operations comprising: retrieve, from one or more storage devices through a network, metadata indicative of organization of data within a data set, and map data indicative of organization of multiple data blocks within a data file maintained by the one or more storage devices, wherein: the map data comprises multiple map entries; and each map entry of the multiple map entries corresponds to one or more data blocks of the multiple data blocks; receive, from multiple node devices, indications of which node devices among the multiple node devices are available node devices that are each able to perform a processing task with at least one data set portion of the one or more data set portions; and in response to an indication within the metadata or the map data that the data set comprises partitioned data wherein the data within the data set is organized into multiple partitions that are each distributable to a single node device, and each map entry corresponds to a single data block: determine a first quantity of the available node devices based on the indications of which node devices are available node devices; retrieve a second quantity of node devices last involved in storage of the data set within the data file from the metadata or the map data; compare the first and second quantities of node devices to detect a match between the first and second quantities; assign each of the available node devices one of a series of positive integer values as a designation value, wherein the series extends from an integer value of 0 to a positive integer value equal to the first quantity minus the integer value of 1; and in response to detection of a match between the first and second quantities, for each map entry of the map data: retrieve, from the map entry, a hashed identifier for one data sub-block indicated in the map entry as within the corresponding data block, and a data sub-block size for each of the data sub-blocks indicated in the map entry as within the corresponding data block, wherein: the hashed identifier is derived from a partition label of a partition of the multiple partitions; and the data sub-block comprises a data set portion of the one or more data set portions; determine a location of the corresponding data block within the data file; divide the hashed identifier by the first quantity to obtain a modulo value; compare the modulo value to the designation value assigned to each of the available node devices to identify an available node device assigned a designation value that matches the modulo value; and provide a pointer to the available node device assigned the designation value that matches the modulo value, the pointer comprising: an indication of the location of the corresponding data block; and a sum of the data sub-block sizes of all of the data sub-blocks within the corresponding data block. 2. The apparatus of claim 1 , wherein in response to the indication that the data set comprises partitioned data and in response to detection of a lack of a match between the first and second quantities, the processor component is caused to perform operations comprising: for each indication within each map entry of a data sub-block within a corresponding data block: retrieve, from the map entry, the data sub-block size and hashed identifier of the data sub-block; determine a location of the data sub-block within the data file; divide the hashed identifier by the first quantity to obtain a modulo value; compare the modulo value to the designation value assigned to each of the available node devices to identify an available node device assigned a designation value that matches the modulo value; and provide a pointer to the available node device assigned the designation value that matches the modulo value, the pointer comprising: an indication of the location of the data sub-block; and the data sub-block size. 3. The apparatus of claim 1 , wherein the processor component is caused to perform operations comprising: in response to an indication within the metadata or the map data that the data set does not comprise partitioned data, for each map entry of the map data: retrieve, from the map entry, a data block size and a data block quantity, wherein the data block quantity indicates a quantity of adjacent data blocks in the data file that correspond to the map entry; and for each data block that corresponds to the map entry: determine a location of the corresponding data block within the data file; select one of the available node devices; and provide a pointer to the selected one of the available node devices, the pointer comprising: an indication of the location of the corresponding data block; and the data block size. 4. The apparatus of claim 3 , wherein the selection of one of the available node devices comprises a round robin selection of one of the available node devices. 5. The apparatus of claim 1 , wherein the apparatus comprises one of the available node devices. 6. The apparatus of claim 5 , wherein the processor component performs a processing task with at least one data set portion retrieved from the data file as the one of the available node devices at least partially in parallel with at least one other of the available node devices. 7. The apparatus of claim 1 , wherein to retrieve the map data from the one or more storage devices, the processor component is caused to perform operations comprising: retrieve a map base from the data file; analyze the map base to determine whether at least a portion of the map data is stored within one or more map extensions within the data file; and in response to a determination that at least a portion of the map data is stored within one or more map extensions: retrieve the one or more map extensions from the data file; and retrieve at least a subset of the map entries from the one or more map extensions. 8. The apparatus of claim 7 , wherein in response to a determination that no portion of the map data is stored within one or more map extensions, the processor is caused to perform operations comprising retrieve all of the map entries from the map base. 9. The apparatus of claim 1 , wherein to receive indications of which node devices among the multiple node devices are available, the processor component is caused to perform operations comprising: recurringly receive indications of status from the multiple node devices; and recurringly update a stored indication of the availability of each node device of the multiple node devices. 10. The apparatus of claim 1 , wherein the processor component is caused to perform operations comprising provide an indication of a task to perform with the data set to the multiple node devices to enable at least a first node device of the multiple node devices to perform the task with a first data set portion of the data set and at least a second node device of the multiple node devices to perform the task with a second data set portion of the data set at least partially in parallel. 11. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause a processor component to perform operations comprising: retrieve, from one or more storage devices through a network, metadata indicative of organization of data within a data set, and map data indicative of organization of multiple data blocks within a data file maintained by the one or more storage devic
by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device · CPC title
Management of space entities, e.g. partitions, extents, pools · CPC title
Management of files · CPC title
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
Simplification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.