What technology area does this patent fall under?

Primary CPC classification G06F17/30466. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 21 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Data placement control for distributed computing environment

US10055458B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10055458-B2
Application number	US-201514813668-A
Country	US
Kind code	B2
Filing date	Jul 30, 2015
Priority date	Jul 30, 2015
Publication date	Aug 21, 2018
Grant date	Aug 21, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes dividing a dataset into partitions by hashing a specified key, selecting a set of distributed file system nodes as a primary node group for storage of the partitions, and causing a primary copy of the partitions to be stored on the primary node group by a distributed storage system file server such that the location of each partition is known by hashing of the specified key.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: dividing a first dataset including a first plurality of elements into partitions by hashing a key for each of the first plurality of elements to generate a hash value for the key of the element, wherein each element of the first plurality of elements is stored in a partition corresponding to the hash value for the key of the element; selecting a set of distributed storage system nodes as a first primary node group for storage of the partitions of the first dataset; causing a primary copy of the partitions of the first dataset to be stored on the first primary node group by a distributed storage system file server based on the respective hash values such that the storage system node on which each element of the first plurality of elements is stored is associated with the hash value for the key of the element; dividing at least one additional dataset into partitions by hashing a key for each element of the at least one additional dataset to generate a hash value for the key of the element, wherein the datasets comprise tables; and causing a primary copy of the partitions of each additional dataset to be stored on corresponding primary node groups by the distributed storage system file server as a function of hash values such that the storage system node of each partition in the corresponding primary node group is known by hashing of the key, wherein a number of partitions that store each of the tables is a power of two, and wherein at least one partition is striped across multiple nodes of the primary node group. 2. The method of claim 1 and further comprising optimizing a query utilizing known locations of partitions. 3. The method of claim 1 and further comprising optimizing a query that includes a join over the tables by performing data shuffles between pairs of the partitions where partition groups of each pair are different or contained striped partitions. 4. The method of claim 1 and further comprising causing a replica copy of the partitions to be stored on a replica node group by the distributed storage system file server wherein each node in the replica node group is not one to one mapped with nodes in the primary node group. 5. The method of claim 1 and further comprising causing a replica copy of the partitions to be stored in accordance with heuristics of the distributed storage system file server. 6. The method of claim 1 wherein the distributed storage system comprises a Hadoop system and wherein causing a primary copy of the partitions to be stored on the primary node group comprises identifying nodes in a partition group associated with a table to a Hadoop file server using a pluggable BlockPlacementPolicy. 7. A system comprising: a memory having instructions stored thereon; and a processor in communication with the memory, wherein the processor executes the instructions to: divide each of multiple datasets including a plurality of elements into partitions by hashing a key for each of the plurality of elements to generate a hash value for the key of the element, wherein each element of the plurality of elements is stored in a partition corresponding to the hash value for the key of the element; select sets of distributed storage system nodes as primary node groups for storage of the partitions; and cause a primary copy of the partitions of each dataset to be stored on corresponding primary node groups by a distributed storage system file server based on the respective hash values such that the storage system node on which each element of the plurality of elements is stored is associated with the hash value for the key of the element, wherein a number of partitions that store each of the multiple datasets is a power of two, and wherein at least one partition is striped across multiple nodes. 8. The system of claim 7 wherein the processor executes the instructions to perform as a query coordinator to communicate with the primary node groups and optimize queries based on known locations of each partition. 9. The system of claim 8 wherein optimizing a query that includes a join over the multiple datasets is performed by specifying data shuffles between pairs of the partitions where partition groups of each pair are different or contained striped partitions. 10. The system of claim 7 wherein the processor executes the instructions to cause a replica copy of the partitions to be stored on a replica node group by the distributed storage system file server wherein each node in the replica node group is not one to one mapped with nodes in the primary node group. 11. The system of claim 7 wherein the processor executes the instructions to cause a replica copy of the partitions to be stored in accordance with heuristics of the distributed storage system file server. 12. The system of claim 7 wherein the distributed storage system comprises a Hadoop system and wherein causing a primary copy of the partitions to be stored on the primary node group comprises identifying nodes in a partition group associated with a file to a Hadoop file server using a pluggable BlockPlacementPolicy.

Assignees

Futurewei Technologies Inc

Inventors

Classifications

G06F17/30486
Physics · mapped topic
G06F17/30466Primary
Physics · mapped topic
G06F16/24554Primary
Unary operations; Data partitioning operations · CPC title
G06F16/24544Primary
Join order optimisation · CPC title

Patent family

Related publications grouped by family.

View patent family 57884023

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10055458B2 cover?: A method includes dividing a dataset into partitions by hashing a specified key, selecting a set of distributed file system nodes as a primary node group for storage of the partitions, and causing a primary copy of the partitions to be stored on the primary node group by a distributed storage system file server such that the location of each partition is known by hashing of the specified key.
Who is the assignee on this patent?: Futurewei Technologies Inc
What technology area does this patent fall under?: Primary CPC classification G06F17/30466. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 21 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).