Data distribution/retrieval using multi-dimensional index

US9292559B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9292559-B2
Application numberUS-201313743988-A
CountryUS
Kind codeB2
Filing dateJan 17, 2013
Priority dateJan 19, 2012
Publication dateMar 22, 2016
Grant dateMar 22, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A distributed data storage system uses a data distribution and location algorithm based on distance functions and hyper-spheres in a multi-dimensional space. The distributed data storage system uses the algorithm to maintain, over time, a balanced distribution across a number of computers interconnected by a network of a varying set of data items. Each data item includes one or more key fields. The system also includes an efficient partial-match and exact-match search across a whole set of data items using as search criteria the values of any or all of the sought data item's key field(s).

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of data distribution in a distributed data storage system, the distributed data storage system storing a set of data items across a plurality of physically distinct computers, each physically distinct computer managing a subset of the data items, the method comprising: defining at least one metric space associated with the distributed data storage system; defining within the one metric space a number of hyper-spheres, each of the defined hyper-spheres having a center and radius, data points representing corresponding data items, and an association with one or more of the physically distinct computers; calculating a distance from a specified data point, corresponding to a specified data item to be stored in the distributed data storage system, to a center of each of the defined hyper-spheres; and selecting at least one physically distinct computer for storage of the specified data item based on which of the hyper-sphere centers is closest to the specified data point; wherein each data item includes one or more key fields, each key field representing a defining field of the data item within the set of data items; wherein every value of a key field is mapped to a real number; and wherein the set of real numbers to which the key fields of a data item are mapped comprises a data point in the metric space. 2. The method of claim 1 , further comprising: calculating a distance from a specified data point, corresponding to a specified data item to be retrieved from the distributed data storage system, to a center of each of the defined hyper-spheres; and selecting at least one physically distinct computer for data retrieval of the specified data item based on which of the hyper-sphere centers is closest to the specified data point and the association of the hyper-sphere to such physically distinct computer. 3. The method of claim 1 : wherein each subset includes one or more key fields, each key field representing a defining field of one item within the set of data items; the method further comprising retrieving data items when only a subset of the key field values defining the data items are known, by matching the subset of the whole set of data items whose mapped points are located on a hyper-plane that intersects one or more of the defined hyper-spheres, the hyper-plane defined by the coordinates to which the known key field values map. 4. The method of claim 3 , wherein a data retrieval request is addressed simultaneously to each physically distinct computer associated with each of the hyper-spheres intersected by the hyper-plane. 5. The method of claim 1 , wherein for numeric data, the mapping of numeric key fields is based on an identity function f(x)=x. 6. The method of claim 1 , wherein, for character data, the mapping of character strings is based on Levenshtein distance. 7. The method of claim 1 , wherein, for numeric data, the mapping of numeric key fields is based on a compressing function f(x) such that f(x min )>>x min and f(x max )>>x max . 8. The method of claim 1 , further comprising using a shared state containing the hyper-sphere centers and radii, and an identifier of every computer that is associated with every hyper-sphere of the defined at least one metric space and associated hyper- spheres, on reception of a data retrieval request from an external entity. 9. The method of claim 1 , wherein the defined hyper-sphere centers are fixed. 10. The method of claim 1 , wherein the defined hyper-sphere centers are dynamically updated based on historical data. 11. The method of claim 1 , further comprising dynamically splitting or joining defined hyper-spheres based on clustering of data points. 12. The method of claim 11 , wherein splitting defined hyper-spheres comprises reallocating data points within a portion of a first hyper-sphere to a neighboring second hyper-sphere. 13. The method of claim 12 , wherein reallocating data points includes transferring the corresponding data items from a first physically distinct computer associated with the first hyper-sphere to a second physically distinct computer associated with the neighboring second hyper-sphere. 14. The method of claim 11 , wherein joining defined hyper-spheres comprises joining two defined hyper-spheres into a single defined hyper-sphere containing all data points of the two joined hyper-spheres and retaining all corresponding data items in a physically distinct computer associated with the single defined hyper-sphere. 15. The method of claim 1 , further comprising: redistributing the set of data items across the distributed data storage system when one or more physically distinct computers are added to or removed from the plurality of physically distinct computers by creating or destroying respective hyper-spheres associated with each of the added or removed physically distinct computers followed by transferring of corresponding data items. 16. The method of claim 1 , wherein the distributed data storage system comprises a cloud based storage system. 17. The method of claim 1 , wherein the distributed data storage system comprises a telecommunications storage system and the set of data items includes telecommunications-related data. 18. A distributed data storage system for storing a set of data items across a plurality of physically distinct computers interconnected by a network, each physically distinct computer managing a subset of the data items, each computer of the plurality of computers configured as a query processor and/or data storage device, the query processor and/or data storage device comprising a data manager implemented using one or more of the physically distinct computers and configured to: define at least one metric space associated with the distributed data storage system; define within the at least one metric space a number of hyper-spheres, each of the hyper-spheres having a center and radius, data points representing corresponding data items, and an association with one or more of the physically distinct computers; and determine a location to store a specified data item in the distributed data storage system by: calculating a distance of the corresponding specified data point to a center of each of the defined hyper-spheres; and selecting at least one physically distinct computer for storage of the specified data item based on which associated hyper-sphere's center is closest to the corresponding specified data point; wherein each data item includes one or more key fields, each key field representing a defining field of the data item within the set of data items; wherein every value of a key field is mapped to a real number; and wherein the set of real numbers to which the key fields of a data item are mapped comprises a data point in the metric space. 19. The distributed data storage system of claim 18 , wherein the data manager is further configured to determine a location to retrieve a specified data item in the distributed data storage system by: calculating a distance of the corresponding specified data point to a center of each of the defined hyper-spheres; and selecting at least one physically distinct computer for retrieval of the specified data item based on which associated hyper-sphere's center is closest to the corresponding specified data point. 20. The distributed data storage system of claim 18 , wherein the data manager is further configured to redistribute data points within the hyper-spheres and corresponding data items across associated physically

Assignees

Inventors

Classifications

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • G06F16/27Primary

    Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Multidimensional index structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9292559B2 cover?
A distributed data storage system uses a data distribution and location algorithm based on distance functions and hyper-spheres in a multi-dimensional space. The distributed data storage system uses the algorithm to maintain, over time, a balanced distribution across a number of computers interconnected by a network of a varying set of data items. Each data item includes one or more key fields.…
Who is the assignee on this patent?
Ericsson Telefon Ab L M
What technology area does this patent fall under?
Primary CPC classification G06F17/30333. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).