Distributed data store for hierarchical data

US9633073B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9633073-B1
Application numberUS-201414223760-A
CountryUS
Kind codeB1
Filing dateMar 24, 2014
Priority dateMar 24, 2014
Publication dateApr 25, 2017
Grant dateApr 25, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing resource service provider may store user data in a distributed data storage system. The distributed data storage system may contain one or more storage nodes configured to store hierarchical data in one or more data stores such as a column data store. Data in the data stores may be compressed or otherwise encoded, by a storage optimizer, in order to reduce that redundancy in the hierarchical data stored in the one or more data stores. Responses to user queries may be fulfilled based at least in part on data stored in the one or more data stores. A query processor may scan multiple different data stores across various storage nodes in order to obtain items responsive to the user query.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for query processing, comprising: under the control of one or more computer systems configured with executable instructions, receiving user input data; storing the input data in one or more storage nodes by at least: storing a first portion of the user input data in an append data store as a result of having insufficient computing capacity to perform one or more optimization operations; extracting a second portion of the user input data from the append data store and performing the one or more optimization operations on the second portion of the input data to create optimized data as a result of regaining sufficient computing capacity to perform the one or more optimization operations; and storing the optimized data in an optimized data store; receiving a user query and a completion threshold based at least in part on information generated by the user; performing a search for one or more records responsive to the user query, using a filter, on the optimized data store; obtaining sufficient records to satisfy the completion threshold by at least: if the search on the optimized data store obtained sufficient records to satisfy the completion threshold, returning a result of the search in response to the query; and if the search on the optimized data store did not obtain sufficient records to satisfy the completion threshold, performing a second search for the one or more records response to the user query, using the filter, on the append data store; and providing the one or more records. 2. The computer-implemented method of claim 1 , wherein storing the first portion of the input data in the append data store further includes encoding the first portion of the input data to eliminate at least some redundancy in the first portion of the input data. 3. The computer-implemented method of claim 2 , wherein extracting the second portion of user input data and performing the one or more optimization operations on the second portion of user input data further includes extracting the second portion of user input data, wherein at least a portion of the second portion of user input data includes the encoded first portion of the input data, and performing the one or more optimization operations based at least in part on current computing capacity of a storage optimizer, the storage optimizer configured to extract data and perform optimization operations. 4. The computer-implemented method of claim 1 , wherein obtaining sufficient records to satisfy the completion threshold further includes transmitting one or more duplicate user queries to one or more remote storage nodes. 5. A system, comprising: one or more processors; one or more storage nodes comprising, a first storage node, the first storage node comprising a first data store and a second data store, the first data store comprising data items retrieved from and removed from the second data store and processed for storage in the first data store, the second data store comprising preprocessed data items stored in the second data store as a result of the system having insufficient computing capacity at a time of receiving the preprocessed data items to processed the data items; and memory with instructions that, when executed by the one or more processors, cause the system to: generate a filter based at least in part on one or more query terms included in a received user query; perform a first search for one or more data items responsive to the user query, using the filter, on the first data store; obtain sufficient data items to satisfy a threshold by at least: if the first search on the first data store obtained sufficient processed data items to satisfy the threshold, returning a result of the search in response to the query; and if the search on the first data store did not obtain sufficient processed data items to satisfy the threshold, performing a second search for the one or more data items in response to the user query, using the filter, on preprocessed data items contained in the second data store; and provide the one or more data items. 6. The system of claim 5 , wherein the system further comprises a storage optimizer configured to extract data from the first data store, perform one or more packing operations and store packed data in the second data store. 7. The system of claim 5 , wherein the memory further includes instructions that, when executed by the one or more processors, cause the system to: partition the user query into one or more partition queries; and transmit the one or more partition queries to one or more remote storage nodes of the one or more storage nodes. 8. The system of claim 5 , wherein the first data store is encoded based at least in part on a negative look behind algorithm where the encoded data indicates a predecessor value sharing at least one redundant data value. 9. The system of claim 5 , wherein the first data store is encoded based at least in part on a repetition counter where the encoded data includes a repetition value indicating a portion of a previous record to be repeated. 10. The system of claim 5 , wherein the first data store is encoded based at least in part on a reuse counter where the encoded data include a reuse value indicating a number of redundant data values from a previous record. 11. The system of claim 5 , wherein the memory further includes instructions that, when executed by the one or more processors cause the system to perform the second search for the one or more data items based at least in part on a run time threshold associated with the first search. 12. The system of claim 11 , wherein the memory further includes instructions that, when executed by the one or more processors cause the system to receive from a user a work factor configured to reduce the run time threshold associated with the first search. 13. A non-transitory computer-readable storage medium having collectively stored thereon executable instructions that, when executed by one or more processors of a computer system, cause the computer system to at least: receive a query and a value; obtain responses to the query satisfying the value by at least: performing a first search on a first data store, of a storage node, for one or more records responsive to the query, the first data store containing encoded data such that at least a portion of redundancy in data is removed to create the encoded data, the encoded data created as a result of the storage node having sufficient capacity to encode data obtained from a second data store of the storage node; and if the search does not return sufficient responses to satisfy the value, performing a second search on the second data store, of the storage node, for the one or more records responsive to the query the second data store contacting data; and return the one or more records. 14. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions that cause the computer system to obtain responses to the query satisfying the value further include instructions that cause the computer system to, if the second search does not return sufficient responses to satisfy the value, transmit one or more duplicate queries to one or more remote storage nodes. 15. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions further comprise instructions that, when executed by the one or more processors, cause the computer system to: obtain a number records responsive to the query, of the one or more records responsive to the query, that exceeds the value; and terminate, i

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9633073B1 cover?
A computing resource service provider may store user data in a distributed data storage system. The distributed data storage system may contain one or more storage nodes configured to store hierarchical data in one or more data stores such as a column data store. Data in the data stores may be compressed or otherwise encoded, by a storage optimizer, in order to reduce that redundancy in the hie…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).