Fetching Query Results Through Cloud Object Stores
US-2024394271-A1 · Nov 28, 2024 · US
US9426219B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9426219-B1 |
| Application number | US-201314098912-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 6, 2013 |
| Priority date | Dec 6, 2013 |
| Publication date | Aug 23, 2016 |
| Grant date | Aug 23, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Data may be partitioned and uploaded in multiple parts in parallel to a data warehouse cluster in a data warehouse system. Data to be uploaded may be identified, and the partitions for the data may be determined at the storage client. The data may then be partitioned at the storage client. In various embodiments, no local partitions of the data may be maintained in persistent storage at the storage client. The partitioned data may then be sent in parallel to a data warehouse staging area in another network-based service that is implemented as part of a same network-based service implementing the data warehouse system. A request may then be sent to the data warehouse cluster to perform a multi-part upload from the staging area to the data warehouse cluster.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a plurality of compute nodes implementing a network-based services platform; at least some compute nodes of the plurality of compute nodes configured to implement a data warehouse cluster as part of a data warehouse service provided by the network-based services platform, wherein the data warehouse cluster provides data storage among the at least some compute nodes according to a data distribution scheme; another one or more compute nodes of the plurality of compute nodes configured to implement an upload staging area for the data warehouse cluster, wherein the upload staging area is accessible to the data warehouse cluster as part of the network-based services platform; at least one other compute node of the plurality of compute nodes configured to provide a dynamic, multi-part upload module from the network-based services platform to a storage client of the data warehouse cluster; the dynamic, multi-part upload module, configured to: determine, at the storage client, a plurality of partitions for data maintained at the storage client to be uploaded to the data warehouse cluster according to the data distribution scheme for the at least some compute nodes in the data warehouse cluster; dynamically partition the data at the storage client according to the determined plurality of partitions; send the dynamically partitioned data from the storage client to the upload staging area for the data warehouse cluster; and subsequent to said sending the partitioned data, send, from the storage client, an upload request to the data warehouse cluster in order to upload the plurality of partitions of the data from the upload staging area to respective ones of the at least some compute nodes in the data warehouse cluster; at least some compute nodes of the plurality of compute nodes of the network-based services platform, in response to receipt of the upload request, upload respective partitions of the plurality of partitions of the data in parallel from the upload staging area to respective ones of the at least some compute nodes in the data warehouse cluster. 2. The system of claim 1 , wherein to determine the plurality of partitions for the data maintained at the storage client to be uploaded to the data warehouse cluster according to the data distribution scheme for the at least some compute nodes in the data warehouse cluster, the dynamic, multi-part upload module is configured to: based, at least in part, on the data distribution scheme, identify a number of partitions for the data; and evaluate the data in order to determine partition boundaries corresponding to the number of partitions such that data objects maintained in each of the plurality of partitions remain intact. 3. The system of claim 1 , wherein said dynamically partitioning the data at the storage client according to the determined plurality of partitions is performed in system memory of the storage client. 4. The system of claim 1 , wherein the at least one other compute node is further configured to: receive a multi-part upload request for the data from the storage client; wherein said providing the dynamic, multi-part upload module to the storage client of the data warehouse cluster is performed in response to receiving the multi-part upload request. 5. The system of claim 1 , wherein the dynamic, multi-part upload module is provided to the storage client asynchronously. 6. A method, comprising: performing, by one or more computing devices implementing a storage client: identifying, at the storage client, data to be uploaded to a data warehouse cluster from the storage client, wherein the data warehouse cluster provides data storage among a plurality of compute nodes according to a data distribution scheme; determining, at the storage client, a plurality of partitions for the data according to the data distribution scheme for the plurality of compute nodes in the data warehouse cluster; partitioning, at the storage client, the data according to the determined plurality of partitions; sending, from the storage client, the partitioned data in parallel to an upload staging area that is accessible to the plurality of compute nodes of the data warehouse cluster; and subsequent to said sending the partitioned data, sending, from the storage client, an upload request to the data warehouse cluster for a multi-part upload of the partitioned data from the upload staging area to respective ones of the plurality of compute nodes in the data warehouse cluster. 7. The method of claim 6 , wherein said determining the plurality of partitions for the data according to the data distribution scheme for the plurality of compute nodes in the data warehouse cluster comprises evaluating the data to be uploaded in order to determine partition boundaries for the plurality of partitions such that data objects maintained in each of the plurality of partitions remain intact. 8. The method of claim 6 , wherein said partitioning the data according to the determined plurality of partitions comprises generating a compressed version of each of the plurality of partitions. 9. The method of claim 6 , wherein said partitioning the data according to the determined plurality of partitions comprises generating an encrypted version of each of the plurality of partitions. 10. The method of claim 6 , wherein said partitioning the data according to the determined plurality of partitions is performed in system memory at the storage client such that the plurality of partitions of the data are generated without creating local copies of the plurality of partitions in persistent storage at the storage client. 11. The method of claim 6 , wherein the data warehouse cluster is one of a plurality of data warehouse clusters that together implement a data warehouse service; wherein the method further comprises: performing, by another one or more computing devices implementing a control interface for the data warehouse service: receiving a multi-part upload request for the data from the storage client, wherein the multi-part upload request specifies the data warehouse cluster out of the plurality of data warehouse clusters; and in response to receiving the multi-part upload request, sending a dynamic, multi-part upload module to the storage client that is configured to perform said identifying the data to be uploaded, said determining the plurality of partitions for the data, said partitioning the data, said sending the partitioned data, and said sending the upload request. 12. The method of claim 6 , wherein said sending the partitioned data in parallel to the upload staging area comprises: sending the partitioned data to one or more additional network-based services that do not implement the upload staging area for further processing of the partitioned data, and wherein the partitioned data is further sent from the one or more additional network-based services to the upload staging area accessible to the plurality of compute nodes of the data warehouse cluster. 13. The method of claim 6 , wherein one of the plurality of compute nodes is a leader node configured to store data received at the leader node among the plurality of compute nodes in the data warehouse cluster according to the data distribution scheme, and wherein the method further comprises: identifying other data to be uploaded to the data warehouse cluster; and sending the other data to the leader node of the data warehouse cluster for storage among the plurality of compute nodes. 14. A non-transitory, computer-readable storage medium, comprising program instructions that implement a client
Physics · mapped topic
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Electricity · mapped topic
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.