Distributed heterogeneous system for data warehouse management

US10133797B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10133797-B1
Application numberUS-201313968934-A
CountryUS
Kind codeB1
Filing dateAug 16, 2013
Priority dateAug 16, 2013
Publication dateNov 20, 2018
Grant dateNov 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer-readable storage media for implementing data warehouse management are disclosed. A data warehouse management system includes a job request scheduler configured to generate a workflow for data warehouse operations. The data warehouse management system includes a request manager configured to retrieve job requests for the data warehouse operations from the job request scheduler. The data warehouse management system includes a priority queue service configured to place each of the job requests into a respective priority queue based on their priorities. The data warehouse management system includes a worker service configured to retrieve the job requests from the priority queues in a priority order and to cause execution of the data warehouse operations. The data warehouse management system includes a data warehouse service including one or more database clusters configured to store data relating to the data warehouse operations.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: one or more computing devices configured to implement a data warehouse management system, wherein the data warehouse management system comprises: a job request scheduler configured to generate a workflow for a plurality of data warehouse operations, wherein the job request scheduler is further configured to: determine dependencies between jobs for job requests corresponding to the plurality of data warehouse operations, and schedule job requests for the workflow according to the dependencies such that a dependent job is scheduled to execute after another job on which it is dependent; a request manager configured to: retrieve the scheduled job requests for the plurality of data warehouse operations from the job request scheduler, determine priorities for the retrieved job requests, and select, for each job request of the retrieved job requests, a respective priority queue from among a plurality of priority queues, wherein at least two priority queues of the plurality of priority queues represent different priority levels, and wherein the selection of the respective priority queue is based at least in part on the determined priorities and the different priority levels; the priority queue service configured to: provide the plurality of priority queues, and place each job request from the request manager into the selected respective priority queue of the plurality of priority queues; a worker service configured to: retrieve the job requests for the plurality of data warehouse operations from the plurality of priority queues according to the different priority levels, select, for each job request, a respective database cluster from among a plurality of database clusters in a data warehouse service, wherein the respective database cluster is selected based at least in part on performance data of the plurality of database clusters, and cause execution of individual ones of the plurality of data warehouse operations for each of the job request on the selected respective database cluster of the plurality of database clusters; and the data warehouse service comprising the plurality of database clusters, wherein the plurality of database clusters are configured to store data from the execution of the plurality of data warehouse operations. 2. The system as recited in claim 1 , wherein the data warehouse management system further comprises: a metadata repository configured to store metadata relating to the plurality of data warehouse operations. 3. The system as recited in claim 1 , wherein the worker service is configured to retrieve the job requests for the plurality of data warehouse operations from the plurality of priority queues in the priority order from highest priority to lowest priority. 4. The system as recited in claim 1 , wherein the job request scheduler is further configured to receive one or more results of the plurality of data warehouse operations, and wherein the job request scheduler is configured to schedule one or more additional data warehouse operations in response to the one or more results. 5. A computer-implemented method, comprising: generating a workflow representing requests for data warehouse operations and determining dependencies between the requests, wherein the data warehouse operations include a first data warehouse operation and a second data warehouse operation; schedule the requests according to the dependencies such that a dependent request is scheduled to execute after another request on which it is dependent; determining a respective priority of each of the scheduled requests for data warehouse operations in the workflow; enqueueing each of the scheduled requests for data warehouse operations in a respective request queue of a plurality of request queues according to the respective priority, wherein the plurality of request queues represent different priority levels and comprise a higher priority request queue and a lower priority request queue, wherein the request for the first data warehouse operation is enqueued in the higher priority request queue, wherein the request for the second data warehouse operation is enqueued in the lower priority request queue, wherein the request for the first data warehouse operation and the request for the second data warehouse operation are enqueued according to the respective priorities previously determined and the different priority levels; retrieving the requests from the plurality of request queues according to the respective priorities; select, for each request, a respective database cluster from among a plurality of database clusters based at least in part on their respective performance data; and cause execution of individual ones of the plurality of data warehouse operations for each of the requests on the respective database cluster. 6. The method as recited in claim 5 , wherein the retrieving and executing comprise: retrieving the request for the first data warehouse operation from the higher priority request queue; after retrieving the request for the first data warehouse operation from the higher priority request queue, executing the first data warehouse operation; and after retrieving the request for the first data warehouse operation from the higher priority request queue, determining that the higher priority request queue is empty; after determining that the higher priority request queue is empty, retrieving the request for the second data warehouse operation from the lower priority request queue; and after retrieving the request for the second data warehouse operation from the lower priority request queue, executing the second data warehouse operation. 7. The method as recited in claim 6 , wherein the first data warehouse operation is executed using a data warehouse accessible to a client over a network, and wherein the second data warehouse operation is executed using a data warehouse hosted by the client. 8. The method as recited in claim 6 , wherein the request for the first data warehouse operation is assigned to a worker host accessible to a client over a network, and wherein the request for the second data warehouse operation is assigned to a worker host hosted by the client. 9. The method as recited in claim 5 , further comprising: receiving a result of the first data warehouse operation; and scheduling a request for a third data warehouse operation in response to receiving the result of the first data warehouse operation, wherein the third data warehouse operation is dependent on completion of the first data warehouse operation. 10. The method as recited in claim 5 , wherein the first data warehouse operation comprises an extraction of data from a data source, a transformation of the extracted data, and a loading of the transformed data into a data warehouse. 11. A non-transitory, computer-readable storage medium storing program instructions computer-executable to implement a workflow service that provides a service interface for a plurality of clients, wherein the workflow service is configured to perform: receiving, via the service interface, a request for a data warehouse operation; receiving, via the service interface, an identification of a data source for the data warehouse operation; receiving, via the service interface, an identification of a worker included within a worker fleet that includes different types of workers for different types of data warehouses, wherein the worker is configured to perform the data warehouse operation; receiving, via the service interface, an identification of a target data warehouse for the data warehouse operation; and configuring a plurality of services for queuing and executio

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • Databases characterised by their database models, e.g. relational or object models · CPC title

  • G06F16/25Primary

    Integrating or interfacing systems involving database management systems · CPC title

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10133797B1 cover?
Methods, systems, and computer-readable storage media for implementing data warehouse management are disclosed. A data warehouse management system includes a job request scheduler configured to generate a workflow for data warehouse operations. The data warehouse management system includes a request manager configured to retrieve job requests for the data warehouse operations from the job reque…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30563. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).