Scalable analysis platform for semi-structured data

US10275475B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10275475-B2
Application numberUS-201414210495-A
CountryUS
Kind codeB2
Filing dateMar 14, 2014
Priority dateMar 15, 2013
Publication dateApr 30, 2019
Grant dateApr 30, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of operating a data analysis system includes retrieving objects from a data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. The method further includes dynamically creating a cumulative schema by, for each object of the retrieved objects: (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, (ii) creating a unified schema, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema. The method further includes exporting the data of each of the retrieved objects to a data warehouse.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of operating a data analysis system, the method comprising: retrieving objects from a data source, wherein each of the retrieved objects includes (i) data and (ii) metadata describing the data; dynamically updating a cumulative schema, wherein said dynamically updating comprises, for each object of the retrieved objects: (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, wherein, for at least one object of the objects, a structure of an inferred schema is different from another structure of another inferred schema for another object of the objects, (ii) creating a unified schema based at least on a portion of the inferred schema for the object, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema; converting the cumulative schema into a relational schema; and exporting, according to the relational schema, the data of each of the retrieved objects to a data warehouse. 2. The method of claim 1 , wherein the dynamically creating is performed during a first pass through the retrieved objects, and wherein the exporting is performed during a second pass through the retrieved objects. 3. The method of claim 1 , further comprising: storing the data of each of the retrieved objects into an index storage service, wherein the data of each of the retrieved objects is exported from the index storage service to the data warehouse. 4. The method of claim 3 , wherein the exporting includes: creating at least one intermediate file from the index storage service, wherein the at least one intermediate file has a predefined data warehouse format; and bulk loading the at least one intermediate file into the data warehouse. 5. The method of claim 4 , wherein the at least one intermediate file is created according to the relational schema. 6. The method of claim 3 , further comprising: receiving a query from a user via a graphical user interface; and responding to the query based on at least one of (i) data stored by the index storage service and (ii) results returned from the data warehouse. 7. The method of claim 6 , further comprising passing the query to the data warehouse in order to obtain the results. 8. The method of claim 6 , further comprising: displaying initial results to the user via the graphical user interface; and iteratively updating results in the graphical user interface as execution of the query continues. 9. The method of claim 1 , further comprising: receiving a query from a user via a graphical user interface; and responding to the query based on results returned from the data warehouse. 10. The method of claim 1 , further comprising: receiving a query from a user via a graphical user interface; displaying initial results of the query to the user in the graphical user interface; and iteratively updating results of the query in the graphical user interface as execution of the query continues. 11. The method of claim 10 , wherein the updating results in the graphical user interface includes updating scaling of at least one axis of at least one data chart. 12. The method of claim 1 , further comprising: displaying the cumulative schema to a user via a graphical user interface; updating the cumulative schema as additional data is retrieved from the data source; and selectively updating the graphical user interface to reflect the updated cumulative schema. 13. The method of claim 12 further comprising, in the user interface, visually distinguishing changed items in the updated cumulative schema. 14. The method of claim 1 , further comprising repeating the retrieving, the dynamically creating, and the exporting in response to new objects being available from the data source. 15. The method of claim 14 , further comprising, prior to repeating the exporting: determining whether the cumulative schema has changed since a previous exporting; and in response to determining that the cumulative schema has changed, sending at least one command to the data warehouse to update a schema of the data warehouse to reflect the changes to the cumulative schema. 16. A non-transitory computer-readable medium storing processor-executable instructions comprising: retrieving objects from a data source, wherein each of the retrieved objects includes (i) data and (ii) metadata describing the data; dynamically updating a cumulative schema, wherein said dynamically updating comprises, for each object of the retrieved objects: (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, wherein, for at least one object of the objects, a structure of an inferred schema is different from another structure of another inferred schema for another object of the objects, (ii) creating a unified schema based at least on a portion of the inferred schema for the object, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema; converting the cumulative schema into a relational schema; and exporting, according to the relational schema, the data of each of the retrieved objects to a data warehouse. 17. The computer-readable medium of claim 16 , wherein the instructions further comprise storing the data of each of the retrieved objects into an index storage service, wherein the data of each of the retrieved objects is exported from the index storage service to the data warehouse. 18. The computer-readable medium of claim 16 , wherein the instructions further comprise: displaying the cumulative schema to a user via a graphical user interface; updating the cumulative schema as additional data is retrieved from the data source; and selectively updating the graphical user interface to reflect the updated cumulative schema.

Assignees

Inventors

Classifications

  • G06F16/211Primary

    Schema design and management · CPC title

  • G06F16/86Primary

    Mapping to a database · CPC title

  • Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • Update request formulation · CPC title

  • Indexing; Data structures therefor; Storage structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10275475B2 cover?
A method of operating a data analysis system includes retrieving objects from a data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. The method further includes dynamically creating a cumulative schema by, for each object of the retrieved objects: (i) inferring a schema from the object based on the metadata of the object and inferred data types of …
Who is the assignee on this patent?
Amiato Inc, Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/211. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 30 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).