What technology area does this patent fall under?

Primary CPC classification G06F16/211. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 30 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Scalable analysis platform for semi-structured data

US10275475B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10275475-B2
Application number	US-201414210495-A
Country	US
Kind code	B2
Filing date	Mar 14, 2014
Priority date	Mar 15, 2013
Publication date	Apr 30, 2019
Grant date	Apr 30, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of operating a data analysis system includes retrieving objects from a data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. The method further includes dynamically creating a cumulative schema by, for each object of the retrieved objects: (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, (ii) creating a unified schema, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema. The method further includes exporting the data of each of the retrieved objects to a data warehouse.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of operating a data analysis system, the method comprising: retrieving objects from a data source, wherein each of the retrieved objects includes (i) data and (ii) metadata describing the data; dynamically updating a cumulative schema, wherein said dynamically updating comprises, for each object of the retrieved objects: (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, wherein, for at least one object of the objects, a structure of an inferred schema is different from another structure of another inferred schema for another object of the objects, (ii) creating a unified schema based at least on a portion of the inferred schema for the object, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema; converting the cumulative schema into a relational schema; and exporting, according to the relational schema, the data of each of the retrieved objects to a data warehouse. 2. The method of claim 1 , wherein the dynamically creating is performed during a first pass through the retrieved objects, and wherein the exporting is performed during a second pass through the retrieved objects. 3. The method of claim 1 , further comprising: storing the data of each of the retrieved objects into an index storage service, wherein the data of each of the retrieved objects is exported from the index storage service to the data warehouse. 4. The method of claim 3 , wherein the exporting includes: creating at least one intermediate file from the index storage service, wherein the at least one intermediate file has a predefined data warehouse format; and bulk loading the at least one intermediate file into the data warehouse. 5. The method of claim 4 , wherein the at least one intermediate file is created according to the relational schema. 6. The method of claim 3 , further comprising: receiving a query from a user via a graphical user interface; and responding to the query based on at least one of (i) data stored by the index storage service and (ii) results returned from the data warehouse. 7. The method of claim 6 , further comprising passing the query to the data warehouse in order to obtain the results. 8. The method of claim 6 , further comprising: displaying initial results to the user via the graphical user interface; and iteratively updating results in the graphical user interface as execution of the query continues. 9. The method of claim 1 , further comprising: receiving a query from a user via a graphical user interface; and responding to the query based on results returned from the data warehouse. 10. The method of claim 1 , further comprising: receiving a query from a user via a graphical user interface; displaying initial results of the query to the user in the graphical user interface; and iteratively updating results of the query in the graphical user interface as execution of the query continues. 11. The method of claim 10 , wherein the updating results in the graphical user interface includes updating scaling of at least one axis of at least one data chart. 12. The method of claim 1 , further comprising: displaying the cumulative schema to a user via a graphical user interface; updating the cumulative schema as additional data is retrieved from the data source; and selectively updating the graphical user interface to reflect the updated cumulative schema. 13. The method of claim 12 further comprising, in the user interface, visually distinguishing changed items in the updated cumulative schema. 14. The method of claim 1 , further comprising repeating the retrieving, the dynamically creating, and the exporting in response to new objects being available from the data source. 15. The method of claim 14 , further comprising, prior to repeating the exporting: determining whether the cumulative schema has changed since a previous exporting; and in response to determining that the cumulative schema has changed, sending at least one command to the data warehouse to update a schema of the data warehouse to reflect the changes to the cumulative schema. 16. A non-transitory computer-readable medium storing processor-executable instructions comprising: retrieving objects from a data source, wherein each of the retrieved objects includes (i) data and (ii) metadata describing the data; dynamically updating a cumulative schema, wherein said dynamically updating comprises, for each object of the retrieved objects: (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, wherein, for at least one object of the objects, a structure of an inferred schema is different from another structure of another inferred schema for another object of the objects, (ii) creating a unified schema based at least on a portion of the inferred schema for the object, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema; converting the cumulative schema into a relational schema; and exporting, according to the relational schema, the data of each of the retrieved objects to a data warehouse. 17. The computer-readable medium of claim 16 , wherein the instructions further comprise storing the data of each of the retrieved objects into an index storage service, wherein the data of each of the retrieved objects is exported from the index storage service to the data warehouse. 18. The computer-readable medium of claim 16 , wherein the instructions further comprise: displaying the cumulative schema to a user via a graphical user interface; updating the cumulative schema as additional data is retrieved from the data source; and selectively updating the graphical user interface to reflect the updated cumulative schema.

Assignees

Inventors

Classifications

G06F16/211Primary
Schema design and management · CPC title
G06F16/86Primary
Mapping to a database · CPC title
G06F16/254
Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title
G06F16/235
Update request formulation · CPC title
G06F16/22
Indexing; Data structures therefor; Storage structures · CPC title

Patent family

Related publications grouped by family.

View patent family 51532920

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10275475B2 cover?: A method of operating a data analysis system includes retrieving objects from a data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. The method further includes dynamically creating a cumulative schema by, for each object of the retrieved objects: (i) inferring a schema from the object based on the metadata of the object and inferred data types of …
Who is the assignee on this patent?: Amiato Inc, Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/211. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 30 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).