Visualizing large data volumes utilizing initial sampling and multi-stage calculations

US10459932B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10459932-B2
Application numberUS-201414575633-A
CountryUS
Kind codeB2
Filing dateDec 18, 2014
Priority dateDec 18, 2014
Publication dateOct 29, 2019
Grant dateOct 29, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments visualize large data volumes utilizing initial sampling to reduce size of a dataset. This sampling may be random in nature. The sampled dataset may be refined (wrangled) by binning, grouping, cleansing, and/or other techniques to produce a wrangled sample dataset. A user defines useful end visualization(s) by inputting expected dimension/measures. From these visualizations of sampled data, minimal grouping sets are deduced for application to the full dataset. The user publishes/schedules the wrangled operation and grouping sets definition. Based on this, a wrangled dataset and grouping sets are produced in the big data layer. When the user accesses the visualization(s), minimal grouping sets are retrieved in the in-memory engine of the client and processed by an in-memory database engine according to the common processing plan. This produces result sets and a final set of visualizations of the full dataset, in which the user can recognize valuable data trends and/or relationships.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: a first, in-memory database engine of an interface layer comprising an in-memory database, communicating with a separate layer comprising a large volume of stored data, to receive a first dataset representing a sample of the large volume of stored data, wherein the sample is prepared from a SUM aggregation operation or a COUNT aggregation operation leveraging an existing functionality in the separate layer, wherein communicating the first dataset comprises: the first, in-memory database engine receiving the sample from the separate layer; and the first, in-memory database engine refining the sample to provide the first dataset, wherein the refining comprises binning; the first, in-memory database engine storing the first dataset in the in-memory database; the first, in-memory database engine creating from the first dataset, a multi-stage calculation plan configured to receive a minimal grouping set as input; a second engine executing a SQL operation comprising Rank, on the first dataset according to the calculation plan to produce a first result set; the second engine receiving from the separate layer, a second dataset comprising the minimal grouping set; the second engine performing a SORT SQL operation on the second dataset according to the calculation plan to produce a second result set; and the first, in-memory database engine creating a visualization from the second result set, the visualization including an icon explaining that the second result set simulates the large volume of stored data, and explaining that there is an error margin in the second result set. 2. A method as in claim 1 wherein the second dataset is further prepared from a filter operation performed in the separate layer. 3. A method as in claim 1 wherein the calculation plan is defined by desired dimensions and measures indicating a trend in the large volume of stored data. 4. A method as in claim 1 wherein the sample comprises a random sample. 5. A method as in claim 1 wherein the second dataset is produced by refining performed in the separate layer. 6. A non-transitory computer readable storage medium embodying a computer program for performing a method, said method comprising: a first, in-memory database engine of an interface layer comprising an in-memory database communicating with a separate layer comprising a large volume of stored data, to receive a first dataset representing a sample of the large volume of stored data, wherein the sample is prepared from a SUM aggregation operation or a COUNT aggregation operation leveraging an existing functionality in the separate layer, wherein communicating the first dataset comprises: the first, in-memory database engine receiving the sample from the separate layer; and the first, in-memory database engine refining the sample to provide the first dataset, wherein the refining comprises binning; the first, in-memory database engine storing the first dataset in the in-memory database; the first, in-memory database engine creating from the first dataset, a multi-stage calculation plan configured to receive a minimal grouping set as input; a second engine executing a SQL operation comprising Rank, on the first dataset according to the calculation plan to produce a first result set; the second engine receiving from the separate layer, a second dataset comprising the minimal grouping set; the second engine performing a SORT SQL operation on the second dataset according to the calculation plan to produce a second result set; and the first, in-memory database engine creating a visualization from the second result set, the visualization including an icon explaining that the second result set simulates the large volume of stored data, and explaining that there is an error margin in the second result set. 7. A non-transitory computer readable storage medium as in claim 6 wherein the calculation plan is defined by desired dimensions and measures indicating a trend in the large volume of stored data. 8. A computer system comprising: one or more processors; a software program, executable on said computer system, the software program configured to: cause a first, in-memory database engine of an interface layer comprising an in-memory database communicating with a separate layer comprising a large volume of stored data, to receive a first dataset representing a sample of the large volume of stored data, wherein the sample is prepared from a SUM aggregation operation or a COUNT aggregation operation leveraging an existing functionality in the separate layer, wherein the first dataset is produced by refining comprising binning; cause the first, in-memory database engine to create from the first dataset, a multi- stage calculation plan configured to receive a minimal grouping set as input; cause a second engine to execute a SQL operation comprising Rank, on the first dataset according to the calculation plan to produce a first result set; cause the second engine to receive from the separate layer, a second dataset comprising the minimal grouping set; cause the second engine to perform a SORT SQL operation on the second dataset according to the calculation plan to produce a second result set; and cause the first, in-memory database engine to create a visualization from the second result set, the visualization including an icon explaining that the second result set simulates the large volume of stored data, and explaining that there is an error margin in the second result set. 9. A computer system as in claim 8 wherein the second dataset is further prepared from a filter operation performed in the separate layer. 10. A computer system as in claim 8 wherein the calculation plan is defined by desired dimensions and measures indicating a trend in the large volume of stored data. 11. A computer system as in claim 8 wherein the sample comprises a random sample.

Assignees

Inventors

Classifications

  • Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title

  • Presentation of query results · CPC title

  • Query processing support for facilitating data mining operations in structured databases · CPC title

  • Browsing; Visualisation therefor (browsing or visualisation for clustering or classification G06F16/358) · CPC title

  • Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10459932B2 cover?
Embodiments visualize large data volumes utilizing initial sampling to reduce size of a dataset. This sampling may be random in nature. The sampled dataset may be refined (wrangled) by binning, grouping, cleansing, and/or other techniques to produce a wrangled sample dataset. A user defines useful end visualization(s) by inputting expected dimension/measures. From these visualizations of sample…
Who is the assignee on this patent?
Naibo Alexis, Xu Xiaohui, Le Biannic Yann, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/2462. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 29 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).