Optimized training of linear machine learning models
US-2016078361-A1 · Mar 17, 2016 · US
US10459932B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10459932-B2 |
| Application number | US-201414575633-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 18, 2014 |
| Priority date | Dec 18, 2014 |
| Publication date | Oct 29, 2019 |
| Grant date | Oct 29, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments visualize large data volumes utilizing initial sampling to reduce size of a dataset. This sampling may be random in nature. The sampled dataset may be refined (wrangled) by binning, grouping, cleansing, and/or other techniques to produce a wrangled sample dataset. A user defines useful end visualization(s) by inputting expected dimension/measures. From these visualizations of sampled data, minimal grouping sets are deduced for application to the full dataset. The user publishes/schedules the wrangled operation and grouping sets definition. Based on this, a wrangled dataset and grouping sets are produced in the big data layer. When the user accesses the visualization(s), minimal grouping sets are retrieved in the in-memory engine of the client and processed by an in-memory database engine according to the common processing plan. This produces result sets and a final set of visualizations of the full dataset, in which the user can recognize valuable data trends and/or relationships.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: a first, in-memory database engine of an interface layer comprising an in-memory database, communicating with a separate layer comprising a large volume of stored data, to receive a first dataset representing a sample of the large volume of stored data, wherein the sample is prepared from a SUM aggregation operation or a COUNT aggregation operation leveraging an existing functionality in the separate layer, wherein communicating the first dataset comprises: the first, in-memory database engine receiving the sample from the separate layer; and the first, in-memory database engine refining the sample to provide the first dataset, wherein the refining comprises binning; the first, in-memory database engine storing the first dataset in the in-memory database; the first, in-memory database engine creating from the first dataset, a multi-stage calculation plan configured to receive a minimal grouping set as input; a second engine executing a SQL operation comprising Rank, on the first dataset according to the calculation plan to produce a first result set; the second engine receiving from the separate layer, a second dataset comprising the minimal grouping set; the second engine performing a SORT SQL operation on the second dataset according to the calculation plan to produce a second result set; and the first, in-memory database engine creating a visualization from the second result set, the visualization including an icon explaining that the second result set simulates the large volume of stored data, and explaining that there is an error margin in the second result set. 2. A method as in claim 1 wherein the second dataset is further prepared from a filter operation performed in the separate layer. 3. A method as in claim 1 wherein the calculation plan is defined by desired dimensions and measures indicating a trend in the large volume of stored data. 4. A method as in claim 1 wherein the sample comprises a random sample. 5. A method as in claim 1 wherein the second dataset is produced by refining performed in the separate layer. 6. A non-transitory computer readable storage medium embodying a computer program for performing a method, said method comprising: a first, in-memory database engine of an interface layer comprising an in-memory database communicating with a separate layer comprising a large volume of stored data, to receive a first dataset representing a sample of the large volume of stored data, wherein the sample is prepared from a SUM aggregation operation or a COUNT aggregation operation leveraging an existing functionality in the separate layer, wherein communicating the first dataset comprises: the first, in-memory database engine receiving the sample from the separate layer; and the first, in-memory database engine refining the sample to provide the first dataset, wherein the refining comprises binning; the first, in-memory database engine storing the first dataset in the in-memory database; the first, in-memory database engine creating from the first dataset, a multi-stage calculation plan configured to receive a minimal grouping set as input; a second engine executing a SQL operation comprising Rank, on the first dataset according to the calculation plan to produce a first result set; the second engine receiving from the separate layer, a second dataset comprising the minimal grouping set; the second engine performing a SORT SQL operation on the second dataset according to the calculation plan to produce a second result set; and the first, in-memory database engine creating a visualization from the second result set, the visualization including an icon explaining that the second result set simulates the large volume of stored data, and explaining that there is an error margin in the second result set. 7. A non-transitory computer readable storage medium as in claim 6 wherein the calculation plan is defined by desired dimensions and measures indicating a trend in the large volume of stored data. 8. A computer system comprising: one or more processors; a software program, executable on said computer system, the software program configured to: cause a first, in-memory database engine of an interface layer comprising an in-memory database communicating with a separate layer comprising a large volume of stored data, to receive a first dataset representing a sample of the large volume of stored data, wherein the sample is prepared from a SUM aggregation operation or a COUNT aggregation operation leveraging an existing functionality in the separate layer, wherein the first dataset is produced by refining comprising binning; cause the first, in-memory database engine to create from the first dataset, a multi- stage calculation plan configured to receive a minimal grouping set as input; cause a second engine to execute a SQL operation comprising Rank, on the first dataset according to the calculation plan to produce a first result set; cause the second engine to receive from the separate layer, a second dataset comprising the minimal grouping set; cause the second engine to perform a SORT SQL operation on the second dataset according to the calculation plan to produce a second result set; and cause the first, in-memory database engine to create a visualization from the second result set, the visualization including an icon explaining that the second result set simulates the large volume of stored data, and explaining that there is an error margin in the second result set. 9. A computer system as in claim 8 wherein the second dataset is further prepared from a filter operation performed in the separate layer. 10. A computer system as in claim 8 wherein the calculation plan is defined by desired dimensions and measures indicating a trend in the large volume of stored data. 11. A computer system as in claim 8 wherein the sample comprises a random sample.
Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title
Presentation of query results · CPC title
Query processing support for facilitating data mining operations in structured databases · CPC title
Browsing; Visualisation therefor (browsing or visualisation for clustering or classification G06F16/358) · CPC title
Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.