Distributed data warehouse

US9858326B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9858326-B2
Application numberUS-201213648812-A
CountryUS
Kind codeB2
Filing dateOct 10, 2012
Priority dateOct 10, 2012
Publication dateJan 2, 2018
Grant dateJan 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and data structures are provided for allowing data mining with improved efficiency. During processing of a usage log (or multiple logs) for an activity, such as a usage logfile of network search activity, a common fact table is generated. The common fact table allows a plurality of auxiliary data structures to be formed from the common fact table. These auxiliary data structures are designed to allow users to submit queries against the contents of the data structure in order to investigate the data. The efficiency of access of the common fact table is improved by allowing users to access auxiliary data structures other than the auxiliary data structures that are associated with a user. Optionally, the common fact table and/or the auxiliary data structures can include dimension values that correspond to both pre-identified dimension values as well as dimension values that are identified during processing of the activity logfiles.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more computer-storage media storing computer-useable instructions that, when executed by a computing device, mine data, the instructions causing the computing device to: identify definitions defining which measures and dimensions are desired for inclusion in a plurality of auxiliary data structures, an auxiliary data structure representing groupings of measures and dimensions of interest from a common fact table; aggregate the definitions for the auxiliary data structures in a centralized data location, the definition of each dimension being associated with at least one of the measures; process one or more initial data files to extract measure values of the measures and dimension keys of the dimensions using the aggregated definitions; construct the common fact table from the extracted measure values and the extracted dimension keys from the processed one or more initial data files; construct one or more dimension tables corresponding to the one or more dimensions, the one or more dimension tables being stored separately from the common fact table; create the auxiliary data structures from the common fact table, each auxiliary data structure comprising the identified definitions desired for inclusion in the auxiliary data structure, each auxiliary data structure including a different subset of the measures and the dimensions of the common fact table, a first user being associated with a first subset of the auxiliary data structures; receive a user data query from the first user, the user data query comprising a combination of a measure and a dimension; identify an auxiliary data structure from the auxiliary data structures that includes the combination of the measure and the dimension, the identified auxiliary data structure being different from the first subset of auxiliary data structures; generate a responsive result to the user data query based on the identified auxiliary data structure; and provide the generated responsive result to the first user. 2. The computer-storage media of claim 1 , wherein at least one dimension of a first auxiliary data structure of the auxiliary data structures is different from each dimension of a second auxiliary data structure of the auxiliary data structures. 3. The computer-storage media of claim 1 , wherein at least one measure of a first auxiliary data structure of the auxiliary data structures is different from each measure of a second auxiliary data structure of the auxiliary data structures. 4. The computer-storage media of claim 1 , wherein each measure of a first auxiliary data structure of the auxiliary data structures is different from each measure of a second auxiliary data structure of the auxiliary data structures. 5. The computer-storage media of claim 1 , wherein identifying the auxiliary data structure comprising identifying the auxiliary data structure based on a listing of auxiliary data structures generated from the common fact table. 6. The computer-storage media of claim 1 , wherein a plurality of the aggregated definitions comprise user-defined definitions. 7. The computer-storage media of claim 6 , wherein the auxiliary data structure formed from the common fact table comprises a measure from a first user-defined definition and a dimension from a second user-defined definition, the dimension from the second user-defined definition being different from the dimensions in the first user-defined definition. 8. The computer-storage media of claim 1 , wherein the aggregated definitions comprise one or more definitions that are associated with the first user, at least one of the measure and the dimension in the user data query being different from measures and dimensions in the one or more definitions associated with the first user. 9. The computer-storage media of claim 8 , wherein the one or more definitions are associated with the first user based on the first user being a member of a group of users associated with the one or more definitions. 10. The computer-storage media of claim 1 , wherein providing a first generated responsive result comprises displaying at least a portion of the first responsive result to the first user. 11. The computer-storage media of claim 1 , wherein the aggregated definitions comprise one or more definitions that are associated with the first user, at least one of the measure and the dimension in the user data query being different from measures and dimensions in the one or more definitions associated with the first user. 12. A computer-implemented method for mining data, comprising: identifying definitions defining which measures and dimensions are desired for inclusion in auxiliary data structures, an auxiliary data structure representing groupings of measures and dimensions of interest from a common fact table; aggregating the definitions for the auxiliary data structures, the definitions comprising a plurality of managed dimension values for at least one dimension; processing one or more initial data files to extract values for the measures and the dimensions using the aggregated definitions, the extracted values including one or more unmanaged dimension values for the at least one dimension; validating the one or more unmanaged dimension values; constructing the common fact table from the extracted values of the measures and dimension keys of the dimensions from the processed one or more initial data files; constructing one or more dimension tables corresponding to the plurality of dimensions based on the extracted values, the one or more dimension tables being stored separately from the common fact table; creating the auxiliary data structures from the common fact table, each auxiliary data structure including a different subset of the measures and the dimensions of the common fact table, the subset of the measures and the dimensions of the common fact table corresponding to the measures and dimensions of interest of the auxiliary data structure, at least one auxiliary dimension table including a dimension having validated unmanaged dimension values; receiving a user data query, the user data query comprising one or more combinations of measures and dimensions; generating a responsive result to the user data query based on at least one of the auxiliary data structures; and providing the generated responsive result. 13. The method of claim 12 , wherein the generated responsive result comprises a dimension having validated unmanaged dimension values. 14. The method of claim 12 , wherein the one or more unmanaged dimension values are validated based on a number of instances of each unmanaged dimension value being less than a first threshold value. 15. The method of claim 12 , wherein the one or more unmanaged dimension values are validated based on the one or more unmanaged dimension values being less than a second threshold value. 16. The method of claim 12 , wherein the one or more unmanaged dimension values are validated based on a plurality of combinations of first threshold values for a number of instances of a dimension value and second threshold values for a number of unmanaged dimension values. 17. The method of claim 12 , wherein validating the one or more unmanaged dimension values comprises matching a format of an unmanaged dimension value to a format validation rule. 18. The method of claim 12 , wherein the user data query is received from a first user, the aggregated definitions comprising one or more definitions that are associated with the first user, and at least one of the measures and the dimensions in the use

Assignees

Inventors

Classifications

  • G06F16/283Primary

    Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title

  • Integrating or interfacing systems involving database management systems · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9858326B2 cover?
Methods and data structures are provided for allowing data mining with improved efficiency. During processing of a usage log (or multiple logs) for an activity, such as a usage logfile of network search activity, a common fact table is generated. The common fact table allows a plurality of auxiliary data structures to be formed from the common fact table. These auxiliary data structures are des…
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/283. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).