Automatic insights for multi-dimensional data

US10635667B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10635667-B2
Application numberUS-201515739132-A
CountryUS
Kind codeB2
Filing dateJun 29, 2015
Priority dateJun 29, 2015
Publication dateApr 28, 2020
Grant dateApr 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Automatically identifying insights from a dataset and presenting the insights graphically and in natural language text ranked by importance is provided. Different data types and structures in the dataset are automatic recognized and matched with a corresponding specific analysis type. The data is analyzed according to the determined corresponding analysis types, and insights from the analysis are automatically identified. The insights within a given insight type and between insight types are ranked and presented in order of importance. Insights include those having multiple pipelined attributes and other insights include multiple insights identified as having some relationship for the included insights.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: identifying, at a computing device, one or more subspaces of multi-dimensional data; identifying, from the multi-dimensional data, an attribute of the one or more subspaces; determining a set of insight candidates to be evaluated, wherein an insight candidate is characterized by an insight type; identifying from the set of insight candidates, a plurality of insights for one or more subspaces, wherein the identifying comprises determining that the attribute of a first subspace is different than the attribute of a second subspace by a threshold wherein the set of insight candidates comprises insights of two or more different types; ordering the identified plurality of insights; and identifying for presentation at least a portion of the plurality of insights based at least on the ordering. 2. The method of claim 1 , wherein the two or more different types of insight are included in at least one of the following categories: numerical insights including a value; time-based insights including a time component; and compound insights including insights based on two or more subspaces. 3. The method of claim 1 , wherein the ordering comprises: scoring at least a portion of the plurality of insights; and ranking at least a portion of the scored plurality of insights, and wherein the at least the portion of the plurality of insights identified for presentation comprises a predetermined number of top-ranked insights of the plurality of insights. 4. The method of claim 3 , wherein the scoring comprises: determining an impact value for an individual one of the plurality of insights based at least on a importance rating of the first one of the one or more subspaces, the importance rating is based on a relevance of the subspace associated with the insight to other comparable sub spaces, the method further comprising at least one of: pruning one or more of the plurality of insights in response to the corresponding impact value being below a pruning threshold value, or prioritizing further operations associated with the individual one of the plurality of insights based on the determined impact value. 5. The method of claim 3 , wherein the scoring comprises: at least one of: determining an impact value for an individual one of the plurality of insights based at least on a importance rating of the first one of the one or more subspaces, the importance rating is based on a relevance of the subspace associated with the insight to other comparable subspaces or determining a significance value of the individual one of the plurality of insights; normalizing at least one of the impact value or significance value to the range [0, 1]; and determining a score for the individual one of the plurality of insights based at least on the impact value and the significance value. 6. The method of claim 5 , further comprising: for individual ones of a top k of the plurality of insights, defining a potential function based on the associated score and at least one of a subspace distance model, an attributes distance model or an insight type distance model; and determining a new order of the top k of the plurality of insights in response to maximizing the potential function for each of the ones of the top k of the identified insights. 7. The method of claim 5 , wherein the significance value is at least partially based on the extent to which the individual insight is determined to be uncommon. 8. The method of claim 1 , wherein one or more of the plurality of insights is associated with more than one subspace. 9. The method of claim 1 , wherein the one or more other subspaces comprise at least one of a sibling relationship or a parent relationship to the identified one or more subspaces. 10. The method of claim 1 , wherein one or more of the plurality of insights comprises more than one pipelined attribute based on the one or more attributes. 11. The method of claim 1 , further comprising: identifying connections between two or more of the plurality of insights; and forming one or more meta insights based at least on the identified connections between at least two of the plurality of insights, the meta insights comprise a plurality of insights. 12. The method of claim 1 , wherein the attribute is a piped attribute and wherein identifying the piped attribute comprises: performing a first calculation on the multi-dimensional data to generate a first attribute; and performing a second calculation on the first attribute to generate the piped attribute. 13. A system for providing insights, the system comprising: one or more processors; and a memory communicatively coupled to the one or more processors, the memory storing instructions that, when executed, cause the one or more processors to: identify one or more subspaces of multi-dimensional data; identify, from the multi-dimensional data, one or more an attribute of the one or more subspaces; determine a set of insight candidates to be evaluated, wherein an insight candidate is characterized by an insight type; identify from the set of insight candidates, a plurality of insights for one or more subspaces, wherein the identifying comprises determining that the attribute values of a first subspace form a significant pattern or relationship with the attribute values of a second subspace, wherein the significance is evaluated based on the corresponding insight type and exceeds a threshold amount; wherein the set of insight candidates comprises insights of two or more different types; order the identified plurality of insights; and identify for presentation at least a portion of the plurality of insights based at least on the ordering. 14. The system of claim 13 , wherein the ordering comprises: scoring at least a portion of the plurality of insights; and ranking at least a portion of the scored plurality of insights, and wherein the at least the portion of the plurality of insights identified for presentation comprises a predetermined number of top-ranked insights of the plurality of insights. 15. The system of claim 12 , wherein the scoring comprises: at least one of: determining an impact value for an individual one of the plurality of insights based at least on a importance rating of the first one of the one or more subspaces, the importance rating is based on a relevance of the subspace associated with the insight to other comparable subspaces or determining a significance value of the individual one of the plurality of insights; normalizing least one of the impact value or significance value to the range [0, 1]; and determining a score for the individual one of the plurality of insights based at least on the impact value and the significance value. 16. The system of claim 14 , wherein the scoring comprises: determining an impact value for an individual one of the plurality of insights based at least on a importance rating of a corresponding dataset of the multi-dimensional data, the importance rating is based on a relevance of the subspace associated with the insight to other comparable subspaces, further comprising at least one of: pruning one or more of the plurality of insights in response to the corresponding impact value being below a pruning threshold value, or prioritizing further operations associated with the individual one of the plurality of insights based on the determined impact value. 17. The system of claim 14 , wherein the scoring comprises: determining an impact value for an individual one of the plurality of insights based at least

Assignees

Inventors

Classifications

  • G06Q10/06Primary

    Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling · CPC title

  • Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title

  • Visual data mining; Browsing structured data · CPC title

  • Query optimisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10635667B2 cover?
Automatically identifying insights from a dataset and presenting the insights graphically and in natural language text ranked by importance is provided. Different data types and structures in the dataset are automatic recognized and matched with a corresponding specific analysis type. The data is analyzed according to the determined corresponding analysis types, and insights from the analysis a…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06Q10/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).