Systems and methods for generating natural language insights about sets of data

US9378270B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9378270-B2
Application numberUS-201414314261-A
CountryUS
Kind codeB2
Filing dateJun 25, 2014
Priority dateJun 25, 2014
Publication dateJun 28, 2016
Grant dateJun 28, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the invention provide systems and methods for generating natural language insights about a set of data. More specifically, embodiments of the present invention are directed to methods and systems that transform data into insights or actionable information. The output generated by embodiments of the present invention would be equivalent to that of an observation made or insights gathered by a qualified data scientist presented with the same data. Embodiments as described herein can include an insight engine that can analyze both structured and unstructured data and generate information in a natural language of the user's choice. Insights provided by embodiments described herein can be supported by an ability to drilldown to graphs/tables and atomic data and provide a good starting point for further analysis.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating natural language insights about a set of data, the method comprising: storing by a server system the set of data in one or more data stores of the server system; defining by the server system an analysis to be performed on the set of data for each question of a plurality of questions; processing information received by the server system from a computing device via a network to identify at least one question of the plurality of questions; performing by the server system the defined analysis for the identified at least one question on at least a subset of the set of data, and generating a natural language answer to the identified at least one question based at least in part on performing the defined analysis for the identified at least one question; transmitting by the server system a transmission to the computing device to cause display of the natural language answer to the identified at least one question with a user interface of the computing device, wherein the natural language answer comprises a first representation of a measure that corresponds to a first value; subsequently identifying by the server system a subsequent value of the measure; selecting by the server system a subsequent natural language answer based at least in part on values corresponding to user votes indicating preferences associated with one or both of the natural language answer and the subsequent natural language answer, wherein the subsequent natural language answer comprises a different representation of the measure that corresponds to the subsequent value, and the selecting the subsequent natural language answer comprises: determining a difference between the first value and the subsequent value; comparing the difference between the first value and the subsequent value to a threshold; and consequent to the difference satisfying the threshold, selecting the subsequent natural language answer; and transmitting by the server system another transmission to the computing device or a second computing device via the network to cause display of the subsequent natural language answer with the user interface of the computing device or a second user interface of the second computing device. 2. The method of claim 1 , wherein the defining the analysis to be performed on the set of data for each question of the plurality of questions comprises: creating one or more question templates for the question; identifying one or more measures of the analysis and one or more dimensions of a collection of data upon which the analysis will be performed; defining processes for performing the analysis; identifying one or more observations available on results of performing the defined processes; and creating one or more answer templates for each identified observation. 3. The method of claim 2 , wherein the received information comprises a natural language query, and wherein the processing the received information comprises applying at least one question template of the one or more question templates to the received information. 4. The method of claim 2 , wherein the performing the defined analysis for the identified at least one question comprises: identifying the analysis to be performed and dimensions of data to be used based at least partially on the identified at least one question; collecting, from the set of data, a subset of data based at least partially on the identified dimensions; executing the defined processes for performing the analysis on the collected subset of data; populating one or more answer templates with analysis results based at least partially on the identified one or more observations for the analysis; and generating the natural language answer based at least partially on the populated one or more answer templates. 5. The method of claim 4 , wherein the collecting, from the set of data, the subset of data based at least partially on the identified dimensions comprises using an attribute-weighted data mining algorithm. 6. The method of claim 1 , wherein the transmitting the natural language answer comprises providing one or more of an email, a micro-blog message, an instant message, a voice message, and/or a graphical representation, and/or a textual representation on a web page. 7. A system comprising: a server system comprising one or more servers and a memory storing a set of instructions which, when executed by the one or more servers, causes the server system to generate natural language insights about a set of data at least partially by: storing the set of data in one or more data stores of the server system; defining an analysis to be performed on the set of data for each question of a plurality of questions; processing information received from a computing device via a network to identify at least one question of the plurality of questions; performing the defined analysis for the identified at least one question on at least a subset of the set of data, and generating a natural language answer to the identified at least one question based at least in part on performing the defined analysis for the identified at least one question; transmitting a transmission to the computing device to cause display of the natural language answer to the identified at least one question, wherein the natural language answer comprises a first representation of a measure that corresponds to a first value; subsequently identifying a subsequent value of the measure; selecting a subsequent natural language answer based at least in part on values corresponding to user votes indicating preferences associated with the natural language answer one or both of the subsequent natural language answer, wherein the subsequent natural language answer comprises a different representation of the measure that corresponds to the subsequent value, and the selecting the subsequent natural language answer comprises: determining a difference between the first value and the subsequent value; comparing the difference between the first value and the subsequent value to a threshold; and consequent to the difference satisfying the threshold, selecting the subsequent natural language answer; and transmitting another transmission to the computing device or a second computing device via the network to cause display of the subsequent natural language answer with a user interface of the computing device or a second user interface of the second computing device. 8. The system of claim 7 , wherein the defining the analysis to be performed on the set of data for each question of the plurality of questions comprises: creating one or more question templates for the question; identifying one or more measures of the analysis and one or more dimensions of a collection of data upon which the analysis will be performed; defining processes for performing the analysis; identifying one or more observations available on results of performing the defined processes; and creating one or more answer templates for each identified observation. 9. The system of claim 8 , wherein the received information comprises a natural language query, and wherein the processing the received information comprises applying at least one of the one or more question templates to the received information. 10. The system of claim 8 , wherein the performing the defined analysis for the identified at least one question comprises: identifying the analysis to be performed and dimensions of data to be used based at least partially on the identified at least one question; collecting, from the set of data, a subset of data based at least partially on the identified dimensions; executing the defined processes for performing the analysis

Assignees

Inventors

Classifications

  • G06F16/30Primary

    of unstructured textual data (document management systems G06F16/93) · CPC title

  • Templates · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9378270B2 cover?
Embodiments of the invention provide systems and methods for generating natural language insights about a set of data. More specifically, embodiments of the present invention are directed to methods and systems that transform data into insights or actionable information. The output generated by embodiments of the present invention would be equivalent to that of an observation made or insights g…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 28 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).