Finding data in connected corpuses using examples

US8983954B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-8983954-B2
Application numberUS-201213443681-A
CountryUS
Kind codeB2
Filing dateApr 10, 2012
Priority dateApr 10, 2012
Publication dateMar 17, 2015
Grant dateMar 17, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, datasets are stored in a catalog. The datasets are enriched by establishing relationships among the domains in different datasets. A user searches for relevant datasets by providing examples of the domains of interest. The system identifies datasets corresponding to the user-provided examples. The system them identifies connected subsets of the datasets that are directly linked or indirectly linked through other domains. The user provides known relationship examples to filter the connected subsets and to identify the connected subsets that are most relevant to the user's query. The selected connected subsets may be further analyzed by business intelligence/analytics to create pivot tables or to process the data.

First claim

Opening claim text (preview).

What is claimed is: 1. A data processing system, comprising: a processor; and a memory coupled to the processor, the memory configured to store program instructions executable by the processor to cause the data processing system to: receive a collection of values from a user; identify a data type for each of the values; identify distinct datasets that correspond to the data types, each of the distinct datasets data set having one or more of the data types; identify relationships among the distinct datasets, the relationships corresponding to links between similar data types in the distinct datasets; provide, to the user, a list of proposed groups of datasets, wherein the datasets within each proposed group are linked to each other through one or more relationships; receive an example value set from the user, the example value set corresponding to a known relationship between two or more data types, and the example value set including at least two values; re-interpret the one or more relationships based upon the example value set; and provide, to the user, a second list of a second proposed group of datasets based upon the re-interpretation, wherein the datasets within the second proposed group include the example value set. 2. The data processing system of claim 1 , further comprising: receive a user selection of one of the proposed dataset groups. 3. The data processing system of claim 2 , further comprising: combine the datasets within the selected proposed dataset group into a new dataset. 4. The data processing system of claim 2 , further comprising: combine subsets of the datasets within the selected proposed dataset group into a new dataset. 5. The data processing system of claim 1 , wherein the list of proposed groups of datasets are linked to each other through the relationships to intermediate datasets that do not have data types corresponding to the values received from the user. 6. The data processing system of claim 1 , wherein the list of proposed groups of datasets comprises datasets that have overlapping data types corresponding to the values received from the user. 7. The data processing system of claim 1 , further comprising: rank the list of proposed groups of datasets, the ranking based upon weights assigned to one or more of the data types, datasets, and values received from the user. 8. A method, comprising: performing, by a processor in a computer system: identifying a collection of domains corresponding to a collection of values; identifying distinct datasets corresponding to at least one of the domains; identifying relationships among the distinct datasets, the relationships corresponding to links between similar domains in the distinct datasets; identifying groups of datasets among the distinct datasets, wherein the datasets within each proposed group are linked to each other through one or more relationships; receiving an example value set, the example value set corresponding to a known relationship between two or more domains, the example value set including at least two values; re-interpreting the one or more relationships based upon the example value set; and identifying one or more proposed groups of datasets based upon the re-interpretation, wherein the at least two values of the example value set are found within the datasets of the proposed groups. 9. The method of claim 8 , wherein all of the values of the example value set are found within the datasets of the proposed groups. 10. The method of claim 8 , wherein at least one of the values of the example value set is found within the datasets of the proposed groups. 11. The method of claim 8 , further comprising: receiving a user selection of one of the proposed dataset groups. 12. The method of claim 11 , further comprising: combining the datasets within the selected dataset group into a new dataset. 13. The method of claim 11 , further comprising: combining subsets of the datasets within the selected dataset group into a new dataset. 14. The method of claim 8 , wherein the datasets in the proposed groups are linked to each other through the relationships to intermediate datasets that do not include the domains in the collection of domains. 15. The method of claim 8 , wherein the datasets in the proposed groups comprise datasets that have overlapping domains corresponding to the collection of values. 16. The method of claim 8 , further comprising: ranking the proposed groups of datasets. 17. An article of manufacture having computer-executable instructions stored thereon that, upon execution by at least one processor of a computer system, cause the computer system to: identify a collection of data types corresponding to a collection of values; identify distinct datasets corresponding to at least one of the data types; identify relationships among the distinct datasets, the relationships corresponding to links between similar data types in the distinct datasets; identify groups of datasets among the distinct datasets, wherein the datasets within each proposed group are linked to each other through one or more relationships; receive an example value set from the user, the example value set corresponding to a known relationship between two or more data types, the example value set including at least two values; re-interpret the one or more relationships based upon the example value set; and identify one or more proposed groups of datasets based upon the re-interpretation, wherein the at least two values of the example value set are found within the datasets of the proposed groups. 18. The article of manufacture of claim 17 , wherein the computer-executable instructions, upon execution, further cause the computer system to: receive a user selection of one of the proposed dataset groups; and combining subsets of the datasets within the selected dataset group into a new dataset.

Assignees

Inventors

Classifications

  • using ranking · CPC title

  • Search customisation based on user profiles and personalisation · CPC title

  • Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

  • G06F16/634Primary

    Query by example, e.g. query by humming · CPC title

  • Query processing support for facilitating data mining operations in structured databases · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8983954B2 cover?
In one embodiment, datasets are stored in a catalog. The datasets are enriched by establishing relationships among the domains in different datasets. A user searches for relevant datasets by providing examples of the domains of interest. The system identifies datasets corresponding to the user-provided examples. The system them identifies connected subsets of the datasets that are directly link…
Who is the assignee on this patent?
Platt John C, Chaudhuri Surajit, Novik Lev, and 5 more
What technology area does this patent fall under?
Primary CPC classification G06F16/634. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 17 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).