Understanding tables for search

US10853344B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10853344-B2
Application numberUS-201715661269-A
CountryUS
Kind codeB2
Filing dateJul 27, 2017
Priority dateJun 30, 2014
Publication dateDec 1, 2020
Grant dateDec 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject tuple (e.g., a subject column) for a table, detecting a tuple header (e.g., a column header) using other tables, and detecting a tuple header (e.g., a column header) using a knowledge base. Implementations can be utilized in a structured data search system (SDSS) that indexes structured information, such as, tables in a relational database or html tables extracted from web pages. The SDSS allows users to search over the structured information (tables) using different mechanisms including keyword search and data finding data.

First claim

Opening claim text (preview).

What is claimed: 1. A method for use at a computer system, the method comprising: receiving one or more tables, wherein the one or more tables include tuples that are not expressly defined as one of a subject tuple or a non-subject tuple; calculating a distinctness metric for a tuple from a table, wherein the distinctness metric indicates distinctness of cell values in the tuple; selecting the tuple as a potential subject tuple of the table based at least in part on determining that the distinctness metric for the tuple exceeds a threshold; determining a co-occurrence for values in the tuple by determining how often the values in the tuple are included in subject tuples in a plurality of other tables, the determined co-occurrence indicating a likelihood of the tuple being a subject tuple; calculating a co-occurrence score for the values in the tuple based on the co-occurrence and a number of occurrences of the values in the plurality of other tables; and classifying the tuple as one of a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence score. 2. The method of claim 1 , wherein selecting the tuple from the table as potentially being a subject tuple of the table comprises selecting a column from the table as potentially being a subject column of the table; wherein determining the co-occurrence for the values in the tuple by determining how often the values in the tuple are included in subject tuples in the plurality of other tables comprises determining a co-occurrence for values in the column by determining how often the values in the column are included in subject columns in the plurality of other tables; and wherein classifying the tuple as one of a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence comprises classifying the column as one of a subject column of the table or a non-subject column of the table based on the determined co-occurrence. 3. The method of claim 1 , further comprising calculating a score for the tuple based on the determined co-occurrence, the calculated score indicating a likelihood of a candidate subject tuple being a true subject tuple; and wherein classifying the tuple as one of: a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence comprises classifying the tuple based on the calculated score for the tuple. 4. The method of claim 1 , further comprising annotating the table with the tuple based on classification of the tuple. 5. The method of claim 1 , further comprising using the tuple to index the table based on classification of the tuple. 6. A method for use at a computer system, a first one or more tuples and a second one or more tuples forming a table at the computer system, the method comprising: constructing a set of candidate tuple names for the first one or more tuples; for each candidate tuple name in the set of candidate tuple names: searching a plurality of other tables for use of the candidate tuple name; calculating a first metric that indicates how many of the plurality of other tables comprise the candidate tuple name used as a tuple name or a first row of data; and calculating a second metric that indicates how many of the plurality of other tables comprise the candidate tuple name used as something other than a tuple name or a first row of data; and selecting a tuple, from among the second one or more tuples, as a header tuple for the first one or more tuples when a fraction of values in the tuple for which the first metric is greater than the second metric exceeds a defined threshold value. 7. The method of claim 6 , further comprising inferring that another tuple, from among the first one or more tuples, is a hypernym of cell values contained in the other tuple based on the cell values contained in the other tuple; and wherein selecting a tuple, from among the second one or more tuples, as a header tuple for the first one or more tuples comprises selecting a tuple, from among the second one or more tuples, as the header tuple based on the inference. 8. The method of claim 7 , wherein inferring that another tuple, from among the first one or more tuples, is the hypernym of cell values contained in the other tuple comprises referring to a knowledge base to infer that another tuple, from among the first one or more tuples, is the hypernym of cell values contained in the other tuple. 9. The method of claim 8 , wherein referring to the knowledge base comprises extracting one or more concept attributes and one or more instance attributes from the knowledge base. 10. The method of claim 6 , further comprising annotating the table with the selected tuple. 11. The method of claim 6 , further comprising using the selected tuple to index the table. 12. The method of claim 6 , further comprising computing a feature of the table from the selected tuple. 13. A system, comprising: one or more processors; system memory coupled to one or more hardware processors, the system memory storing instructions that are executable by the one or more hardware processors; the one or more processors executing the instructions stored in the system memory to: receive one or more tables, wherein the one or more tables include tuples that are not expressly defined as one of a subject tuple or a non-subject tuple; calculate a distinctness metric for a tuple from a table of the one or more tables, wherein the distinctness metric indicates distinctness of cell values in the tuple and includes one or more of a ratio of a number of distinct cell values to total number of cells and a number of occurrences of a most repeated value; select the tuple as potentially being a subject tuple of the table based at least in part on determining that the distinctness metric for the tuple exceeds a threshold; determine a co-occurrence for values in the tuple by determining how often the values in the tuple are included in subject tuples in a plurality of other tables, the determined co-occurrence indicating a likelihood of the tuple being a subject tuple; calculate a co-occurrence score for the values in the tuple based on the co-occurrence and a number of occurrences of the values in the plurality of other tables; classify the tuple as one of a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence score; modify an index based on the classification of the tuple; and provide the index for use by a search system that receives user searches. 14. The system of claim 13 , wherein the one or more processors executing the instructions stored in the system memory to select a tuple from the table as potentially being a subject tuple of the table comprises the one or more processors executing the instructions stored in the system memory to select a column from the table as potentially being a subject column of the table; wherein the one or more processors executing the instructions stored in the system memory to determine the co-occurrence for the values in the tuple by determining how often the values in the tuple are included in subject tuples in the plurality of other tables comprises the one or more processors executing the instructions stored in the system memory to determine how often values in the column are included in subject columns in the plurality of other tables; and wherein the one or more processors executing the instructions stored in the system memory to classify the tuple as one of: a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence comprises the one or mo

Assignees

Inventors

Classifications

  • Comparing digital values (G06F7/06, {G06F7/22,} G06F7/38 take precedence) · CPC title

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • Relational databases · CPC title

  • G06F16/00Primary

    Information retrieval; Database structures therefor; File system structures therefor · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10853344B2 cover?
The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject tuple (e.g., a subject column) for a table, detecting a tuple header (e.g., a column header) using other tables, and detecting a tuple header (e.g., a column header) using a knowledge base. Implementations can be utilized in…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).