Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06F16/951. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Understanding tables for search

US10853344B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10853344-B2
Application number	US-201715661269-A
Country	US
Kind code	B2
Filing date	Jul 27, 2017
Priority date	Jun 30, 2014
Publication date	Dec 1, 2020
Grant date	Dec 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject tuple (e.g., a subject column) for a table, detecting a tuple header (e.g., a column header) using other tables, and detecting a tuple header (e.g., a column header) using a knowledge base. Implementations can be utilized in a structured data search system (SDSS) that indexes structured information, such as, tables in a relational database or html tables extracted from web pages. The SDSS allows users to search over the structured information (tables) using different mechanisms including keyword search and data finding data.

First claim

Opening claim text (preview).

What is claimed: 1. A method for use at a computer system, the method comprising: receiving one or more tables, wherein the one or more tables include tuples that are not expressly defined as one of a subject tuple or a non-subject tuple; calculating a distinctness metric for a tuple from a table, wherein the distinctness metric indicates distinctness of cell values in the tuple; selecting the tuple as a potential subject tuple of the table based at least in part on determining that the distinctness metric for the tuple exceeds a threshold; determining a co-occurrence for values in the tuple by determining how often the values in the tuple are included in subject tuples in a plurality of other tables, the determined co-occurrence indicating a likelihood of the tuple being a subject tuple; calculating a co-occurrence score for the values in the tuple based on the co-occurrence and a number of occurrences of the values in the plurality of other tables; and classifying the tuple as one of a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence score. 2. The method of claim 1 , wherein selecting the tuple from the table as potentially being a subject tuple of the table comprises selecting a column from the table as potentially being a subject column of the table; wherein determining the co-occurrence for the values in the tuple by determining how often the values in the tuple are included in subject tuples in the plurality of other tables comprises determining a co-occurrence for values in the column by determining how often the values in the column are included in subject columns in the plurality of other tables; and wherein classifying the tuple as one of a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence comprises classifying the column as one of a subject column of the table or a non-subject column of the table based on the determined co-occurrence. 3. The method of claim 1 , further comprising calculating a score for the tuple based on the determined co-occurrence, the calculated score indicating a likelihood of a candidate subject tuple being a true subject tuple; and wherein classifying the tuple as one of: a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence comprises classifying the tuple based on the calculated score for the tuple. 4. The method of claim 1 , further comprising annotating the table with the tuple based on classification of the tuple. 5. The method of claim 1 , further comprising using the tuple to index the table based on classification of the tuple. 6. A method for use at a computer system, a first one or more tuples and a second one or more tuples forming a table at the computer system, the method comprising: constructing a set of candidate tuple names for the first one or more tuples; for each candidate tuple name in the set of candidate tuple names: searching a plurality of other tables for use of the candidate tuple name; calculating a first metric that indicates how many of the plurality of other tables comprise the candidate tuple name used as a tuple name or a first row of data; and calculating a second metric that indicates how many of the plurality of other tables comprise the candidate tuple name used as something other than a tuple name or a first row of data; and selecting a tuple, from among the second one or more tuples, as a header tuple for the first one or more tuples when a fraction of values in the tuple for which the first metric is greater than the second metric exceeds a defined threshold value. 7. The method of claim 6 , further comprising inferring that another tuple, from among the first one or more tuples, is a hypernym of cell values contained in the other tuple based on the cell values contained in the other tuple; and wherein selecting a tuple, from among the second one or more tuples, as a header tuple for the first one or more tuples comprises selecting a tuple, from among the second one or more tuples, as the header tuple based on the inference. 8. The method of claim 7 , wherein inferring that another tuple, from among the first one or more tuples, is the hypernym of cell values contained in the other tuple comprises referring to a knowledge base to infer that another tuple, from among the first one or more tuples, is the hypernym of cell values contained in the other tuple. 9. The method of claim 8 , wherein referring to the knowledge base comprises extracting one or more concept attributes and one or more instance attributes from the knowledge base. 10. The method of claim 6 , further comprising annotating the table with the selected tuple. 11. The method of claim 6 , further comprising using the selected tuple to index the table. 12. The method of claim 6 , further comprising computing a feature of the table from the selected tuple. 13. A system, comprising: one or more processors; system memory coupled to one or more hardware processors, the system memory storing instructions that are executable by the one or more hardware processors; the one or more processors executing the instructions stored in the system memory to: receive one or more tables, wherein the one or more tables include tuples that are not expressly defined as one of a subject tuple or a non-subject tuple; calculate a distinctness metric for a tuple from a table of the one or more tables, wherein the distinctness metric indicates distinctness of cell values in the tuple and includes one or more of a ratio of a number of distinct cell values to total number of cells and a number of occurrences of a most repeated value; select the tuple as potentially being a subject tuple of the table based at least in part on determining that the distinctness metric for the tuple exceeds a threshold; determine a co-occurrence for values in the tuple by determining how often the values in the tuple are included in subject tuples in a plurality of other tables, the determined co-occurrence indicating a likelihood of the tuple being a subject tuple; calculate a co-occurrence score for the values in the tuple based on the co-occurrence and a number of occurrences of the values in the plurality of other tables; classify the tuple as one of a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence score; modify an index based on the classification of the tuple; and provide the index for use by a search system that receives user searches. 14. The system of claim 13 , wherein the one or more processors executing the instructions stored in the system memory to select a tuple from the table as potentially being a subject tuple of the table comprises the one or more processors executing the instructions stored in the system memory to select a column from the table as potentially being a subject column of the table; wherein the one or more processors executing the instructions stored in the system memory to determine the co-occurrence for the values in the tuple by determining how often the values in the tuple are included in subject tuples in the plurality of other tables comprises the one or more processors executing the instructions stored in the system memory to determine how often values in the column are included in subject columns in the plurality of other tables; and wherein the one or more processors executing the instructions stored in the system memory to classify the tuple as one of: a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence comprises the one or mo

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06F7/02
Comparing digital values (G06F7/06, {G06F7/22,} G06F7/38 take precedence) · CPC title
G06F16/951Primary
Indexing; Web crawling techniques · CPC title
G06F16/2282Primary
Tablespace storage structures; Management thereof · CPC title
G06F16/284
Relational databases · CPC title
G06F16/00Primary
Information retrieval; Database structures therefor; File system structures therefor · CPC title

Patent family

Related publications grouped by family.

View patent family 54930742

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10853344B2 cover?: The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject tuple (e.g., a subject column) for a table, detecting a tuple header (e.g., a column header) using other tables, and detecting a tuple header (e.g., a column header) using a knowledge base. Implementations can be utilized in…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Sponsor answers and user-approved, system-suggested links in a social search engine

Ranking tables for keyword search

Searching for join candidates

Frequently asked questions