Sponsor answers and user-approved, system-suggested links in a social search engine
US-2017262529-A1 · Sep 14, 2017 · US
US10853344B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10853344-B2 |
| Application number | US-201715661269-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 27, 2017 |
| Priority date | Jun 30, 2014 |
| Publication date | Dec 1, 2020 |
| Grant date | Dec 1, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject tuple (e.g., a subject column) for a table, detecting a tuple header (e.g., a column header) using other tables, and detecting a tuple header (e.g., a column header) using a knowledge base. Implementations can be utilized in a structured data search system (SDSS) that indexes structured information, such as, tables in a relational database or html tables extracted from web pages. The SDSS allows users to search over the structured information (tables) using different mechanisms including keyword search and data finding data.
Opening claim text (preview).
What is claimed: 1. A method for use at a computer system, the method comprising: receiving one or more tables, wherein the one or more tables include tuples that are not expressly defined as one of a subject tuple or a non-subject tuple; calculating a distinctness metric for a tuple from a table, wherein the distinctness metric indicates distinctness of cell values in the tuple; selecting the tuple as a potential subject tuple of the table based at least in part on determining that the distinctness metric for the tuple exceeds a threshold; determining a co-occurrence for values in the tuple by determining how often the values in the tuple are included in subject tuples in a plurality of other tables, the determined co-occurrence indicating a likelihood of the tuple being a subject tuple; calculating a co-occurrence score for the values in the tuple based on the co-occurrence and a number of occurrences of the values in the plurality of other tables; and classifying the tuple as one of a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence score. 2. The method of claim 1 , wherein selecting the tuple from the table as potentially being a subject tuple of the table comprises selecting a column from the table as potentially being a subject column of the table; wherein determining the co-occurrence for the values in the tuple by determining how often the values in the tuple are included in subject tuples in the plurality of other tables comprises determining a co-occurrence for values in the column by determining how often the values in the column are included in subject columns in the plurality of other tables; and wherein classifying the tuple as one of a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence comprises classifying the column as one of a subject column of the table or a non-subject column of the table based on the determined co-occurrence. 3. The method of claim 1 , further comprising calculating a score for the tuple based on the determined co-occurrence, the calculated score indicating a likelihood of a candidate subject tuple being a true subject tuple; and wherein classifying the tuple as one of: a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence comprises classifying the tuple based on the calculated score for the tuple. 4. The method of claim 1 , further comprising annotating the table with the tuple based on classification of the tuple. 5. The method of claim 1 , further comprising using the tuple to index the table based on classification of the tuple. 6. A method for use at a computer system, a first one or more tuples and a second one or more tuples forming a table at the computer system, the method comprising: constructing a set of candidate tuple names for the first one or more tuples; for each candidate tuple name in the set of candidate tuple names: searching a plurality of other tables for use of the candidate tuple name; calculating a first metric that indicates how many of the plurality of other tables comprise the candidate tuple name used as a tuple name or a first row of data; and calculating a second metric that indicates how many of the plurality of other tables comprise the candidate tuple name used as something other than a tuple name or a first row of data; and selecting a tuple, from among the second one or more tuples, as a header tuple for the first one or more tuples when a fraction of values in the tuple for which the first metric is greater than the second metric exceeds a defined threshold value. 7. The method of claim 6 , further comprising inferring that another tuple, from among the first one or more tuples, is a hypernym of cell values contained in the other tuple based on the cell values contained in the other tuple; and wherein selecting a tuple, from among the second one or more tuples, as a header tuple for the first one or more tuples comprises selecting a tuple, from among the second one or more tuples, as the header tuple based on the inference. 8. The method of claim 7 , wherein inferring that another tuple, from among the first one or more tuples, is the hypernym of cell values contained in the other tuple comprises referring to a knowledge base to infer that another tuple, from among the first one or more tuples, is the hypernym of cell values contained in the other tuple. 9. The method of claim 8 , wherein referring to the knowledge base comprises extracting one or more concept attributes and one or more instance attributes from the knowledge base. 10. The method of claim 6 , further comprising annotating the table with the selected tuple. 11. The method of claim 6 , further comprising using the selected tuple to index the table. 12. The method of claim 6 , further comprising computing a feature of the table from the selected tuple. 13. A system, comprising: one or more processors; system memory coupled to one or more hardware processors, the system memory storing instructions that are executable by the one or more hardware processors; the one or more processors executing the instructions stored in the system memory to: receive one or more tables, wherein the one or more tables include tuples that are not expressly defined as one of a subject tuple or a non-subject tuple; calculate a distinctness metric for a tuple from a table of the one or more tables, wherein the distinctness metric indicates distinctness of cell values in the tuple and includes one or more of a ratio of a number of distinct cell values to total number of cells and a number of occurrences of a most repeated value; select the tuple as potentially being a subject tuple of the table based at least in part on determining that the distinctness metric for the tuple exceeds a threshold; determine a co-occurrence for values in the tuple by determining how often the values in the tuple are included in subject tuples in a plurality of other tables, the determined co-occurrence indicating a likelihood of the tuple being a subject tuple; calculate a co-occurrence score for the values in the tuple based on the co-occurrence and a number of occurrences of the values in the plurality of other tables; classify the tuple as one of a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence score; modify an index based on the classification of the tuple; and provide the index for use by a search system that receives user searches. 14. The system of claim 13 , wherein the one or more processors executing the instructions stored in the system memory to select a tuple from the table as potentially being a subject tuple of the table comprises the one or more processors executing the instructions stored in the system memory to select a column from the table as potentially being a subject column of the table; wherein the one or more processors executing the instructions stored in the system memory to determine the co-occurrence for the values in the tuple by determining how often the values in the tuple are included in subject tuples in the plurality of other tables comprises the one or more processors executing the instructions stored in the system memory to determine how often values in the column are included in subject columns in the plurality of other tables; and wherein the one or more processors executing the instructions stored in the system memory to classify the tuple as one of: a subject tuple of the table or a non-subject tuple of the table based on the determined co-occurrence comprises the one or mo
Comparing digital values (G06F7/06, {G06F7/22,} G06F7/38 take precedence) · CPC title
Indexing; Web crawling techniques · CPC title
Tablespace storage structures; Management thereof · CPC title
Relational databases · CPC title
Information retrieval; Database structures therefor; File system structures therefor · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.