Ranking tables for keyword search

US9940365B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9940365-B2
Application numberUS-201414325378-A
CountryUS
Kind codeB2
Filing dateJul 8, 2014
Priority dateJul 8, 2014
Publication dateApr 10, 2018
Grant dateApr 10, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention extends to methods, systems, and computer program products for ranking tables for keyword search. Aspects of the invention include generating lists of candidate tables for inclusion in a search query response, computing table hit matrices, retrieving content from fields of candidate tables having keyword hits, generating ranking features of tables, and computing ranking scores for tables. Aspects of the invention can be used to match keywords against column names, to match keywords against values in subject and non-subject columns, and to match keywords against table descriptions like page titles, table captions, cell values, nearest headings and surrounding text. Which keywords are matched against which fields can depend on the table and/or the query (referred to as “late binding”).

First claim

Opening claim text (preview).

What is claimed: 1. A computer system, the computer system comprising: a hardware processor; system memory coupled to the hardware processors, the system memory storing instructions that are executable by the hardware processor; the hardware processor executing the instructions stored in the system memory to rank tables for inclusion in a response to a search query including one or more keywords, including the following: access a list of candidate tables for inclusion in the search query response, the list of candidate tables having been previously selected by having an approximate ranking score meeting a score threshold to be considered a candidate table; for each candidate table: generate a hit matrix for the candidate table, including for any of the one or more keywords contained in the candidate table, determining that one or more fields in one or more parts of the candidate table contain a hit for the keyword and, for each of the one or more fields, computing a location of the keyword hit within the field; access table features for the candidate table from a feature index; generate a hit matrix overlay for the candidate table by overlaying the hit matrix with the accessed table features to distinguish keyword hits in different semantic locations inside the candidate table, overlaying the hit matrix including mapping keyword hit locations into logical regions of the candidate table; compute one or more dynamic features of the candidate table from the hit matrix overlay; and generate a ranking score for the candidate table at least from the one or more dynamic features. 2. The computer system of claim 1 , further comprising the hardware processor executing the instructions stored in the system memory to: access one or more keywords of the search query; compile a list of one or more hit tables that have any keyword hit, including for each keyword in the one or more keywords, referring to a table keyword index to identify any tables including the keyword; compute an approximate ranking score for each hit table in the compiled list, the approximate ranking score for each hit table indicative of the sufficiency of the hit table as match to the search query; and formulate the list of candidate tables for further ranking by filtering out any hit tables from the compiled list having an approximate ranking score below the score threshold. 3. The computer system of claim 2 , wherein the hardware processor executing the instructions stored in the system memory to compile a list of one or more hit tables that have any keyword hit comprises the hardware processor executing the instructions stored in the system memory to compile a list of one or more web tables. 4. The computer system of claim 2 , wherein the hardware processor executing the instructions stored in the system memory to compute an approximate ranking score for each table in the compiled list comprises the hardware processor executing the instructions stored in the system memory to compute an approximate ranking score for each table in the compiled list by summing the inverse document frequencies for any keywords included in the table. 5. The computer system of claim 2 , wherein the hardware processor executing the instructions stored in the system memory to compile a list of one or more hit tables that have any keyword hit comprises the hardware processor executing the instructions stored in the system memory to compile a list of one or more hit tables including at least one hit table that has a keyword hit in a hidden field. 6. The computer system of claim 1 , further comprising for each candidate table the hardware processor executing the instructions stored in the system memory to: access a set of static features for the candidate table; and derive a set of ranking features for the candidate table from the set of static features and the one or more dynamic features; and wherein the hardware processor executing the instructions stored in the system memory to generate a ranking score for the candidate table comprises the hardware processor executing the instructions stored in the system memory to generate a ranking score for the candidate table from the set of ranking features for the candidate table. 7. The computer system of claim 6 , further comprising the hardware processor executing the instructions stored in the system memory to access a set of query features for the search query; and wherein the hardware processor executing the instructions stored in the system memory to derive a set of ranking features for the candidate table comprises the hardware processor executing the instructions stored in the system memory to derive the set of ranking features from the set of static features, the one or more dynamic features, and the set of query features. 8. The computer system of claim 7 , wherein the hardware processor executing the instructions stored in the system memory to access a set of query features for the search query comprises the hardware processor executing the instructions stored in the system memory to: for one or more keywords of the search query, access a translation model for the keyword; and access at least one other query feature selected from among: an indication if the keyword is a stop word, an indication if the keyword is numeric, and an indication if the keyword is alphanumeric. 9. The computer system of claim 6 , wherein the hardware processor executing the instructions stored in the system memory to access a set of static features comprises the hardware processor executing the instructions stored in the system memory to access one or more of: a static rank of a web page containing the candidate table, a domain rank of the web page containing the candidate table, a click count of the web page containing the candidate table, a subject column index, number of rows in the candidate table, or a data type for each column of the candidate table. 10. The computer system of claim 6 , wherein the hardware processor executing the instructions stored in the system memory to derive a set of ranking features for the candidate table comprises the hardware processor executing the instructions stored in the system memory to: determine how many subject column values have keyword hits; and for each subject column value that has a keyword hit, determine how much the subject column value overlaps with the keyword hit. 11. The computer system of claim 6 , wherein the hardware processor executing the instructions stored in the system memory to derive a set of ranking features comprises the hardware processor executing the instructions stored in the system memory to: determine how many attribute column values have keyword hits; and for each attribute column value that has a keyword hit, determine an overlap ratio of how much the attribute column value overlaps with the keyword hit. 12. The computer system of claim 6 , wherein the hardware processor executing the instructions stored in the system memory to derive a set of ranking features comprises the hardware processor executing the instructions stored in the system memory to: determine which portions of the candidate table description, including one or more of: caption, Uniform Resource Locator (URL), title, page level heading, other headings, or surrounding text, include keyword hits; and compute a base score for the candidate table based on which portions of the candidate table description include keyword hits. 13. The computer system of claim 1 , wherein the hardware processor executing the instructions stored in the system memory to compute one or more dynamic features comprises the hardware processor executing the

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9940365B2 cover?
The present invention extends to methods, systems, and computer program products for ranking tables for keyword search. Aspects of the invention include generating lists of candidate tables for inclusion in a search query response, computing table hit matrices, retrieving content from fields of candidate tables having keyword hits, generating ranking features of tables, and computing ranking sc…
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/24578. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 10 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).