Annotating structured data for search

US9959305B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9959305-B2
Application numberUS-201414325377-A
CountryUS
Kind codeB2
Filing dateJul 8, 2014
Priority dateJul 8, 2014
Publication dateMay 1, 2018
Grant dateMay 1, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention extends to methods, systems, and computer program products for annotating structured data for search. Aspects of the invention include associating structured data, such as, for example, tables, with additional content to improve indexing of the structured data for search and/or provide improved search results for structured data. Web pages can include tables as well as other content. The other content in a web page, such as, for example, content outside the <table> and </table> tags of a web table, can be useful in supporting searches for web tables. Content in one web page can also be useful in supporting searches for a table in another web page.

First claim

Opening claim text (preview).

What is claimed: 1. A method for use at a computer system, the computer system including a processor and system memory, the method for indexing data for search, the method comprising the processor: accessing a web page from the system memory, the web page defining a table with table tags, the table storing different portions of data in different parts of the table, the web page also including other content outside the table tags; determining that a portion of the other content is relevant to describing a part of the table; attaching the portion of the other content to the table to annotate the table with the portion of the other content; generating an index for the part of the table by indexing at least over data within the part of the table along with the portion of the other content, the portion of the other content providing additional context for the data within the part of the table, the index specifically tied to the part of the table; storing the index improving the relevance of providing the part of the table in search results; refining the index for the part of the table, comprising: accessing an indication that a link to a previously indexed web page was selected from among a plurality of links returned in query results accessing one or more tokens from the link; selecting a specified additional subset of table content from the table; determining that the one or more tokens are relevant to describing the specified subset of the table content; refining the index by indexing over the one or more tokens and the specified subset of table content in the system memory; and storing the refined index further improving the relevance of providing the part of the table in search results. 2. The method of claim 1 , wherein accessing a web page defining a table comprises access a web page defining a web table. 3. The method of claim 1 , further comprising determining further content external to the web page is relevant to the table, the further content comprising one or more of: a page title, a heading, or a uniform resource location (URL); and wherein generating an index for the part of the table comprises generating an index over one or more of: the page title, the heading, or the uniform resource location (URL) and the table content. 4. The method of claim 1 , further comprising determining that further content external to the web page is relevant to the table, the further content comprising a portion of incoming anchor text; and wherein generating an index for the part of the table comprises generating an index over the portion of incoming anchor text. 5. The method of claim 1 , further comprising determining that further content external to the web page is relevant to the table, the further content comprising a portion of content in a click log; and wherein generating an index for the part of the table comprises generating an index over the portion of content in the click log. 6. The method of claim 1 , further comprising determining that further content external to the web page is relevant to the table, the further content comprising a portion of content in a knowledge base; and wherein generating an index for the part of the table comprises generating an index over the portion of content in the knowledge base. 7. The method of claim 2 , wherein generating an index for part of the table comprises generating an inverted index over the data within part of the web table and a sequence of subheadings contained in the web page. 8. The method of claim 2 , wherein generating an index for part of the table comprises generating an inverted index over the data within part of the web table and a caption derived from the web page. 9. The method of claim 2 , wherein generating an index for part of the table comprises generating an inverted index for processing Internet search queries to return the part of the table. 10. The method of claim 1 , further comprising determining that further content external to the web page is relevant to the table, the further content comprising text surrounding the table; and wherein generating an index for the part of the table comprises generating an index over the surrounding text. 11. The method of claim 1 , wherein accessing one or more tokens from the link comprises accessing one or more words in the query. 12. The method of claim 1 , wherein refining the index comprises generating an inverted index over anchor text contained in the link. 13. The method of claim 1 , wherein refining the index comprises generating the inverted index over content from at least one of: a click log or a knowledge base. 14. A computer system, the computer system comprising: one or more hardware processors; system memory coupled to the one or more hardware processors, the system memory storing instructions that are executable by the one or more hardware processors; the one or more hardware processors executing the instructions stored in the system memory to index data for search, including the following: access a web page from the system memory, the web page defining a table with table tags, the table storing different portions of data in different parts of the table, the web page also including other content outside the table tags; determine that a portion of the other content is relevant to describing a part of the table; attach the portion of the other content to the web table to annotate the table with the portion of the other content; generate an index for the part of the table by indexing at least over data within the part of the table along with the portion of the other content, the portion of the other content providing additional context for the data within the part of the table, the index specifically tied to the part of the table; store the index improving the relevance of providing the part of the table in search results; refine the index for the part of the table, comprising: access an indication that a link to a previously indexed web page was selected from among a plurality of links returned in query results access one or more tokens from the link; select a specified additional subset of table content from the table; determine that the one or more tokens are relevant to describing the specified subset of the table content; refine the index by indexing over the one or more tokens and the specified subset of table content in the system memory; and store the refined index further improving the relevance of providing the part of the table in search results. 15. The system of claim 14 , further comprising the one or more hardware processors executing the instructions to determine that further content external to the web page is relevant to the table, the further content comprising a portion of anchor text; and wherein the one or more hardware processors executing the instructions to generate an index for the part of the table comprises the one or more hardware processors executing the instructions to generate an inverted index over data within part of the table, the portion of other content, and the portion of incoming anchor text. 16. The system of claim 14 , further comprising the one or more hardware processors executing the instructions to determine that further content external to the web page is relevant to the table, the further content comprising a portion of content in a click log; and wherein the one or more hardware processors executing the instructions to generate an index for the part of the table comprises the one or more hardware processors executing the instructions to generate an inverted index over data within part of

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9959305B2 cover?
The present invention extends to methods, systems, and computer program products for annotating structured data for search. Aspects of the invention include associating structured data, such as, for example, tables, with additional content to improve indexing of the structured data for search and/or provide improved search results for structured data. Web pages can include tables as well as oth…
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30339. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).