Computing features of structured data

US10127315B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10127315-B2
Application numberUS-201414325376-A
CountryUS
Kind codeB2
Filing dateJul 8, 2014
Priority dateJul 8, 2014
Publication dateNov 13, 2018
Grant dateNov 13, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention extends to methods, systems, and computer program products for computing features of structured data. Aspects of the invention include computing features of table components (e.g., of rows, columns, cells, etc.). Computed features can be used for ranking the table components. When aggregated, features for different components of a table can be used for ranking the table (e.g., a web table).

First claim

Opening claim text (preview).

What is claimed: 1. A method for use at a computer system, the method for indexing a table for ranking the table, the method comprising a processor: accessing a table, the table including a subject column, a non-subject column, and a plurality of rows, cells of the table being at intersections between the columns and rows, the table annotated with a portion of other content relevant to describing the table; generating an index for the table by indexing at least over data within the table along with the portion of other content; storing the index improving the relevance of providing the table in search results; refining the index for the table, comprising: deriving a semantic attribute for a cell at the intersection of the non-subject column and a row of the table, the semantic attribute derived by combining: (1) a value in the cell, (2) another value in another cell at the intersection of the subject column and the row, and (3) a name value for the non-subject column into the semantic attribute, the semantic attribute providing additional information to distinguish the value in the cell from the value in other cells; calculating a frequency with which the semantic attribute is included in a plurality of other tables; calculating a relevance of the semantic attribute based on inclusion of the semantic attribute in the plurality of other tables and based on features of the plurality of other tables; determining a feature of the table by aggregating the semantic attribute, the calculated frequency of occurrence of the semantic attribute, the calculated relevance of the semantic attribute into the feature of the table, and the portion of other content; refining the index for the table by indexing over the cell and the feature of the table; and storing the refined index further improving the relevance of providing the table in search results; and surfacing results from the table that satisfy a search query based on a ranking for the table determined from the refined index. 2. The method of claim 1 , wherein deriving a semantic attribute comprises combining an entity-column name-value triple for the row. 3. The method of claim 2 , wherein deriving a semantic attribute comprises: generating an entity-attribute binary table for the row, the entity-attribute binary table having a first column corresponding to the subject column; formulating an entity-column name-value triple from the entity-attribute binary table; and wherein calculating an occurrence rate for the semantic attribute comprises counting the occurrence rate of the entity-column name-value triple across entity-attribute binary tables generated for one or more other tables. 4. The method of claim 3 , wherein the table includes a second non-subject column, and further comprising: deriving another semantic attribute for a further cell at the intersection of the second non-subject column and the row, the other semantic attribute derived by combining: (1) a further value in the further cell, (2) the other vales in the other cell, and (3) a name value for the second non-subject column into the other semantic attribute, the other semantic attribute providing additional information to distinguish the further value in the further cell from the further value in other cells; calculating another frequency with which the other semantic attribute is included in a second plurality of other tables; calculating another relevance of the other semantic attribute based on inclusion of the other semantic attribute in the second plurality of other tables and based on features of the second plurality of other tables; and wherein determining a feature of the table comprises determining the feature of the table by aggregating the other semantic attribute, the frequency of occurrence of the other semantic attribute, and the relevance of the other semantic attribute into the feature. 5. The method of claim 4 , further comprising calculating an additional frequency with which the other semantic attribute was selected from presented search results; and wherein determining the feature of the table comprises aggregating the additional frequency of the other semantic attribute into the feature. 6. The method of claim 1 , further comprising: deriving another semantic attribute for a further cell at the intersection of the non-subject column and another row, the other semantic attribute derived by combining: (1) a further value in the further cell, (2) a further other value in a further other cell at the intersection of the subject column and the other row, and (3) the name value for the non-subject column into the other semantic attribute, the other semantic attribute providing additional information to distinguish the further value in the further cell from the further value in other cells; calculating another frequency with which the other semantic attribute is included in a second plurality of other tables; calculating another relevance of the other semantic attribute based on inclusion of the other semantic attribute in the second plurality of other tables and based on features of the second plurality of other tables; and wherein determining a feature of the table comprises determining the feature of the table by aggregating the other semantic attribute, the frequency of occurrence of the other semantic attribute, and the relevance of the other semantic attribute into the feature. 7. The method of claim 1 , further comprising calculating an additional frequency with which the semantic attribute was selected from presented search results; and wherein determining the feature of the table comprises aggregating the additional frequency of the semantic attribute into the feature. 8. The method of claim 6 , wherein determining a feature of the table comprises determining one or more of: a popularity of the table, or a trustworthiness of the table. 9. The method of claim 1 , wherein determining a feature of the table comprises determining one or more of: a trustworthiness of the table or a popularity of the table. 10. A computer program product for use at a computer system, the computer program product for implementing a method for indexing a table for ranking the table, the computer program product comprising one or more hardware storage devices having stored thereon computer-executable instructions that, when executed at a processor, cause the computer system to perform the method, including the following: access a table from system memory, the table including a subject column, a non-subject column, and a plurality of rows, cells of the table being at intersections between columns and rows, the table annotated with a portion of other content relevant to describing the table; generating an index for the table by indexing at least over data within the table along with the portion of other content; storing the index improving the relevance of providing the table in search results; refining the index for the table, comprising: derive a semantic attribute for a cell at the intersection of the non-subject column and a row of the table, the semantic attribute derived by combining: (1) a value in the cell, (2) another value in another cell at the intersection of the subject column and the row, and (3) a name value for the non-subject column into the semantic attribute, the semantic attribute providing additional information to distinguish the value in the cell from the value in other cells; calculate a frequency with which the semantic attribute is included in a plurality of other tables; calculate a relevance of the semantic attribute based on inclusion of the semantic attribute in the plurality of other tables and based on features of the plurality of other tables;

Assignees

Inventors

Classifications

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10127315B2 cover?
The present invention extends to methods, systems, and computer program products for computing features of structured data. Aspects of the invention include computing features of table components (e.g., of rows, columns, cells, etc.). Computed features can be used for ranking the table components. When aggregated, features for different components of a table can be used for ranking the table (e…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 13 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).