Searching for join candidates

US9116940B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9116940-B1
Application numberUS-201313862768-A
CountryUS
Kind codeB1
Filing dateApr 15, 2013
Priority dateApr 15, 2013
Publication dateAug 25, 2015
Grant dateAug 25, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and techniques are provided for receiving an input column and a search keyword and providing one or more suggested columns with which to merge the input column. A coverage score and a refinity score are calculated for potential columns based on the input column as well as a search score based on the search keyword. The one or more suggested columns may be determined based on the coverage score, refinity score, and/or the search score. The input column and/or a potential column may be modified based on a function and the modification may result in a plurality of modified input and/or potential columns. Coverage, refinity, and search scores may be calculated based on the modified columns.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving an input column comprising a plurality of query values; receiving a search keyword; identifying a first potential table column; determining a coverage score for the first potential table column, wherein the coverage score is based on the number of query values in the input column also contained in at least a portion of the first potential table column; determining a refinity score for the first potential table column representing a similarity between the first potential table column and the input column, wherein the refinity score is based on an average number of occurrences of values from the input column within at least a portion of the first potential table column; determining a search keyword score for the first potential table column based on the search keyword; and determining a first total score corresponding to the first potential table column based on the coverage score, the refinity score, and the search keyword score. 2. The method of claim 1 , further comprising providing the first potential table column to a user based on the first total score. 3. The method of claim 1 , wherein the search keyword is input by a user. 4. The method of claim 1 , wherein receiving an input column comprising at least one query value further comprises: receiving an input table comprising the input column; and receiving a column ID corresponding to the input column. 5. The method of claim 1 , wherein the search keyword corresponds to a column heading. 6. The method of claim 1 , wherein the first potential table column is selected from a corpus of data. 7. The method of claim 6 , wherein the corpus of data is uploaded data in a database. 8. The method of claim 6 , wherein the corpus of data is gathered from web crawlers. 9. The method of claim 6 , wherein the corpus of data is index based on a table ID, column ID, and row count. 10. The method of claim 1 , wherein the search keyword score for the first potential column is determined independent of the refinity score. 11. The method of claim 1 , wherein the search keyword score for the first potential column is determined independent of the coverage score. 12. The method of claim 1 , further comprising providing the first potential column to a user based on the coverage score, the refinity score, and the search keyword score. 13. The method of claim 1 , further comprising: ranking the first potential column based on the first total score; ranking a second potential column based on a second total score; and providing the first potential column and the second potential column in an order based on the rankings for the first potential column and the second potential column. 14. The method of claim 1 , further comprising: ranking the first potential column based on the first total score; ranking a second potential column based on a second total score; selecting the first potential column based on the rank for the first potential column and the rank for the second potential column; and providing the first potential column instead of the second potential column based on the selection. 15. The method of claim 12 , further comprising providing the first potential column to a user if the coverage score is above a threshold. 16. The method of claim 12 , further comprising providing the first potential column to a user if the refinity score is below a threshold. 17. The method of claim 1 , further comprising: generating a first modified input column by applying a first function to the input column; generating a first modified index by applying a second function to the index; identifying the potential table column, from the first modified index, based on the first modified input column. 18. The method of claim 17 , wherein the first function and the second function are the same. 19. The method of claim 1 , further comprising: generating a first modified input column by applying a first function to the input column; identifying the potential table column, from the index, based on the modified input column. 20. The method of claim 1 , further comprising determining the first total score based at least on dividing the coverage score by the refinity score. 21. A system comprising: a database storing a corpus of data; a processor in connection with said database, said processor configured to: receive an input column comprising a plurality of query values; receive a search keyword; identify a first potential table column; determine a coverage score for the first potential table column, wherein the coverage score is based on the number of query values in the input column also contained in at least a portion of the potential table column; determine a refinity score for the first potential table column representing a similarity between the first potential table column and the input column, wherein the refinity score is based on an average number of occurrences of values from the input column within at least a portion of the first potential table column; determine a search keyword score for the first potential table column based on the search keyword; and determine a first total score corresponding to the first potential table column based on the coverage score, the refinity score, and the search keyword score. 22. The system of claim 21 , further configured to provide the first potential table column to a user based on the first total score. 23. The system of claim 21 , further configured to: generate a first modified input column by applying a first function to the input column; identify the potential table column, from the index, based on the modified input column.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9116940B1 cover?
Systems and techniques are provided for receiving an input column and a search keyword and providing one or more suggested columns with which to merge the input column. A coverage score and a refinity score are calculated for potential columns based on the input column as well as a search score based on the search keyword. The one or more suggested columns may be determined based on the coverag…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2272. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 25 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).