Name disambiguation using context terms

US9830379B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9830379-B2
Application numberUS-95525310-A
CountryUS
Kind codeB2
Filing dateNov 29, 2010
Priority dateNov 29, 2010
Publication dateNov 28, 2017
Grant dateNov 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems and apparatus, including computer programs encoded on a computer storage medium, for disambiguating names in a document corpus. In an aspect, a method includes generating context term lists for a person name, each context term list being a list of context terms from a resource for the person name; clustering the context term lists into a plurality of clusters, each of the clusters of context term lists including context term lists that are most similar to the cluster relative to other clusters; for each of the clusters, selecting a representative term for the cluster; receiving the person name as a search query; and generating a plurality of query suggestions from the search query and the representative terms for the clusters, each query suggesting being a combination of the person name and one representative term.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method performed by a data processing apparatus, the method comprising: generating context term lists for a person name, each context term list including a list of context terms that co-occur with the person name in queries to which a resource for the person name is determined to be responsive, and each of the resources to which the context term lists for the person name correspond being different resources, the generating the context term lists comprising: for each of the selected resources: selecting first queries for which a relevance of the resource is determined to meet a first relevance threshold relative to other resources, each of the first queries including the person name; selecting terms that are not the person name and that co-occur with the person name in the first queries as context terms for the person name and the resource; selecting second queries for which a relevance of the resource is determined meet a second relevance threshold, each of the second queries not including the person name; selecting terms from the second queries as context terms for the person name and the resource; and generating a respective context term list for the resource and for the person name from the context terms selected from the first queries and the second queries; clustering the context term lists into a plurality of clusters, each of the clusters of context term lists including context term lists that are most similar to the cluster relative to other clusters, the clustering comprising iteratively determining a measure of similarity between pairs of context term lists based on the context terms in the pairs of lists and merging context term lists based on the measures of similarity; selecting from each of the clusters, a respective context term, the respective context term being a representative context term for the cluster; receiving the person name as a search query; and generating a plurality of query suggestions from the search query and the representative terms for the clusters, each query suggesting being a combination of the person name and one representative term. 2. The method of claim 1 , wherein generating context term lists further comprises: for each resource, identifying third terms that are determined to be contextually significant among the resources; and selecting the third terms as context terms for the person name. 3. The method of claim 1 , wherein: each context term list comprises a context term vector of weights, each element of each context term vector corresponding to a context term and having a non-zero value if the context term is represented by the context term list, the non-zero value being a relevance score that measures the relevance of the context term to the resource to which the context term list corresponds; determining a measure of similarity between pairs of context term lists based on the context terms in the pairs of lists and merging context term lists based on the measures of similarity comprises: determining similarity scores for the context term vector of weights for the corresponding pairs of context term lists, each similarity score measuring the similarity of the context term vector of weights of the corresponding pairs of context term lists; merging pairs of context terms vector of weights for which a determined similarity score meets a similarity threshold into a single context term vector of weights. 4. The method of claim 3 , wherein merging pairs of context terms vector of weights for which a determined similarity score meets a similarity threshold into a single context term vector of weights comprises summing the pair of context terms vector of weights to form a single context terms vector of weights. 5. The method of claim 3 , further comprising decreasing the similarity threshold after one or more iterations of determining similarity scores and merging pairs of context terms vector of weights. 6. The method of claim 3 , wherein clustering the context term lists into the plurality of clusters comprises, prior to determining similarity scores: determining groups of similar context terms from the context term lists; and for each group of similar context terms, normalizing the context terms in the context term lists to a common context term. 7. The method of claim 4 , wherein for each of the clusters, selecting a representative term for the cluster comprises: ranking the context terms in the cluster; and selecting a highest ranking context term in the cluster as the representative term. 8. The method of claim 7 , wherein ranking the context terms in the cluster comprises: ranking the context terms based on the sum of the relevance scores associated with the context term in the cluster. 9. The method of claim 8 , wherein ranking the context terms in the cluster comprises: decreasing the relevance score of each context term in proportion to an inverse document frequency score of the context term derived from the occurrence of the context term in the resources. 10. The method of claim 9 , wherein ranking the context terms in the cluster comprises: increasing the relevance score of each context term in proportion an authority score of each resource in which the context term occurs, the authority score measuring the importance of the resource relative to other resources. 11. The method of claim 9 , wherein ranking the context terms in the cluster comprises: increasing the relevance score of each context term that was included in a query with the person name. 12. The method of claim 1 , wherein the relevance threshold is one of: a minimum relevance score relative to relevance scores of other resources identified for the respective first query; and a minimum ranking relative to rankings of other resources identified for the respective first query. 13. A non-transitory computer readable storage device storing instructions executable by a data processing apparatus, and upon such execution cause the data process apparatus to perform operations comprising: generating context term lists for a person name, each context term list including a list of context terms that co-occur with the person name in queries to which a resource for the person name is determined to be responsive, and each of the resources to which the context term lists for the person name correspond being different resources, the generating the context term lists comprising: for each of the selected resources: selecting first queries for which a relevance of the resource is determined to meet a first relevance threshold relative to other resources, each of the first queries including the person name; selecting terms that are not the person name and that co-occur with the person name in the first queries as context terms for the person name and the resource; selecting second queries for which a relevance of the resource is determined meet a second relevance threshold, each of the second queries not including the person name; selecting terms from the second queries as context terms for the person name and the resource; and generating a respective context term list for the resource and for the person name from the context terms selected from the first queries and the second queries; clustering the context term lists into a plurality of clusters, each of the clusters of context term lists including context term lists that are most similar to the cluster relative to other clusters, the clustering comprising iteratively determining a measure of similarity between pairs of context term lists based on the context terms in the pairs of lists and merging context term list

Assignees

Inventors

Classifications

  • using system suggestions (G06F16/3325 takes precedence) · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9830379B2 cover?
Methods, systems and apparatus, including computer programs encoded on a computer storage medium, for disambiguating names in a document corpus. In an aspect, a method includes generating context term lists for a person name, each context term list being a list of context terms from a resource for the person name; clustering the context term lists into a plurality of clusters, each of the clust…
Who is the assignee on this patent?
Gupta Nitin, Das Abhinandan S, Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/3322. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).