Systems and methods for identifying a named entity
US-9009153-B2 · Apr 14, 2015 · US
US9830379B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9830379-B2 |
| Application number | US-95525310-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 29, 2010 |
| Priority date | Nov 29, 2010 |
| Publication date | Nov 28, 2017 |
| Grant date | Nov 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems and apparatus, including computer programs encoded on a computer storage medium, for disambiguating names in a document corpus. In an aspect, a method includes generating context term lists for a person name, each context term list being a list of context terms from a resource for the person name; clustering the context term lists into a plurality of clusters, each of the clusters of context term lists including context term lists that are most similar to the cluster relative to other clusters; for each of the clusters, selecting a representative term for the cluster; receiving the person name as a search query; and generating a plurality of query suggestions from the search query and the representative terms for the clusters, each query suggesting being a combination of the person name and one representative term.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method performed by a data processing apparatus, the method comprising: generating context term lists for a person name, each context term list including a list of context terms that co-occur with the person name in queries to which a resource for the person name is determined to be responsive, and each of the resources to which the context term lists for the person name correspond being different resources, the generating the context term lists comprising: for each of the selected resources: selecting first queries for which a relevance of the resource is determined to meet a first relevance threshold relative to other resources, each of the first queries including the person name; selecting terms that are not the person name and that co-occur with the person name in the first queries as context terms for the person name and the resource; selecting second queries for which a relevance of the resource is determined meet a second relevance threshold, each of the second queries not including the person name; selecting terms from the second queries as context terms for the person name and the resource; and generating a respective context term list for the resource and for the person name from the context terms selected from the first queries and the second queries; clustering the context term lists into a plurality of clusters, each of the clusters of context term lists including context term lists that are most similar to the cluster relative to other clusters, the clustering comprising iteratively determining a measure of similarity between pairs of context term lists based on the context terms in the pairs of lists and merging context term lists based on the measures of similarity; selecting from each of the clusters, a respective context term, the respective context term being a representative context term for the cluster; receiving the person name as a search query; and generating a plurality of query suggestions from the search query and the representative terms for the clusters, each query suggesting being a combination of the person name and one representative term. 2. The method of claim 1 , wherein generating context term lists further comprises: for each resource, identifying third terms that are determined to be contextually significant among the resources; and selecting the third terms as context terms for the person name. 3. The method of claim 1 , wherein: each context term list comprises a context term vector of weights, each element of each context term vector corresponding to a context term and having a non-zero value if the context term is represented by the context term list, the non-zero value being a relevance score that measures the relevance of the context term to the resource to which the context term list corresponds; determining a measure of similarity between pairs of context term lists based on the context terms in the pairs of lists and merging context term lists based on the measures of similarity comprises: determining similarity scores for the context term vector of weights for the corresponding pairs of context term lists, each similarity score measuring the similarity of the context term vector of weights of the corresponding pairs of context term lists; merging pairs of context terms vector of weights for which a determined similarity score meets a similarity threshold into a single context term vector of weights. 4. The method of claim 3 , wherein merging pairs of context terms vector of weights for which a determined similarity score meets a similarity threshold into a single context term vector of weights comprises summing the pair of context terms vector of weights to form a single context terms vector of weights. 5. The method of claim 3 , further comprising decreasing the similarity threshold after one or more iterations of determining similarity scores and merging pairs of context terms vector of weights. 6. The method of claim 3 , wherein clustering the context term lists into the plurality of clusters comprises, prior to determining similarity scores: determining groups of similar context terms from the context term lists; and for each group of similar context terms, normalizing the context terms in the context term lists to a common context term. 7. The method of claim 4 , wherein for each of the clusters, selecting a representative term for the cluster comprises: ranking the context terms in the cluster; and selecting a highest ranking context term in the cluster as the representative term. 8. The method of claim 7 , wherein ranking the context terms in the cluster comprises: ranking the context terms based on the sum of the relevance scores associated with the context term in the cluster. 9. The method of claim 8 , wherein ranking the context terms in the cluster comprises: decreasing the relevance score of each context term in proportion to an inverse document frequency score of the context term derived from the occurrence of the context term in the resources. 10. The method of claim 9 , wherein ranking the context terms in the cluster comprises: increasing the relevance score of each context term in proportion an authority score of each resource in which the context term occurs, the authority score measuring the importance of the resource relative to other resources. 11. The method of claim 9 , wherein ranking the context terms in the cluster comprises: increasing the relevance score of each context term that was included in a query with the person name. 12. The method of claim 1 , wherein the relevance threshold is one of: a minimum relevance score relative to relevance scores of other resources identified for the respective first query; and a minimum ranking relative to rankings of other resources identified for the respective first query. 13. A non-transitory computer readable storage device storing instructions executable by a data processing apparatus, and upon such execution cause the data process apparatus to perform operations comprising: generating context term lists for a person name, each context term list including a list of context terms that co-occur with the person name in queries to which a resource for the person name is determined to be responsive, and each of the resources to which the context term lists for the person name correspond being different resources, the generating the context term lists comprising: for each of the selected resources: selecting first queries for which a relevance of the resource is determined to meet a first relevance threshold relative to other resources, each of the first queries including the person name; selecting terms that are not the person name and that co-occur with the person name in the first queries as context terms for the person name and the resource; selecting second queries for which a relevance of the resource is determined meet a second relevance threshold, each of the second queries not including the person name; selecting terms from the second queries as context terms for the person name and the resource; and generating a respective context term list for the resource and for the person name from the context terms selected from the first queries and the second queries; clustering the context term lists into a plurality of clusters, each of the clusters of context term lists including context term lists that are most similar to the cluster relative to other clusters, the clustering comprising iteratively determining a measure of similarity between pairs of context term lists based on the context terms in the pairs of lists and merging context term list
using system suggestions (G06F16/3325 takes precedence) · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.