Clustering of search results

US9443008B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9443008-B2
Application numberUS-83595410-A
CountryUS
Kind codeB2
Filing dateJul 14, 2010
Priority dateJul 14, 2010
Publication dateSep 13, 2016
Grant dateSep 13, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One particular embodiment clusters a plurality of documents using one or more clustering algorithms to obtain one or more first sets of clusters, wherein: each first set of clusters results from clustering the documents using one of the clustering algorithms; and with respect to each first set of clusters, each of the documents belongs to one of the clusters from the first set of clusters; accesses a search query; identifies a search result in response to the search query, wherein the search result comprises two or more of the documents; and clusters the search result to obtain a second set of clusters, wherein each document of the search result belongs to one of the clusters from the second set of clusters.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: clustering a plurality of documents to obtain one or more first sets of clusters, wherein a first cluster of the one or more first sets of clusters comprises at least two first individual documents of the plurality of documents; accessing a search query after the clustering the plurality of documents; identifying a search result in response to the search query, wherein the search result comprises the at least two first individual documents of the plurality of documents; and clustering the search result to obtain a second set of clusters, wherein second individual documents of the search result belong to one second cluster of the second set of clusters, the clustering the search result comprising: for a unique pair of the second individual documents, computing a similarity measure for the second individual documents with respect to the search query based, at least in part, on the one or more first sets of clusters, wherein the similarity measure for the second individual documents is computed based, at least in part, on a weighted sum of a clustering similarity between the second individual documents with respect to the one or more first sets of clusters and a query-based similarity between the second individual documents with respect to the search query; and clustering the second individual documents based, at least in part, on the similarity measure; wherein the query-based similarity between the second individual documents is based, at least in part, on a fraction of a sum of: a textual match between the search query and the second individual documents to the textual match between the query, and the intersection of the documents; and wherein the clustering similarity between the second individual documents with respect to the one or more first sets of clusters is based, at least in part, on a weighted combination of agreements between the one or more first sets of clusters and the second individual documents. 2. The method recited in claim 1 , wherein for the unique pair of result documents, the computing of the similarity measure for the unique pair of result documents as a weighted sum is further based, at least in part, on a cosine similarity between the two documents. 3. The method recited in claim 1 , further comprising: accessing a new document; and determining whether the new document belongs to a cluster from the first set of clusters; in response to determining that the new document belongs to the cluster from the first set of clusters, adding the new document to the cluster from first set of clusters; and in response to determining that the new document does not belong to any cluster from the first set of clusters, creating a new cluster, adding the new document to the new cluster, and adding the new cluster to the first set of clusters. 4. The method recited in claim 1 , further comprising grouping clusters from the first set of clusters into a plurality of topic models, wherein individual clusters from the first set of clusters belong to one of the topic models. 5. The method recited in claim 4 , further comprising: accessing a new document; and determining one of the topic models corresponding to a first clustering associated with the new document; determining whether the new document belongs to a cluster of the one of the topic models; in response to determining that the new document belongs to the cluster of the one of the topic models, adding the new document to the cluster of the one of the topic models; and in response to determining that the new document does not belong to any clusters of the one of the topic models, creating a new cluster, adding the new document to the new cluster, adding the new cluster to a first set of clusters, and assigning the new cluster to the one of the topic models. 6. The method recited in claim 1 , further comprising presenting the second individual documents of the search result according to the second set of clusters. 7. A system, comprising: a memory comprising instructions executable by one or more processors; and one or more processors coupled to the memory, the one or more processors to execute the instructions to: cluster a plurality of documents to obtain one or more first sets of clusters, wherein a first cluster of the one or more first sets of clusters is to comprise at least two first individual documents of the plurality of documents; access a search query after the cluster of the plurality of documents; identify a search result in response to the search query, the search result to comprise the at least two first individual documents of the plurality of documents; cluster the search result to obtain a second set of clusters, second individual documents of the search result to belong to one second cluster of the second set of clusters, the cluster of the search result to comprise: for a unique pair of the second individual documents a similarity measure for the result documents with respect to the search query to be computed to be based, at least in part, on the one or more first sets of clusters, wherein the similarity measure for the second individual documents is to be computed to be based, at least in part, on a weighted sum of a clustering similarity between the second individual documents with respect to the one or more first sets of clusters and a query-based similarity between the second individual documents with respect to the search query; and the second individual documents to be clustered to be based, at least in part, on the similarity measure; wherein the query-based similarity between the second individual documents is to be based, at least in part, on a fraction of a sum of: a textual match between the search query and the second individual documents to the textual match between the query, and the intersection of the documents; and wherein the clustering similarity between the second individual documents with respect to the one or more first sets of clusters is to be based, at least in part, on a weighted combination of agreements between the one or more first sets of clusters and the second individual documents. 8. The system recited in claim 7 , wherein for the unique pair of result documents, to compute the similarity measure as a weighted sum is to be further based, at least in part, on a cosine similarity between the two documents. 9. The system recited in claim 7 , wherein the instructions are further executable by the one or more processors to: access a new document; and determine whether the new document is to belong to a cluster from the first set of clusters; in response to a determination that the new document is to belong to the cluster from the first set of clusters, to add the new document to the one of the clusters from the first set of clusters; and in response to a determination that the new document does not belong to any cluster from the first set of clusters, to create a new cluster, to add the new document to the new cluster, and to add the new cluster to the first set of clusters. 10. The system recited in claim 7 , wherein the instructions are further executable by the one or more processors to group clusters from the first set of clusters into a plurality of topic models, individual clusters from the first set of clusters to belong to one of the topic models. 11. The system recited in claim 10 , wherein the instructions are further executable by the one or more processors to: access a new document; and determine one of the topic models corresponding to a first clustering to be associated with the new document; determine whether the new document is to belong to a cluster of the one of the topic models

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • Query execution (filtering based on additional data G06F16/335) · CPC title

  • G06F16/35Primary

    Clustering; Classification · CPC title

  • Presentation of query results · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9443008B2 cover?
One particular embodiment clusters a plurality of documents using one or more clustering algorithms to obtain one or more first sets of clusters, wherein: each first set of clusters results from clustering the documents using one of the clustering algorithms; and with respect to each first set of clusters, each of the documents belongs to one of the clusters from the first set of clusters; acce…
Who is the assignee on this patent?
Vadrevu Srinivas, Chang Yi, Zheng Zhaohui, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F17/30705. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).