Flattening a cluster hierarchy tree to filter documents

US9305076B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9305076-B1
Application numberUS-201313836970-A
CountryUS
Kind codeB1
Filing dateMar 15, 2013
Priority dateJun 28, 2012
Publication dateApr 5, 2016
Grant dateApr 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an automatic electronic discovery search tool, documents can be clustered into a cluster hierarchy according to a first clustering approach. Once a hierarchy tree is created, portions of the tree can be flattened for application of a second superior clustering approach. Clustered portions may be used in a document review tool or further filtered according to specified criteria. Clusters may be filtered or used in a document review tool.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of filtering a set of documents considered to be relevant to a subject, comprising: selecting a set of documents determined to be relevant to a subject; clustering, by a processing device, the set of documents into a hierarchy of clusters based on a partitional clustering approach, the cluster hierarchy comprising a tree structure having a plurality of branches, each branch having at least one of a leaf node or a sub tree, wherein the clustering comprises dividing the set of documents into a number of clusters of the hierarchy; flattening, by a processing device, one or more branches of the cluster hierarchy, wherein the flattening comprises recreating a cluster that was divided by combining two or more clusters that resulted from the cluster's division; and re-clustering, by a processing device, documents in the flattened branches based on an agglomerative clustering approach to form a modified cluster hierarchy, wherein when more than one flattened branch is selected for re-clustering, each flattened branch is re-clustered in parallel with other flattened branches, and wherein the modified cluster hierarchy includes clusters formed by partitional clustering and clusters formed by agglomerative clustering. 2. The method of claim 1 , wherein the step of flattening one or more branches of the cluster hierarchy occurs when the cohesiveness of a cluster in a particular branch falls below a threshold. 3. The method of claim 1 , wherein the set of documents is distributed across a plurality of client devices in a hosted user environment. 4. The method of claim 1 , further comprising receiving a maximum tree depth level; and clustering, by a processing device, documents in the set of documents into a hierarchy according to a partitional clustering approach to the received maximum tree depth level. 5. The method of claim 1 , further comprising receiving a number of levels of the cluster hierarchy to be flattened; and flattening, by a processing device, one or more branches of the cluster hierarchy at the received number of levels. 6. The method of claim 1 , further comprising receiving a maximum number of documents per flattened branch; and flattening, by a processing device, one or more branches of the cluster hierarchy wherein each flattened branch contains fewer than the received maximum number of documents per flattened branch. 7. The method of claim 1 , further comprising filtering one or more re-clustered branches in accordance with received filter criteria. 8. The method of claim 1 , wherein the step of clustering documents in the set of documents into a hierarchy according to a partitional clustering approach is performed based on one or more non-content fields. 9. The method of claim 1 , wherein the step of re-clustering the flattened branches according to an agglomerative clustering approach is performed based on one or more content fields. 10. The method of claim 1 , wherein the flattening comprises combining the one or more clusters of the one or more branches into fewer, less divided clusters. 11. The method of claim 1 , wherein after the re-clustering, the hierarchy includes a first set of clusters generated based on the partitional clustering approach and a second set of clusters generated based on the agglomerative clustering approach, wherein the first set of clusters includes documents that were partitionally clustered but not agglomeratively clustered, and wherein the second set of clusters includes documents that were both partitionally clustered and agglomerative clustered. 12. The method of claim 1 , wherein a first number of documents in the set of documents on which the partitional clustering approach is used is larger than a second number of documents in the flattened clusters on which the agglomerative clustering approach is used. 13. A system for clustering a set of documents considered to be relevant to a subject, comprising: one or more processors; and a storage device having instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to: select a set of documents determined to be relevant to a subject; cluster the set of documents into a hierarchy of clusters based on a partitional clustering approach, the cluster hierarchy comprising a tree structure having a plurality of branches, each branch having at least one of a leaf node or a sub tree, wherein the clustering comprises dividing the set of documents into a number of clusters of the hierarchy; flatten one or more branches of the cluster hierarchy, wherein the flattening comprises recreating a cluster that was divided by combining two or more clusters that resulted from the cluster's division; and re-cluster documents in the flattened branches based on an agglomerative clustering approach to form a modified cluster hierarchy, wherein when more than one flattened branch is selected for re-clustering, each flattened branch is re-clustered in parallel with other flattened branches, and wherein the modified cluster hierarchy includes clusters formed by partitional clustering and clusters formed by agglomerative clustering. 14. The system of claim 13 , the storage device having further instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to flatten one or more branches of the cluster hierarchy when the cohesiveness of a cluster in a particular branch falls below a threshold. 15. The system of claim 13 , wherein the set of documents is distributed across a plurality of client devices in a hosted user environment. 16. The system of claim 13 , the storage devices having further instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to: receive a maximum tree depth level; and cluster documents in the set of documents into a hierarchy according to a partitional cluster approach to the received maximum tree depth level. 17. The system of claim 13 , the storage device having further instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to: receive a number of levels of the cluster hierarchy to be flattened; and flatten one or more branches of the cluster hierarchy at the received number of levels. 18. The system of claim 13 , the storage device having further instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to: receive a maximum number of documents per flattened branch; and flatten one or more branches of the cluster hierarchy, wherein each flattened branch contains fewer than the received maximum number of documents per flattened branch.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9305076B1 cover?
In an automatic electronic discovery search tool, documents can be clustered into a cluster hierarchy according to a first clustering approach. Once a hierarchy tree is created, portions of the tree can be flattened for application of a second superior clustering approach. Clustered portions may be used in a document review tool or further filtered according to specified criteria. Clusters may …
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/285. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).