System and method for providing technology assisted data review with optimizing features

US2018113935A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018113935-A1
Application numberUS-201715849375-A
CountryUS
Kind codeA1
Filing dateDec 20, 2017
Priority dateMar 13, 2013
Publication dateApr 26, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The invention provided is a system configured to use a topic-related generative model to build a document map from a plurality of documents in a storage medium and generate a control set from the plurality of documents including at least two stratified document sets. The system then receives a set of control set metrics regarding the control set from a user. The system selects a machine call responsive document from a document map based on a determined predictive responsiveness for that document. The system receives a responsiveness call from a user through the task/queue framework regarding a machine call document. Finally, the system compares the responsiveness of the machine call document to the control set metrics and rebuilds the document map based on the results of the comparison between the machine call document responsiveness and the control set metrics.

First claim

Opening claim text (preview).

1 . An electronic document system, comprising: a processor; a data store including a plurality of documents; a non-transitory computer readable medium, comprising instructions for: generating a document map for the plurality of documents within the data store using a topic-related generative model for the plurality of documents by clustering the plurality of documents into topics based on the topic-related generative model; selecting a control set of documents from the plurality of documents, wherein the control set of documents is selected from a first strata of the plurality of documents and a second strata of the plurality of documents; sending the control set of documents to a user; receiving a control set metric regarding the control set of documents from the user, wherein the control set metric includes an indicator of responsiveness for each of the documents of the control set of documents; the data review system performing the steps of: a) determining a responsiveness score for each of the plurality of documents according to a scoring algorithm including determining a document responsiveness probability for the document, determining a weighted topic score for the document for each of a set of topics in the topic-related generative model based on the document responsiveness probability and a topic-document weight between the topic and the document, generating an initial responsiveness score based on the topic-document weights of the document for each topic and the weighted topic score, and normalizing the document responsiveness probability based on the initial responsiveness score to determine the responsiveness score for the document; b) determining a set of responsive documents and a set of non-responsive documents of the plurality of documents based on the responsiveness score determined for each of the plurality of documents and a decision boundary score; c) determining a confidence score for the data review system using the responsiveness score for each of the documents of the control set and the indicator of responsiveness for each of the control set documents received from the user; d) selecting one or more of the plurality of documents based on the responsiveness scores of the plurality of documents, wherein the responsiveness score of each of the one or more selected documents is at or near the decision boundary score; e) presenting the one or more selected documents to the user; f) receiving an indicator of responsiveness from the user for each of the selected documents; g) refining the scoring algorithm based on the indicator of responsiveness for each of the selected document; and h) generating a desired confidence score for the document system and presenting the set of responsive documents to the user when the desired confidence score for the document system is achieved, wherein the confidence score for the document system is determined by comparing the responsiveness score for the documents of the control set to the indicator of responsiveness for the documents of the control set received from the user. 2 . The system of claim 1 , wherein the initial responsiveness score is a sum over all the topics of the topic-generative model of the product of the topic-document weight of the document for each topic and the weighted topic score for the topic. 3 . The system of claim 1 , wherein the confidence score is based on a recall measurement and a precision measurement of the electronic document system. 4 . The system of claim 1 , wherein the confidence score is an F1 score. 5 . The system of claim 1 , wherein the topic-related generative model is a Latent Dirichlet Allocation model. 6 . The system of claim 1 , wherein the control set of documents are generated based on the initial responsiveness score for each of the documents. 7 . The system of claim 1 , wherein the first strata of the plurality of documents includes non-responsive documents with the initial responsiveness score below the decision boundary and the second strata of the plurality of documents includes responsive documents with an initial responsiveness score above the decision boundary. 8 . The system of claim 7 , wherein the first strata is randomly selected from documents with the initial responsiveness score below the decision boundary and the second strata is randomly selected from documents with the initial responsiveness score above the decision boundary. 9 . The system of claim 1 , wherein the initial responsiveness score was generated based on user interaction with the plurality of documents. 10 . The system of claim 9 , wherein the user interaction includes a keyword search of the plurality of documents. 11 . The system of claim 1 , wherein generating a desired confidence score for the data review system comprises: repeating steps a-g until the determined confidence score for the data review system is the desired confidence score for the data review system. 12 . A method, comprising: building a document map for a plurality of documents within a data store of a document system using a topic-related generative model for the plurality of documents by clustering the plurality of documents into topics based on the topic-related generative model; selecting a control set of documents from the plurality of documents, wherein the control set of documents is selected from a first strata of the plurality of documents and a second strata of the plurality of documents; sending the control set of documents to a user; receiving a control set metric regarding the control set of documents from the user, wherein the control set metric includes an indicator of responsiveness for each of the documents of the control set of documents; performing, by the document system, the steps of: a) determining a responsiveness score for each of the plurality of documents based on a scoring algorithm including determining a document responsiveness probability for the document, determining a weighted topic score for the document for each of a set of topics in the topic-related generative model based on the document responsiveness probability and a topic-document weight between the topic and the document, generating an initial responsiveness score based on the topic-document weights of the document for each topic and the weighted topic score, and normalizing the document responsiveness probability based on the initial responsiveness score to determine the responsiveness score for the document; b) determining a set of responsive documents and a set of non-responsive documents of the plurality of documents based on the responsiveness score determined for each of the plurality of documents and a decision boundary score; c) determining a confidence score for the document system using the responsiveness score for each of the documents of the control set and the indicator of responsiveness for each of the control set documents received from the user; d) selecting one or more of the plurality of documents based on the responsiveness scores of the plurality of documents, wherein the responsiveness score of each of the one or more selected documents is at or near the decision boundary score; e) presenting the one or more selected documents to the user; f) receiving an indicator of responsiveness from the user for each of the selected documents; g) refining the scoring algorithm based on the indicator of responsiveness for each of the selected document; and h) generating a desired confidence score for the document system, and presenting the set of responsive documents to the user when the desired confidence score for the document system is achieved, wherein the confidence score for the document s

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018113935A1 cover?
The invention provided is a system configured to use a topic-related generative model to build a document map from a plurality of documents in a storage medium and generate a control set from the plurality of documents including at least two stratified document sets. The system then receives a set of control set metrics regarding the control set from a user. The system selects a machine call re…
Who is the assignee on this patent?
Open Text Holdings Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30719. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 26 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).