Legal document search based on legal similarity

US2017132730A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017132730-A1
Application numberUS-201514938041-A
CountryUS
Kind codeA1
Filing dateNov 11, 2015
Priority dateNov 11, 2015
Publication dateMay 11, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system are provided for performing a legal document search. The method includes finding, by a processor, for each of a plurality of documents, a respective law clause related thereto, to obtain a plurality of related law clauses. The method further includes constructing, by the processor, a graph having nodes defined by the plurality of documents and the plurality of related law clauses and having edges defined by (1) relations between the plurality of documents and the plurality of related law clauses and (2) relations between the plurality of documents. The method further includes identifying, by the processor, from the plurality of documents, one or more candidate documents that are similar to an input query document by mining the graph using similarity criteria.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for performing a legal document search, the method comprising: finding, by a processor, for each of a plurality of documents, a respective law clause related thereto, to obtain a plurality of related law clauses; constructing, by the processor, a graph having nodes defined by the plurality of documents and the plurality of related law clauses and having edges defined by (1) relations between the plurality of documents and the plurality of related law clauses and (2) relations between the plurality of documents; and identifying, by the processor, from the plurality of documents, one or more candidate documents that are similar to an input query document by mining the graph using similarity criteria. 2 . The method of claim 1 , wherein said finding step comprises determining a respective confidence score for each pairing of a given one of the plurality of documents and the respective law clause related thereto, the respective confidence score serving as a ranking for the given one of the plurality of documents with respect to the respective law clause related thereto. 3 . The method of claim 2 , wherein the one or more candidate documents comprise a plurality of candidate documents, and the method further comprises re-ranking the plurality of candidate documents based on a number of the plurality of related law clauses that occur in both the plurality of candidate documents and the input query document. 4 . The method of claim 3 , wherein said re-ranking step re-ranks the plurality of candidate documents using at least one of a mean reciprocal rank and a mean averaged precision. 5 . The method of claim 3 , wherein the re-ranking step re-ranks the plurality of documents using an Integer Linear Programming (ILP) solver. 6 . The method of claim 1 , wherein said finding step comprises: representing law clause and document combinations as respective vectors; and measuring a cosine similarity between the respective vectors to identify the respective law clause for the each of the documents with a respective confidence score. 7 . The method of claim 6 , wherein the vectors are Term Frequency-Inverse Document Frequency vectors. 8 . The method of claim 1 , further comprising training a word embedding model using only law clauses and omitting documents, wherein the respective vectors are formed using the word embedding model. 9 . The method of claim 1 , wherein said identifying step comprises displaying, on a hardware display device, the one or more candidate documents. 10 . The method of claim 1 , wherein said identifying step comprises transmitting, over one or more networks by a hardware transmission device, the one or more candidate documents to a remote computing device. 11 . The method of claim 1 , wherein the graph is mined using a random walk path formulation. 12 . The method of claim 11 , wherein the random walk path formulation includes a restart component. 13 . The method of claim 11 , wherein the random walk path formulation includes indirect relations between the law clauses and the plurality of documents. 14 . The method of claim 1 , wherein the graph is mined using an approach that considers only the nodes defined by the plurality of documents while omitting the nodes defined by the plurality of related law clauses. 15 . The method of claim 1 , wherein the graph is mined using an approach that considers only (1) the relations between the plurality of documents and the plurality of related law clauses while omitting (2) the relations between the plurality of documents. 16 . The method of claim 1 , wherein the graph is mined using an approach that considers both (1) the relations between the plurality of documents and the plurality of related law clauses and (2) the relations between the plurality of documents. 17 . A computer program product for performing a legal document search, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising: finding, by a processor, for each of a plurality of documents, a respective law clause related thereto, to obtain a plurality of related law clauses; constructing, by the processor, a graph having nodes defined by the plurality of documents and the plurality of related law clauses and having edges defined by relations between (1) the plurality of documents and the plurality of related law clauses and (2) the plurality of documents; and identifying, by the processor, from the plurality of documents, one or more candidate documents that are similar to an input query document by mining the graph using similarity criteria. 18 . The system of claim 17 , wherein the processor determines a respective confidence score for each pairing of a given one of the plurality of documents and the respective law clause related thereto, the respective confidence score serving as a ranking for the given one of the plurality of documents with respect to the respective law clause related thereto. 19 . The system of claim 18 , wherein the one or more candidate documents comprise a plurality of candidate documents, and the processor re-ranks the plurality of candidate documents based on a number of the plurality of related law clauses that occur in both the plurality of candidate documents and the input query document. 20 . A system for performing a legal document search, the system comprising: a hardware processor and a memory device, configured to: find, for each of a plurality of documents, a respective law clause related thereto, to obtain a plurality of related law clauses; construct a graph having nodes defined by the plurality of documents and the plurality of related law clauses and having edges defined by (1) relations between the plurality of documents and the plurality of related law clauses and (2) relations between the plurality of documents; and identify, from the plurality of documents, one or more candidate documents that are similar to an input query document by mining the graph using similarity criteria; and a transmission server for transmitting, over one or more networks, the one or more candidate documents to a remote computing device.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017132730A1 cover?
A method and system are provided for performing a legal document search. The method includes finding, by a processor, for each of a plurality of documents, a respective law clause related thereto, to obtain a plurality of related law clauses. The method further includes constructing, by the processor, a graph having nodes defined by the plurality of documents and the plurality of related law cl…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06Q50/18. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 11 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).