What technology area does this patent fall under?

Primary CPC classification G06F16/93. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for document analysis to produce, consume and analyze content-by-example logs for documents

US12585707B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12585707-B2
Application number	US-202217851506-A
Country	US
Kind code	B2
Filing date	Jun 28, 2022
Priority date	Jun 28, 2022
Publication date	Mar 24, 2026
Grant date	Mar 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Document analysis systems and methods for the generation of a content-by-example log that expresses withheld documents in terms of a set of disclosed documents are disclosed. Additionally, document analysis systems and methods for the analysis of such a content-by-example log to determine withheld documents of interest without access to those withheld documents are disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for document analysis, comprising: a processor; a non-transitory computer readable medium, comprising instructions for: receiving, by a receiving party, a content-by-example log, the content-by-example log including an entry for each withheld document of a set of withheld documents, wherein the set of withheld documents is inaccessible to the receiving party, wherein the entry for each withheld document associates an identifier for the corresponding withheld document with example identifiers for a set of example documents, wherein the set of example documents exemplify the corresponding withheld document, wherein the set of example documents are disclosed documents accessible to the receiving party; storing the content-by-example log at a data store; analyzing the content-by-example log to determine identifiers of withheld documents of interest by: transforming the content-by-example log into a feature vector index, wherein the feature vector index comprises: a feature vector associated with each of the identifiers of the withheld documents, wherein the feature vector comprises: a set of features determined from the set of example documents for the corresponding withheld document, wherein creating a respective feature vector for each withheld document comprises: generating a document feature vector for each example document identified in the content-by-example log as being associated with the withheld document, wherein the document feature vector comprises a weighted set of text based features determined from the respective example document, wherein the feature vector associates with the identifier for the withheld document in the feature vector index by using the document feature vectors generated for each example document of the set of example documents; and determining the identifiers of withheld documents of interest based on the feature vector index by: obtaining labels associated with identifiers of withheld documents; obtaining a supervised machine learning model trained at a first time based on obtained labels for documents; further training, at a second time after the first time, the supervised machine learning model using newly obtained labels for withheld documents identified in the content-by-example log, wherein the further training at the second time of the supervised machine learning model utilizes features determined from the example documents and provided by the feature vector index, wherein the features are associated with the withheld documents; ranking identifiers for withheld documents of the content-by-example log based on the feature vector index using the further trained supervised machine learning model; and selecting a number of top ranked identifiers of withheld documents as identifiers of the set of withheld documents of interest; and generating requests for a producing party having the set of withheld documents, wherein the generated requests correspond to at least some of the set of withheld documents of interest by specifying a subset of identifiers of the at least some of the set of withheld documents of interest. 2 . The system of claim 1 , wherein determining the identifiers of withheld documents of interest comprises: searching the identifiers for the withheld documents using the feature vector index based on a query to rank the identifiers for the withheld documents; and selecting a number of top ranked identifiers of withheld documents as identifiers of the set of withheld documents of interest. 3 . The system of claim 2 , wherein the query is determined from content associated with the disclosed documents accessible by the receiving party. 4 . The system of claim 1 , wherein the features of the feature vector are the identifiers of the set of example documents. 5 . The system of claim 1 , wherein determining the identifiers of withheld documents of interest comprises: generating a set of clusters of identifiers of withheld documents by clustering the identifiers for the withheld documents included in the content-by-example log based on the feature vector index; selecting an identifier from each of the set of clusters of identifiers of withheld documents as identifiers of the set of withheld documents of interest. 6 . The system of claim 5 , wherein the identifier is selected from a cluster of the set of clusters based on a distance of that identifier from a centroid of that cluster. 7 . A method for document analysis, comprising: receiving, by a receiving party, a content-by-example log, the content-by-example log including an entry for each withheld document of a set of withheld documents, wherein the set of withheld documents is inaccessible to the receiving party, wherein the entry for each withheld document associates an identifier for the corresponding withheld document with example identifiers for a set of example documents, wherein the set of example documents exemplify the corresponding withheld document, wherein the set of example documents are disclosed documents accessible to the receiving party; storing the content-by-example log at a data store; analyzing the content-by-example log to determine identifiers of withheld documents of interest by: transforming the content-by-example log into a feature vector index, wherein the feature vector index comprises: a feature vector associated with each of the identifiers of the withheld documents, wherein the feature vector comprises: a set of features determined from the set of example documents for the corresponding withheld document, wherein creating a respective feature vector for each withheld document comprises: generating a document feature vector for each example document identified in the content-by-example log as being associated with the withheld document, wherein the document feature vector comprises a weighted set of text based features determined from the respective example document, wherein the feature vector associates with the identifier for the withheld document in the feature vector index by using the document feature vectors generated for each example document of the set of example documents; and determining the identifiers of withheld documents of interest based on the feature vector index by: obtaining labels associated with identifiers of withheld documents; obtaining a supervised machine learning model trained at a first time based on obtained labels for documents; further training, at a second time after the first time, the supervised machine learning model using newly obtained labels for withheld documents identified in the content-by-example log, wherein the further training at the second time of the supervised machine learning model utilizes features determined from the example documents and provided by the feature vector index, wherein the features are associated with the withheld documents; ranking identifiers for withheld documents of the content-by-example log based on the feature vector index using the further trained supervised machine learning model; and selecting a number of top ranked identifiers of withheld documents as identifiers of the set of withheld documents of interest; and generating requests for a producing party having the set of withheld documents, wherein the generated requests correspond to at least some of the set of withheld documents of interest by specifying a subset of identifiers of the at least some of the set of withheld documents of interest. 8 . The method of claim 7 , wherein determining the identifiers of withheld documents of interest comprises: searching the identifiers for the withheld documents using the feature vector index based on a query to rank the identifiers for the withheld documents; and select

Assignees

Open Text Inc

Inventors

Classifications

G06F16/90335
Query processing · CPC title
G06Q10/10
Office automation; Time management · CPC title
G06Q50/18
Legal services · CPC title
G06F16/3347
using vector based model · CPC title
G06F16/93Primary
Document management systems · CPC title

Patent family

Related publications grouped by family.

View patent family 89322902

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12585707B2 cover?: Document analysis systems and methods for the generation of a content-by-example log that expresses withheld documents in terms of a set of disclosed documents are disclosed. Additionally, document analysis systems and methods for the analysis of such a content-by-example log to determine withheld documents of interest without access to those withheld documents are disclosed.
Who is the assignee on this patent?: Open Text Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/93. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).