What technology area does this patent fall under?

Primary CPC classification G06F40/40. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Aug 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Structured document generation using document-scale embeddings

US2025259013A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2025259013-A1
Application number	US-202418441889-A
Country	US
Kind code	A1
Filing date	Feb 14, 2024
Priority date	Feb 14, 2024
Publication date	Aug 14, 2025
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and related system for generating document embeddings within an embedding space based on a set of structured documents by determining (i) a first vector based on a first segment of a first document and (ii) a second vector based on a second segment of the first document and updating association vectors indicating the second segment based on a distance between the first and second vectors. The method also includes generating a document embedding based on the association vectors, generating a candidate vector based on a candidate document, and determining a result indicating that a second distance between the candidate vector and a first document embedding satisfies a document embedding distance threshold. The method may also include generating a new document by providing, to a text generation model, a portion of the candidate document and a portion of the second segment of the first document.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for generating a document by forming document embeddings indicating relationships between different segments of structured documents, the system comprising one or more non-transitory, machine-readable media storing program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: generating document embeddings within an embedding space based on a corpora of structured documents by, for each respective document of the structured documents: providing (i) a leader segment of the respective document to a first set of encoder network layers to determine a leader vector and (ii) a supporter segment of the respective document to the first set of encoder network layers to determine a supporter vector; updating a set of association vectors indicating the supporter segment based on a first distance between the leader vector and the supporter vector; and generating a respective embedding of the document embeddings in the embedding space by providing the set of association vectors to a second set of encoder network layers; generating a candidate vector based on a candidate document; determining whether a second distance between the candidate vector and a first document embedding of the document embeddings satisfies a document embedding distance threshold, wherein the first document embedding is associated with a first document; and in response to a determination that the second distance satisfies the document embedding distance threshold, generating a new document by providing, to a text generation model, the candidate document and portions of the supporter segment of the first document indicated by the set of association vectors associated with the first document. 2 . A method, the method comprising: generating document embeddings within an embedding space based on a set of structured documents by, for a first document of the set of structured documents: determining (i) a first vector based on a first segment of the first document and (ii) a second vector based on a second segment of the first document; updating a set of association vectors indicating the second segment based on a first distance between the first vector and the second vector; and generating a first document embedding of the document embeddings in the embedding space based on the set of association vectors; generating a candidate vector based on a candidate document; determining a result indicating that a second distance between the candidate vector and the first document embedding satisfies a document embedding distance threshold, wherein the first document embedding is associated with the first document; and in response to the result, generating a new document by providing, to a text generation model, a portion of the candidate document and a portion of the second segment of the first document. 3 . The method of claim 2 , wherein the result is a first result, and wherein generating the new document comprises: obtaining a density-related distance threshold; determining distances between embeddings in the embedding space; determining a set of embedding clusters based on the distances; determining a second result indicating that the candidate vector is within a first cluster; selecting the first cluster in response to the second result; and determining a third result indicating that the first cluster is associated with a density value within the density-related distance threshold, wherein generating the new document comprises generating the new document based on the third result. 4 . The method of claim 2 , wherein generating the candidate vector comprises: generating modified input text that comprises a first portion of the candidate document without comprising a second portion of the candidate document; and providing the modified input text to an encoder to generate the candidate vector. 5 . The method of claim 2 , further comprising: providing a first token of the candidate document to a knowledge graph to retrieve an alternative token; and generating a modified version of the candidate document by replacing the first token with the alternative token, wherein generating the candidate vector comprises generating the candidate vector based on the modified version of the candidate document. 6 . The method of claim 2 , wherein the candidate vector is a first candidate vector, and wherein the result is a first result, and wherein generating the first candidate vector comprises: generating a plurality of summarizations using a text summarization model based on the candidate document; generating a plurality of candidate vectors by, for each respective summarization of the plurality of summarizations, providing the respective summarization to an encoder to generate a respective candidate vector of the plurality of candidate vectors, wherein the plurality of candidate vectors comprises the first candidate vector; and selecting the first candidate vector based on a second result indicating that the first candidate vector is furthest from any embedding of the document embeddings. 7 . The method of claim 2 , further comprising generating a plurality of phrases based on the second segment by substituting initial tokens of the second segment with additional tokens mapped to the initial tokens, wherein generating the first document embedding comprises: generating a plurality of intermediate embeddings by providing a first set of encoder layers with the plurality of phrases; and generating a plurality of document embeddings by providing a second set of encoder layers with the plurality of intermediate embeddings, wherein the plurality of document embeddings comprises the first document embedding. 8 . The method of claim 2 , further comprising: obtaining dates associated with the set of structured documents, wherein each date of the dates is mapped to a document of the set of structured documents; generating a trajectory associated with a label for a subset of embeddings of the document embeddings based on a subset of the dates, wherein each embedding of the subset of embeddings is categorized with the label; predicting a future region in the embedding space based on the trajectory; and storing, in a data store, the future region in association with the label. 9 . The method of claim 8 , wherein the result is a first result, and wherein generating the candidate vector comprises: determining a second result indicating that the candidate vector is not within the future region; and in response to the second result indicating that the candidate vector is not within the future region, modifying a value of the candidate vector to be within the future region in the embedding space. 10 . The method of claim 2 , further comprising: dimensionally reducing the document embeddings to a three-dimensional dataset; and generating a visualization based on the three-dimensional dataset. 11 . One or more non-transitory, machine-readable media storing program instructions that, when executed by one or more processors, perform operations comprising: generating a set of document embeddings within an embedding space based on a set of structured documents by, for a first document of the set of structured documents: determining (i) a first vector based on a first segment of the first document and (ii) a second vector based on a second segment of the first document; updating a set of association vectors indicating the second segment based on a first distance between the first vector and the second vector; and generating a first document embedding of the set of document embeddings in the embedding space

Assignees

Capital One Services Llc

Inventors

Classifications

G06F40/30
Semantic analysis · CPC title
G06F16/345
Summarisation for human users · CPC title
G06F40/40Primary
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
G06F40/284Primary
Lexical analysis, e.g. tokenisation or collocates · CPC title

Patent family

Related publications grouped by family.

View patent family 96661147

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025259013A1 cover?: A method and related system for generating document embeddings within an embedding space based on a set of structured documents by determining (i) a first vector based on a first segment of a first document and (ii) a second vector based on a second segment of the first document and updating association vectors indicating the second segment based on a distance between the first and second vecto…
Who is the assignee on this patent?: Capital One Services Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/40. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Aug 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for updating textual item descriptions using an embedding space

Automated analysis of customer interaction text to generate customer intent information and hierarchy of customer issues

Analysis of theme coverage of documents

Frequently asked questions