Systems and methods for updating textual item descriptions using an embedding space
US-2025225314-A1 · Jul 10, 2025 · US
US2025259013A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025259013-A1 |
| Application number | US-202418441889-A |
| Country | US |
| Kind code | A1 |
| Filing date | Feb 14, 2024 |
| Priority date | Feb 14, 2024 |
| Publication date | Aug 14, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and related system for generating document embeddings within an embedding space based on a set of structured documents by determining (i) a first vector based on a first segment of a first document and (ii) a second vector based on a second segment of the first document and updating association vectors indicating the second segment based on a distance between the first and second vectors. The method also includes generating a document embedding based on the association vectors, generating a candidate vector based on a candidate document, and determining a result indicating that a second distance between the candidate vector and a first document embedding satisfies a document embedding distance threshold. The method may also include generating a new document by providing, to a text generation model, a portion of the candidate document and a portion of the second segment of the first document.
Opening claim text (preview).
What is claimed is: 1 . A system for generating a document by forming document embeddings indicating relationships between different segments of structured documents, the system comprising one or more non-transitory, machine-readable media storing program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: generating document embeddings within an embedding space based on a corpora of structured documents by, for each respective document of the structured documents: providing (i) a leader segment of the respective document to a first set of encoder network layers to determine a leader vector and (ii) a supporter segment of the respective document to the first set of encoder network layers to determine a supporter vector; updating a set of association vectors indicating the supporter segment based on a first distance between the leader vector and the supporter vector; and generating a respective embedding of the document embeddings in the embedding space by providing the set of association vectors to a second set of encoder network layers; generating a candidate vector based on a candidate document; determining whether a second distance between the candidate vector and a first document embedding of the document embeddings satisfies a document embedding distance threshold, wherein the first document embedding is associated with a first document; and in response to a determination that the second distance satisfies the document embedding distance threshold, generating a new document by providing, to a text generation model, the candidate document and portions of the supporter segment of the first document indicated by the set of association vectors associated with the first document. 2 . A method, the method comprising: generating document embeddings within an embedding space based on a set of structured documents by, for a first document of the set of structured documents: determining (i) a first vector based on a first segment of the first document and (ii) a second vector based on a second segment of the first document; updating a set of association vectors indicating the second segment based on a first distance between the first vector and the second vector; and generating a first document embedding of the document embeddings in the embedding space based on the set of association vectors; generating a candidate vector based on a candidate document; determining a result indicating that a second distance between the candidate vector and the first document embedding satisfies a document embedding distance threshold, wherein the first document embedding is associated with the first document; and in response to the result, generating a new document by providing, to a text generation model, a portion of the candidate document and a portion of the second segment of the first document. 3 . The method of claim 2 , wherein the result is a first result, and wherein generating the new document comprises: obtaining a density-related distance threshold; determining distances between embeddings in the embedding space; determining a set of embedding clusters based on the distances; determining a second result indicating that the candidate vector is within a first cluster; selecting the first cluster in response to the second result; and determining a third result indicating that the first cluster is associated with a density value within the density-related distance threshold, wherein generating the new document comprises generating the new document based on the third result. 4 . The method of claim 2 , wherein generating the candidate vector comprises: generating modified input text that comprises a first portion of the candidate document without comprising a second portion of the candidate document; and providing the modified input text to an encoder to generate the candidate vector. 5 . The method of claim 2 , further comprising: providing a first token of the candidate document to a knowledge graph to retrieve an alternative token; and generating a modified version of the candidate document by replacing the first token with the alternative token, wherein generating the candidate vector comprises generating the candidate vector based on the modified version of the candidate document. 6 . The method of claim 2 , wherein the candidate vector is a first candidate vector, and wherein the result is a first result, and wherein generating the first candidate vector comprises: generating a plurality of summarizations using a text summarization model based on the candidate document; generating a plurality of candidate vectors by, for each respective summarization of the plurality of summarizations, providing the respective summarization to an encoder to generate a respective candidate vector of the plurality of candidate vectors, wherein the plurality of candidate vectors comprises the first candidate vector; and selecting the first candidate vector based on a second result indicating that the first candidate vector is furthest from any embedding of the document embeddings. 7 . The method of claim 2 , further comprising generating a plurality of phrases based on the second segment by substituting initial tokens of the second segment with additional tokens mapped to the initial tokens, wherein generating the first document embedding comprises: generating a plurality of intermediate embeddings by providing a first set of encoder layers with the plurality of phrases; and generating a plurality of document embeddings by providing a second set of encoder layers with the plurality of intermediate embeddings, wherein the plurality of document embeddings comprises the first document embedding. 8 . The method of claim 2 , further comprising: obtaining dates associated with the set of structured documents, wherein each date of the dates is mapped to a document of the set of structured documents; generating a trajectory associated with a label for a subset of embeddings of the document embeddings based on a subset of the dates, wherein each embedding of the subset of embeddings is categorized with the label; predicting a future region in the embedding space based on the trajectory; and storing, in a data store, the future region in association with the label. 9 . The method of claim 8 , wherein the result is a first result, and wherein generating the candidate vector comprises: determining a second result indicating that the candidate vector is not within the future region; and in response to the second result indicating that the candidate vector is not within the future region, modifying a value of the candidate vector to be within the future region in the embedding space. 10 . The method of claim 2 , further comprising: dimensionally reducing the document embeddings to a three-dimensional dataset; and generating a visualization based on the three-dimensional dataset. 11 . One or more non-transitory, machine-readable media storing program instructions that, when executed by one or more processors, perform operations comprising: generating a set of document embeddings within an embedding space based on a set of structured documents by, for a first document of the set of structured documents: determining (i) a first vector based on a first segment of the first document and (ii) a second vector based on a second segment of the first document; updating a set of association vectors indicating the second segment based on a first distance between the first vector and the second vector; and generating a first document embedding of the set of document embeddings in the embedding space
Semantic analysis · CPC title
Summarisation for human users · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.