Structured document generation using document-scale embeddings
US-2025259013-A1 · Aug 14, 2025 · US
US2025225314A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025225314-A1 |
| Application number | US-202418409494-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 10, 2024 |
| Priority date | Jan 10, 2024 |
| Publication date | Jul 10, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are disclosed herein for generating updated descriptions of items based on analyzing candidate embeddings of semantic representations of item descriptions. The system may obtain a text file describing an item. The system may provide the text file to a generative language model to generate semantic representations of the text file. The system may generate, based on the text file, candidate embeddings in an embedding space. The system may obtain embeddings associated with existing items. The system may determine subsets of the embeddings within a threshold distance. The system may compare the subsets. The system may determine attributes associated with a candidate embedding based on the comparison. The system may generate an updated text file based on the attributes.
Opening claim text (preview).
What is claimed is: 1 . A system for updating textual descriptions of items based on existing descriptions within an embedding space, the system comprising: one or more processors; and one or more non-transitory, computer-readable media storing instructions that, when executed by the one or more processors, cause operations comprising: receiving a text file comprising one or more semantic tokens for a textual description of an item; providing the text file to a generative language model to cause the generative language model to generate a first semantic representation of the textual description and a second semantic representation of the textual description different from the first semantic representation, wherein the generative language model has been trained to generate semantic representations based on text files; generating, in an embedding space, a first candidate embedding of the first semantic representation and a second candidate embedding of the second semantic representation; obtaining a plurality of embeddings of semantic representations of text associated with a set of existing items, wherein each embedding of the plurality of embeddings is represented in the embedding space; determining a first subset and a second subset of the plurality of embeddings, the first subset comprising embeddings that are within a threshold distance from the first candidate embedding within the embedding space and the second subset comprising embeddings that are within the threshold distance from the second candidate embedding within the embedding space; in response to determining that the first subset is smaller than the second subset, determining one or more attributes associated with the first candidate embedding; and generating, based on the text file and the one or more attributes, an updated text file, wherein the updated text file includes an updated textual description based on one or more updated semantic tokens that describe the one or more attributes. 2 . A method comprising: obtaining a text file comprising a textual description of an item; providing the text file to a generative language model to cause the generative language model to generate a first semantic representation of the textual description and a second semantic representation of the textual description different from the first semantic representation, wherein the generative language model is trained to generate semantic representations based on text files; generating, in an embedding space, a first candidate embedding of the first semantic representation and a second candidate embedding of the second semantic representation; obtaining a plurality of embeddings of semantic representations of text associated with a set of existing items, wherein each embedding of the plurality of embeddings is represented in the embedding space; determining a first subset of the plurality of embeddings that are within a threshold distance from the first candidate embedding within the embedding space and a second subset of the plurality of embeddings that are within the threshold distance from the second candidate embedding within the embedding space; comparing the first subset with the second subset; based on comparing the first subset with the second subset, determining one or more attributes associated with the first candidate embedding; and generating, based on the text file and the one or more attributes, an updated text file. 3 . The method of claim 2 , further comprising: providing the updated text file to the generative language model; based on providing the updated text file to the generative language model, generating, in the embedding space, a third candidate embedding of a third semantic representation of the updated text file; determining that a third subset of embeddings of the plurality of embeddings is smaller than the first subset and the second subset, wherein the third subset comprises embeddings that are within the threshold distance from the third candidate embedding within the embedding space; generating a set of semantic tokens associated with a set of attributes associated with the third candidate embedding; and providing the set of attributes to the generative language model to cause the generative language model to generate an output comprising an updated description of the item based on the third semantic representation. 4 . The method of claim 2 , wherein determining the one or more attributes comprises: generating a set of attention weights associated with the first candidate embedding, wherein the set of attention weights comprises a set of values corresponding to a set of semantic tokens associated with the text file; determining a first semantic token associated with a first attention weight of the set of attention weights; and generating the one or more attributes to include the first semantic token. 5 . The method of claim 4 , wherein determining the first semantic token associated with the first attention weight of the set of attention weights comprises: determining a subset of the set of attention weights and a corresponding subset of semantic tokens of the set of semantic tokens, wherein each attention weight of the subset of the set of attention weights is greater than a threshold weight; generating, for display on a user interface associated with a user, the corresponding subset of semantic tokens; and receiving, via the user interface, a selection of the first semantic token. 6 . The method of claim 2 , further comprising: obtaining a threshold density, wherein the threshold density indicates a threshold number of embeddings per unit volume of the embedding space; determining a first spherical volume in the embedding space around the first candidate embedding, wherein the first spherical volume is characterized by the threshold density; and determining the threshold distance based on a radius of the first spherical volume in the embedding space. 7 . The method of claim 2 , wherein obtaining the plurality of embeddings comprises: obtaining, from a text file database, a plurality of text files associated with the set of existing items; and providing the plurality of text files to an embedding model to cause the embedding model to generate the plurality of embeddings, wherein each embedding of the plurality of embeddings corresponds to a corresponding text file of the plurality of text files. 8 . The method of claim 7 , further comprising: transmitting, to the text file database, a query for an updated plurality of text files; obtaining the updated plurality of text files from the text file database; providing the updated plurality of text files to the embedding model to cause the embedding model to generate an updated plurality of embeddings, wherein each embedding of the updated plurality of embeddings corresponds to a corresponding file of the updated plurality of text files; and updating the first subset and the second subset to include one or more embeddings of the updated plurality of embeddings. 9 . The method of claim 2 , further comprising: obtaining a plurality of training text files and a plurality of training semantic representations, wherein each training semantic representation of the plurality of training semantic representations is associated with a corresponding training text file of the plurality of training text files; generating a plurality of training semantic token vectors, wherein each training semantic token vector of the plurality of training semantic token vectors represents the corresponding training text file of the plurality of training text files using semantic tokens; and providing a training dataset to the generative language model to train the generativ
using vector based model · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Named entity recognition · CPC title
Natural language generation · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.