Systems and methods for updating textual item descriptions using an embedding space

US2025225314A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025225314-A1
Application numberUS-202418409494-A
CountryUS
Kind codeA1
Filing dateJan 10, 2024
Priority dateJan 10, 2024
Publication dateJul 10, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed herein for generating updated descriptions of items based on analyzing candidate embeddings of semantic representations of item descriptions. The system may obtain a text file describing an item. The system may provide the text file to a generative language model to generate semantic representations of the text file. The system may generate, based on the text file, candidate embeddings in an embedding space. The system may obtain embeddings associated with existing items. The system may determine subsets of the embeddings within a threshold distance. The system may compare the subsets. The system may determine attributes associated with a candidate embedding based on the comparison. The system may generate an updated text file based on the attributes.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for updating textual descriptions of items based on existing descriptions within an embedding space, the system comprising: one or more processors; and one or more non-transitory, computer-readable media storing instructions that, when executed by the one or more processors, cause operations comprising: receiving a text file comprising one or more semantic tokens for a textual description of an item; providing the text file to a generative language model to cause the generative language model to generate a first semantic representation of the textual description and a second semantic representation of the textual description different from the first semantic representation, wherein the generative language model has been trained to generate semantic representations based on text files; generating, in an embedding space, a first candidate embedding of the first semantic representation and a second candidate embedding of the second semantic representation; obtaining a plurality of embeddings of semantic representations of text associated with a set of existing items, wherein each embedding of the plurality of embeddings is represented in the embedding space; determining a first subset and a second subset of the plurality of embeddings, the first subset comprising embeddings that are within a threshold distance from the first candidate embedding within the embedding space and the second subset comprising embeddings that are within the threshold distance from the second candidate embedding within the embedding space; in response to determining that the first subset is smaller than the second subset, determining one or more attributes associated with the first candidate embedding; and generating, based on the text file and the one or more attributes, an updated text file, wherein the updated text file includes an updated textual description based on one or more updated semantic tokens that describe the one or more attributes. 2 . A method comprising: obtaining a text file comprising a textual description of an item; providing the text file to a generative language model to cause the generative language model to generate a first semantic representation of the textual description and a second semantic representation of the textual description different from the first semantic representation, wherein the generative language model is trained to generate semantic representations based on text files; generating, in an embedding space, a first candidate embedding of the first semantic representation and a second candidate embedding of the second semantic representation; obtaining a plurality of embeddings of semantic representations of text associated with a set of existing items, wherein each embedding of the plurality of embeddings is represented in the embedding space; determining a first subset of the plurality of embeddings that are within a threshold distance from the first candidate embedding within the embedding space and a second subset of the plurality of embeddings that are within the threshold distance from the second candidate embedding within the embedding space; comparing the first subset with the second subset; based on comparing the first subset with the second subset, determining one or more attributes associated with the first candidate embedding; and generating, based on the text file and the one or more attributes, an updated text file. 3 . The method of claim 2 , further comprising: providing the updated text file to the generative language model; based on providing the updated text file to the generative language model, generating, in the embedding space, a third candidate embedding of a third semantic representation of the updated text file; determining that a third subset of embeddings of the plurality of embeddings is smaller than the first subset and the second subset, wherein the third subset comprises embeddings that are within the threshold distance from the third candidate embedding within the embedding space; generating a set of semantic tokens associated with a set of attributes associated with the third candidate embedding; and providing the set of attributes to the generative language model to cause the generative language model to generate an output comprising an updated description of the item based on the third semantic representation. 4 . The method of claim 2 , wherein determining the one or more attributes comprises: generating a set of attention weights associated with the first candidate embedding, wherein the set of attention weights comprises a set of values corresponding to a set of semantic tokens associated with the text file; determining a first semantic token associated with a first attention weight of the set of attention weights; and generating the one or more attributes to include the first semantic token. 5 . The method of claim 4 , wherein determining the first semantic token associated with the first attention weight of the set of attention weights comprises: determining a subset of the set of attention weights and a corresponding subset of semantic tokens of the set of semantic tokens, wherein each attention weight of the subset of the set of attention weights is greater than a threshold weight; generating, for display on a user interface associated with a user, the corresponding subset of semantic tokens; and receiving, via the user interface, a selection of the first semantic token. 6 . The method of claim 2 , further comprising: obtaining a threshold density, wherein the threshold density indicates a threshold number of embeddings per unit volume of the embedding space; determining a first spherical volume in the embedding space around the first candidate embedding, wherein the first spherical volume is characterized by the threshold density; and determining the threshold distance based on a radius of the first spherical volume in the embedding space. 7 . The method of claim 2 , wherein obtaining the plurality of embeddings comprises: obtaining, from a text file database, a plurality of text files associated with the set of existing items; and providing the plurality of text files to an embedding model to cause the embedding model to generate the plurality of embeddings, wherein each embedding of the plurality of embeddings corresponds to a corresponding text file of the plurality of text files. 8 . The method of claim 7 , further comprising: transmitting, to the text file database, a query for an updated plurality of text files; obtaining the updated plurality of text files from the text file database; providing the updated plurality of text files to the embedding model to cause the embedding model to generate an updated plurality of embeddings, wherein each embedding of the updated plurality of embeddings corresponds to a corresponding file of the updated plurality of text files; and updating the first subset and the second subset to include one or more embeddings of the updated plurality of embeddings. 9 . The method of claim 2 , further comprising: obtaining a plurality of training text files and a plurality of training semantic representations, wherein each training semantic representation of the plurality of training semantic representations is associated with a corresponding training text file of the plurality of training text files; generating a plurality of training semantic token vectors, wherein each training semantic token vector of the plurality of training semantic token vectors represents the corresponding training text file of the plurality of training text files using semantic tokens; and providing a training dataset to the generative language model to train the generativ

Assignees

Inventors

Classifications

  • using vector based model · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Named entity recognition · CPC title

  • Natural language generation · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025225314A1 cover?
Systems and methods are disclosed herein for generating updated descriptions of items based on analyzing candidate embeddings of semantic representations of item descriptions. The system may obtain a text file describing an item. The system may provide the text file to a generative language model to generate semantic representations of the text file. The system may generate, based on the text f…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/166. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 10 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).