Holistic embedding generation for entity matching

US12596872B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12596872-B2
Application numberUS-202418592408-A
CountryUS
Kind codeB2
Filing dateFeb 29, 2024
Priority dateFeb 29, 2024
Publication dateApr 7, 2026
Grant dateApr 7, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments may send input including a standardized attribute and an associated attribute value to a task-agnostic generative large language model (LLM). Embodiments may receive, from the task-agnostic generative LLM, a natural language description of the standardized attribute and associated attribute value. Embodiments may send the natural language description of the standardized attribute and associated attribute value to at least one embedding generator. Embodiments may receive, from the at least one embedding generator, at least one embedding of the natural language description.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for creating at least one attribute embedding, the method comprising: sending input comprising a standardized attribute and an associated attribute value to a task-agnostic generative large language model (LLM); receiving, from the task-agnostic generative LLM, a natural language description of the standardized attribute and associated attribute value; sending the natural language description of the standardized attribute and associated attribute value to at least one embedding generator; and receiving, from the at least one embedding generator, at least one embedding of the natural language description. 2 . The method of claim 1 , further comprising: applying an automated attribute extraction process to a digital document associated with an entity to extract, from the digital document, a set of standardized attributes and associated attribute values; retrieving, from at least one data store, a set of attribute embeddings corresponding to the extracted set of standardized attributes and associated attribute values, wherein at least one attribute embedding of the set of attribute embeddings is created based on the natural language description; creating an entity embedding for the entity based on the set of attribute embeddings; and storing the entity embedding. 3 . The method of claim 2 , wherein creating the entity embedding comprises: sending the set of attribute embeddings to a task-specific weighting model associated with an automated matching task, wherein the task-specific weighting model comprises weights that are trained using machine learning to indicate relationships between attributes and likelihoods of success of the automated matching task; receiving a weighted set of attribute embeddings from the task-specific weighting model; and including the weighted set of attribute embeddings in the entity embedding. 4 . The method of claim 3 , wherein the task-specific weighting model is trained using a distributed gradient. 5 . The method of claim 1 , wherein the attribute embedding is further created by: identifying a node of a graph associated with a standardized attribute and associated attribute value; using the graph, identifying at least one neighboring node that shares at least one edge with the node; and including data associated with the at least one neighboring node in input to the generative large language model. 6 . The method of claim 5 , further comprising: using a taxonomy, mapping the standardized attribute and associated attribute value to an attribute identifier; and using the attribute identifier to identify the node of the graph. 7 . The method of claim 1 , wherein the attribute embedding is further created by: configuring a task-specific instruction specific to an automated matching task; and including the task-specific instruction in input to the generative large language model. 8 . The method of claim 1 , further comprising: sending the natural language description of a standardized attribute and associated attribute value to a plurality of embedding generators each trained according to a different training objective; receiving a plurality of embeddings of the natural language description from the plurality of embedding generators; and stacking the plurality of embeddings of the natural language description in a single matrix. 9 . The method of claim 2 , wherein an automated matching task uses the entity embedding to at least one of: identify at least one job posting that matches the entity, wherein the entity comprises a job seeker; or identify at least one job candidate that matches the entity, wherein the entity comprises a job posting; or identify at least one digital content item that matches the entity, wherein the entity comprises a user of an application system. 10 . A system comprising: at least one processor; and at least one memory coupled to the at least one processor, wherein the at least one memory comprises at least one instruction that, when executed by the at least one processor, cause the at least one processor to perform at least one operation comprising: sending input comprising a standardized attribute and an associated attribute value to a task-agnostic generative large language model (LLM); receiving, from the task-agnostic generative LLM, a natural language description of the standardized attribute and associated attribute value; sending the natural language description of the standardized attribute and associated attribute value to at least one embedding generator; and receiving, from the at least one embedding generator, at least one embedding of the natural language description. 11 . The system of claim 10 , wherein the at least one instruction, when executed by the at least one processor, causes the at least one processor to perform at least one operation comprising further comprising: applying an automated attribute extraction process to a digital document associated with an entity to extract, from the digital document, a set of standardized attributes and associated attribute values; retrieving, from at least one data store, a set of attribute embeddings corresponding to the extracted set of standardized attributes and associated attribute values, wherein at least one attribute embedding of the set of attribute embeddings is created based on the natural language description; creating an entity embedding for the entity based on the set of attribute embeddings; and storing the entity embedding. 12 . The system of claim 11 , wherein creating the entity embedding comprises: sending the set of attribute embeddings to a task-specific weighting model associated with an automated matching task, wherein the task-specific weighting model comprises weights that are trained using machine learning to indicate relationships between attributes and likelihoods of success of the automated matching task; receiving a weighted set of attribute embeddings from the task-specific weighting model; and including the weighted set of attribute embeddings in the entity embedding. 13 . The system of claim 10 , wherein the attribute embedding is further created by: identifying a node of a graph associated with a standardized attribute and associated attribute value; using the graph, identifying at least one neighboring node that shares at least one edge with the node; and including data associated with the at least one neighboring node in input to the generative large language model. 14 . The system of claim 13 , wherein the at least one instruction, when executed by the at least one processor, causes the at least one processor to perform at least one operation further comprising: using a taxonomy, mapping the standardized attribute and associated attribute value to an attribute identifier; and using the attribute identifier to identify the node of the graph. 15 . The system of claim 10 , wherein the at least one instruction, when executed by the at least one processor, causes the at least one processor to perform at least one operation comprising further comprising: sending the natural language description of a standardized attribute and associated attribute value to a plurality of embedding generators each trained according to a different training objective; receiving a plurality of embeddings of the natural language description from the plurality of embedding generators; and stacking the plurality of embeddings of the natural language description in a single matrix. 16 . At least one non-transitory machine-readable storage medium com

Assignees

Inventors

Classifications

  • Generative networks · CPC title

  • Combinations of networks · CPC title

  • using statistical methods · CPC title

  • Knowledge engineering; Knowledge acquisition · CPC title

  • Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12596872B2 cover?
Embodiments may send input including a standardized attribute and an associated attribute value to a task-agnostic generative large language model (LLM). Embodiments may receive, from the task-agnostic generative LLM, a natural language description of the standardized attribute and associated attribute value. Embodiments may send the natural language description of the standardized attribute an…
Who is the assignee on this patent?
Microsoft Tech Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/279. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).