Machine learning techniques for generating domain-aware sentence embeddings

US12086540B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12086540-B2
Application numberUS-202117510875-A
CountryUS
Kind codeB2
Filing dateOct 26, 2021
Priority dateOct 26, 2021
Publication dateSep 10, 2024
Grant dateSep 10, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for perform predictive data analysis operations using natural language input data. For example, certain embodiments of the present invention utilize systems, methods, and computer program products that perform predictive data analysis operations by using sentence embedding machine learning models that are trained in coordination with similarity-based machine learning models.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: generating, by one or more processors and using a sentence embedding machine learning model, a plurality of sentence embeddings based at least in part on a plurality of sentences, wherein: (i) the plurality of sentences comprises one or more first sentences of a first natural language document data object and one or more second sentences of a second natural language document data object, and (ii) the sentence embedding machine learning model is generated by updating parameters of an initial sentence embedding machine learning model based at least in part on a similarity determination model error measure that is determined based at least in part on one or more similarity determination model outputs of a sentence similarity determination machine learning model; determining, by the one or more processors and using the sentence similarity determination machine learning model, an inferred similarity measure for a sentence pair comprising a first sentence of the one or more first sentences and a second sentence of the one or more second sentences based at least in part on a first sentence embedding of the plurality of sentence embeddings that corresponds to the first sentence and a second sentence embedding of the plurality of sentence embeddings that corresponds to the second sentence; generating, by the one or more processors, a predictive output based at least in part on the inferred similarity measure; and initiating, by the one or more processors, a performance of one or more prediction-based actions based at least in part on the predictive output. 2. The computer-implemented method of claim 1 , wherein the initial sentence embedding machine learning model is a pretrained sentence embedding machine learning model that is configured to enable retraining the pretrained sentence embedding machine learning model. 3. The computer-implemented method of claim 1 , wherein generating the predictive output comprises: generating a cross-document relationship graph data object having a plurality of sentence nodes and one or more sentence relationship edges, wherein: (i) each sentence node is associated with a corresponding sentence of the plurality of sentences, and (ii) two sentence nodes are associated with a common sentence relationship edge if a corresponding inferred similarity measure for a corresponding sentence pair that is associated with the two sentence nodes satisfies an inferred similarity measure threshold; generating a cross-document relationship summary data object based at least in part on one or more graph-based inference outputs of performing one or more graph-based inferences on the cross-document relationship graph data object; and generating the predictive output based at least in part on the cross-document relationship summary data object. 4. The computer-implemented method of claim 3 , wherein: the one or more graph-based inferences comprise a centrality-based page-rank inference that is configured to generate a centrality-based page-rank score for each sentence node; and the one or more graph-based inference outputs comprise each centrality-based page-rank score. 5. The computer-implemented method of claim 3 , wherein the one or more graph-based inference outputs comprise a centrality score for each sentence node. 6. The computer-implemented method of claim 1 , wherein: the sentence similarity determination machine learning model is determined based at least in part on the similarity determination model error measure, the similarity determination model error measure is determined based at least in part on a deviation measure of an inferred similarity measure for a training sentence entry pair and a ground-truth similarity measure for the training sentence entry pair, and the ground-truth similarity measure is determined using a knowledge base graph data object. 7. The computer-implemented method of claim 1 , wherein: the first natural language document data object is determined based at least in part on a user-provided search query; and the predictive output describes a search result data object for the user-provided search query. 8. The computer-implemented method of claim 7 , wherein: the second natural language document data object is selected from a plurality of candidate natural language document data objects, each candidate natural language document data object is associated with a corresponding cross-document similarity measure with respect to the first natural language document data object, and the predictive output describes a ranking of at least a subset of the plurality of candidate natural language document data objects based at least in part on each corresponding cross-document similarity measure. 9. The computer-implemented method of claim 1 , wherein the one or more similarity determination model outputs are generated by the sentence similarity determination machine learning model based at least in part on a training pair of sentence embeddings and a domain-specific training task. 10. A computing system comprising one or more processors and memory including program code, the memory and the program code configured to, with the one or more processors, cause the computing system to at least: generate, using a sentence embedding machine learning model, a plurality of sentence embeddings based at least in part on a plurality of sentences, wherein: (i) the plurality of sentences comprises one or more first sentences of a first natural language document data object and one or more second sentences of a second natural language document data object, and (ii) the sentence embedding machine learning model is generated by updating parameters of an initial sentence embedding machine learning model based at least in part on a similarity determination model error measure that is determined based at least in part on one or more similarity determination model outputs of a sentence similarity determination machine learning model; determine, using the sentence similarity determination machine learning model, an inferred similarity measure for a sentence pair comprising a first sentence of the one or more first sentences and a second sentence of the one or more second sentences based at least in part on a first sentence embedding of the plurality of sentence embeddings that corresponds to the first sentence and a second sentence embedding of the plurality of sentence embeddings that corresponds to the second sentence; generate a predictive output based at least in part on the inferred similarity measure; and initiate a performance of one or more prediction-based actions based at least in part on the predictive output. 11. The computing system of claim 10 , wherein the initial sentence embedding machine learning model is a pretrained sentence embedding machine learning model that is configured to enable retraining the pretrained sentence embedding machine learning model. 12. The computing system of claim 10 , wherein generating the predictive output comprises: generating a cross-document relationship graph data object having a plurality of sentence nodes and one or more sentence relationship edges, wherein: (i) each sentence node is associated with a corresponding sentence of the plurality of sentences, and (ii) two sentence nodes are associated with a common sentence relationship edge if a corresponding inferred similarity measure for a corresponding sentence pair that is associated with the two sentence nodes satisfies an inferred similarity measure threshold; generating a cross-document relationship summary data object based at least in part on one or more graph-based i

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Inference or reasoning models · CPC title

  • Knowledge representation; Symbolic representation · CPC title

  • Knowledge engineering; Knowledge acquisition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12086540B2 cover?
Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for perform predictive data analysis operations using natural language input data. For example, certain embodiments of the present invention utilize systems, methods, and computer program products that perform predictive data analysis operations by using sente…
Who is the assignee on this patent?
Unitedhealth Group Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).