Building a complementary model for aggregating topics from textual content
US-2021209500-A1 · Jul 8, 2021 · US
US12086540B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12086540-B2 |
| Application number | US-202117510875-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 26, 2021 |
| Priority date | Oct 26, 2021 |
| Publication date | Sep 10, 2024 |
| Grant date | Sep 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for perform predictive data analysis operations using natural language input data. For example, certain embodiments of the present invention utilize systems, methods, and computer program products that perform predictive data analysis operations by using sentence embedding machine learning models that are trained in coordination with similarity-based machine learning models.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method comprising: generating, by one or more processors and using a sentence embedding machine learning model, a plurality of sentence embeddings based at least in part on a plurality of sentences, wherein: (i) the plurality of sentences comprises one or more first sentences of a first natural language document data object and one or more second sentences of a second natural language document data object, and (ii) the sentence embedding machine learning model is generated by updating parameters of an initial sentence embedding machine learning model based at least in part on a similarity determination model error measure that is determined based at least in part on one or more similarity determination model outputs of a sentence similarity determination machine learning model; determining, by the one or more processors and using the sentence similarity determination machine learning model, an inferred similarity measure for a sentence pair comprising a first sentence of the one or more first sentences and a second sentence of the one or more second sentences based at least in part on a first sentence embedding of the plurality of sentence embeddings that corresponds to the first sentence and a second sentence embedding of the plurality of sentence embeddings that corresponds to the second sentence; generating, by the one or more processors, a predictive output based at least in part on the inferred similarity measure; and initiating, by the one or more processors, a performance of one or more prediction-based actions based at least in part on the predictive output. 2. The computer-implemented method of claim 1 , wherein the initial sentence embedding machine learning model is a pretrained sentence embedding machine learning model that is configured to enable retraining the pretrained sentence embedding machine learning model. 3. The computer-implemented method of claim 1 , wherein generating the predictive output comprises: generating a cross-document relationship graph data object having a plurality of sentence nodes and one or more sentence relationship edges, wherein: (i) each sentence node is associated with a corresponding sentence of the plurality of sentences, and (ii) two sentence nodes are associated with a common sentence relationship edge if a corresponding inferred similarity measure for a corresponding sentence pair that is associated with the two sentence nodes satisfies an inferred similarity measure threshold; generating a cross-document relationship summary data object based at least in part on one or more graph-based inference outputs of performing one or more graph-based inferences on the cross-document relationship graph data object; and generating the predictive output based at least in part on the cross-document relationship summary data object. 4. The computer-implemented method of claim 3 , wherein: the one or more graph-based inferences comprise a centrality-based page-rank inference that is configured to generate a centrality-based page-rank score for each sentence node; and the one or more graph-based inference outputs comprise each centrality-based page-rank score. 5. The computer-implemented method of claim 3 , wherein the one or more graph-based inference outputs comprise a centrality score for each sentence node. 6. The computer-implemented method of claim 1 , wherein: the sentence similarity determination machine learning model is determined based at least in part on the similarity determination model error measure, the similarity determination model error measure is determined based at least in part on a deviation measure of an inferred similarity measure for a training sentence entry pair and a ground-truth similarity measure for the training sentence entry pair, and the ground-truth similarity measure is determined using a knowledge base graph data object. 7. The computer-implemented method of claim 1 , wherein: the first natural language document data object is determined based at least in part on a user-provided search query; and the predictive output describes a search result data object for the user-provided search query. 8. The computer-implemented method of claim 7 , wherein: the second natural language document data object is selected from a plurality of candidate natural language document data objects, each candidate natural language document data object is associated with a corresponding cross-document similarity measure with respect to the first natural language document data object, and the predictive output describes a ranking of at least a subset of the plurality of candidate natural language document data objects based at least in part on each corresponding cross-document similarity measure. 9. The computer-implemented method of claim 1 , wherein the one or more similarity determination model outputs are generated by the sentence similarity determination machine learning model based at least in part on a training pair of sentence embeddings and a domain-specific training task. 10. A computing system comprising one or more processors and memory including program code, the memory and the program code configured to, with the one or more processors, cause the computing system to at least: generate, using a sentence embedding machine learning model, a plurality of sentence embeddings based at least in part on a plurality of sentences, wherein: (i) the plurality of sentences comprises one or more first sentences of a first natural language document data object and one or more second sentences of a second natural language document data object, and (ii) the sentence embedding machine learning model is generated by updating parameters of an initial sentence embedding machine learning model based at least in part on a similarity determination model error measure that is determined based at least in part on one or more similarity determination model outputs of a sentence similarity determination machine learning model; determine, using the sentence similarity determination machine learning model, an inferred similarity measure for a sentence pair comprising a first sentence of the one or more first sentences and a second sentence of the one or more second sentences based at least in part on a first sentence embedding of the plurality of sentence embeddings that corresponds to the first sentence and a second sentence embedding of the plurality of sentence embeddings that corresponds to the second sentence; generate a predictive output based at least in part on the inferred similarity measure; and initiate a performance of one or more prediction-based actions based at least in part on the predictive output. 11. The computing system of claim 10 , wherein the initial sentence embedding machine learning model is a pretrained sentence embedding machine learning model that is configured to enable retraining the pretrained sentence embedding machine learning model. 12. The computing system of claim 10 , wherein generating the predictive output comprises: generating a cross-document relationship graph data object having a plurality of sentence nodes and one or more sentence relationship edges, wherein: (i) each sentence node is associated with a corresponding sentence of the plurality of sentences, and (ii) two sentence nodes are associated with a common sentence relationship edge if a corresponding inferred similarity measure for a corresponding sentence pair that is associated with the two sentence nodes satisfies an inferred similarity measure threshold; generating a cross-document relationship summary data object based at least in part on one or more graph-based i
Related publications grouped by family.
Answers are generated from the same data shown on this page.