Confidence calibration using pseudo-accuracy
US-11657222-B1 · May 23, 2023 · US
US11847415B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11847415-B2 |
| Application number | US-202117382946-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 22, 2021 |
| Priority date | Sep 30, 2020 |
| Publication date | Dec 19, 2023 |
| Grant date | Dec 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An embodiment may involve obtaining a set of pre-defined features and a new document; extracting a subset of the pre-defined features from within new document; applying a natural language model to the new document, wherein the natural language model was pre-trained using scientific or medical literature and fine-tuned using a corpus of documents; applying a feature-based model to the subset of the pre-defined features extracted from the new document, wherein the feature-based model was trained with the pre-defined features and the respective labels of the documents; and applying an aggregation model to the classifications of the new document produced by the natural language model and the feature-based model, wherein the aggregation model was trained with prior classifications produced by the natural language model and the feature-based model so that the aggregation model produces a further classification of the new document representing its relevance to pharmacovigilance.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: obtaining, from persistent storage, a corpus of documents, wherein each of the documents is labelled with its relevance to pharmacovigilance; performing data preparation operations on the documents, wherein the data preparation operations include: de-duplicating the documents, normalizing terminology within the documents, and extracting pre-defined features within the documents, wherein the pre-defined features relate to pharmacovigilance; fine-tuning a natural language model with the documents and their labels, wherein the natural language model was pre-trained using scientific or medical literature, and wherein the fine-tuning involves further training of one or more encoders within the natural language model so that the natural language model seeks to classify new documents in accordance with their relevance to pharmacovigilance; training a feature-based model with the pre-defined features extracted from the documents and the respective labels of the documents so that the feature-based model also seeks to classify the new documents in accordance with their relevance to pharmacovigilance, wherein the feature-based model utilizes a plurality of decision trees with nodes representing the pre-defined features; and training an aggregation model with classifications produced by the natural language model and the feature-based model so that the aggregation model seeks to produce further classifications of the new documents in accordance with their relevance to pharmacovigilance, wherein the further classifications are weighted combinations of classifications produced by the natural language model and the feature-based model for the new documents. 2. The computer-implemented method of claim 1 , wherein each respective selection within the documents is labelled with a binary value indicating that the selection is either of interest or not of interest to pharmacovigilance. 3. The computer-implemented method of claim 1 , wherein the relevance to pharmacovigilance for each of the documents is expressed with a binary value indicating that each of the documents is either of interest or not of interest to pharmacovigilance. 4. The computer-implemented method of claim 1 , wherein the relevance to pharmacovigilance for each of the documents is expressed with a probability that each of the documents is of interest to pharmacovigilance. 5. The computer-implemented method of claim 1 , wherein the natural language model is a context-free word embedding model. 6. The computer-implemented method of claim 1 , wherein the encoders of the natural language model each contain a transformer and a neural network. 7. The computer-implemented method of claim 1 , wherein the pre-defined features include terms related to drugs, statistical characteristics, risk scores, designated medical events, adverse medical events, and terms pre-selected to keep under review. 8. The computer-implemented method of claim 7 , wherein the pre-defined features also include indications of combinations of the terms appearing a common sentence or consecutive sentences. 9. The computer-implemented method of claim 1 , wherein the aggregation model applies multivariate logistic regression to produce the further classifications. 10. The computer-implemented method of claim 1 , further comprising: storing, in the persistent storage, the natural language model, the feature-based model, and the aggregation model as trained. 11. An article of manufacture including a non-transitory computer-readable medium, having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprises: obtaining, from persistent storage, a corpus of documents, wherein each of the documents is labelled with its relevance to pharmacovigilance; performing data preparation operations on the documents, wherein the data preparation operations include: de-duplicating the documents, normalizing terminology within the documents, and extracting pre-defined features within the documents, wherein the pre-defined features relate to pharmacovigilance; fine-tuning a natural language model with the documents and their labels, wherein the natural language model was pre-trained using scientific or medical literature, and wherein the fine-tuning involves further training of one or more encoders within the natural language model so that the natural language model seeks to classify new documents in accordance with their relevance to pharmacovigilance; training a feature-based model with the pre-defined features extracted from the documents and the respective labels of the documents so that the feature-based model also seeks to classify the new documents in accordance with their relevance to pharmacovigilance, wherein the feature-based model utilizes a plurality of decision trees with nodes representing the pre-defined features; and training an aggregation model with classifications produced by the natural language model and the feature-based model so that the aggregation model seeks to produce further classifications of the new documents in accordance with their relevance to pharmacovigilance, wherein the further classifications are weighted combinations of classifications produced by the natural language model and the feature-based model for the new documents. 12. A computer-implemented method comprising: obtaining, from persistent storage, a set of pre-defined features and a new document related to a scientific or medical topic, wherein the pre-defined features relate to pharmacovigilance; normalizing terminology within the new document; extracting a subset of the pre-defined features from within new document; applying a natural language model to the new document, wherein the natural language model was pre-trained using scientific or medical literature and fine-tuned using a corpus of documents, wherein each of the documents was labelled with its relevance to pharmacovigilance, and wherein the fine-tuning involved further training of one or more encoders within the natural language model so that the natural language model seeks to classify the new document in accordance with its relevance to pharmacovigilance; applying a feature-based model to the subset of the pre-defined features extracted from the new document, wherein the feature-based model was trained with the pre-defined features and the respective labels of the documents so that the feature-based model also seeks to classify the new document in accordance with its relevance to pharmacovigilance, wherein the feature-based model utilizes a plurality of decision trees with nodes representing the pre-defined features; and applying an aggregation model to the classifications of the new document produced by the natural language model and the feature-based model, wherein the aggregation model was trained with prior classifications produced by the natural language model and the feature-based model so that the aggregation model seeks to produce a further classification of the new document in accordance with its relevance to pharmacovigilance, wherein the further classification is a weighted combination of classifications produced by the natural language model and the feature-based model for the new document. 13. The computer-implemented method of claim 12 , wherein each respective selection within the documents is labelled with a binary value indicating that the selection is either of interest or not of interest to pharmacovigilance. 14. The computer-implemented method of claim 12 , wherein the relevance to pharmacovigilance for each of the documen
Transfer learning · CPC title
Supervised learning · CPC title
Recognition of textual entities · CPC title
Semantic analysis · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.