What technology area does this patent fall under?

Primary CPC classification G06F40/279. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automated detection of safety signals for pharmacovigilance

US11847415B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11847415-B2
Application number	US-202117382946-A
Country	US
Kind code	B2
Filing date	Jul 22, 2021
Priority date	Sep 30, 2020
Publication date	Dec 19, 2023
Grant date	Dec 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An embodiment may involve obtaining a set of pre-defined features and a new document; extracting a subset of the pre-defined features from within new document; applying a natural language model to the new document, wherein the natural language model was pre-trained using scientific or medical literature and fine-tuned using a corpus of documents; applying a feature-based model to the subset of the pre-defined features extracted from the new document, wherein the feature-based model was trained with the pre-defined features and the respective labels of the documents; and applying an aggregation model to the classifications of the new document produced by the natural language model and the feature-based model, wherein the aggregation model was trained with prior classifications produced by the natural language model and the feature-based model so that the aggregation model produces a further classification of the new document representing its relevance to pharmacovigilance.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: obtaining, from persistent storage, a corpus of documents, wherein each of the documents is labelled with its relevance to pharmacovigilance; performing data preparation operations on the documents, wherein the data preparation operations include: de-duplicating the documents, normalizing terminology within the documents, and extracting pre-defined features within the documents, wherein the pre-defined features relate to pharmacovigilance; fine-tuning a natural language model with the documents and their labels, wherein the natural language model was pre-trained using scientific or medical literature, and wherein the fine-tuning involves further training of one or more encoders within the natural language model so that the natural language model seeks to classify new documents in accordance with their relevance to pharmacovigilance; training a feature-based model with the pre-defined features extracted from the documents and the respective labels of the documents so that the feature-based model also seeks to classify the new documents in accordance with their relevance to pharmacovigilance, wherein the feature-based model utilizes a plurality of decision trees with nodes representing the pre-defined features; and training an aggregation model with classifications produced by the natural language model and the feature-based model so that the aggregation model seeks to produce further classifications of the new documents in accordance with their relevance to pharmacovigilance, wherein the further classifications are weighted combinations of classifications produced by the natural language model and the feature-based model for the new documents. 2. The computer-implemented method of claim 1 , wherein each respective selection within the documents is labelled with a binary value indicating that the selection is either of interest or not of interest to pharmacovigilance. 3. The computer-implemented method of claim 1 , wherein the relevance to pharmacovigilance for each of the documents is expressed with a binary value indicating that each of the documents is either of interest or not of interest to pharmacovigilance. 4. The computer-implemented method of claim 1 , wherein the relevance to pharmacovigilance for each of the documents is expressed with a probability that each of the documents is of interest to pharmacovigilance. 5. The computer-implemented method of claim 1 , wherein the natural language model is a context-free word embedding model. 6. The computer-implemented method of claim 1 , wherein the encoders of the natural language model each contain a transformer and a neural network. 7. The computer-implemented method of claim 1 , wherein the pre-defined features include terms related to drugs, statistical characteristics, risk scores, designated medical events, adverse medical events, and terms pre-selected to keep under review. 8. The computer-implemented method of claim 7 , wherein the pre-defined features also include indications of combinations of the terms appearing a common sentence or consecutive sentences. 9. The computer-implemented method of claim 1 , wherein the aggregation model applies multivariate logistic regression to produce the further classifications. 10. The computer-implemented method of claim 1 , further comprising: storing, in the persistent storage, the natural language model, the feature-based model, and the aggregation model as trained. 11. An article of manufacture including a non-transitory computer-readable medium, having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprises: obtaining, from persistent storage, a corpus of documents, wherein each of the documents is labelled with its relevance to pharmacovigilance; performing data preparation operations on the documents, wherein the data preparation operations include: de-duplicating the documents, normalizing terminology within the documents, and extracting pre-defined features within the documents, wherein the pre-defined features relate to pharmacovigilance; fine-tuning a natural language model with the documents and their labels, wherein the natural language model was pre-trained using scientific or medical literature, and wherein the fine-tuning involves further training of one or more encoders within the natural language model so that the natural language model seeks to classify new documents in accordance with their relevance to pharmacovigilance; training a feature-based model with the pre-defined features extracted from the documents and the respective labels of the documents so that the feature-based model also seeks to classify the new documents in accordance with their relevance to pharmacovigilance, wherein the feature-based model utilizes a plurality of decision trees with nodes representing the pre-defined features; and training an aggregation model with classifications produced by the natural language model and the feature-based model so that the aggregation model seeks to produce further classifications of the new documents in accordance with their relevance to pharmacovigilance, wherein the further classifications are weighted combinations of classifications produced by the natural language model and the feature-based model for the new documents. 12. A computer-implemented method comprising: obtaining, from persistent storage, a set of pre-defined features and a new document related to a scientific or medical topic, wherein the pre-defined features relate to pharmacovigilance; normalizing terminology within the new document; extracting a subset of the pre-defined features from within new document; applying a natural language model to the new document, wherein the natural language model was pre-trained using scientific or medical literature and fine-tuned using a corpus of documents, wherein each of the documents was labelled with its relevance to pharmacovigilance, and wherein the fine-tuning involved further training of one or more encoders within the natural language model so that the natural language model seeks to classify the new document in accordance with its relevance to pharmacovigilance; applying a feature-based model to the subset of the pre-defined features extracted from the new document, wherein the feature-based model was trained with the pre-defined features and the respective labels of the documents so that the feature-based model also seeks to classify the new document in accordance with its relevance to pharmacovigilance, wherein the feature-based model utilizes a plurality of decision trees with nodes representing the pre-defined features; and applying an aggregation model to the classifications of the new document produced by the natural language model and the feature-based model, wherein the aggregation model was trained with prior classifications produced by the natural language model and the feature-based model so that the aggregation model seeks to produce a further classification of the new document in accordance with its relevance to pharmacovigilance, wherein the further classification is a weighted combination of classifications produced by the natural language model and the feature-based model for the new document. 13. The computer-implemented method of claim 12 , wherein each respective selection within the documents is labelled with a binary value indicating that the selection is either of interest or not of interest to pharmacovigilance. 14. The computer-implemented method of claim 12 , wherein the relevance to pharmacovigilance for each of the documen

Assignees

Astrazeneca Ab

Inventors

Classifications

G06N3/096
Transfer learning · CPC title
G06N3/09
Supervised learning · CPC title
G06F40/279Primary
Recognition of textual entities · CPC title
G06F40/30Primary
Semantic analysis · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 80822606

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11847415B2 cover?: An embodiment may involve obtaining a set of pre-defined features and a new document; extracting a subset of the pre-defined features from within new document; applying a natural language model to the new document, wherein the natural language model was pre-trained using scientific or medical literature and fine-tuned using a corpus of documents; applying a feature-based model to the subset of …
Who is the assignee on this patent?: Astrazeneca Ab
What technology area does this patent fall under?: Primary CPC classification G06F40/279. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).