System and method for predicting subject enrollment

US11494680B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11494680-B2
Application numberUS-201815980532-A
CountryUS
Kind codeB2
Filing dateMay 15, 2018
Priority dateMay 15, 2018
Publication dateNov 8, 2022
Grant dateNov 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for predicting subject enrollment for a study includes a time-to-first-enrollment (TTFE) model and a first-enrollment-to-last-enrollment (FELE) model for each site in the study. The TTFE model includes a Gaussian distribution with a generalized linear mixed effects model solved with maximum likelihood point estimation or with Bayesian regression, and the FELE model includes a negative binomial distribution with a generalized linear mixed effects model solved with maximum likelihood point estimation or with Bayesian regression estimation.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for predicting subject enrollment for a clinical study, comprising: generating a database of unique healthcare sites, the database including data regarding site enrollment history for at least some of the sites and having no duplicated sites; splitting the database of unique healthcare sites into a training set and a testing set; determining training data from the training set based on time to first subject enrollment and enrollment count from the time of first subject enrollment to the time of last subject enrollment; training a first statistical model to predict a time to first enrollment for each site in the unique healthcare site database using the training data based on time to first subject enrollment; training a second statistical model to predict enrollment count for periods of time after the time of first subject enrollment for each site in the unique healthcare site database using the training data based on enrollment count from the time of first subject enrollment to the time of last subject enrollment; generating a clinical study model for predicting subject enrollment by: combining the first and second statistical models for each site by using the predicted time to first enrollment as a starting point for generating the predicted enrollment count for the periods of time after the time of first subject enrollment; and aggregating the predicted enrollment count for each period of time for each site to predict cumulative enrollment for the clinical study for each period of time; using the clinical study model to generate an initial prediction of subject enrollment for each site in the unique healthcare database; receiving updated site enrollment history; using the clinical study model to generate a revised prediction of subject enrollment for each site in the unique healthcare database, wherein the revised prediction improves as site enrollment history increases; and using at least one of the initial prediction or the revised prediction for each site to improve the efficiency of the clinical study; wherein: the first statistical model comprises a Gaussian distribution for the time to first subject enrollment in the training data and a generalized linear mixed effects model for each random effect variable; the first statistical model converges using maximum likelihood point estimation; the second statistical model comprises a gamma-Poisson distribution for the enrollment count from the time of first subject enrollment to the time of last subject enrollment in the training data and a generalized linear mixed effects model for each random effect variable; and the second statistical model converges using Bayesian regression estimation. 2. The method of claim 1 , wherein generating a database of unique healthcare sites comprises: receiving a database of entities; determining which of the entities is related to healthcare; applying a gradient boosting model to pairs of healthcare-related entities that have a common geographic characteristic; calculating a matching probability for each pair of healthcare-related entities; when the matching probability for a pair of healthcare-related entities at least equals a pre-determined threshold, manually reviewing the pair of healthcare-related entities to determine whether they are a single healthcare site; when the pair of healthcare-related entities is determined to be a single healthcare site, adding the single healthcare site to the database of unique healthcare sites; when the matching probability for the pair of healthcare-related entities is less than the pre-determined threshold, adding the healthcare-related entities to the database of unique healthcare sites; and adding sites from a site master managed database to the database of unique healthcare sites. 3. The method of claim 2 , wherein sites from the site master managed database and the database of unique healthcare sites are compared to eliminate duplicate sites and integrate the data about each site. 4. The method of claim 2 , wherein the common geographic characteristic is selected from a group consisting of country, state, and zip code. 5. The method of claim 2 , wherein the site master managed database is generated by: receiving a database of study sites; preparing the information for the study sites; applying a gradient boosting model to pairs of study sites that have a common geographic characteristic; calculating a matching probability for each pair of study sites; when the matching probability for a pair of study sites at least equals a pre-determined second threshold, manually reviewing the pair of study sites to determine whether they are a single study site; and when the pair of study sites is determined to be a single study site, adding the single study site to the site master managed database. 6. The method of claim 5 , wherein when the matching probability for the pair of study sites is less than the pre-determined second threshold, adding the study sites to the database of unique healthcare sites when the names and addresses for the study sites exist and are recognizable. 7. The method of claim 5 , wherein after the information for the study sites is prepared, when a first study site is not matched with a second study site having a common geographic characteristic, adding the first study site to the database of unique healthcare sites when the name and address for the first study site exists and is recognizable. 8. The method of claim 5 , wherein the common geographic characteristic is selected from a group consisting of country, state, and zip code.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Machine learning · CPC title

  • G16H10/20Primary

    for electronic clinical trials or questionnaires · CPC title

  • G06N7/005Primary

    Physics · mapped topic

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11494680B2 cover?
A system for predicting subject enrollment for a study includes a time-to-first-enrollment (TTFE) model and a first-enrollment-to-last-enrollment (FELE) model for each site in the study. The TTFE model includes a Gaussian distribution with a generalized linear mixed effects model solved with maximum likelihood point estimation or with Bayesian regression, and the FELE model includes a negative …
Who is the assignee on this patent?
Medidata Solutions Inc
What technology area does this patent fall under?
Primary CPC classification G16H10/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).