Automated design of primer sets for nucleic acid amplification
US-2024336954-A1 · Oct 10, 2024 · US
US2025266129A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025266129-A1 |
| Application number | US-202519200097-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 6, 2025 |
| Priority date | May 27, 2020 |
| Publication date | Aug 21, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The disclosed embodiments concern methods, apparatus, systems, and computer program products for developing polygenic risk score (PRS) models. In some implementations, a fully automated process is provided that allows for a PRS model to be defined by an initial set of parameters. In some implementations the PRS models are trained to provide a PRS for particular populations.
Opening claim text (preview).
What is claimed is: 1 . A computing system comprising: one or more processors; cache memory; and mass storage memory containing computer-readable instructions that, when executed by the one or more processor, cause the computing system for perform operations comprising: based on genetic and condition data of a plurality of individuals, determining a population-specific genetic and condition dataset; determining, for the population-specific genetic and condition dataset: a plurality of population-specific single-nucleotide polymorphism (SNP) training sets that are statistically associated with a predetermined condition, and an SNP validation set; loading, into the cache memory, the plurality of population-specific SNP training sets and the SNP validation set; training, in parallel by accessing the cache memory, a plurality of population-specific machine learning models to predict respective probabilities of individuals exhibiting the predetermined condition based on the genetic and condition data of the individuals, wherein the plurality of population-specific machine learning models are trained using: the plurality of population-specific SNP training sets in the cache memory, correlations between the population-specific SNP training sets and the predetermined condition, and respective sets of parameters, wherein the respective sets of parameters are different for each of the plurality of population-specific machine learning models and include model hyperparameters used in training of the plurality of population-specific machine learning models; based on the SNP validation set in the cache memory, determining performance metrics for each of the population-specific machine learning models; and based on the performance metrics, selecting a particular machine learning model from the plurality of population-specific machine learning models, wherein the particular machine learning model is selected based on having a best performance metric of the plurality of population-specific machine learning models. 2 . The computing system of claim 1 , wherein the operations further comprise: training a new machine learning model to predict respective probabilities of the individuals exhibiting the predetermined condition based on the genetic and condition data of the individuals, wherein the new machine learning model is trained using: a population-specific SNP training set in the cache memory that was used in the training of the particular machine learning model, the SNP validation set in the cache memory, the correlations between the population-specific SNP training sets and the predetermined condition, and a particular set of the parameters that was used in the training of the particular machine learning model. 3 . The computing system of claim 1 , wherein the operations further comprise: determining that the genetic and condition data of a particular individual from the plurality of population-specific SNP training sets or the SNP validation set has been stored in the cache memory for more than a threshold period of time; and deleting, from the cache memory, the genetic and condition data of the particular individual. 4 . The computing system of claim 1 , wherein the operations further comprise: determining that the genetic and condition data of a particular individual from the plurality of population-specific SNP training sets or the SNP validation set is subject to a deletion request; and deleting, from the cache memory, the genetic and condition data of the particular individual. 5 . The computing system of claim 1 , wherein determining the plurality of population-specific SNP training sets and the SNP validation set comprises: dividing the population-specific genetic and condition dataset into at least the plurality of population-specific SNP training sets and the SNP validation set. 6 . The computing system of claim 1 , wherein the predetermined condition is obtained from a user of the computing system. 7 . The computing system of claim 1 , wherein the genetic and condition data of the plurality of individuals includes indications of presence or absence of the predetermined condition. 8 . The computing system of claim 1 , wherein the condition data of the plurality of individuals includes one or more of: answers to survey questions, family history, medical records, biomarkers, or data from one or more wearable sensors. 9 . The computing system of claim 1 , wherein the plurality of individuals includes greater than 10,000,000 individuals, and wherein the plurality of population-specific SNP training sets in the cache memory represent genetic data from between 100,000 and 1,000,000 individuals. 10 . The computing system of claim 9 , wherein the correlations are from a genome wide association study (GWAS) on the genetic data and the predetermined condition. 11 . The computing system of claim 1 , wherein the plurality of population-specific machine learning models comprise a population-specific machine learning model for one or more ethnicities of: European, African American, Sub-Saharan African, North Africa, LatinX, Central America, East Asian, South Asian, Southeast Asian, West Asian, and Central Asian. 12 . The computing system of claim 1 , wherein the plurality of population-specific SNP training sets represent individuals of European ethnicity, and wherein the SNP validation set represents individuals of Hispanic ethnicity. 13 . A computer-implemented method comprising: based on genetic and condition data of a plurality of individuals, determining a population-specific genetic and condition dataset; determining, for the population-specific genetic and condition dataset: a plurality of population-specific single-nucleotide polymorphism (SNP) training sets that are statistically associated with a predetermined condition, and an SNP validation set; loading, into a cache memory, the plurality of population-specific SNP training sets and the SNP validation set; training, in parallel by accessing the cache memory, a plurality of population-specific machine learning models to predict respective probabilities of individuals exhibiting the predetermined condition based on the genetic and condition data of the individuals, wherein the plurality of population-specific machine learning models are trained using: the plurality of population-specific SNP training sets in the cache memory, correlations between the population-specific SNP training sets and the predetermined condition, and respective sets of parameters, wherein the respective sets of parameters are different for each of the plurality of population-specific machine learning models and include model hyperparameters used in training of the plurality of population-specific machine learning models; based on the SNP validation set in the cache memory, determining performance metrics for each of the population-specific machine learning models; and based on the performance metrics, selecting a particular machine learning model from the plurality of population-specific machine learning models, wherein the particular machine learning model is selected based on having a best performance metric of the plurality of population-specific machine learning models. 14 . The computer-implemented method of claim 13 , further comprising: training a new machine learning model to predict respective probabilities of the individuals exhibiting the predetermined condition based on the genetic and condition data of the individuals, wherein the new machine learning model is trained using: a population-specific SNP training set in the cache memory that was used in the
ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title
Machine learning · CPC title
for calculating health indices; for individual health risk assessment · CPC title
ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks · CPC title
for mining of medical data, e.g. analysing previous cases of other patients · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.