Phenotype trait prediction with threshold polygenic risk score

US2020118647A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020118647-A1
Application numberUS-201916598988-A
CountryUS
Kind codeA1
Filing dateOct 10, 2019
Priority dateOct 12, 2018
Publication dateApr 16, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method predicts a phenotypic trait for an individual. The system identifies a subset of SNP loci with predictive ability of the phenotypic trait. The system calculates a PRS for the individual based on the individual's genetic dataset at the identified subset of SNP loci. The system compares the PRS to a threshold PRS. The threshold PRS is determined by calculating, for each training individual of a plurality of training individuals including some reported to have and some reported to not have the phenotypic trait, a PRS, sweeping through a domain of PRS while calculating a true positive rate and a false positive rate, and then identifying an optimal threshold PRS as the threshold PRS. The system generates a prediction whether the individual has the phenotypic trait based on the comparison.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for predicting a phenotypic trait for an individual, the method comprising: receiving a genetic dataset of the individual, the genetic dataset including a plurality of reads at a plurality of single nucleotide polymorphism (SNP) loci; identifying a subset of the plurality of SNP loci with predictive ability of the phenotypic trait; calculating a polygenic risk score for the individual based on the genetic dataset of the individual at the identified subset of SNP loci; comparing the polygenic risk score to a threshold polygenic risk score, the threshold polygenic risk score determined by: obtaining genetic datasets for a plurality of training individuals, calculating, for each training individual of the plurality of training individuals, a polygenic risk score for the training individual based on the genetic dataset of the training individual at the identified subset of SNP loci, calculating a plurality of true positive rates of predicting the phenotypic trait of the plurality of training individuals over a range of candidate threshold polygenic risk scores, each true positive rate corresponding to one of the candidate threshold polygenic risk scores, calculating a plurality of false positive rates of predicting the phenotypic trait of the plurality of training individuals over the range of candidate threshold polygenic risk scores, each false positive rate corresponding to one of the candidate threshold polygenic risk scores, and selecting one of the candidate threshold polygenic risk scores as the threshold polygenic risk score based on the true positive rate and the false positive rate corresponding to the selected one of the candidate threshold polygenic risk scores; and generating a prediction whether the individual has the phenotypic trait based on the comparison. 2 . The method of claim 1 , wherein the subset of SNP loci with the predictive ability of the phenotypic trait is identified with a genome-wide association study (GWAS) with the plurality of training individuals over the plurality of SNP loci, the GWAS comprising: calculating a p-value score for each SNP locus based on a positive count of training individuals reported to have the phenotypic trait and a negative count of training individuals reported to not have the phenotypic trait; and identifying the subset of SNP loci based on the p-value scores for the plurality of SNP loci being below a threshold p-value score; 3 . The method of claim 2 , wherein calculating the polygenic risk score comprises: calculating a weight for each SNP locus of the subset of SNP loci based on the p-value score for the SNP locus; and calculating the polygenic risk score by summing over each product of a genotype of the individual at a SNP locus in the subset of SNP loci and a corresponding weight for the SNP locus. 4 . The method of claim 1 , further comprising: determining an ethnicity of the individual based on the genetic dataset; wherein the plurality of training individuals is of the same ethnicity. 5 . The method of claim 1 , wherein the true positive rate for one of the candidate threshold polygenic risk scores is a percentage of the training individuals reported to have the phenotypic trait who are predicted to have the phenotypic trait based on the polygenic risk scores of the training individuals compared to the one of the candidate threshold polygenic risk scores. 6 . The method of claim 1 , wherein the false positive rate for one of the candidate threshold polygenic risk scores is a percentage of training individuals reported to not have the phenotypic trait who are predicted to have the phenotypic trait based on the polygenic risk scores of the training individuals compared to the one of the candidate threshold polygenic risk scores. 7 . The method of claim 1 , wherein generating a prediction whether the individual has the phenotypic trait based on the comparison comprises: in response to the polygenic risk score of the individual being greater than or equal to the threshold polygenic risk score, determining that the individual likely has the phenotypic trait; and in response to the polygenic risk score of the individual being less than the threshold polygenic risk score, determining that the individual likely does not have the phenotypic trait. 8 . The method of claim 1 , further comprising: transmitting the prediction to a client device for displaying the prediction. 9 . The method of claim 1 , wherein the plurality of training individuals belongs to a genetic community, the genetic community determined based on identity-by-descent (IBD) affinities among the training individuals. 10 . A non-transitory computer-readable storage medium storing instructions for predicting a phenotypic trait for an individual, the instructions, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving a genetic dataset of the individual, the genetic dataset including a plurality of reads at a plurality of single nucleotide polymorphism (SNP) loci; identifying a subset of the plurality of SNP loci with predictive ability of the phenotypic trait; calculating a polygenic risk score for the individual based on the genetic dataset of the individual at the identified subset of SNP loci; comparing the polygenic risk score to a threshold polygenic risk score, the threshold polygenic risk score determined by: obtaining genetic datasets for a plurality of training individuals, calculating, for each training individual of the plurality of training individuals, a polygenic risk score for the training individual based on the genetic dataset of the training individual at the identified subset of SNP loci, calculating a plurality of true positive rates of predicting the phenotypic trait of the plurality of training individuals over a range of candidate threshold polygenic risk scores, each true positive rate corresponding to one of the candidate threshold polygenic risk scores, calculating a plurality of false positive rates of predicting the phenotypic trait of the plurality of training individuals over the range of candidate threshold polygenic risk scores, each false positive rate corresponding to one of the candidate threshold polygenic risk scores, and selecting one of the candidate threshold polygenic risk scores as the threshold polygenic risk score based on the true positive rate and the false positive rate corresponding to the selected one of the candidate threshold polygenic risk scores; and generating a prediction whether the individual has the phenotypic trait based on the comparison. 11 . The non-transitory computer-readable storage medium of claim 10 , wherein the subset of SNP loci with the predictive ability of the phenotypic trait is identified with a genome-wide association study (GWAS) with the plurality of training individuals over the plurality of SNP loci, the GWAS comprising: calculating a p-value score for each SNP locus based on a positive count of training individuals reported to have the phenotypic trait and a negative count of training individuals reported to not have the phenotypic trait; and identifying the subset of SNP loci based on the p-value scores for the plurality of SNP loci being below a threshold p-value score; 12 . The non-transitory computer-readable storage medium of claim 11 , wherein calculating the polygenic risk score comprises: calculating a weight for each SNP locus of the subset of SNP loci based on the p-value score for the SNP locus; and calculating the polygenic risk score by summing over each product of a genotype of the individual at a SNP locu

Assignees

Inventors

Classifications

  • G16B20/20Primary

    Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection · CPC title

  • for detection of mutation or polymorphism · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • Supervised data analysis · CPC title

  • G16B40/00Primary

    ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020118647A1 cover?
A system and method predicts a phenotypic trait for an individual. The system identifies a subset of SNP loci with predictive ability of the phenotypic trait. The system calculates a PRS for the individual based on the individual's genetic dataset at the identified subset of SNP loci. The system compares the PRS to a threshold PRS. The threshold PRS is determined by calculating, for each traini…
Who is the assignee on this patent?
Ancestry Com Dna Llc
What technology area does this patent fall under?
Primary CPC classification G16B20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 16 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).