Phased Whole Genome Genetic Risk In A Family Quartet
US-2015370959-A9 · Dec 24, 2015 · US
US9928338B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9928338-B2 |
| Application number | US-201213487064-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 1, 2012 |
| Priority date | Jun 1, 2011 |
| Publication date | Mar 27, 2018 |
| Grant date | Mar 27, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure presents a unified system to phase a personal genome for downstream clinical interpretation. In an embodiment, an initial phasing is generated using public datasets, such as haplotypes from the 1000 Genomes Project, and a phasing toolkit. A local perturbation algorithm is applied to improve long range phasing. If available, a Mendelian inheritance pipeline is applied to identify phasing of novel and rare variants. These datasets are merged, followed by correction by any experimental data. This allows for full clinical interpretation of the role of a group of variants in a gene, whether inherited or de novo variants.
Opening claim text (preview).
What is claimed is: 1. A computerized method for inferring a haplotype phase for an individual in a collection of unrelated individuals utilizing a dynamically linked matrix data structure, comprising: receiving unphased individual genotype data describing a human genotype for an individual from a database and storing the individual genotype data on a memory of a computer system; receiving genotype data describing human genotypes for a plurality of individuals from a database and storing the genotype data on a memory of a computer system; imputing an initial haplotype phase for the individual and each individual in the plurality of individuals based on a statistical model and storing the initial haplotype phase for each individual in the plurality of individuals on a computer system comprising a processor a memory; building a data structure describing a Hidden Markov Model, where the data structure contains: a set of predicted haplotype phases for each individual in the plurality of individuals; and a set of parameters comprising local recombination rates and mutation rates; wherein any change to the set of predicted haplotype phases contained within the data structure automatically results in re-computation of the set of parameters comprising local recombination rates and mutation rates contained within the data structure; repeatedly randomly modifying at least one of the predicted haplotype phases in the set of predicted haplotype phases to automatically re-compute a new set of parameters comprising local recombination rates and mutation rates that are stored within the data structure; automatically replacing a predicted haplotype phase for an individual with a randomly modified haplotype phase within the data structure, when the new set of parameters indicate that the randomly modified haplotype phase is more likely than an existing predicted haplotype phase; and extracting at least one final predicted haplotype phases from the data structure as a phased haplotype for an individual; and storing the phased haplotype phase for the individual on a memory of a computer system. 2. The method of claim 1 , wherein the initial haplotype phase for the individual and each individual in the plurality of individuals is imputed from a public dataset, and wherein the method further comprises applying a Mendelian inheritance pipeline to the haplotype phase for the individual using a computer system to generate a dataset identifying phasing variants of interest and storing the dataset in the memory of a computer system, where the Mendelian inheritance pipeline is a dataset comprising genetic variants associated with Mendelian disease. 3. The method of claim 1 , further comprising correcting the haplotype phase for the individual using experimental data. 4. The method of claim 3 , where experimental data is paired end sequencing data. 5. The method of claim 1 , where the variants of interest are compound heterozygous. 6. The method of claim 1 , further comprising optimizing the haplotype phase for the first individual by maximizing a score, where the score is based on identity between the haplotype phase for the individual and an imputed haplotype in the set of imputed haplotypes for the plurality of individuals. 7. The method of claim 6 , where the score is S( ) where S ( A , ρ , θ ) := ∑ j = 1 N ∑ s = 0 1 log q j ( B = A 2 j - s , . ; ρ , θ , A - j ) , where j is a sample index, where B is an emitted haplotype sequence for the plurality of individuals, where the generated matrix is a matrix A, having dimensions 2N by L, where N is a number of genotypes for the plurality of individuals and L is a number of genetic markers, comprises the initial haplotype phase for each individual in the plurality of individuals stored in rows A 2j−1 and A 2j , where a matrix A −j is the matrix A with the initial haplotype phase for the particular individual removed, where ρ is the local recombination rate, and where θ is the mutation rate. 8. The method of claim 1 , further comprising optimizing the haplotype phase for the first individual by selecting a single site move that results in a highest scoring pair for a genotype of the first individual, where the highest scoring pair results in a non-negligible increase in the score over a current configuration score. 9. The method of claim 8 , where the non-negligible increase is 0.3. 10. The method of claim 8 , further comprising determining a confidence measure for an optimized haplotype phase for the individual with respect to the single site switch by computing Δ jk SS at k and j, where the genotype of the first individual is missing, Δ jk SS is a maximum of a set of
Related publications grouped by family.
Answers are generated from the same data shown on this page.