Method and system for phasing individual genomes in the context of clinical interpretation

US9928338B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9928338-B2
Application numberUS-201213487064-A
CountryUS
Kind codeB2
Filing dateJun 1, 2012
Priority dateJun 1, 2011
Publication dateMar 27, 2018
Grant dateMar 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure presents a unified system to phase a personal genome for downstream clinical interpretation. In an embodiment, an initial phasing is generated using public datasets, such as haplotypes from the 1000 Genomes Project, and a phasing toolkit. A local perturbation algorithm is applied to improve long range phasing. If available, a Mendelian inheritance pipeline is applied to identify phasing of novel and rare variants. These datasets are merged, followed by correction by any experimental data. This allows for full clinical interpretation of the role of a group of variants in a gene, whether inherited or de novo variants.

First claim

Opening claim text (preview).

What is claimed is: 1. A computerized method for inferring a haplotype phase for an individual in a collection of unrelated individuals utilizing a dynamically linked matrix data structure, comprising: receiving unphased individual genotype data describing a human genotype for an individual from a database and storing the individual genotype data on a memory of a computer system; receiving genotype data describing human genotypes for a plurality of individuals from a database and storing the genotype data on a memory of a computer system; imputing an initial haplotype phase for the individual and each individual in the plurality of individuals based on a statistical model and storing the initial haplotype phase for each individual in the plurality of individuals on a computer system comprising a processor a memory; building a data structure describing a Hidden Markov Model, where the data structure contains: a set of predicted haplotype phases for each individual in the plurality of individuals; and a set of parameters comprising local recombination rates and mutation rates; wherein any change to the set of predicted haplotype phases contained within the data structure automatically results in re-computation of the set of parameters comprising local recombination rates and mutation rates contained within the data structure; repeatedly randomly modifying at least one of the predicted haplotype phases in the set of predicted haplotype phases to automatically re-compute a new set of parameters comprising local recombination rates and mutation rates that are stored within the data structure; automatically replacing a predicted haplotype phase for an individual with a randomly modified haplotype phase within the data structure, when the new set of parameters indicate that the randomly modified haplotype phase is more likely than an existing predicted haplotype phase; and extracting at least one final predicted haplotype phases from the data structure as a phased haplotype for an individual; and storing the phased haplotype phase for the individual on a memory of a computer system. 2. The method of claim 1 , wherein the initial haplotype phase for the individual and each individual in the plurality of individuals is imputed from a public dataset, and wherein the method further comprises applying a Mendelian inheritance pipeline to the haplotype phase for the individual using a computer system to generate a dataset identifying phasing variants of interest and storing the dataset in the memory of a computer system, where the Mendelian inheritance pipeline is a dataset comprising genetic variants associated with Mendelian disease. 3. The method of claim 1 , further comprising correcting the haplotype phase for the individual using experimental data. 4. The method of claim 3 , where experimental data is paired end sequencing data. 5. The method of claim 1 , where the variants of interest are compound heterozygous. 6. The method of claim 1 , further comprising optimizing the haplotype phase for the first individual by maximizing a score, where the score is based on identity between the haplotype phase for the individual and an imputed haplotype in the set of imputed haplotypes for the plurality of individuals. 7. The method of claim 6 , where the score is S( ) where S ⁡ ( A , ρ , θ ) := ∑ j = 1 N ⁢ ∑ s = 0 1 ⁢ log ⁢ ⁢ q j ⁡ ( B = A 2 ⁢ ⁢ j - s , . ; ρ , θ , A - j ) , where j is a sample index, where B is an emitted haplotype sequence for the plurality of individuals, where the generated matrix is a matrix A, having dimensions 2N by L, where N is a number of genotypes for the plurality of individuals and L is a number of genetic markers, comprises the initial haplotype phase for each individual in the plurality of individuals stored in rows A 2j−1 and A 2j , where a matrix A −j is the matrix A with the initial haplotype phase for the particular individual removed, where ρ is the local recombination rate, and where θ is the mutation rate. 8. The method of claim 1 , further comprising optimizing the haplotype phase for the first individual by selecting a single site move that results in a highest scoring pair for a genotype of the first individual, where the highest scoring pair results in a non-negligible increase in the score over a current configuration score. 9. The method of claim 8 , where the non-negligible increase is 0.3. 10. The method of claim 8 , further comprising determining a confidence measure for an optimized haplotype phase for the individual with respect to the single site switch by computing Δ jk SS at k and j, where the genotype of the first individual is missing, Δ jk SS is a maximum of a set of

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • G06F19/22Primary

    Physics · mapped topic

  • Physics · mapped topic

  • G16B30/00Primary

    ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9928338B2 cover?
The present disclosure presents a unified system to phase a personal genome for downstream clinical interpretation. In an embodiment, an initial phasing is generated using public datasets, such as haplotypes from the 1000 Genomes Project, and a phasing toolkit. A local perturbation algorithm is applied to improve long range phasing. If available, a Mendelian inheritance pipeline is applied to i…
Who is the assignee on this patent?
Tang Hua, Snyder Michael, Than Jennifer Li Pook, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F19/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).