Haplotype resolved genome sequencing

US9670530B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9670530-B2
Application numberUS-201414169056-A
CountryUS
Kind codeB2
Filing dateJan 30, 2014
Priority dateJan 30, 2014
Publication dateJun 6, 2017
Grant dateJun 6, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods of determining a haplotype or partial haplotype of a DNA sample containing high molecular weight segments of genomic DNA are disclosed. Such methods may include sequencing DNA in an enriched DNA sample to produce a plurality of sequence reads, where some of the sequence reads contain a first allele of the first haplotype and other of the sequence reads contain a second allele of the first haplotype. Some methods align the sequence reads to a reference genome to produce aligned reads, where aligned reads from the first high molecular weight segment tend to cluster into islands on the reference genome. Some methods further determine distances separating adjacent aligned reads on the reference genome and select a first group of the aligned reads having separation distances to adjacent aligned reads that are smaller than a cutoff value. Using alleles from the first group of aligned reads, the method may define a first haplotype or first partial haplotype.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of determining a haplotype or partial haplotype of a DNA sample comprising high molecular weight segments of genomic DNA, the method comprising: processing the DNA sample to produce an enriched DNA sample enriched for DNA from a first high molecular weight segment having a plurality of alleles from a first haplotype; sequencing DNA in the enriched DNA sample to produce a plurality of sequence reads, which are shorter in length than the first high molecular weight segment, wherein some of the sequence reads contain a first allele of the first haplotype and other of the sequence reads contain a second allele of the first haplotype; aligning the sequence reads to a reference genome to produce aligned reads, wherein aligned reads from the first high molecular weight segment tend to cluster into islands on the reference genome; determining distances separating adjacent ones of the aligned reads on the reference genome, wherein the separation distances between adjacent aligned reads fall into at least two groups distinguishable by the magnitude of their separation distances; determining a cutoff value by (i) generating a mixture model from the separation distances between adjacent aligned reads, wherein the mixture model fits two Gaussian distributions to the separation distances; and (ii) determining the cutoff value from a property of at least one of the two Gaussian distributions; selecting a first group of the aligned reads having separation distances to adjacent aligned reads that are smaller than the cutoff value, thereby excluding aligned reads having greater separation distances, wherein at least a portion of the first group of the aligned reads belong to the same island on the reference genome; and using alleles from the first group of aligned reads to define a first haplotype or first partial haplotype. 2. The method of claim 1 , further comprising determining a complete haplotype from the first partial haplotype and other partial haplotypes. 3. The method of claim 1 , wherein the high molecular weight segments are at least about 50 kb in length on average. 4. The method of claim 1 , wherein the high molecular weight segments are at least about 100 kb in length on average. 5. The method of claim 1 , wherein each of the two Gaussian distributions comprises its own central tendency. 6. The method of claim 1 , wherein generating a mixture model comprises applying an expectation maximization procedure to the separation distances between adjacent aligned reads. 7. The method of claim 1 , wherein determining the cutoff value comprises identifying a fraction of the probability mass of the distribution containing the shorter separation distances. 8. The method of claim 7 , wherein the fraction of the probability mass of the distribution containing the shorter separation distances is about 80% or greater. 9. A system for haplotyping genomic DNA samples comprising: (a) a sequencer for receiving nucleic acids samples and providing nucleic acid sequence information from the sample; (b) a processor; and (c) one or more computer-readable storage media having stored thereon instructions for execution on said processor to evaluate sequence reads from the sequencer, the instructions comprising: (i) aligning the sequence reads to a reference genome to produce aligned reads, wherein aligned reads from a first high molecular weight segment of a genomic DNA sample tend to cluster into islands on the reference genome; (ii) determining distances separating adjacent ones of the aligned reads on the reference genome, wherein the separation distances between adjacent aligned reads fall into at least two groups distinguishable by the magnitude of their separation distances; (iii) determining a cutoff value by generating a mixture model from the separation distances between adjacent aligned reads, wherein the mixture model fits two Gaussian distributions to the separation distances; and determining the cutoff value from a property of at least one of the two Gaussian distributions; (iv) selecting a first group of the aligned reads having separation distances to adjacent aligned reads that are smaller than the cutoff value, thereby excluding aligned reads having greater separation distances, wherein at least a portion of the first group of the aligned reads belong to the same island on the reference genome; and (v) using alleles from the first group of aligned reads to define a first haplotype or first partial haplotype. 10. The system of claim 9 , further comprising a component configured to process a DNA sample to produce an enriched DNA sample enriched for DNA from the first high molecular weight segment having a plurality of alleles from a first haplotype.

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • C12Q1/6869Primary

    Methods for sequencing · CPC title

  • C12Q1/6809Primary

    Methods for determination or identification of nucleic acids involving differential detection · CPC title

  • involving nucleic acid arrays, e.g. sequencing by hybridisation · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9670530B2 cover?
Methods of determining a haplotype or partial haplotype of a DNA sample containing high molecular weight segments of genomic DNA are disclosed. Such methods may include sequencing DNA in an enriched DNA sample to produce a plurality of sequence reads, where some of the sequence reads contain a first allele of the first haplotype and other of the sequence reads contain a second allele of the fir…
Who is the assignee on this patent?
Illumina Inc
What technology area does this patent fall under?
Primary CPC classification C12Q1/6869. Mapped technology areas include Chemistry & Metallurgy.
When was this patent published?
Publication date Tue Jun 06 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).