Linear genome assembly from three dimensional genome structure

US12315601B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12315601-B2
Application numberUS-201716308386-A
CountryUS
Kind codeB2
Filing dateJun 8, 2017
Priority dateJun 8, 2016
Publication dateMay 27, 2025
Grant dateMay 27, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments provide a method for sequencing and assembling long DNA genomes comprising generating a 3D contact map of chromatin loop structures in a target genome, the 3D contact map of chromatin loop structures defining spatial proximity relationships between genomic loci in the genome, and deriving a linear genomic nucleic acid sequence from the 3D map of chromatin loop structures.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for assembly of one or more long DNA molecules comprising: a) performing a DNA proximity ligation assay conducted on one or more samples; b) generating a draft assembly of contigs and scaffolds from input sequencing reads obtained, at least in part, from the DNA proximity ligation assay conducted on one or more samples; c) assembling larger sequences corresponding to one or more DNA molecules in the one or more samples by iteratively overlapping, ordering, orienting, and merging the contigs and scaffolds in the draft assembly, wherein assembling larger sequences is determined, at least in part, by application of a greedy algorithm, an optimization algorithm, or a manual annotation algorithm; d) performing misjoin correction on the scaffolds, wherein the misjoin correction uses contact frequency between sequences in the scaffolds generated from a contact matrix to determine one or more misjoins; e) generating one or more megascaffolds from the corrected scaffolds, wherein generating one or more megascaffolds comprises using a density graph to construct hemi-scaffolds from the corrected scaffolds and transforming the density graph into a confidence graph, the confidence graph constructs one or more megascaffolds from the hemi-scaffolds; and f) generating a final assembly from the megascaffolds. 2. The method of claim 1 , wherein assembling the input sequencing reads, which forms a set, is determined, at least in part, based on a frequency at which all or part of a given sequence forms contact with other sequences and a given sequencing read forms contact with other sequencing reads in the set. 3. The method of claim 2 is wherein a part of a given se. 4. The method of claim 1 , wherein assembling the input sequence is determined, at least in part, based on a relative orientation with which a given sequence forms contacts with other input sequences. 5. The method of claim 4 , wherein the orientation is inner, outer, left, or right. 6. The method of claim 1 , wherein; the input sequencing reads are: contigs, scaffolds, or a combination thereof; generated using short-read sequencing technology, long-read sequencing technology, insert clones, linkage mapping data, physical mapping data, optical mapping data, or a combination thereof; from a single organism or multiple organisms; or from multiple organisms, and the multiple organisms are from a same or different species; or a combination thereof; the one or more DNA molecules are chromosomes, portions of chromosomes, plasmids, or other nucleotide sequences; consecutive sequences in the assembly are merged to increase contiguity of the assembly; the DNA proximity ligation assay is Hi-C; or a combination thereof. 7. The method of claim 1 , wherein assembling the input sequences is performed, at least in part, based on analyzing the sequences of the contigs and scaffolds. 8. The method of claim 7 , wherein flanking sequences of the sequences of the contigs and scaffolds are analyzed. 9. The method of claim 1 , further comprising assembling a draft assembly prior to generating the final assembly. 10. The method of claim 9 , further comprising identifying neighboring contigs in the draft assembly, wherein the neighboring contig is a contig and/or scaffold located within a given linear genomic distance according to the draft assembly. 11. The method of claim 1 , further comprising identifying different sub-compartments with different distance scaling and long range contact pattern based on massively multiplex single cell DNA-DNA proximity ligation assay. 12. The method of claim 11 , wherein: DNA-DNA proximity ligation data from different subsets is used for different assembly related tasks, optionally wherein the assembly related tasks comprise of misassembly detection or contig ordering; DNA-DNA proximity ligation data from different subsets is used to perform tasks at different scales, optionally wherein the scales comprise of kilobase or megabase; or a Hi-C ligation protocol is performed on synchronized populations of cells. 13. The method of claim 1 , wherein a Hi-C ligation protocol is performed on one or more cells that have been treated to modify genome folding. 14. The method of claim 13 , where the treatment to modify genome folding is gene editing. 15. The method of claim 14 , where the gene editing method is CRISPR or TALEN. 16. The method of claim 1 , to assemble transcriptomes, thus generating a draft assembly. 17. The method of claim 16 , wherein the draft assembly spans sequences of genes associated with RNA transcripts found in a cell. 18. The method of claim 17 , wherein the final assembly performs one or more following tasks: assigning genes to chromosomes; determining the order and orientation of the genes; and estimating distances between genes. 19. The method of claim 1 , wherein the one or more DNA molecules do not correspond to genes. 20. The method of claim 1 , where bisulfite treatment is applied to ligation products derived from a proximity ligation experiment. 21. The method of claim 20 , wherein the final assembly is used to: analyze proximity between DNA loci in a sample; determine a frequency of methylation for one or more bases in a sample; or determine whether one or more loci tend to be methylated simultaneously. 22. The method of claim 1 , wherein the DNA proximity ligation assay is used to generate a 3D contact map, the 3D contact map defines one or more contact domains. 23. The method of claim 22 , wherein the 3D contact map defines: one or more loops; one or more compartments; one or more superloops; one or more compartment interactions; other 3D features; centromere and telomere regions; or a combination thereof. 24. A method for genome assembly comprising: a) performing a DNA proximity ligation assay; b) generating a draft assembly of contigs and scaffolds from input sequencing reads obtained, at least in part, the DNA proximity ligation assay conducted on one or more samples; c) assembling larger sequences corresponding to one or more DNA molecules in the one or more samples by iteratively overlapping, ordering, orienting, and merging the contigs and scaffolds in the draft assembly, wherein a proper orientation of contigs and/or scaffolds is determined, at least in part, by 3D contact features, wherein the features in question are centromere-to-centromere interactions, telomere-to-telomere interactions and centromere-to-telomere interactions or a combination thereof; d) performing misjoin correction on the scaffolds, wherein the misjoin correction uses contact frequency between sequences in the scaffolds generated from a contact matrix to determine one or more misjoin; and e) generating one or more megascaffolds from the corrected scaffolds, wherein generating one or more megascaffolds comprises using a density graph to construct hemi-scaffolds from the corrected scaffolds and transforming the density graph into a confidence graph, the confidence graph constructs one or more megascaffolds from the hemi-scaffolds, the contact features are determined, at least in part, by data from the DNA proximity ligation assay. 25. The method of claim 24 , wherein: contacts associated with the reads correspond to one or more pixels in a contact map; or a combination thereof.

Assignees

Inventors

Classifications

  • C12Q1/6869Primary

    Methods for sequencing · CPC title

  • Nucleic acid folding · CPC title

  • Sequence assembly · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • G16B5/10Primary

    Boolean models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12315601B2 cover?
Embodiments provide a method for sequencing and assembling long DNA genomes comprising generating a 3D contact map of chromatin loop structures in a target genome, the 3D contact map of chromatin loop structures defining spatial proximity relationships between genomic loci in the genome, and deriving a linear genomic nucleic acid sequence from the 3D map of chromatin loop structures.
Who is the assignee on this patent?
Broad Inst Inc, Baylor College Medicine
What technology area does this patent fall under?
Primary CPC classification C12Q1/6869. Mapped technology areas include Chemistry & Metallurgy.
When was this patent published?
Publication date Tue May 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).