Systems and Methods for Correcting for Noise and Systemic Variations in Sequencing Data
US-2024404627-A1 · Dec 5, 2024 · US
US12027236B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12027236-B2 |
| Application number | US-201916247502-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 14, 2019 |
| Priority date | Jan 14, 2018 |
| Publication date | Jul 2, 2024 |
| Grant date | Jul 2, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided herein includes a method for generating an error-corrected genome assembly for an organism comprising: generating a genomic contact map derived from a DNA proximity ligation assay conducted on one or more samples from the organism or a closely related species; superimposing a reference assembled genome derived from whole genome sequencing of one or more samples from the organism on top of the genomic contact map using computer software; correcting errors in the reference assembled genome through a computer user interface to obtain a corrected assembly file, wherein errors in the reference assembled genome are visualized by observing aberrant contacts in the genomic contact map; and applying the corrected assembly file to the reference assembled genome.
Opening claim text (preview).
What is claimed is: 1. A method for generating an error-corrected genome assembly for an organism, comprising: a) performing whole genome sequencing on one or more samples from the organism or closely related species, wherein a reference assembled genome is generated from the sequencing contains gaps in adjacent contigs or a scaffold; b) performing a DNA proximity ligation assay on one or more samples from the organism or closely related species and generating a genomic contact map derived from the DNA proximity ligation assay conducted on the one or more samples from the organism or a related species; c) superimposing the positions of adjacent contigs or scaffolds from the reference assembled genome derived from the whole genome sequencing of one or more samples from the organism on top of the genomic contact map using computer software, wherein computer software comprises using a density graph; d) correcting errors in the reference assembled genome through a computer user interface, wherein correcting errors comprises incorporating sequences from the genomic contact map derived from the DNA proximity ligation assay thereby filling the gaps in the reference assembled genome, to obtain a corrected assembly file, wherein errors in the reference assembled genome are visualized by observing aberrant contacts in the genomic contact map; and e) generating an error corrected genome assembly data file, wherein the error corrected genome assembly is the final permutated reference assembled genome. 2. The method of claim 1 , wherein the DNA proximity ligation assay is Hi-C. 3. The method of claim 1 , wherein the reference assembled genome is generated using short-read sequencing technology, long-read sequencing technology, insert clones, linkage mapping data, physical mapping data, optical mapping date, or a combination thereof. 4. The method of claim 1 , wherein observing aberrant contacts in the genomic contact map is based, at least in part, on the frequency of contacts between one part of a contig or scaffold and other parts of the same contig or scaffold, or based on the frequency of contact between one part of a contig or scaffold and other contigs and scaffolds, or a combination thereof. 5. The method of claim 4 , wherein the aberrant contacts are misjoins, rearrangements, translocations, inversions, insertion, deletions, repeats, alignment errors, due to features of how the genome folds in three dimensions, cyclic permutations of the chromosomes, or a combination thereof. 6. The method of claim 5 , wherein the translocations are balanced translocations, unbalanced translocations, or a combination thereof. 7. The method of claim 5 , wherein the repeats are tandem repeats. 8. The method of claim 5 , wherein a misjoin comprises a point along the diagonal of the contact map, a translocation comprises an extremely bright arrowhead motif pointing towards the diagonal of the contact map, and an inversion comprises two arrowhead motifs pointing at one another. 9. The method of claim 1 , wherein the organism is an animal or a plant.
ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title
ICT programming tools or database systems specially adapted for bioinformatics · CPC title
ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks · CPC title
Sequence assembly · CPC title
Sequence alignment; Homology search · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.