Models for Targeted Sequencing
US-2024321389-A1 · Sep 26, 2024 · US
US10839940B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10839940-B2 |
| Application number | US-200913139809-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 23, 2009 |
| Priority date | Dec 24, 2008 |
| Publication date | Nov 17, 2020 |
| Grant date | Nov 17, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Exemplary embodiments of the present disclosure relate generally to methods, computer-accessible medium and systems for assembling haplotype and/or genotype sequences of at least one genome, which can be based upon, e.g., consistent layouts of short sequence reads and long-range genome related data. For example, a processing arrangement can be configured to perform a procedure including, e.g., obtaining randomly located short sequence reads, using at least one score function in combination with constraints based on, e.g., the long range data, generating a layout of randomly located short sequence reads such that the layout is globally optimal with respect to the score function, obtained through searching coupled with score and constraint dependent pruning to determine the globally optimal layout substantially satisfying the constraints, generating a whole and/or a part of a genome wide haplotype sequence and/or genotype sequence, and converting a globally optimal layout into one or more consensus sequences.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer-accessible medium having stored thereon computer executable instructions for assembling at least one part of at least one of at least one haplotype sequence or at least one genotype sequence of at least one genome, wherein, when the executable instructions are executed by a computer processing arrangement, the processing arrangement is configured to perform at least one procedure comprising: (a) obtaining (i) a plurality of randomly located short sequence reads, and (ii) overlap information about overlaps between the randomly located short sequence reads; (b) obtaining long range information for the randomly located short sequence reads, wherein the long range information includes optical map data and mate-pair data; (c) automatically randomly selecting a first read from the randomly located short sequence reads; (d) automatically identifying one or more overlapping second reads of the randomly located short sequence reads that overlap with the first read; (e) automatically generating one or more scores for the one or more overlapping second reads using the overlap information and the long range information; (f) selecting a particular read of the one or more second overlapping reads based on the one or more scores; (g) automatically generating a path through the plurality of randomly located short sequence reads by repeating procedures (e) and (f); and (h) automatically assembling the at least one part of the at least one of the at least one haplotype sequence or the at least one genotype sequence of the genome based on the path. 2. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the one or more scores based on at least one of a containment or an overhang among a single pair of the randomly located short sequence reads. 3. The computer-accessible medium of claim 2 , wherein the processing arrangement is further configured to evaluate the at least one of the containment or the overhang using at least one of (i) an orientation of the randomly located short sequence reads, (ii) a location of the randomly located short sequence reads, or (iii) a haplotypic identity of the randomly located short sequence reads. 4. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the one or more scores using a weighted transitivity score. 5. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the one or more scores using a Bayesian likelihood. 6. The computer-accessible medium of claim 5 , wherein the Bayesian likelihood is based on at least one penalty function. 7. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the one or more scores based on a plurality of homologous reference sequences. 8. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the one or more scores based on short range information. 9. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to prune at least one of the paths. 10. The computer-accessible medium of claim 9 , wherein the processing arrangement is configured to prune the at least one of the paths based on the one or more scores. 11. The computer-accessible medium of claim 9 , wherein the processing arrangement is configured to prune the at least one of the paths based on the overlap information. 12. The computer-accessible medium of claim 9 , wherein the processing arrangement is configured to prune the at least one of the paths based on a maximum number of candidate paths allowed in a queue. 13. The computer-accessible medium of claim 12 , wherein the maximum number of candidate paths allowed in the queue is fixed. 14. The computer-accessible medium of claim 9 , wherein the processing arrangement is configured to prune the at least one of the paths based on a percentage of top ranking paths compared to an optimum score. 15. The computer-accessible medium of claim 14 , wherein the percentage of top ranking paths compared to an optimum score dynamically changes over time. 16. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to obtain the randomly located short sequence reads using at least one of (i) Sanger chemistry, (ii) sequencing-by-synthesis, (iii) sequencing-by-hybridization, or (iv) sequencing-by-ligation. 17. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to obtain the randomly located short sequence reads using at least one method having at least one error, wherein the at least one error is at least one of: (i) incorrect base-calls, (ii) missing bases, (iii) inserted bases, or (iv) homopolymeric compression. 18. The computer-accessible medium of claim 1 , wherein the long-range information further includes a physical map that is at least one of (i) an ordered restriction map, (ii) a probe map, or (iii) a base-distribution map. 19. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to evaluate the scoring procedure based on a consistency of the one or more scores with respect to the long-range information by determining a local alignment with an alignment score. 20. The computer-accessible medium of claim 1 , wherein the randomly located short sequence reads are generated using at least one procedure having at least one error, and wherein the at least one error is at least one of: (i) incorrect base-calls, (ii) missing bases, (iii) inserted bases, (iv) homopolymeric compression or (v) expansion. 21. The computer-accessible medium of claim 1 , wherein the long-range comprises approximately 10 Kb-200 mb of information associated with the at least one genome. 22. A method for assembling at least one part of at least one of at least one haplotype sequence or at least one genotype sequence of at least one genome, comprising: (a) obtaining (i) a plurality of randomly located short sequence reads, and (ii) overlap information about overlaps between the randomly located short sequence reads; (b) obtaining long range information for the randomly located short sequence reads, wherein the long range information includes optical map data and mate-pair data; (c) automatically randomly selecting a first read from the randomly located short sequence reads; (d) automatically identifying one or more overlapping second reads of the randomly located short sequence reads that overlap with the first read; (e) automatically generating one or more scores regarding the one or more overlapping second reads using the overlap information and the long range information; (f) selecting a particular read of the one or more second overlapping reads based on the one or more scores; (g) automatically generating a path through the plurality of randomly located short sequence reads by repeating procedures (e) and (f); and (h) using a computer hardware arrangement, automatically assembling the at least one part of the at least one of the at least one haplotype sequence or the at least one genotype sequence of the genome based on the path. 23. The method of claim 22 , further comprising generating the one or more scores based on at least one of a containment or an overhang among a single pair of the randomly located
Related publications grouped by family.
Answers are generated from the same data shown on this page.