Models for Targeted Sequencing
US-2024321389-A1 · Sep 26, 2024 · US
US11830581B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11830581-B2 |
| Application number | US-201916295836-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 7, 2019 |
| Priority date | Mar 7, 2019 |
| Publication date | Nov 28, 2023 |
| Grant date | Nov 28, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An iterative process for optimizing one or more parameters used by a k-mer based de novo genome assembler program to assemble a set of sequenced nucleic acids is described. The method utilizes quality metrics whose desired values are initially specified. Computed values of the quality metrics are calculated during the assembly process and compared to the desired values. The assembly process stops when the computed values are not desired values. After modification of one or more of the parameters (e.g., k-mer value), the assembly process re-initiates using the modified parameter set. This process repeats until the computed values of the quality metrics meet the desired values. The final parameter set is then used to generate or complete one or more final assembled genomes.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: selecting a k-mer-based de novo genome assembler program, designated assembler; providing a set of parameters for assembly of a set of sequenced nucleic acids, the parameters having respective values and respective priority scores, the parameters including a k-mer parameter having an initial length k in number of nucleotides, k being a positive integer equal to at least 35; providing a set of quality metrics for assembly of a set of sequenced nucleic acids, the quality metrics having respective weights indicating importance, respective target values, and respective computed values calculated during assembly, a given quality metric depending on one or more of the parameters; initiating assembly of the set of sequenced nucleic acids by the assembler, the assembler utilizing the set of parameters and the set of quality metrics, thereby forming intermediate assembled sequences; performing a procedure iteratively until the respective computed values of the quality metrics equal the respective target values, the procedure comprising the steps of: i) stopping the assembler when the respective computed values of the quality metrics do not equal the respective target values, ii) modifying at least one of the parameters and/or at least one of the quality metrics, iii) deleting any intermediate assembled sequences, iv) initiating assembly of the set of sequenced nucleic acids by the assembler, and v) calculating values of the quality metrics including any modified quality metrics, the procedure terminating while utilizing a set of final parameters and a set of final quality metrics, the final parameters including a final k-mer parameter of length k′; and completing assembly of the set of sequenced nucleic acids using the assembler, the set of final parameters, and the set of final quality metrics, thereby forming one or more final assembled genomes; wherein the intermediate assembled sequences have sizes, a mean size, a median size, and a standard deviation of the sizes of the intermediate assembled sequences calculated during the assembly; wherein a given taxonomic rank contains reference genomes having sizes, a mean size, a median size, and a standard deviation of the sizes of the reference genomes; and wherein the assembly includes a quality metric for the difference between the mean size of the intermediate assembled sequences and the mean size of the reference genomes. 2. The method of claim 1 , wherein the method comprises storing the final parameters and the final quality metrics with the one or more final assembled genomes. 3. The method of claim 1 , wherein the method is performed by a computer system without human intervention. 4. The method of claim 1 , wherein the intermediate assembled sequences are partially assembled genomes. 5. The method of claim 1 , wherein the intermediate assembled sequences are wholly assembled genomes. 6. The method of claim 1 , wherein the quality metrics include a member selected from the group consisting of N50, NA50, NG50, NGA50, L50, LA50, LG50, LGA50, and combinations thereof. 7. The method of claim 1 , wherein the quality metrics include a member selected from the group consisting of number of contigs, number of contigs above a given size, number of contig edges, number of connections within contigs, and combinations thereof. 8. The method of claim 1 , wherein the assembly includes a quality metric for the difference between the median size of the intermediate assembled sequences and the median size of the reference genomes. 9. The method of claim 5 , wherein the assembly includes a quality metric for the difference between the standard deviation of the sizes of the intermediate assembled sequences and the standard deviation of the sizes of the reference genomes. 10. The method of claim 1 , wherein the quality metrics include a member selected from the group consisting of coverages per nucleotide base, coverages per contig, coverages per assembly, number of misassembles, relative abundances of nucleotides, repetitive content of nucleotides, and combinations thereof. 11. The method of claim 1 , wherein k′ is optimal for one of the final assembled genomes. 12. The method of claim 1 , wherein k′ is an average of values of the k-mer parameter used in the procedure. 13. The method of claim 1 , wherein the k-mer parameter has a lower priority score and/or lower ranking compared to another parameter. 14. The method of claim 13 , wherein said another parameter is a member selected from the group consisting of coverage cutoff value, number of mismatches allowed, expected genome size, insert size, and sequencing error rate of the intermediate assembled sequences. 15. The method of claim 1 , wherein k′ is optimal for a subset of the final assembled genomes, the subset comprising more than one final assembled genome. 16. The method of claim 15 , wherein the subset is defined by taxonomic rank. 17. The method of claim 15 , wherein the subset is defined by sequencing method. 18. The method of claim 17 , wherein the sequencing method is a member selected from the group consisting of Sanger, Illumina, PacBio, 454, Ion Torrent, and SOLid. 19. A system comprising one or more computer processor circuits configured and arranged to: select a k-mer-based de novo genome assembler program, designated assembler; provide a set of parameters for assembly of a set of sequenced nucleic acids, the parameters having respective values and respective priority scores, the parameters including a k-mer parameter having an initial length k in number of nucleotides, k being a positive integer equal to at least 35; provide a set of quality metrics for assembly of a set of sequenced nucleic acids, the quality metrics having respective weights indicating importance, respective target values, and respective computed values calculated during assembly, a given quality metric depending on one or more of the parameters; initiate assembly of the set of sequenced nucleic acids by the assembler, the assembler utilizing the set of parameters and the set of quality metrics, thereby forming intermediate assembled sequences, wherein the intermediate assembled sequences are wholly assembled genomes; perform a procedure iteratively until the respective computed values of the quality metrics equal the respective target values, the procedure comprising the steps of: i) stopping the assembler when the respective computed values of the quality metrics do not equal the respective target values, ii) modifying at least one of the parameters and/or at least one of the quality metrics, iii) deleting any intermediate assembled sequences, iv) initiating assembly of the set of sequenced nucleic acids by the assembler, and v) calculating values of the quality metrics including any modified quality metrics, the procedure terminating while utilizing a set of final parameters and a set of final quality metrics, the final parameters including a final k-mer parameter of length k′; and complete assembly of the set of sequenced nucleic acids using the assembler, the set of final parameters, and the set of final quality metrics, thereby forming one or more final assembled genomes; wherein the assembly includes a quality metric for the difference between the standard deviation of the sizes of the intermediate assembled sequences and the standard deviation of the sizes of the reference genomes. 20. A computer program product, comprising a computer readable hardware storage device having a computer-readable pro
Related publications grouped by family.
Answers are generated from the same data shown on this page.