Automated design of primer sets for nucleic acid amplification
US-2024336954-A1 · Oct 10, 2024 · US
US10068053B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10068053-B2 |
| Application number | US-201414571022-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 15, 2014 |
| Priority date | Dec 16, 2013 |
| Publication date | Sep 4, 2018 |
| Grant date | Sep 4, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatuses are provided for creating and using a machine-leaning model to call a base at a position of a nucleic acid based on intensity values measured during a production sequencing run. The model can be trained using training data from training sequencing runs performed earlier. The model is trained using intensity values and assumed sequences that are determined as the correct output. The training data can be filtered to improve accuracy. The training data can be selected in a specific manner to be representative of the type of organism to be sequenced. The model can be trained to use intensity signals from multiple cycles and from neighboring nucleic acids to improve accuracy in the base calls.
Opening claim text (preview).
What is claimed is: 1. A method of calling one or more bases for a nucleic acid of an organism, the method comprising: receiving, at a computer system, a basecalling model, the basecalling model configured to: receive inputs of intensity values for bases at one or more positions on a nucleic acid, and output a base call for each of the one or more positions, wherein the basecalling model is trained using a statistically significant number of assumed sequences of training nucleic acids and corresponding intensity values for bases at the positions of the assumed sequences, the corresponding intensity values being obtained from one or more first sequencing processes of training nucleic acids; receiving, at the computer system, sequencing data of test nucleic acids from a second sequencing process that is different from any of the one or more first sequencing processes, the sequencing data including intensity values for bases at a plurality of positions of a first test nucleic acid; for each of N positions of the first test nucleic acid: identifying intensity values corresponding to the position; determining, by the computer system, a first base call at a first position of the N positions using the basecalling model based on inputs of the intensity values for the N positions, where N is an integer greater than 1, wherein the basecalling model provides scores for each of a plurality of bases, and wherein determining the first base call includes: calculating, by the computer system, scores for each of the plurality of bases at the first position of the N positions using the basecalling model based on inputs of the intensity values for the N positions; and calling, by the computer system, the base corresponding to a highest score for the first position when the highest score satisfies one or more criteria; and calling a base at M positions based on the scores at the N positions, where M is less than or equal to N and greater than one. 2. The method of claim 1 , wherein an intensity value corresponds to a plurality of positions, and each score corresponds to the plurality of positions or to a particular base at one of the plurality of positions. 3. The method of claim 1 , wherein the basecalling model includes a neural network. 4. The method of claim 3 , wherein the neural network outputs raw scores, and wherein the basecalling model includes a post-processing function that modifies the raw scores. 5. The method of claim 3 , wherein the basecalling model includes a plurality of neural networks, the method further comprising: for each of the plurality of bases: determining a respective score using each of the plurality of neural networks; calculating a combined score from the respective scores; and using the combined score as the score for the base at the first position. 6. The method of claim 1 , wherein each intensity value corresponds to one base, and wherein multiple intensity values corresponds to one base. 7. The method of claim 1 , further comprising: performing the second sequencing process on the test nucleic acids. 8. The method of claim 1 , wherein the N positions are not sequential. 9. The method of claim 1 , wherein the basecalling model includes a plurality of intermediate models, the method further comprising: for each of the intermediate models: making a respective base call; determining a consensus base call from the respective base calls; and using the consensus base call for the first position. 10. The method of claim 1 , wherein the basecalling model is further configured to receive inputs of intensity values for one or more neighboring nucleic acids that neighbor the first test nucleic acid. 11. The method of claim 10 , wherein the intensity values for one or more neighboring nucleic acids are for a same cycle as the first position of the first test nucleic acid. 12. The method of claim 10 , wherein the one or more neighboring nucleic acids are within a specified distance. 13. The method of claim 12 , wherein the first nucleic acid and the one or more neighboring nucleic acids are on an ordered lattice, and wherein the specified distance is a number of lattice points separating the first test nucleic acid and the one or more neighboring nucleic acids. 14. The method of claim 12 , wherein the first nucleic acid and the one or more neighboring nucleic acids are not ordered, and wherein the specified distance is a length. 15. A computer product comprising a computer readable medium storing a plurality of instructions for controlling a processor to perform the method of claim 1 . 16. The method of claim 1 , further comprising creating the basecalling model by: receiving sequencing data of training nucleic acids from the one or more first sequencing processes, the sequencing data including intensity values for bases at positions of the training nucleic acids, the training nucleic acids being from one or more training samples; for each of a set of the training nucleic acids: performing an initial base call at positions of the training nucleic acid to obtain an initial sequence based at least on the intensity values at the positions of the training nucleic acid; and determining an assumed sequence corresponding to the initial sequence, wherein the assumed sequence is assumed to be a correct sequence for the positions of the training nucleic acid; and generating the basecalling model using the assumed sequences and the intensity values corresponding to the assumed sequences. 17. A method of calling one or more bases for a nucleic acid of an organism, the method comprising: receiving, at a computer system, a basecalling model, the basecalling model configured to: receive inputs of intensity values for bases at one or more positions on a nucleic acid, and output a base call for each of the one or more positions, wherein the basecalling model is trained using a statistically significant number of assumed sequences of training nucleic acids and corresponding intensity values for bases at the positions of the assumed sequences, the corresponding intensity values being obtained from one or more first sequencing processes of training nucleic acids; receiving, at the computer system, sequencing data of test nucleic acids from a second sequencing process that is different from any of the one or more first sequencing processes, the sequencing data including intensity values for bases at a plurality of positions of a first test nucleic acid; for each of N positions of the first test nucleic acid: identifying intensity values corresponding to the position; determining, by the computer system, a first base call at a first position of the N positions using the basecalling model based on inputs of the intensity values for the N positions, where N is an integer equal to or greater than 1, wherein the basecalling model provides scores for each of a plurality of bases, and wherein determining the first base call includes: calculating, by the computer system, scores for each of the plurality of bases at the first position of the N positions using the basecalling model based on inputs of the intensity values for the N positions; and calling, by the computer system, the base corresponding to a highest score for the first position when the highest score satisfies one or more criteria, and wherein the one or more criteria include at least one of: the highest score being greater than a first threshold, and a difference between the highest score and a next highest score being greater than a second threshold. 18. The meth
ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
Physics · mapped topic
Physics · mapped topic
Supervised data analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.