Selection device for candidate sequence information for similarity determination, selection method, and use for such device and method
US-2015379197-A1 · Dec 31, 2015 · US
US2016378916A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016378916-A1 |
| Application number | US-201615195741-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 28, 2016 |
| Priority date | Jun 15, 2009 |
| Publication date | Dec 29, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention is directed to logic for analysis of nucleic acid sequence data that employs algorithms that lead to a substantial improvement in sequence accuracy and that can be used to phase sequence variations, e.g., in connection with the use of the long fragment read (LFR) process.
Opening claim text (preview).
What is claimed is: 1 . A method of determining a sequence of at least a portion of a genome of an organism from a sample comprising genomic DNA of the organism, the method comprising: aliquoting the sample to produce a plurality of aliquots, each aliquot comprising less than a genome equivalent of genomic DNA fragments of the organism, the sample including genomic DNA that is not in a cell at the time of aliquoting; tagging the DNA fragments in each aliquot with an aliquot-specific tag sequence to produce tagged fragments; for each aliquot, sequencing the tagged fragments from the aliquot to obtain signals for bases at positions of the tagged fragments; analyzing, by a computer system, the signals to produce a plurality of reads, the analysis including a basecalling process that determines base calls at positions of the tagged fragments, each read comprising an aliquot-specific tag sequence; counting, by the computer system, aliquots that include a particular base call on a read at a particular position in the genome using the aliquot-specific tag sequences, wherein one or more reads from a first number of the aliquots comprise a first base call at a first position in the genome and reads from a second number of aliquots comprise a different second base call at the first position in the genome; identifying, by the computer system, the first base call as a false base call when the first number of the aliquots in which the first base call appears at the first position is less than a first threshold number of aliquots, the first threshold number being two or greater than two; and assembling, by the computer system, the plurality of reads to produce an assembled sequence, wherein the assembled sequence excludes the first base call at the first position when the first base call is identified as a false base call, the assembled sequence corresponding to at least a portion of the genome of the organism. 2 . The method of claim 1 wherein the genome is a mammalian genome and the assembled sequence has a genome call rate of 70 percent or greater and an exome call rate of 70 percent or greater, wherein the assembled sequence comprises no more than one false single nucleotide variant per megabase. 3 . The method of claim 1 wherein the genome comprises at least one gigabase. 4 . The method of claim 1 wherein the genome is double stranded, the method comprising separating single strands of the double stranded genomic DNA before aliquoting. 5 . The method of claim 1 comprising amplifying the DNA fragments in each aliquot. 6 . The method of claim 5 , wherein the amplification uses adapters or random primers. 7 . The method of claim 5 comprising amplifying the DNA fragments in each aliquot by multiple displacement amplification. 8 . The method of claim 5 comprising amplifying the DNA fragments in each aliquot at least 1000-fold. 9 . The method of claim 5 wherein the sample comprises 1 to 20 cells of the organism. 10 . The method of claim 9 wherein the sample comprises cellular contaminants, the method comprising amplifying the DNA fragments in each aliquot in the presence of the cellular contaminants. 11 . The method of claim 9 wherein the cells are circulating non-blood cells from blood of the higher organism. 12 . The method of claim 1 wherein the assembled sequence has a call rate of at least 70 percent of the genome. 13 . The method of claim 1 wherein the sample comprises from 1 pg to 10 ng of the genome. 14 . The method of claim 13 wherein the assembled sequence has fewer than one false single nucleotide variant per megabase. 15 . The method of claim 1 comprising: receiving a plurality of intact cells of the organism; and disrupting the intact cells to release the genomic DNA, thereby producing the sample comprising genomic DNA of the organism. 16 . The method of claim 1 wherein the sample is aliquoted into wells of a multiwall plate. 17 . The method of claim 1 wherein the sample is aliquoted into droplets. 18 . The method of claim 1 comprising: identifying, by the computer system, the first base call as a false base call at the first position in the genome when the first base call appears in at least a second threshold amount of aliquots that also include the second base call at the first position, where the second number of aliquots is greater than the first number of aliquots. 19 . The method of claim 18 , wherein the second threshold amount is a percentage of aliquots that include the false base call. 20 . The method of claim 1 , wherein the fragments are 50-2000 nucleotides in length. 21 . The method of claim 1 , further comprising: determining, by the computer system, the first number of the aliquots by: identifying a first set of reads that align to the first position of the genome and have the first base call; and counting unique aliquot-specific tag sequences in the first set. 22 . The method of claim 21 comprising: determining, by the computer system, reads that align to the first position by aligning reads to each other. 23 . The method of claim 21 comprising: determining, by the computer system, reads that align to the first position by aligning reads to a reference genome. 24 . The method of claim 1 comprising: determining, by the computer system, a third number of aliquots including a particular base call at a second position in the genome; and using, by the computer system, the third number of aliquots to determine whether the particular base call is accepted at the second position. 25 . The method of claim 24 comprising: determining, by the computer system, a score for the particular base call being at the second position in the genome, the score based on the third number of aliquots including the particular base call; and comparing, by the computer system, the score to a first threshold; and identifying, by the computer system, whether the particular base call is accepted or an error based on whether the score is greater than or less than the threshold. 26 . The method of claim 25 comprising: determining, by the computer system, one or more other scores for other base calls at the second position, wherein the second position is determined to be a no call when all of the scores are below a threshold. 27 . The method of claim 25 , wherein the score is a percentage of expected aliquots. 28 . The method of claim 27 comprising: determining, by the computer system, that the second position is heterozygous in the genome when two scores are above a second threshold. 29 . The method of claim 28 wherein a third score for a third other base call is below a third threshold. 30 . The method of claim 1 , wherein the aliquot-specific tag sequence includes an aliquot-specific set of tags. 31 . The method of claim 1 , wherein the signals obtained from the sequencing correspond to intensities of color dyes. 32 . The method of claim 1 , wherein the organism is a human.
Physics · mapped topic
Methods for sequencing · CPC title
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
Sequence assembly · CPC title
Sequence alignment; Homology search · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.