Selection device for candidate sequence information for similarity determination, selection method, and use for such device and method
US-2015379197-A1 · Dec 31, 2015 · US
US2016232291A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016232291-A1 |
| Application number | US-201615019928-A |
| Country | US |
| Kind code | A1 |
| Filing date | Feb 9, 2016 |
| Priority date | Feb 9, 2015 |
| Publication date | Aug 11, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for determining structural variation and phasing using variant call data obtained from nucleic acid of a biological sample are provided. Sequence reads are obtained, each comprising a portion corresponding to a subset of the test nucleic acid and a portion encoding a barcode independent of the sequencing data. Bin information is obtained. Each bin represents a different portion of the sample nucleic acid. Each bin corresponds to a set of sequence reads in a plurality of sets of sequence reads formed from the sequence reads such that each sequence read in a respective set of sequence reads corresponds to a subset of the nucleic acid represented by the bin corresponding to the respective set. Binomial tests identify bin pairs having more sequence reads with the same barcode in common than expected by chance. Probabilistic models determine structural variation likelihood from the sequence reads of these bin pairs.
Opening claim text (preview).
What is claimed is: 1 . A method of determining a likelihood of a structural variation occurring in a test nucleic acid obtained from a single biological sample, the method comprising: at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors: (A) obtaining a plurality of sequence reads from a plurality of sequencing reactions in which the test nucleic acid is fragmented, wherein each respective sequence read in the plurality of sequence reads comprises a first portion that corresponds to a subset of the test nucleic acid and a second portion that encodes a respective barcode for the respective sequence read in a plurality of barcodes, and each respective barcode is independent of the sequencing data of the test nucleic acid, and the plurality of sequence reads collectively include the plurality of barcodes; (B) obtaining bin information for a plurality of bins, wherein each respective bin in the plurality of bins represents a different portion of the test nucleic acid, the bin information identifies, for each respective bin in the plurality of bins, a set of sequence reads in a plurality of sets of sequence reads that are in the plurality of sequence reads, and the respective first portion of each respective sequence read in each respective set of sequence reads in the plurality of sets of sequence reads corresponds to a subset of the test nucleic acid that at least partially overlaps the different portion of the test nucleic acid that is represented by the bin corresponding to the respective set of sequence reads; (C) identifying, from among the plurality of bins, a first bin and a second bin that correspond to portions of the test nucleic acid that are nonoverlapping, wherein the first bin is represented by a first set of sequence reads in the plurality of sequence reads and the second bin is represented by a second set of sequence reads in the plurality of sequence reads; (D) determining a first value that represents a numeric probability or likelihood that the number of barcodes common to the first set and the second set is attributable to chance; (E) responsive to a determination that the first value satisfies a predetermined cut-off value, for each barcode that is common to the first bin and the second bin, obtaining a fragment pair thereby obtaining one or more fragment pairs, each fragment pair in the one or more fragment pairs (i) corresponding to a different barcode that is common to the first bin and the second bin and (ii) consisting of a different first calculated fragment and a different second calculated fragment, wherein, for each respective fragment pair in the one or more fragment pairs: the different first calculated fragment consists of a respective first subset of sequence reads in the plurality of sequence reads having the barcode corresponding to the respective fragment pair, wherein each sequence read in the respective first subset of sequence reads is within a predefined genetic distance of another sequence read in the respective first subset of sequence reads, the different first calculated fragment of the respective fragment pair originates with a first sequence read having the barcode corresponding to the respective fragment pair in the first bin, and each sequence read in the respective first subset of sequence reads is from the first bin, and the different second calculated fragment consists of a respective second subset of sequence reads in the plurality of sequence reads having the barcode corresponding to the respective fragment pair, wherein each sequence read in the respective second subset of sequence reads is within a predefined genetic distance of another sequence read in the respective second subset of sequence reads, the different second calculated fragment of the respective fragment pair originates with a second sequence read having the barcode corresponding to the respective fragment pair in the second bin, and each sequence read in the respective second subset of sequence reads is from the second bin; and (F) computing a respective likelihood based upon a probability of occurrence of a first model and a probability of occurrence of a second model regarding the one or more fragment pairs to thereby provide a likelihood of a structural variation in the test nucleic acid, wherein (i) the first model specifies that the respective first calculated fragments and the respective second calculated fragments of the one or more fragment pairs are observed given no structural variation in the target nucleic acid sequence and are part of a common molecule, and (ii) the second model specifies that the respective first calculated fragments and the respective second calculated fragments of the one or more fragment pairs are observed given structural variation in the target nucleic acid sequence. 2 . The method of claim 1 , wherein the first bin and the second bin are at least 50 kilobases apart on the test nucleic acid. 3 . The method of claim 1 , wherein the determining (D) uses a binomial test to compute the first value of the form: p= 1− P Binom ( n;n 1 n 2 /B ) wherein, p is the first value, expressed as a p-value, n is the number of unique barcodes that is found in both in the first and second set of sequence reads, n 1 is the number of unique barcodes in the first set of sequence reads, n 2 is the number of unique barcodes in the second set of sequence reads, and B is the total number of unique barcodes across the plurality of bins. 4 . The method of claim 1 , wherein the single biological sample is human, the test nucleic acid is the genome of the biological sample, and the first value satisfies the predetermined cut-off value when the first value is 10 −14 or less or when the first value is 10 −15 or less. 5 . The method claim 1 , wherein each bin in the plurality of bins represents at least 20 kilobases of the test nucleic acid, at least 50 kilobases of the test nucleic acid, at least 100 kilobases of the test nucleic acid, at least 250 kilobases of the test nucleic acid, or at least 500 kilobases of the test nucleic acid. 6 . The method of claim 1 , wherein each respective sequence read in each respective set of sequence reads in the plurality of sequence reads has a respective first portion that corresponds to a subset of the test nucleic acid that fully overlaps the different portion of the test nucleic acid that is represented by the bin corresponding to the respective set of sequence reads. 7 . The method of claim 1 , wherein the barcode in the second portion of each respective sequence read in the plurality of sequence reads encodes a unique predetermined value selected from the set {1, . . . , 1024}, selected from the set {1, . . . , 4096}, selected from the set {1, . . . , 16384}, selected from the set {1, . . . , 65536}, selected from the set {1, . . . , 262144}, selected from the set {1, . . . , 1048576}, selected from the set {1, . . . , 4194304}, selected from the set {1, . . . , 16777216}, selected from the set {1, . . . , 67108864}, or selected from the set {1, . . . , 1×10 12 }. 8 . The method of claim 1 , wherein the structural variation is deemed to have occurred, the method further comprising treating a subject that originated the biological sample with a treatment regimen responsive to the structural variation. 9 . The method claim 1 , wherein an identity of the first and second bin is determined by the identifying (C) using sparse matrix multiplication of the form: V=A 1 T A 2 , wherein, A 1 is a first B×N 1 matrix of barcodes that includes the first bin, A 2 is a second B×N 2 matrix of barcodes that inc
Physics · mapped topic
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations · CPC title
Drugs for disorders of the metabolism (of the blood or the extracellular fluid A61P7/00) · CPC title
using probe arrays or probe chips (C12Q1/6874 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.