Varietal counting of nucleic acids for obtaining genomic copy number information
US-10947589-B2 · Mar 16, 2021 · US
US11929145B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11929145-B2 |
| Application number | US-201816477931-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 22, 2018 |
| Priority date | Jan 20, 2017 |
| Publication date | Mar 12, 2024 |
| Grant date | Mar 12, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Technology provided herein relates in part to methods, processes, machines and apparatuses for non-invasive assessment of genetic alterations. In particular, a method is provided for that includes obtaining a set of sequence reads. The sequence reads each include a single molecule barcode (SMB) sequence that is a non-random oligonucleotide sequence. The method further includes assigning the sequence reads to read groups according to a read group signature. The read group signature comprises an SMB sequence and a start and end position of a nucleic acid fragment from the circulating cell free sample nucleic acid. The sequence reads comprising start and end positions and an SMB sequence similar to the read group signature are assigned to a read group. The method further includes generating a consensus for each read group, and determining the presence or absence of a genetic alteration based on the consensus for each read group.
Opening claim text (preview).
What is claimed is: 1. A method for determining a presence or absence of a genetic alteration for a test subject, comprising: obtaining circulating cell free nucleic acid from a sample from the test subject; ligating nucleic acid molecules of the circulating cell free nucleic acid with adapters to generate a plurality of sequence constructs, wherein: each sequence construct comprises: an adapter ligated to an end of a nucleic acid molecule, each adapter is a single-stranded non-random oligonucleotide or a double-stranded non-random oligonucleotide, and each single-stranded non-random oligonucleotide or double-stranded non-random oligonucleotide comprises at least one single molecule barcode (SMB) having a predetermined non-randomly generated molecular barcode sequence of nucleotides; generating, using a first polymerase chain reaction, library constructs for each sequence construct, wherein each library construct for a given sequence construct comprises a same sequence of nucleotides for at least one SMB and a nucleic acid molecule; capturing a subset of the library constructs using probe oligonucleotides under hybridization conditions to enrich for one or more genomic regions of interest, wherein the probe oligonucleotides span the one or more genomic regions of interest; generating, using a second polymerase chain reaction, enriched library constructs for each library construct of the subset of the library constructs, wherein each enriched library construct for a given library construct of the subset of the library constructs comprises a same sequence of nucleotides for at least one SMB and a nucleic acid molecule; sequencing the enriched library constructs to obtain sequence reads; generating an alignment computer file comprising on-target sequence reads and associated genomic positioning data, wherein: the generating the alignment computer file comprises aligning the sequence reads to a reference genome to identify the on-target sequence reads and obtain the genomic positioning data, the genomic positioning data is informative of a start position and an end position of each on-target sequence read aligned to the reference genome, and the at least one SMB and the genomic positioning data provide a unique identity to the nucleic acid molecule represented in each of the on-target sequence reads; generating, by running programming language scripts on the alignment computer file, a duplicate marked alignment computer file comprising an entry for each on-target sequence read, wherein the generating the duplicate marked alignment computer file comprises: assigning the on-target sequence reads to read groups according to read group signatures, wherein: each of the read group signatures comprises at least one SMB sequence and genomic positioning data informative of a start position and an end position of a nucleic acid molecule, and an on-target sequence read is assigned to a read group when the at least one SMB of the on-target sequence read and the associated genomic positioning data are similar to the at least one SMB sequence and the genomic positioning data of a read group signature associated with the read group; and identifying each of the on-target sequence reads assigned to a same read group as duplicate reads in the duplicate marked alignment computer file by adjusting a flag in the alignment computer file and associating a unique read group numerical identifier with the entry of each of the on-target sequence reads; generating, using the duplicate marked alignment computer file, a final alignment computer file comprising a consensus sequence for each of the read groups, wherein the consensus sequence for each of the read groups is generated by collapsing the on-target sequence reads assigned to each read group into a consensus sequence based on the flag and the unique read group numerical identifier for each of the on-target sequence reads; determining the presence or absence of the genetic alteration based on the consensus sequence for each of the read groups in the final alignment computer file; and outputting a report concerning the presence or absence of the genetic alteration for the test subject, wherein the report comprises (i) the one or more genomic regions of interest, and (ii) a status of the genetic alteration corresponding to the one or more genomic regions of interest. 2. The method of claim 1 , wherein the at least one SMB of the on-target sequence read and the associated genomic positioning data are determined to be similar to the at least one SMB sequence and the genomic positioning data of the read group signature associated with the read group when: i) the at least one SMB of the on-target sequence read is identical to the at least one SMB sequence of the read group signature, and the associated genomic positioning data of the on-target sequence read are identical to the genomic positioning data of the read group signature; or ii) the at least one SMB of the on-target sequence read is identical to the at least one SMB sequence of the read group signature, and the associated genomic positioning data of the on-target sequence read are within 5 bases from the genomic positioning data of the read group signature. 3. The method of claim 1 , further comprising generating, by a computing system, a multiplicity table that includes a number of the sequence reads assigned to each of the read groups and/or a number of the read groups comprising a predetermined number of reads, wherein the computing system uses the multiplicity table for generating the consensus sequence for each read group. 4. The method of claim 1 , wherein thousands to millions of sequence reads are obtained in the sequencing, and wherein the generating the alignment computer file further comprising filtering out a sequence read with an SMB that is different from any predetermined non-randomly generated molecular barcode sequence. 5. The method of claim 1 , wherein the generating the final alignment computer file comprising the consensus sequence for each of the read groups comprises sequence error correction. 6. The method of claim 5 , wherein the sequence error correction comprises determining a total number and identity of nucleotide at each position covered by the on-target sequence reads, and wherein a position in a consensus sequence is assigned a nucleotide identity when about 90% or more of the nucleotides from the on-target sequence reads agree at the position. 7. The method of claim 5 , wherein the sequence error correction comprises determining: (i) a total number and identity of nucleotide at each position covered by the on-target sequence reads, and (ii) an overall base quality for the nucleotide at each position covered by the on-target sequence reads, wherein a position in a consensus sequence is assigned a nucleotide identity when about 90% or more of the nucleotides from the on-target sequence reads agree at the position, and wherein a position in a consensus sequence is assigned an overall quality for the nucleotide identity when about 90% or more of the nucleotide identities agree for the nucleotide from the on-target sequence reads. 8. The method of claim 7 , wherein the overall base quality is a mean base quality, a median base quality, or a maximal base quality. 9. The method of claim 5 , wherein the sequence error correction comprises read group correction, which comprises designating a nucleotide as an unreadable or low quality base (“N”) in a read assigned to a read group that does not match a nucleotide at that position for other reads in the read group. 10. A system for determining a presence or absence of a genetic alteration for a test subject, comprising: one or more processors; and m
Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection · CPC title
characterised by the detection means (C12Q1/6804 takes precedence) · CPC title
Ploidy or copy number detection · CPC title
ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression · CPC title
Sequence alignment; Homology search · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.