Systems and methods to detect rare mutations and copy number variation
US-2016040229-A1 · Feb 11, 2016 · US
US11211147B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11211147-B2 |
| Application number | US-202117179267-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 18, 2021 |
| Priority date | Feb 18, 2020 |
| Publication date | Dec 28, 2021 |
| Grant date | Dec 28, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and software are provided for estimating a circulating tumor fraction for a test subject. Sequence reads are obtained from a panel-enriched sequencing reaction, including sequences for a first plurality of cfDNA fragments corresponding to probe sequences and a second plurality of cfDNA fragments not corresponding to probe sequences. Bin-level coverage ratios are determined from the sequences. Segments are formed by grouping adjacent bins based on similar coverage ratios and segment-level coverage ratios are determined based on bin-level coverage ratios for bins in the segment. For each simulated circulating tumor fraction in a plurality of circulating tumor fractions, segments are fitted to an integer copy state by identifying the integer copy state that best matches the segment-level coverage ratio. The circulating tumor fraction for the test subject is determined using error optimization between segment-level coverage ratios and integer copy states across the simulated circulated tumor fractions.
Opening claim text (preview).
What is claimed is: 1. A method of estimating a circulating tumor fraction for a test subject from panel-enriched sequencing data for a plurality of sequences, the method comprising: at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors: A) obtaining, from a first panel-enriched sequencing reaction, a first plurality of at least 100,000 sequence reads, wherein the first plurality of at least 100,000 sequence reads comprises: (i) a corresponding sequence for each cell-free DNA fragment in a first plurality of cell-free DNA fragments obtained from a liquid biopsy sample from the test subject, wherein each respective cell-free DNA fragment in the first plurality of cell-free DNA fragments corresponds to a respective probe sequence in a plurality of probe sequences used to enrich cell-free DNA fragments in the liquid biopsy sample in the first panel-enriched sequencing reaction; and (ii) a corresponding sequence for each cell-free DNA fragment in a second plurality of cell-free DNA fragments obtained from the liquid biopsy sample, wherein each respective cell-free DNA fragment in the second plurality of cell-free DNA fragments does not correspond to any probe sequence in the plurality of probe sequences; B) determining a plurality of at least 1000 bin-level coverage ratios using the plurality of at least 100,000 sequence reads, each respective bin-level coverage ratio in the plurality of bin-level coverage ratios corresponding to a respective bin in a plurality of at least 1000 bins, wherein: each respective bin in the plurality of bins represents a corresponding region of the genome for the species of the test subject, the plurality of bins collectively covers at least 50 Mb of the genome for the species of the test subject, and each respective bin-level coverage ratio in the plurality of bin-level coverage ratios is determined from a comparison of (i) a number of sequence reads in the plurality of sequence reads that map to the corresponding bin and (ii) a number of sequence reads from one or more reference samples that map to the corresponding bin; C) determining a plurality of segment-level coverage ratios by: forming, using the at least 1000 bin-level coverage ratios, a plurality of segments by grouping respective subsets of adjacent bins in the plurality of bins based on a similarity between the respective coverage ratios of the subset of adjacent bins, and determining, for each respective segment in the plurality of segments, a segment-level coverage ratio based on the corresponding bin-level coverage ratios for each bin in the respective segment; D) fitting, for each respective simulated circulating tumor fraction in a plurality of at least 10 simulated circulating tumor fractions, each respective segment in the plurality of segments to a respective integer copy state in a plurality of at least 4 integer copy states, by identifying the respective integer copy state in the plurality of integer copy states that best matches the segment-level coverage ratio, thereby generating, for each respective simulated circulating tumor fraction in the plurality of simulated tumor fractions, a respective set of integer copy states for the plurality of segments; and E) estimating the circulating tumor fraction for the test subject based on a measure of fit between corresponding segment-level coverage ratios and integer copy states across the plurality of simulated circulated tumor fractions. 2. The method of claim 1 , wherein estimating the circulating tumor fraction comprises minimization of an error between corresponding segment-level coverage ratios and integer copy states across the plurality of simulated circulated tumor fractions. 3. The method of claim 1 , wherein estimating the circulating tumor fraction comprises: identifying a plurality of local minima for the error between corresponding segment-level coverage ratios and integer copy states across the plurality of simulated circulated tumor fractions, and selecting the local minima that is closest to a second estimate of circulating tumor fraction determined by a different methodology. 4. The method of claim 3 , wherein the second estimate of circulating tumor fraction is generated by: (i) detecting a plurality of germline variants in the liquid biopsy sample based on the first plurality of sequence reads; (ii) determining, for each respective germline variant in the plurality of germline variants, a corresponding germline variant allele frequency for the liquid biopsy sample, thereby determining a plurality of germline variant allele frequencies for the liquid biopsy sample; (iii) determining, for each respective germline variant in the plurality of germline variants, an absolute value of the difference between the corresponding germline variant allele frequency for the liquid biopsy sample and a germline variant allele frequency for the respective germline variant allele in a non-cancerous tissue of the subject, thereby determining a plurality of germline variant allele deltas for the liquid biopsy sample; and (iv) estimating the circulating tumor fraction for the liquid biopsy sample as twice the value of the maximum germline variant allele delta in the plurality of germline variant allele deltas. 5. The method of claim 4 , wherein, for each respective germline variant in the plurality of germline variants, the corresponding germline variant allele frequency for the respective germline variant allele in a non-cancerous tissue of the subject is defined as 0.5. 6. The method of claim 4 , wherein, for each respective germline variant in the plurality of germline variants, the corresponding germline variant allele frequency for the respective germline variant allele in a non-cancerous tissue of the subject is determined based on a second sequencing reaction of nucleic acids from a non-cancerous sample of the subject. 7. The method of claim 3 , wherein the second estimate of circulating tumor fraction is generated by: (i) detecting a plurality of somatic variants in the liquid biopsy sample based on the first plurality of sequence reads; (ii) determining, for each respective somatic variant in the plurality of somatic variants, a corresponding somatic variant allele frequency for the liquid biopsy sample, thereby determining a plurality of somatic variant allele frequencies for the liquid biopsy sample; and (iii) estimating the circulating tumor fraction for the liquid biopsy sample as twice the value of the largest somatic variant allele frequency in the plurality of somatic variant allele frequencies. 8. The method of claim 3 , wherein the second estimate of circulating tumor fraction is generated by: (i) detecting a plurality of somatic variants in the liquid biopsy sample based on the first plurality of sequence reads; (ii) determining, for each respective somatic variant in the plurality of somatic variants, a corresponding somatic variant allele frequency for the liquid biopsy sample, thereby determining a plurality of somatic variant allele frequencies for the liquid biopsy sample; and (iii) estimating the circulating tumor fraction for the liquid biopsy sample as the value of the largest somatic variant allele frequency in the plurality of somatic variant allele frequencies. 9. The method of claim 1 , wherein the plurality of probe sequences used to enrich cell-free DNA fragments in the liquid biopsy sample in the first panel-enriched sequencing reaction collectively map to at least 25 different genes in human reference genome. 10. The method of claim 1 , wherein plurality of integer copy states comprises a 1-copy state, a 2-copy state, a 3-copy state, and a
Related publications grouped by family.
Answers are generated from the same data shown on this page.