Methods of analyzing massively parallel sequencing data

US11475980B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11475980-B2
Application numberUS-201414489198-A
CountryUS
Kind codeB2
Filing dateSep 17, 2014
Priority dateJul 29, 2013
Publication dateOct 18, 2022
Grant dateOct 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In at least one illustrative embodiment, a method may comprise selecting a first plurality of text strings that each represent a nucleotide sequence that was read by a massively parallel sequencing instrument, where the nucleotide sequences represented by the selected first plurality of text strings each correspond to a first target locus, comparing the selected first plurality of text strings to one another to determine an abundance count for each unique text string included in the selected first plurality of text strings, identifying a first number of unique text strings included in the selected first plurality of text strings as representing noise responses, and determining a method detection limit as a function of the abundance counts for the first number of unique text strings identified as representing noise responses.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: amplifying nucleotide sequences in a sample using a PCR amplification process to produce an amplified sample; using a massively parallel sequencing (MPS) instrument to read the nucleotide sequences of the amplified sample and generate a first plurality of text strings based on the amplified sample, wherein the first plurality of text strings comprises at least ten thousand text strings; selecting, with a processor, a second plurality of text strings from the first plurality of text strings generated by the MPS instrument, wherein each of the selected second plurality of text strings represents a nucleotide sequence that corresponds to a first target locus in the amplified sample; comparing, with the processor, the selected second plurality of text strings to one another to determine an abundance count for each unique text string included in the selected second plurality of text strings; identifying, with the processor, a first number of unique text strings included in the selected second plurality of text strings as representing noise responses; and determining, with the processor, a method detection limit (MDL) as a function of the abundance counts for the first number of unique text strings identified as representing noise responses. 2. The method of claim 1 , further comprising identifying, with the processor, one or more unique text strings included in the selected second plurality of text strings as representing one or more true alleles for the first target locus; wherein the first number of unique text strings identified as representing noise responses includes all of the unique text strings included in the selected second plurality of text strings that were not identified as representing the one or more true alleles for the first target locus. 3. The method of claim 2 , wherein identifying one or more unique text strings as representing the one or more true alleles for the first target locus comprises identifying a unique text string that represents either a short tandem repeat (STR) or a single nucleotide polymorphism (SNP). 4. The method of claim 1 , further comprising: identifying, with the processor, one or more unique text strings included in the selected second plurality of text strings as representing one or more true alleles for the first target locus; identifying, with the processor, one or more unique text strings included in the selected second plurality of text strings as each representing an artifact; wherein the first number of unique text strings identified as representing noise responses includes all of the unique text strings included in the selected second plurality of text strings that were not identified as representing either the one or more true alleles for the first target locus or an artifact. 5. The method of claim 4 , wherein: identifying one or more unique text strings as representing the one or more true alleles for the first target locus comprises identifying a unique text string that represents a short tandem repeat (STR); and identifying one or more unique text strings as representing an artifact comprises identifying a unique text string that represents a stutter artifact of the STR. 6. The method of claim 1 , wherein determining the MDL comprises doubling a difference between (i) a largest abundance count for the first number of unique text strings identified as representing noise responses and (ii) a smallest abundance count for the first number of unique text strings identified as representing noise responses. 7. The method of claim 1 , wherein determining the MDL comprises calculating a product of (i) a constant and (ii) a standard deviation of the abundance counts for the first number of unique text strings identified as representing noise responses. 8. The method of claim 7 , wherein determining the MDL further comprises adding a mean of the abundance counts for the first number of unique text strings identified as representing noise responses to the product. 9. The method of claim 1 , wherein determining the MDL comprises calculating a ratio between (i) a mean of the abundance counts for the first number of unique text strings identified as representing noise responses and (ii) a standard deviation of the abundance counts for the first number of unique text strings identified as representing noise responses. 10. The method of claim 1 , further comprising: selecting, with the processor, a third plurality of text strings from the first plurality of text strings generated by the MPS instrument, wherein each of the selected third plurality of text strings represents a nucleotide sequence that corresponds to a second target locus in the amplified sample; comparing, with the processor, the selected third plurality of text strings to one another to determine an abundance count for each unique text string included in the selected third plurality of text strings; and identifying, with the processor, a second number of unique text strings included in the selected third plurality of text strings as representing noise responses; wherein, with the processor, determining the MDL comprises determining the MDL as a function of both the abundance counts for the first number of unique text strings identified as representing noise responses and the abundance counts for the second number of unique text strings identified as representing noise responses. 11. The method of claim 10 , wherein the second plurality of text strings comprises at least ten thousand text strings. 12. The method of claim 1 , wherein the second plurality of text strings comprises at least ten thousand text strings. 13. A method comprising: amplifying nucleotide sequences in a sample using a PCR amplification process to produce an amplified sample; using a massively parallel sequencing (MPS) instrument to read the nucleotide sequences of the amplified sample and generate a first plurality of text strings based on the amplified sample, wherein the first plurality of text strings comprises at least ten thousand text strings; selecting, with a processor, a second plurality of text strings from the first plurality of text strings generated by the MPS instrument, wherein each of the selected first plurality of text strings represents a nucleotide sequence that corresponds to a first target locus in the amplified sample; comparing, with the processor, the selected second plurality of text strings to one another to determine an abundance count for each unique text string included in the selected second plurality of text strings; and outputting a graphical display, wherein the graphical display comprises a first plurality of graphical elements that each correspond to one of the unique text strings included in the selected second plurality of text strings and that each represent the abundance count determined for the corresponding unique text string. 14. The method of claim 13 , wherein the graphical display is a bar graph comprising a plurality of bars that each correspond to one of the unique text strings included in the selected second plurality of text strings, each of the plurality of bars having a height indicative of the abundance count determined for the corresponding unique text string. 15. The method of claim 14 , wherein the bar graph further comprises a label associated with each of the plurality of bars, each label designating the nucleotide sequence represented by the corresponding unique text string. 16. The method of claim 14 , wherein the plurality of bars comprise one or more bars that each represent a true allele for the fi

Assignees

Inventors

Classifications

  • G16B30/00Primary

    ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11475980B2 cover?
In at least one illustrative embodiment, a method may comprise selecting a first plurality of text strings that each represent a nucleotide sequence that was read by a massively parallel sequencing instrument, where the nucleotide sequences represented by the selected first plurality of text strings each correspond to a first target locus, comparing the selected first plurality of text strings …
Who is the assignee on this patent?
Battelle Memorial Institute
What technology area does this patent fall under?
Primary CPC classification G16B30/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).