Methods and compositions for identifying repeating sequences in nucleic acids
US-9708653-B2 · Jul 18, 2017 · US
US10718016B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10718016-B2 |
| Application number | US-201715625326-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 16, 2017 |
| Priority date | Feb 15, 2012 |
| Publication date | Jul 21, 2020 |
| Grant date | Jul 21, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Short Tandem Repeats are currently used by law enforcement and others, for example, for the identification of individuals by DNA matching. A method is described herein that uses WPD to classify and identify repeating sequences in nucleotide sequences from the position and frequency information contained within nucleotide sequences. This decomposition allows for the quick classification of nucleotide sequences (i.e., reads) into two different classes, including, for example, one class that contains sequencer reads that contain a repeat motif with non-repeat sequence on either flank, and another class that contains sequencer reads that do not contain any repeat sequence.
Opening claim text (preview).
The invention claimed is: 1. A method for identifying repeating sequences in a target nucleic acid comprising repeating sequences and non-repeating sequences, the method comprising: sequencing the target nucleic acid to obtain sequence data; digitizing, with one or more processors, the sequence data; applying, with the one or more processors, wavelet packet decomposition (WPD) to decompose the digitized sequence data into non-periodic signal data and periodic signal data comprising coefficients; and classifying, with the one or more processors, the non-periodic signal data into a non-repeat bin and the periodic signal data into a repeat bin based upon the coefficients. 2. The method of claim 1 , further comprising identifying the repeating sequences in the target nucleic acid by matching, with the one or more processors, the coefficients from the periodic signal data in the repeat bin to reference coefficients generated from WPD of a reference sequence. 3. The method of claim 1 , wherein applying WPD comprises recursively applying, with the one or more processors, low-pass and high-pass quadrature mirror filters to the digitized sequence data. 4. The method of claim 1 , wherein classifying the non-periodic signal data into the non-repeat bin and the periodic signal data into the repeat bin based upon the coefficients comprises determining whether particular data is non-periodic signal data or periodic signal data by comparing, with the one or more processors, a maximum coefficient from among the coefficients to a threshold value. 5. The method of claim 1 , wherein the repeating sequences are tandem repeats. 6. The method of claim 5 , wherein the tandem repeats are variable number tandem repeats. 7. The method of claim 6 , wherein the variable number tandem repeats are selected from the group consisting of microsatellites, minisatellites, and combinations thereof. 8. The method of claim 6 , wherein the variable number tandem repeats are microsatellites, and wherein the microsatellites are short tandem repeats (STRs). 9. The method of claim 8 , wherein the STRs have repeats of from 2 to 10 nucleotides. 10. The method of claim 8 , wherein the STRs have repeats of from 2 to 8 nucleotides. 11. The method of claim 8 , wherein the STRs have repeats of from 2 to 6 nucleotides. 12. The method of claim 8 , wherein the STRs have repeats of from 3 to 5 nucleotides. 13. The method of claim 8 , wherein the STRs have repeats of 4 nucleotides. 14. The method of claim 6 , wherein the variable number tandem repeats are mini satellites. 15. The method of claim 14 wherein the minisatellites have repeats of from 9 to 80 nucleotides. 16. One or more non-transitory machine-readable media storing a plurality of instructions that, when executed by one or more processors, cause the one or more processors to: digitize sequence data obtained from sequencing a target nucleic acid comprising repeating sequences and non-repeating sequences; apply wavelet packet decomposition (WPD) to decompose the digitized sequence data into non-periodic signal data and periodic signal data comprising coefficients; and classify the non-periodic signal data into a non-repeat bin and the periodic signal data into a repeat bin based upon the coefficients. 17. The one or more non-transitory machine-readable media of claim 16 , wherein the plurality of instructions, when executed by the one or more processors, further cause the one or more processors to identify the repeating sequences in the target nucleic acid by matching the coefficients from the periodic signal data in the repeat bin to reference coefficients generated from WPD of a reference sequence. 18. The one or more non-transitory machine-readable media of claim 16 , wherein the plurality of instructions, when executed by the one or more processors, cause the one or more processors to apply WPD by recursively applying low-pass and high-pass quadrature mirror filters to the digitized sequence data. 19. The one or more non-transitory machine-readable media of claim 16 , wherein the plurality of instructions, when executed by the one or more processors, cause the one or more processors to determine whether particular data is non-periodic signal data or periodic signal data by comparing a maximum coefficient from among the coefficients to a threshold value.
Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection · CPC title
Sequence alignment; Homology search · CPC title
ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations · CPC title
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
for the determination of target sites, i.e. of active nucleic acids · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.