Methods and systems for aligning repetitive dna elements

US2016110498A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016110498-A1
Application numberUS-201314775252-A
CountryUS
Kind codeA1
Filing dateMar 13, 2013
Priority dateMar 13, 2013
Publication dateApr 21, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Presented are methods and systems for aligning repetitive DNA elements. The methods and systems use the conserved flanks of repetitive polymorphic loci to effectively determine the length and sequence of the repetitive DNA element.

First claim

Opening claim text (preview).

1 . A method for determining the length of a polymorphic repetitive DNA element having a repeat region situated between a first conserved flanking region and a second conserved flanking region, the method comprising: (a) providing a data set comprising at least one sequence read of the polymorphic repetitive DNA element; (b) providing a reference sequence comprising the first conserved flanking region and the second conserved flanking region; (c) aligning a portion of the first flanking region of the reference sequence to the sequence read; (d) aligning a portion of the second flanking region of the reference sequence to the sequence read; and (e) determining the length and/or sequence of the repeat region; wherein at least steps (c), (d) and (e) are performed using a suitably programmed computer. 2 . The method of claim 1 , wherein the aligning a portion of the flanking region in one or both of steps (c) and (d) comprises: (i) determining a location of a conserved flanking region on the read by using exact k-mer matching of a seeding region which overlaps or is adjacent to the repeat region; and (ii) aligning the flanking region to the sequence read. 3 . The method of claim 2 , further comprising aligning both the flanking sequence and a short adjacent region comprising a portion of the repeat region. 4 . The method of claim 2 , wherein the seeding region comprises a high-complexity region of the conserved flanking region. 5 . The method of claim 4 , the high-complexity region comprising sequence that is sufficiently distinct from the repeat region so as to avoid mis-alignment. 6 . The method of claim 4 , wherein the high-complexity region comprises a sequence having a diverse mixture of bases. 7 . The method of claim 2 , wherein the seeding region avoids low-complexity regions of the conserved flanking region. 8 . The method of claim 7 , the low-complexity region comprising sequence that substantially resembles that of the repeat sequence. 9 . The method of claim 7 , the low-complexity region comprising sequence having a mixture of bases with low diversity. 10 . The method of claim 2 , wherein the seeding region is directly adjacent to the repeat region. 11 . The method of claim 2 , wherein the seeding region comprises a portion of the repeat region. 12 . The method of claim 2 , wherein the seeding region is offset from the repeat region. 13 . The method of claim 1 , wherein the dataset of sequence reads comprises sequence data from a PCR amplicon having a forward and reverse primer sequence. 14 . The method of claim 1 , wherein the at least one sequence read in the data set comprises a consensus sequence derived from multiple sequence reads. 15 . The method of claim 2 , wherein providing a reference sequence comprises identifying a locus of interest based upon the primer sequence of the PCR amplicon. 16 . The method of claim 1 , wherein the at least one sequencing read comprises sequence from a sequencing-by-synthesis (SBS) reaction. 17 . The method of claim 1 , wherein the at least one sequencing read comprises sequence from a sequencing-by-ligation reaction. 18 . The method of claim 1 , wherein the data set is received from a memory. 19 . The method of claim 1 , wherein the length or sequence of the repeat region is output via a physical or virtual connection, a display or a printer. 20 . The method of claim 1 , wherein the repeat region is a short tandem repeat (STR). 21 . The method of claim 20 , wherein the STR is selected from the CODIS autosomal STR loci. 22 . The method of claim 20 , wherein the STR is selected from the CODIS Y-STR loci. 23 . The method of claim 20 , wherein the STR is selected from the EU autosomal STR loci. 24 . The method of claim 20 , wherein the STR is a selected from the EU Y-STR loci. 25 . A system for determining the length of a polymorphic repetitive DNA element having a repeat region situated between a first conserved flanking region and a second conserved flanking region, the system comprising: a processor; and a program for determining the length of a polymorphic repetitive DNA element, the program comprising instructions for: (a) providing a data set comprising at least one sequence read of the polymorphic repetitive DNA element; (b) providing a reference sequence comprising the first conserved flanking region and the second conserved flanking region; (c) aligning a portion of the first flanking region of the reference sequence to the sequence read; (d) aligning a portion of the second flanking region of the reference sequence to the sequence read; and (e) determining the length and/or sequence of the repeat region. 26 .- 48 . (canceled)

Assignees

Inventors

Classifications

  • repeat or repeated sequences, e.g. VNTR, microsatellite, concatemer · CPC title

  • Methods for sequencing · CPC title

  • G16B30/00Primary

    ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • G06F19/22Primary

    Physics · mapped topic

  • G16B30/10Primary

    Sequence alignment; Homology search · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016110498A1 cover?
Presented are methods and systems for aligning repetitive DNA elements. The methods and systems use the conserved flanks of repetitive polymorphic loci to effectively determine the length and sequence of the repetitive DNA element.
Who is the assignee on this patent?
Illumina Inc
What technology area does this patent fall under?
Primary CPC classification G16B30/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).