Methods and systems for genotyping genetic samples

US10078724B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10078724-B2
Application numberUS-201414517406-A
CountryUS
Kind codeB2
Filing dateOct 17, 2014
Priority dateOct 18, 2013
Publication dateSep 18, 2018
Grant dateSep 18, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The invention provides methods and system for making specific base calls at specific loci using a reference sequence construct, e.g., a directed acyclic graph (DAG) that represents known variants at each locus of the genome. Because the sequence reads are aligned to the DAG during alignment, the subsequent step of comparing a mutation, vis-à-vis the reference genome, to a table of known mutations can be eliminated. The disclosed methods and systems are notably efficient in dealing with structural variations within a genome or mutations that are within a structural variation.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for genotyping a genetic sample, the system comprising: a processor; and a tangible, non-transitory memory storing a plurality of sequence reads corresponding to the genetic sample, and a reference directed acyclic graph (DAG) representing a reference sequence and genetic variation of the reference sequence, wherein the reference DAG comprises a first path corresponding to a first allele and a second path corresponding to a second allele at a first position, wherein the first allele comprises a genetic structural variation; wherein the memory further comprises instructions that, when executed, cause the processor to: align the plurality of sequence reads to the reference (DAG), wherein the aligning comprises: comparing a string of symbols corresponding to a sequence read to the first path and the second path; scoring overlaps between the string of symbols and each of the first path and the second path, wherein a higher score corresponds to a greater amount of overlap; and identifying an overlap corresponding to the highest score for the sequence read, thereby aligning the sequence read to the reference DAG; and determine a genotype for the genetic sample based upon the number of sequence reads aligned to the first path and the second path, wherein the determined genotype comprises the genetic structural variation. 2. The system of claim 1 , further comprising writing a file to memory corresponding to the determined genotype. 3. The system of claim 1 , wherein the system comprises a plurality of processors, and wherein each processor is configured to align a portion of the plurality of sequence reads to the reference sequence construct. 4. A method of genotyping a genetic sample, the method comprising: using at least one computer hardware processor to perform: obtaining a plurality of sequence reads corresponding to a genetic sample; aligning the plurality of sequence reads to a reference directed acyclic graph (DAG) stored in a tangible, non-transitory memory connected to the at least one computer hardware processor, wherein the reference DAG comprises a first path corresponding to a first allele and a second path corresponding to a second allele at a first position, wherein the first allele comprises a genetic structural variation, wherein the aligning comprises: comparing a string of symbols corresponding to a sequence read to the first path and the second path; scoring overlaps between the string of symbols and each of the first path and the second path, wherein a higher score corresponds to a greater amount of overlap; and identifying an overlap corresponding to the highest score for the sequence read, thereby aligning the sequence read to the reference DAG; and determining a genotype for the genetic sample based upon the number of sequence reads aligned to the first path and the second path, wherein the determined genotype comprises the genetic structural variation. 5. The method of claim 4 , wherein the method does not include comparing the sequence reads to a variant call format (VCF) file or a single nucleotide polymorphism database (dbSNP) file. 6. The method of claim 4 , wherein at least a portion of a sequence read includes a genetic structural variation. 7. The method of claim 4 , wherein the reference DAG further comprises a third path and a fourth path at a second position in the reference DAG, wherein the third path comprises a second genetic structural variation. 8. The method of claim 4 , wherein the alignment of two or more related reads is used to genotype the sample. 9. The method of claim 4 , further comprising determining an allele for the sample based upon the alignment. 10. The method of claim 9 , further comprising determining an allele frequency for the sample based upon the number of sequence reads that align to the first path and the second path. 11. The method of claim 4 , further comprising determining a confidence value for the determined genotype based upon the number of overlapping base pairs between a sequence read and one of the first path and the second path, wherein more overlap correlates with greater confidence. 12. The method of claim 4 , wherein the second path corresponds to a base deletion, a base insertion, or a polymorphism. 13. The method of claim 4 , wherein the sequence reads are obtained by sequencing a genetic sample of a subject with a method selected from Sanger sequencing, pyrosequencing, ion semiconductor sequencing, sequencing by synthesis, sequencing by ligation, and single-molecule real-time sequencing. 14. The method of claim 4 , wherein the reference DAG comprises a genome of an organism. 15. The method of claim 4 , wherein the reference DAG further comprises a third path corresponding to a third allele at the first position. 16. The method of claim 4 , further comprising identifying a rare variant in the genetic sample close to the genetic structural variation. 17. The method of claim 4 , wherein the genetic structural variation is between 1 Kb to 3 Mb in size. 18. The method of claim 4 , wherein the reference DAG comprises nodes, and wherein the genetic variation of the reference sequence is represented as alternate nodes. 19. The method of claim 4 , wherein the reference DAG represents a species.

Assignees

Inventors

Classifications

  • Massive parallel sequencing · CPC title

  • Methods for sequencing · CPC title

  • G06F19/22Primary

    Physics · mapped topic

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • Sequence assembly · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10078724B2 cover?
The invention provides methods and system for making specific base calls at specific loci using a reference sequence construct, e.g., a directed acyclic graph (DAG) that represents known variants at each locus of the genome. Because the sequence reads are aligned to the DAG during alignment, the subsequent step of comparing a mutation, vis-à-vis the reference genome, to a table of known mutatio…
Who is the assignee on this patent?
Seven Bridges Genomics Inc
What technology area does this patent fall under?
Primary CPC classification G06F19/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 18 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).