Systems and methods for analyzing sequence data

US9817944B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9817944-B2
Application numberUS-201414177958-A
CountryUS
Kind codeB2
Filing dateFeb 11, 2014
Priority dateFeb 11, 2014
Publication dateNov 14, 2017
Grant dateNov 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The invention provides methods for comparing one set of genetic sequences to another without discarding any information within either set. A set of genetic sequences is represented using a directed acyclic graph (DAG) avoiding any unwarranted reduction to a linear data structure. The invention provides a way to align one sequence DAG to another to produce an alignment that can itself be stored as a DAG. DAG-to-DAG alignment is a natural choice wherever a set of genomic information consisting of more than one string needs to be compared to any non-linear reference. For example, a subpopulation DAG could be compared to a population DAG in order to compare the genetic features of that subpopulation to those of the population.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for genomic analysis, the method comprising: representing a plurality of nucleic acids from a population of individuals as a reference directed acyclic graph (DAG) stored in a non-transitory memory, wherein the reference DAG includes nodes connected by edges in which at least one node includes a string of a plurality of nucleotide characters corresponding to a nucleotide sequence found within the plurality of nucleic acids; obtaining a second DAG representing a second plurality of nucleic acids, the second plurality of nucleic acids comprising nucleic acids from one or more individuals; determining, using a processor coupled to the non-transitory memory, an alignment between the second DAG and the reference DAG; and creating, from the alignment, an aligned DAG comprising an aligned combination of the reference DAG and the second DAG. 2. The method of claim 1 , wherein each DAG comprises at least two alternative sequences per position at multiple positions in that DAG. 3. The method of claim 2 , wherein determining the alignment comprises: scoring sequence overlaps between the reference DAG and the second DAG, wherein greater overlap results in a higher score; and aligning portions of the second DAG to locations in the reference DAG such that the scores for the sequence overlaps are maximized. 4. The method of claim 1 , wherein said alignment is an optimal alignment. 5. The method of claim 4 , wherein said optimal alignment is a best-scoring DAG matrix alignment produced from a combination of said reference DAG and said second DAG. 6. The method of claim 5 , wherein said best-scoring DAG alignment is determined by a mathematical construct representing the optimal path through a matrix of similarity scores in said combination. 7. The method of claim 1 , wherein the second DAG is obtained from sequence reads from a sample from a subject. 8. The method of claim 7 , wherein the reference DAG comprises a plurality of alleles associated with a disease. 9. The method of claim 7 , wherein homozygous loci in the sample are represented using a single node in the second DAG and at least one heterozygous loci in the sample is represented using a plurality of different nodes in the second DAG. 10. The method of claim 1 , wherein the steps are performed using a computer system comprising the processor coupled to the non-transitory memory having the reference DAG stored therein and further wherein the alignment is stored as a final DAG in the non-transitory memory. 11. The method of claim 1 , wherein a DAG is stored as a computer file comprising: nodes, each node comprising a character string and a label, and edges, each edge comprising a pair of labels. 12. The method of claim 1 , wherein a DAG is stored as a computer file comprising: nodes, each node comprising one or more characters representing nucleotides, and edges, each edge representing a connection between a pair of the nodes. 13. The method of claim 1 , wherein at least one path through the reference DAG represents a sequence of a human chromosome. 14. The method of claim 13 , wherein at least one path through the second DAG represents an alternative sequence of the human chromosome. 15. The method of claim 1 , wherein the second DAG represents a transcriptome from an organism and the reference DAG represents one or more genomes from organisms of a same species as the organism. 16. The method of claim 1 , wherein finding an optimally-scoring alignment between the second DAG and the reference DAG comprises: calculating each of a plurality of values for entries in a matrix of similarities between the reference DAG and the second DAG based on a highest-valued neighboring entry and associating each calculated value with the highest-valued neighboring entry upon which the calculation of that calculated value was based; and identifying a path through the matrix that originates at the entry with the highest calculated value and traces sequentially through each associated neighboring entry until a zero entry is met, wherein the identified path indicates the optimally-scoring alignment. 17. The method of claim 1 , wherein: the reference DAG comprises a plurality of binary alignment map (BAM) entries that have been mapped to a first genomic reference; and the second DAG comprises a second plurality of BAM entries that have been mapped to a second genomic reference. 18. The method of claim 1 , further comprising comparing genetic features between the population and the one or more individuals using the aligned DAG. 19. The method of claim 1 , further comprising aligning a set of sequence reads to the aligned DAG.

Assignees

Inventors

Classifications

  • ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks · CPC title

  • Endonuclease · CPC title

  • Methods for sequencing · CPC title

  • for diseases caused by alterations of genetic material · CPC title

  • G06F19/22Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9817944B2 cover?
The invention provides methods for comparing one set of genetic sequences to another without discarding any information within either set. A set of genetic sequences is represented using a directed acyclic graph (DAG) avoiding any unwarranted reduction to a linear data structure. The invention provides a way to align one sequence DAG to another to produce an alignment that can itself be stored …
Who is the assignee on this patent?
Seven Bridges Genomics Inc
What technology area does this patent fall under?
Primary CPC classification G06F19/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).