Systems and methods for transcriptome analysis
US-9063914-B2 · Jun 23, 2015 · US
US9817944B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9817944-B2 |
| Application number | US-201414177958-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 11, 2014 |
| Priority date | Feb 11, 2014 |
| Publication date | Nov 14, 2017 |
| Grant date | Nov 14, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention provides methods for comparing one set of genetic sequences to another without discarding any information within either set. A set of genetic sequences is represented using a directed acyclic graph (DAG) avoiding any unwarranted reduction to a linear data structure. The invention provides a way to align one sequence DAG to another to produce an alignment that can itself be stored as a DAG. DAG-to-DAG alignment is a natural choice wherever a set of genomic information consisting of more than one string needs to be compared to any non-linear reference. For example, a subpopulation DAG could be compared to a population DAG in order to compare the genetic features of that subpopulation to those of the population.
Opening claim text (preview).
What is claimed is: 1. A method for genomic analysis, the method comprising: representing a plurality of nucleic acids from a population of individuals as a reference directed acyclic graph (DAG) stored in a non-transitory memory, wherein the reference DAG includes nodes connected by edges in which at least one node includes a string of a plurality of nucleotide characters corresponding to a nucleotide sequence found within the plurality of nucleic acids; obtaining a second DAG representing a second plurality of nucleic acids, the second plurality of nucleic acids comprising nucleic acids from one or more individuals; determining, using a processor coupled to the non-transitory memory, an alignment between the second DAG and the reference DAG; and creating, from the alignment, an aligned DAG comprising an aligned combination of the reference DAG and the second DAG. 2. The method of claim 1 , wherein each DAG comprises at least two alternative sequences per position at multiple positions in that DAG. 3. The method of claim 2 , wherein determining the alignment comprises: scoring sequence overlaps between the reference DAG and the second DAG, wherein greater overlap results in a higher score; and aligning portions of the second DAG to locations in the reference DAG such that the scores for the sequence overlaps are maximized. 4. The method of claim 1 , wherein said alignment is an optimal alignment. 5. The method of claim 4 , wherein said optimal alignment is a best-scoring DAG matrix alignment produced from a combination of said reference DAG and said second DAG. 6. The method of claim 5 , wherein said best-scoring DAG alignment is determined by a mathematical construct representing the optimal path through a matrix of similarity scores in said combination. 7. The method of claim 1 , wherein the second DAG is obtained from sequence reads from a sample from a subject. 8. The method of claim 7 , wherein the reference DAG comprises a plurality of alleles associated with a disease. 9. The method of claim 7 , wherein homozygous loci in the sample are represented using a single node in the second DAG and at least one heterozygous loci in the sample is represented using a plurality of different nodes in the second DAG. 10. The method of claim 1 , wherein the steps are performed using a computer system comprising the processor coupled to the non-transitory memory having the reference DAG stored therein and further wherein the alignment is stored as a final DAG in the non-transitory memory. 11. The method of claim 1 , wherein a DAG is stored as a computer file comprising: nodes, each node comprising a character string and a label, and edges, each edge comprising a pair of labels. 12. The method of claim 1 , wherein a DAG is stored as a computer file comprising: nodes, each node comprising one or more characters representing nucleotides, and edges, each edge representing a connection between a pair of the nodes. 13. The method of claim 1 , wherein at least one path through the reference DAG represents a sequence of a human chromosome. 14. The method of claim 13 , wherein at least one path through the second DAG represents an alternative sequence of the human chromosome. 15. The method of claim 1 , wherein the second DAG represents a transcriptome from an organism and the reference DAG represents one or more genomes from organisms of a same species as the organism. 16. The method of claim 1 , wherein finding an optimally-scoring alignment between the second DAG and the reference DAG comprises: calculating each of a plurality of values for entries in a matrix of similarities between the reference DAG and the second DAG based on a highest-valued neighboring entry and associating each calculated value with the highest-valued neighboring entry upon which the calculation of that calculated value was based; and identifying a path through the matrix that originates at the entry with the highest calculated value and traces sequentially through each associated neighboring entry until a zero entry is met, wherein the identified path indicates the optimally-scoring alignment. 17. The method of claim 1 , wherein: the reference DAG comprises a plurality of binary alignment map (BAM) entries that have been mapped to a first genomic reference; and the second DAG comprises a second plurality of BAM entries that have been mapped to a second genomic reference. 18. The method of claim 1 , further comprising comparing genetic features between the population and the one or more individuals using the aligned DAG. 19. The method of claim 1 , further comprising aligning a set of sequence reads to the aligned DAG.
ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks · CPC title
Endonuclease · CPC title
Methods for sequencing · CPC title
for diseases caused by alterations of genetic material · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.