Systems and methods for using paired-end data in directed acyclic structure
US-9092402-B2 · Jul 28, 2015 · US
US10319465B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10319465-B2 |
| Application number | US-201615353105-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 16, 2016 |
| Priority date | Nov 16, 2016 |
| Publication date | Jun 11, 2019 |
| Grant date | Jun 11, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments of the disclosure relate to systems and methods for aligning a sequence read to a graph reference. In one embodiment, the method comprises selecting a first node from a graph reference, the graph reference comprising a plurality of nodes connected by a plurality of directed edges, at least one node of the plurality of nodes having a nucleotide sequence. The method further comprises traversing the graph reference according to a depth-first search, and comparing a sequence read to nucleotide sequences generated from the traversal of the graph reference. The traversal of the graph is then modified in response to a determination that each and every node associated with a given nucleotide sequence was previously evaluated.
Opening claim text (preview).
What is claimed is: 1. A system for aligning a sequence read to a graph reference, the system comprising: at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing: a graph reference, the graph reference comprising a plurality of nodes connected by a plurality of edges, at least one node of the plurality of nodes having an associated nucleotide sequence; a plurality of sequence reads; and processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform: selecting a first node of the plurality of nodes; identifying a first path by traversing the graph reference, the first path starting from the first node and comprising at least one child node of the first node; comparing at least one first nucleotide sequence generated from the first path with the sequence read, wherein the at least one first nucleotide sequence is generated at least in part by concatenating associated nucleotide sequences from nodes in the first path into the at least one first nucleotide sequence; identifying a second path by traversing the graph reference, the second path starting from the first node and comprising at least one node not considered by the first path; comparing at least one second nucleotide sequence generated from the second path with the sequence read, the comparing comprising determining whether the at least one second nucleotide sequence generated from the second path was previously generated by the first path, and removing one or more nodes from the identified second path based on the determination; determining a best-fit position of the sequence read on the graph reference; and reporting the best-fit position of the sequence read as the aligned position of the sequence read on the graph reference. 2. The system of claim 1 , wherein traversing the graph reference comprises performing a depth-first search. 3. The system of claim 1 , wherein a path comprises one or more nodes. 4. The system of claim 1 , wherein identifying a first path by traversing the graph reference comprises following outgoing edges of successive nodes until reaching a last node having no outgoing edges. 5. The system of claim 4 , wherein identifying a second path by traversing the graph reference comprises backtracking from the last node having no outgoing edges. 6. The system of claim 5 , wherein backtracking from the last node having no outgoing edges further comprises identifying a node having a previously unfollowed outgoing edge, following the previously unfollowed outgoing edge, and following outgoing edges of successive nodes until reaching a last node having no outgoing edges. 7. The system of claim 1 , wherein each node further comprises a followed indicator that indicates which of its outgoing directed edges have been traversed, and traversing the graph reference further comprises updating the followed indicator when following an outgoing edge from that node. 8. The system of claim 7 , wherein each node further comprises a visited indicator that indicates whether all of its outgoing directed edges have been traversed, wherein traversing the graph reference further comprises: determining whether a node's followed indicator indicates that all of its outgoing directed edges have been traversed; updating the node's visited indicator to indicate that all of the outgoing directed edges have been traversed; resetting the node's followed indicator to indicate that all of the outgoing directed edges have not been traversed; and backtracking from the node. 9. The system of claim 8 , wherein comparing at least one second nucleotide sequence generated from the second path with the sequence read further comprises considering whether the at least one second nucleotide sequence is generated entirely from nodes having visited indicators that indicate all of the outgoing directed edges of those nodes have been traversed. 10. The system of claim 1 , wherein comparing at least one first nucleotide sequence generated from the first path with the sequence read further comprises: identifying a first set of substrings of the at least one first nucleotide sequence; and comparing the first set of substrings to the sequence read. 11. The system of claim 10 , wherein comparing at least one second nucleotide sequence generated from the second path with the sequence read comprises: concatenating nucleotide sequences associated with the nodes in the second path into a second concatenated nucleotide sequence; identifying a second set of substrings of the second concatenated nucleotide sequence; comparing each substring of the second set of substrings to the sequence read if and only if a substring is not within the first set of substrings. 12. The system of claim 11 , wherein comparing each substring of the second set of substrings to the sequence read if and only if a substring is not within the first set of substrings comprises considering whether a substring is generated from at least one node that was not present in the first path. 13. The system of claim 12 , wherein each node further comprises a visited indicator that indicates whether all of that node's outgoing directed edges have been traversed, and comparing each substring of the second set of substrings further comprises: determining whether each of the nodes associated with a substring have visited indicators that indicate all of the outgoing directed edges of those nodes have been traversed; if each of the nodes associated with the substring have visited indicators that indicate all of the outgoing directed edges of those nodes have been traversed, removing the most recently added node from the second path and considering whether any remaining nodes in the second path have an unfollowed outgoing edge; and if any of the nodes associated with the substring have visited indicators that indicate all of the outgoing directed edges of those nodes have not been traversed, comparing the substring with the sequence read. 14. The system of claim 13 , wherein comparing the substring with the sequence read further comprises generating the substring from those nodes. 15. The system of claim 1 , wherein identifying a second path by traversing the graph reference comprises: identifying a quantity of nodes within the graph reference; traversing a portion of the graph reference; and modifying traversal of the graph in response to a determination that a region of the graph has been previously considered by the first path. 16. The system of claim 1 , wherein identifying a first path by traversing the graph reference comprises: considering whether the first node has an unfollowed child node; if the first node has an unfollowed child node, adding the unfollowed child node to the first path. 17. The system of claim 1 , wherein the sequence read comprises a k-mer of the sequence read, wherein comparing nucleotide sequences generated from a path with the sequence read comprises comparing a k-mer of the sequence read with the nucleotide sequences. 18. The system of claim 17 , wherein the k-mer begins from the first base of the sequence read. 19. The system of claim 17 , wherein the length of the k-mer is between 5 and 30 base pairs. 20. The system of claim 17 , further comprising determining one or more best-fit positions of the k-mer of the sequence read. 21. The system of claim 20 , further comprising determining a best-fit positio
Sequence assembly · CPC title
ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks · CPC title
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.