Systems and methods for using paired-end data in directed acyclic structure
US-9092402-B2 · Jul 28, 2015 · US
US11062793B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11062793-B2 |
| Application number | US-201916436040-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 10, 2019 |
| Priority date | Nov 16, 2016 |
| Publication date | Jul 13, 2021 |
| Grant date | Jul 13, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments of the disclosure relate to systems and methods for aligning a sequence read to a graph reference. In one embodiment, the method comprises selecting a first node from a graph reference, the graph reference comprising a plurality of nodes connected by a plurality of directed edges, at least one node of the plurality of nodes having a nucleotide sequence. The method further comprises traversing the graph reference according to a depth-first search, and comparing a sequence read to nucleotide sequences generated from the traversal of the graph reference. The traversal of the graph is then modified in response to a determination that each and every node associated with a given nucleotide sequence was previously evaluated.
Opening claim text (preview).
What is claimed is: 1. A system for aligning a sequence read of a plurality of sequence reads to a graph reference, the system comprising: at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing: the graph reference, the graph reference comprising a plurality of nodes connected by a plurality of edges, at least one node of the plurality of nodes having an associated nucleotide sequence; and processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform: accessing the graph reference; accessing the plurality of sequence reads; aligning the sequence read of the plurality of sequence reads to the graph reference, the aligning comprising: selecting a first node from the plurality of nodes; identifying a first path in the graph reference, the first path starting from the first node and comprising at least one child node of the first node; generating a first nucleotide sequence comprising a first plurality of nucleotide subsequences at least in part by concatenating nucleotide sequences associated with at least some nodes in the first path; comparing at least one first nucleotide subsequence of the first plurality of nucleotide subsequences with the sequence read; identifying a second path in the graph reference, the second path starting from the first node and comprising at least one node not in the first path; generating a second nucleotide sequence comprising a second plurality of nucleotide subsequences at least in part by concatenating nucleotide sequences associated with at least some nodes in the second path; comparing at least one second nucleotide subsequence of the second plurality of nucleotide subsequences with the sequence read, the comparing comprising determining whether the at least one second nucleotide subsequence has been previously generated using the first path, and removing one or more nodes from the second path in response to determining that the at least one second nucleotide subsequence has been previously generated using the first path; determining an aligned position of the sequence read on the graph reference; and outputting the aligned position of the sequence read. 2. The system of claim 1 , wherein identifying the first path in the graph reference comprises following outgoing edges of successive nodes until reaching a last node having no outgoing edges. 3. The system of claim 2 , wherein identifying the second path in the graph reference comprises backtracking from the last node having no outgoing edges. 4. The system of claim 3 , wherein backtracking from the last node having no outgoing edges further comprises identifying a node having a previously unfollowed outgoing edge, following the previously unfollowed outgoing edge, and following outgoing edges of successive nodes until reaching the last node having no outgoing edges. 5. The system of claim 1 , wherein each node further comprises a followed indicator that indicates which of its outgoing directed edges have been followed, and identifying the first path in the graph reference further comprises updating the followed indicator when following an outgoing edge from that node. 6. The system of claim 5 , wherein each node further comprises a visited indicator that indicates whether all of its outgoing directed edges have been followed, wherein identifying the first path in the graph reference further comprises: determining whether a node's followed indicator indicates that all of its outgoing directed edges have been followed; updating the node's visited indicator to indicate that all of the outgoing directed edges have been followed; resetting the node's followed indicator to indicate that all of the outgoing directed edges have not been followed; and backtracking from the node. 7. The system of claim 6 , wherein comparing the at least one second nucleotide subsequence of the second plurality of nucleotide subsequences with the sequence read further comprises determining whether the at least one second nucleotide subsequence is generated entirely from nodes having visited indicators that indicate all of the outgoing directed edges of those nodes have been followed. 8. The system of claim 1 , wherein each node further comprises a visited indicator that indicates whether all of that node's outgoing directed edges have been followed, and comparing the at least one second nucleotide subsequence of the second plurality of nucleotide subsequences further comprises: determining whether each of the nodes associated with the at least one second nucleotide subsequence have visited indicators that indicate all of the outgoing directed edges of those nodes have been followed; if each of the nodes associated with the at least one second nucleotide subsequence have visited indicators that indicate all of the outgoing directed edges of those nodes have been followed, removing a most recently added node from the second path and considering whether any remaining nodes in the second path have an unfollowed outgoing edge; and if any of the nodes associated with the at least one second nucleotide subsequence have visited indicators that indicate all of the outgoing directed edges of those nodes have not been followed, comparing the at least one second nucleotide subsequence with the sequence read. 9. The system of claim 1 , wherein identifying the second path in the graph reference comprises: identifying a quantity of nodes within the graph reference; traversing a portion of the graph reference; and modifying traversal of the graph reference in response to a determination that a region of the graph reference has been previously in the first path. 10. The system of claim 1 , wherein the sequence read comprises a k-mer of the sequence read, wherein comparing nucleotide subsequences with the sequence read comprises comparing the k-mer of the sequence read with the nucleotide subsequences. 11. The system of claim 10 , wherein the k-mer begins from a first base of the sequence read. 12. A method for aligning a sequence read of a plurality of sequence reads to a graph reference, the method comprising: using at least one computer hardware processor to perform: accessing the graph reference, the graph reference being stored in at least one non-transitory computer-readable storage medium and comprising a plurality of nodes connected by a plurality of edges, at least one node of the plurality of nodes having an associated nucleotide sequence; accessing the plurality of sequence reads; aligning the sequence read of the plurality of sequence reads to the graph reference, the aligning comprising: selecting a first node from the plurality of nodes; identifying a first path in the graph reference, the first path starting from the first node and comprising at least one child node of the first node; generating a first nucleotide sequence comprising a first plurality of nucleotide subsequences at least in part by concatenating nucleotide sequences associated with at least some nodes in the first path; comparing at least one first nucleotide subsequence of the first plurality of nucleotide subsequences with the sequence read; identifying a second path in the graph reference, the second path starting from the first node and comprising at least one node not in the first path; generating a second nucleotide sequence comprising a second plurality of nucleotide subsequences at least in part by concatenating nucleotide sequences associated with at least some nodes in the second path; comparing at least one second nucleotide subsequence of the sec
Acquisition · CPC title
Graph matching (graphical image representation G06V30/18181) · CPC title
Recognition of patterns in DNA microarrays · CPC title
Matching; Classification · CPC title
Sequence assembly · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.