Systems and methods for aligning sequences to graph references

US11062793B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11062793-B2
Application numberUS-201916436040-A
CountryUS
Kind codeB2
Filing dateJun 10, 2019
Priority dateNov 16, 2016
Publication dateJul 13, 2021
Grant dateJul 13, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments of the disclosure relate to systems and methods for aligning a sequence read to a graph reference. In one embodiment, the method comprises selecting a first node from a graph reference, the graph reference comprising a plurality of nodes connected by a plurality of directed edges, at least one node of the plurality of nodes having a nucleotide sequence. The method further comprises traversing the graph reference according to a depth-first search, and comparing a sequence read to nucleotide sequences generated from the traversal of the graph reference. The traversal of the graph is then modified in response to a determination that each and every node associated with a given nucleotide sequence was previously evaluated.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for aligning a sequence read of a plurality of sequence reads to a graph reference, the system comprising: at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing: the graph reference, the graph reference comprising a plurality of nodes connected by a plurality of edges, at least one node of the plurality of nodes having an associated nucleotide sequence; and processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform: accessing the graph reference; accessing the plurality of sequence reads; aligning the sequence read of the plurality of sequence reads to the graph reference, the aligning comprising: selecting a first node from the plurality of nodes; identifying a first path in the graph reference, the first path starting from the first node and comprising at least one child node of the first node; generating a first nucleotide sequence comprising a first plurality of nucleotide subsequences at least in part by concatenating nucleotide sequences associated with at least some nodes in the first path; comparing at least one first nucleotide subsequence of the first plurality of nucleotide subsequences with the sequence read; identifying a second path in the graph reference, the second path starting from the first node and comprising at least one node not in the first path; generating a second nucleotide sequence comprising a second plurality of nucleotide subsequences at least in part by concatenating nucleotide sequences associated with at least some nodes in the second path; comparing at least one second nucleotide subsequence of the second plurality of nucleotide subsequences with the sequence read, the comparing comprising determining whether the at least one second nucleotide subsequence has been previously generated using the first path, and removing one or more nodes from the second path in response to determining that the at least one second nucleotide subsequence has been previously generated using the first path; determining an aligned position of the sequence read on the graph reference; and outputting the aligned position of the sequence read. 2. The system of claim 1 , wherein identifying the first path in the graph reference comprises following outgoing edges of successive nodes until reaching a last node having no outgoing edges. 3. The system of claim 2 , wherein identifying the second path in the graph reference comprises backtracking from the last node having no outgoing edges. 4. The system of claim 3 , wherein backtracking from the last node having no outgoing edges further comprises identifying a node having a previously unfollowed outgoing edge, following the previously unfollowed outgoing edge, and following outgoing edges of successive nodes until reaching the last node having no outgoing edges. 5. The system of claim 1 , wherein each node further comprises a followed indicator that indicates which of its outgoing directed edges have been followed, and identifying the first path in the graph reference further comprises updating the followed indicator when following an outgoing edge from that node. 6. The system of claim 5 , wherein each node further comprises a visited indicator that indicates whether all of its outgoing directed edges have been followed, wherein identifying the first path in the graph reference further comprises: determining whether a node's followed indicator indicates that all of its outgoing directed edges have been followed; updating the node's visited indicator to indicate that all of the outgoing directed edges have been followed; resetting the node's followed indicator to indicate that all of the outgoing directed edges have not been followed; and backtracking from the node. 7. The system of claim 6 , wherein comparing the at least one second nucleotide subsequence of the second plurality of nucleotide subsequences with the sequence read further comprises determining whether the at least one second nucleotide subsequence is generated entirely from nodes having visited indicators that indicate all of the outgoing directed edges of those nodes have been followed. 8. The system of claim 1 , wherein each node further comprises a visited indicator that indicates whether all of that node's outgoing directed edges have been followed, and comparing the at least one second nucleotide subsequence of the second plurality of nucleotide subsequences further comprises: determining whether each of the nodes associated with the at least one second nucleotide subsequence have visited indicators that indicate all of the outgoing directed edges of those nodes have been followed; if each of the nodes associated with the at least one second nucleotide subsequence have visited indicators that indicate all of the outgoing directed edges of those nodes have been followed, removing a most recently added node from the second path and considering whether any remaining nodes in the second path have an unfollowed outgoing edge; and if any of the nodes associated with the at least one second nucleotide subsequence have visited indicators that indicate all of the outgoing directed edges of those nodes have not been followed, comparing the at least one second nucleotide subsequence with the sequence read. 9. The system of claim 1 , wherein identifying the second path in the graph reference comprises: identifying a quantity of nodes within the graph reference; traversing a portion of the graph reference; and modifying traversal of the graph reference in response to a determination that a region of the graph reference has been previously in the first path. 10. The system of claim 1 , wherein the sequence read comprises a k-mer of the sequence read, wherein comparing nucleotide subsequences with the sequence read comprises comparing the k-mer of the sequence read with the nucleotide subsequences. 11. The system of claim 10 , wherein the k-mer begins from a first base of the sequence read. 12. A method for aligning a sequence read of a plurality of sequence reads to a graph reference, the method comprising: using at least one computer hardware processor to perform: accessing the graph reference, the graph reference being stored in at least one non-transitory computer-readable storage medium and comprising a plurality of nodes connected by a plurality of edges, at least one node of the plurality of nodes having an associated nucleotide sequence; accessing the plurality of sequence reads; aligning the sequence read of the plurality of sequence reads to the graph reference, the aligning comprising: selecting a first node from the plurality of nodes; identifying a first path in the graph reference, the first path starting from the first node and comprising at least one child node of the first node; generating a first nucleotide sequence comprising a first plurality of nucleotide subsequences at least in part by concatenating nucleotide sequences associated with at least some nodes in the first path; comparing at least one first nucleotide subsequence of the first plurality of nucleotide subsequences with the sequence read; identifying a second path in the graph reference, the second path starting from the first node and comprising at least one node not in the first path; generating a second nucleotide sequence comprising a second plurality of nucleotide subsequences at least in part by concatenating nucleotide sequences associated with at least some nodes in the second path; comparing at least one second nucleotide subsequence of the sec

Assignees

Inventors

Classifications

  • G06V20/693Primary

    Acquisition · CPC title

  • Graph matching (graphical image representation G06V30/18181) · CPC title

  • Recognition of patterns in DNA microarrays · CPC title

  • Matching; Classification · CPC title

  • Sequence assembly · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11062793B2 cover?
Various embodiments of the disclosure relate to systems and methods for aligning a sequence read to a graph reference. In one embodiment, the method comprises selecting a first node from a graph reference, the graph reference comprising a plurality of nodes connected by a plurality of directed edges, at least one node of the plurality of nodes having a nucleotide sequence. The method further co…
Who is the assignee on this patent?
Seven Bridges Genomics Inc
What technology area does this patent fall under?
Primary CPC classification G06V20/693. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 13 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).