What technology area does this patent fall under?

Primary CPC classification G16B45/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 11 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for aligning sequences to graph references

US10319465B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10319465-B2
Application number	US-201615353105-A
Country	US
Kind code	B2
Filing date	Nov 16, 2016
Priority date	Nov 16, 2016
Publication date	Jun 11, 2019
Grant date	Jun 11, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments of the disclosure relate to systems and methods for aligning a sequence read to a graph reference. In one embodiment, the method comprises selecting a first node from a graph reference, the graph reference comprising a plurality of nodes connected by a plurality of directed edges, at least one node of the plurality of nodes having a nucleotide sequence. The method further comprises traversing the graph reference according to a depth-first search, and comparing a sequence read to nucleotide sequences generated from the traversal of the graph reference. The traversal of the graph is then modified in response to a determination that each and every node associated with a given nucleotide sequence was previously evaluated.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for aligning a sequence read to a graph reference, the system comprising: at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing: a graph reference, the graph reference comprising a plurality of nodes connected by a plurality of edges, at least one node of the plurality of nodes having an associated nucleotide sequence; a plurality of sequence reads; and processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform: selecting a first node of the plurality of nodes; identifying a first path by traversing the graph reference, the first path starting from the first node and comprising at least one child node of the first node; comparing at least one first nucleotide sequence generated from the first path with the sequence read, wherein the at least one first nucleotide sequence is generated at least in part by concatenating associated nucleotide sequences from nodes in the first path into the at least one first nucleotide sequence; identifying a second path by traversing the graph reference, the second path starting from the first node and comprising at least one node not considered by the first path; comparing at least one second nucleotide sequence generated from the second path with the sequence read, the comparing comprising determining whether the at least one second nucleotide sequence generated from the second path was previously generated by the first path, and removing one or more nodes from the identified second path based on the determination; determining a best-fit position of the sequence read on the graph reference; and reporting the best-fit position of the sequence read as the aligned position of the sequence read on the graph reference. 2. The system of claim 1 , wherein traversing the graph reference comprises performing a depth-first search. 3. The system of claim 1 , wherein a path comprises one or more nodes. 4. The system of claim 1 , wherein identifying a first path by traversing the graph reference comprises following outgoing edges of successive nodes until reaching a last node having no outgoing edges. 5. The system of claim 4 , wherein identifying a second path by traversing the graph reference comprises backtracking from the last node having no outgoing edges. 6. The system of claim 5 , wherein backtracking from the last node having no outgoing edges further comprises identifying a node having a previously unfollowed outgoing edge, following the previously unfollowed outgoing edge, and following outgoing edges of successive nodes until reaching a last node having no outgoing edges. 7. The system of claim 1 , wherein each node further comprises a followed indicator that indicates which of its outgoing directed edges have been traversed, and traversing the graph reference further comprises updating the followed indicator when following an outgoing edge from that node. 8. The system of claim 7 , wherein each node further comprises a visited indicator that indicates whether all of its outgoing directed edges have been traversed, wherein traversing the graph reference further comprises: determining whether a node's followed indicator indicates that all of its outgoing directed edges have been traversed; updating the node's visited indicator to indicate that all of the outgoing directed edges have been traversed; resetting the node's followed indicator to indicate that all of the outgoing directed edges have not been traversed; and backtracking from the node. 9. The system of claim 8 , wherein comparing at least one second nucleotide sequence generated from the second path with the sequence read further comprises considering whether the at least one second nucleotide sequence is generated entirely from nodes having visited indicators that indicate all of the outgoing directed edges of those nodes have been traversed. 10. The system of claim 1 , wherein comparing at least one first nucleotide sequence generated from the first path with the sequence read further comprises: identifying a first set of substrings of the at least one first nucleotide sequence; and comparing the first set of substrings to the sequence read. 11. The system of claim 10 , wherein comparing at least one second nucleotide sequence generated from the second path with the sequence read comprises: concatenating nucleotide sequences associated with the nodes in the second path into a second concatenated nucleotide sequence; identifying a second set of substrings of the second concatenated nucleotide sequence; comparing each substring of the second set of substrings to the sequence read if and only if a substring is not within the first set of substrings. 12. The system of claim 11 , wherein comparing each substring of the second set of substrings to the sequence read if and only if a substring is not within the first set of substrings comprises considering whether a substring is generated from at least one node that was not present in the first path. 13. The system of claim 12 , wherein each node further comprises a visited indicator that indicates whether all of that node's outgoing directed edges have been traversed, and comparing each substring of the second set of substrings further comprises: determining whether each of the nodes associated with a substring have visited indicators that indicate all of the outgoing directed edges of those nodes have been traversed; if each of the nodes associated with the substring have visited indicators that indicate all of the outgoing directed edges of those nodes have been traversed, removing the most recently added node from the second path and considering whether any remaining nodes in the second path have an unfollowed outgoing edge; and if any of the nodes associated with the substring have visited indicators that indicate all of the outgoing directed edges of those nodes have not been traversed, comparing the substring with the sequence read. 14. The system of claim 13 , wherein comparing the substring with the sequence read further comprises generating the substring from those nodes. 15. The system of claim 1 , wherein identifying a second path by traversing the graph reference comprises: identifying a quantity of nodes within the graph reference; traversing a portion of the graph reference; and modifying traversal of the graph in response to a determination that a region of the graph has been previously considered by the first path. 16. The system of claim 1 , wherein identifying a first path by traversing the graph reference comprises: considering whether the first node has an unfollowed child node; if the first node has an unfollowed child node, adding the unfollowed child node to the first path. 17. The system of claim 1 , wherein the sequence read comprises a k-mer of the sequence read, wherein comparing nucleotide sequences generated from a path with the sequence read comprises comparing a k-mer of the sequence read with the nucleotide sequences. 18. The system of claim 17 , wherein the k-mer begins from the first base of the sequence read. 19. The system of claim 17 , wherein the length of the k-mer is between 5 and 30 base pairs. 20. The system of claim 17 , further comprising determining one or more best-fit positions of the k-mer of the sequence read. 21. The system of claim 20 , further comprising determining a best-fit positio

Assignees

Seven Bridges Genomics Inc

Inventors

Classifications

G16B30/20
Sequence assembly · CPC title
G16B45/00Primary
ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks · CPC title
G06K9/00134
Physics · mapped topic
G06K9/00147
Physics · mapped topic
G06K9/6892
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 62108266

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10319465B2 cover?: Various embodiments of the disclosure relate to systems and methods for aligning a sequence read to a graph reference. In one embodiment, the method comprises selecting a first node from a graph reference, the graph reference comprising a plurality of nodes connected by a plurality of directed edges, at least one node of the plurality of nodes having a nucleotide sequence. The method further co…
Who is the assignee on this patent?: Seven Bridges Genomics Inc
What technology area does this patent fall under?: Primary CPC classification G16B45/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 11 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).