Systems and methods for adaptive local alignment for graph genomes

US11810648B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11810648-B2
Application numberUS-201916663243-A
CountryUS
Kind codeB2
Filing dateOct 24, 2019
Priority dateJan 7, 2016
Publication dateNov 7, 2023
Grant dateNov 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for analyzing genomic information can include obtaining a sequence read including genetic information; identifying, within a graph representing a reference genome, a plurality of candidate mapping positions that relate to the genetic information, the graph comprising nodes representing genetic sequences and edges connecting pairs of nodes; determining, by means of a computer system, whether an alignment with the graph surrounding each of the plurality of candidate mapping positions is advanced or basic; and performing for each candidate mapping position, by means of the computer system, a local alignment based on whether the local alignment is advanced or basic. The advanced local alignment can include a first-local-alignment algorithm, and the basic local alignment includes a second-local-alignment algorithm. Based on the local alignments, the mapped position of the sequence read can be identified within the genome.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: using at least one hardware processor to perform: accessing data indicative of: a sequence read including genetic information; and a graph representing a reference genome and variation in the reference genome, the graph comprising nodes representing genetic sequences and edges connecting at least some of the nodes; identifying a first portion of the graph including a first candidate mapping position; selecting, from among multiple alignment algorithms and based on a measure of complexity of the first portion of the graph, an alignment algorithm to use for aligning the sequence read to the first portion of the graph, the multiple alignment algorithms including a first alignment algorithm and a second alignment algorithm different from the first alignment algorithm, the selecting comprising: determining a number of paths through the first portion of the graph or a number of nodes in the first portion of the graph, selecting the first alignment algorithm as the alignment algorithm to use for aligning the sequence read to the first portion of the graph, when either the number of paths through the first portion of the graph is greater than a threshold number of paths or when the number of nodes is greater than a threshold number of nodes, and selecting the second alignment algorithm as the alignment algorithm to use for aligning the sequence read to the first portion of the graph, when either the number of paths through the first portion of the graph is less than or equal to the threshold number of paths or when the number of nodes is less than or equal to the threshold number of nodes; and aligning the sequence read to the first portion of the graph using the selected alignment algorithm. 2. The method of claim 1 , further comprising using the at least one hardware processor to perform: identifying a second portion of the graph including a second candidate mapping position; selecting, from among the multiple alignment algorithms and based on a measure of complexity of the second portion of the graph, a second alignment algorithm to use for aligning the sequence read to the second portion of the graph; and aligning the sequence read to the second portion of the graph using the selected second alignment algorithm for aligning the sequence read to the second portion of the graph. 3. The method of claim 2 , further comprising using the at least one hardware processor to perform: determining, based on the alignment of the sequence read to the first portion of the graph and the alignment of the sequence read to the second portion of the graph, whether to align the sequence read to the graph using the first portion of the graph or the second portion of the graph. 4. The method of claim 3 , further comprising using the at least one hardware processor to perform: determining a first measure of quality for the alignment of the sequence read to the first portion of the graph; determining a second measure of quality for the alignment of the sequence read to the second portion of the graph; and ranking the alignment of the sequence read to the first portion of the graph and the alignment of the sequence read to the second portion of the graph based on the first measure of quality and the second measure of quality. 5. The method of claim 1 , further comprising using the at least one hardware processor to perform: determining the number of paths through the first portion of the graph. 6. The method of claim 5 , wherein selecting the alignment algorithm from among the multiple alignment algorithms based on the measure of complexity comprises: determining whether the number of paths through the first portion of the graph is less than or equal to the threshold number of paths; when it is determined that the number of paths is greater than the threshold number of paths, selecting the first alignment algorithm of the multiple alignment algorithms; and when it is determined that the number of paths is less than or equal to the threshold number of paths, selecting the second alignment algorithm of the multiple alignment algorithms. 7. The method of claim 6 , wherein the threshold number of paths is selected from the group consisting of two paths to twenty paths. 8. The method of claim 6 , wherein the first alignment algorithm is a graph-based alignment algorithm. 9. The method of claim 6 , wherein the second alignment algorithm is a linear alignment algorithm or a pattern matching algorithm. 10. The method of claim 1 , further comprising: determining the number of nodes in the first portion of the graph. 11. The method of claim 10 , wherein selecting the alignment algorithm from among the multiple alignment algorithms based on the measure of complexity comprises: determining whether the number of nodes is less than the threshold number of nodes; when it is determined that the number of nodes is greater than the threshold number of nodes, selecting the first alignment algorithm of the multiple alignment algorithms; and when it is determined that the number of nodes is less than or equal to the threshold number of nodes, selecting the second alignment algorithm of the multiple alignment algorithms. 12. The method of claim 11 , wherein the first alignment algorithm is a graph-based alignment algorithm. 13. The method of claim 11 , wherein the second alignment algorithm is a linear alignment algorithm or a pattern matching algorithm. 14. The method of claim 11 , wherein the threshold number of nodes is less than or equal to ten nodes. 15. The method of claim 1 , wherein aligning the sequence read to the first portion of the graph using the selected alignment algorithm comprises: generating a plurality of linear sequences based on the first portion of the graph, wherein each linear sequence of the plurality of linear sequences represents a respective path through the first portion of the graph; and aligning the sequence read against each of the plurality of linear sequences using the selected alignment algorithm. 16. The method of claim 15 , further comprising using the at least one hardware processor to perform a depth first search of the first portion of the graph to generate the plurality of the linear sequences. 17. A system comprising: at least one processor; and at least one non-transitory machine-readable memory device storing processor-executable instructions that, when executed by the at least one processor, cause the at least one processor to perform: accessing data indicative of: a sequence read including genetic information; and a graph representing a reference genome and variation in the reference genome, the graph comprising nodes representing genetic sequences and edges connecting at least some of the nodes; identifying a first portion of the graph including a first candidate mapping position; selecting, from among multiple alignment algorithms and based on a measure of complexity of the first portion of the graph, an alignment algorithm to use for aligning the sequence read to the first portion of the graph, the multiple alignment algorithms including a first alignment algorithm and a second alignment algorithm different from the first alignment algorithm, the selecting comprising: determining a number of paths through the first portion of the graph or a number of nodes in the first portion of the graph, selecting the first alignment algorithm as the alignment algorithm to use for aligning the sequence read to the first portion of the graph, when either the number of paths through the first portio

Assignees

Inventors

Classifications

  • G16B30/10Primary

    Sequence alignment; Homology search · CPC title

  • G16B30/00Primary

    ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11810648B2 cover?
Systems and methods for analyzing genomic information can include obtaining a sequence read including genetic information; identifying, within a graph representing a reference genome, a plurality of candidate mapping positions that relate to the genetic information, the graph comprising nodes representing genetic sequences and edges connecting pairs of nodes; determining, by means of a computer…
Who is the assignee on this patent?
Seven Bridges Genomics Inc
What technology area does this patent fall under?
Primary CPC classification G16B30/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).