Methods and systems for aligning sequences in the presence of repeating elements
US-2015199474-A1 · Jul 16, 2015 · US
US11810648B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11810648-B2 |
| Application number | US-201916663243-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 24, 2019 |
| Priority date | Jan 7, 2016 |
| Publication date | Nov 7, 2023 |
| Grant date | Nov 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for analyzing genomic information can include obtaining a sequence read including genetic information; identifying, within a graph representing a reference genome, a plurality of candidate mapping positions that relate to the genetic information, the graph comprising nodes representing genetic sequences and edges connecting pairs of nodes; determining, by means of a computer system, whether an alignment with the graph surrounding each of the plurality of candidate mapping positions is advanced or basic; and performing for each candidate mapping position, by means of the computer system, a local alignment based on whether the local alignment is advanced or basic. The advanced local alignment can include a first-local-alignment algorithm, and the basic local alignment includes a second-local-alignment algorithm. Based on the local alignments, the mapped position of the sequence read can be identified within the genome.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: using at least one hardware processor to perform: accessing data indicative of: a sequence read including genetic information; and a graph representing a reference genome and variation in the reference genome, the graph comprising nodes representing genetic sequences and edges connecting at least some of the nodes; identifying a first portion of the graph including a first candidate mapping position; selecting, from among multiple alignment algorithms and based on a measure of complexity of the first portion of the graph, an alignment algorithm to use for aligning the sequence read to the first portion of the graph, the multiple alignment algorithms including a first alignment algorithm and a second alignment algorithm different from the first alignment algorithm, the selecting comprising: determining a number of paths through the first portion of the graph or a number of nodes in the first portion of the graph, selecting the first alignment algorithm as the alignment algorithm to use for aligning the sequence read to the first portion of the graph, when either the number of paths through the first portion of the graph is greater than a threshold number of paths or when the number of nodes is greater than a threshold number of nodes, and selecting the second alignment algorithm as the alignment algorithm to use for aligning the sequence read to the first portion of the graph, when either the number of paths through the first portion of the graph is less than or equal to the threshold number of paths or when the number of nodes is less than or equal to the threshold number of nodes; and aligning the sequence read to the first portion of the graph using the selected alignment algorithm. 2. The method of claim 1 , further comprising using the at least one hardware processor to perform: identifying a second portion of the graph including a second candidate mapping position; selecting, from among the multiple alignment algorithms and based on a measure of complexity of the second portion of the graph, a second alignment algorithm to use for aligning the sequence read to the second portion of the graph; and aligning the sequence read to the second portion of the graph using the selected second alignment algorithm for aligning the sequence read to the second portion of the graph. 3. The method of claim 2 , further comprising using the at least one hardware processor to perform: determining, based on the alignment of the sequence read to the first portion of the graph and the alignment of the sequence read to the second portion of the graph, whether to align the sequence read to the graph using the first portion of the graph or the second portion of the graph. 4. The method of claim 3 , further comprising using the at least one hardware processor to perform: determining a first measure of quality for the alignment of the sequence read to the first portion of the graph; determining a second measure of quality for the alignment of the sequence read to the second portion of the graph; and ranking the alignment of the sequence read to the first portion of the graph and the alignment of the sequence read to the second portion of the graph based on the first measure of quality and the second measure of quality. 5. The method of claim 1 , further comprising using the at least one hardware processor to perform: determining the number of paths through the first portion of the graph. 6. The method of claim 5 , wherein selecting the alignment algorithm from among the multiple alignment algorithms based on the measure of complexity comprises: determining whether the number of paths through the first portion of the graph is less than or equal to the threshold number of paths; when it is determined that the number of paths is greater than the threshold number of paths, selecting the first alignment algorithm of the multiple alignment algorithms; and when it is determined that the number of paths is less than or equal to the threshold number of paths, selecting the second alignment algorithm of the multiple alignment algorithms. 7. The method of claim 6 , wherein the threshold number of paths is selected from the group consisting of two paths to twenty paths. 8. The method of claim 6 , wherein the first alignment algorithm is a graph-based alignment algorithm. 9. The method of claim 6 , wherein the second alignment algorithm is a linear alignment algorithm or a pattern matching algorithm. 10. The method of claim 1 , further comprising: determining the number of nodes in the first portion of the graph. 11. The method of claim 10 , wherein selecting the alignment algorithm from among the multiple alignment algorithms based on the measure of complexity comprises: determining whether the number of nodes is less than the threshold number of nodes; when it is determined that the number of nodes is greater than the threshold number of nodes, selecting the first alignment algorithm of the multiple alignment algorithms; and when it is determined that the number of nodes is less than or equal to the threshold number of nodes, selecting the second alignment algorithm of the multiple alignment algorithms. 12. The method of claim 11 , wherein the first alignment algorithm is a graph-based alignment algorithm. 13. The method of claim 11 , wherein the second alignment algorithm is a linear alignment algorithm or a pattern matching algorithm. 14. The method of claim 11 , wherein the threshold number of nodes is less than or equal to ten nodes. 15. The method of claim 1 , wherein aligning the sequence read to the first portion of the graph using the selected alignment algorithm comprises: generating a plurality of linear sequences based on the first portion of the graph, wherein each linear sequence of the plurality of linear sequences represents a respective path through the first portion of the graph; and aligning the sequence read against each of the plurality of linear sequences using the selected alignment algorithm. 16. The method of claim 15 , further comprising using the at least one hardware processor to perform a depth first search of the first portion of the graph to generate the plurality of the linear sequences. 17. A system comprising: at least one processor; and at least one non-transitory machine-readable memory device storing processor-executable instructions that, when executed by the at least one processor, cause the at least one processor to perform: accessing data indicative of: a sequence read including genetic information; and a graph representing a reference genome and variation in the reference genome, the graph comprising nodes representing genetic sequences and edges connecting at least some of the nodes; identifying a first portion of the graph including a first candidate mapping position; selecting, from among multiple alignment algorithms and based on a measure of complexity of the first portion of the graph, an alignment algorithm to use for aligning the sequence read to the first portion of the graph, the multiple alignment algorithms including a first alignment algorithm and a second alignment algorithm different from the first alignment algorithm, the selecting comprising: determining a number of paths through the first portion of the graph or a number of nodes in the first portion of the graph, selecting the first alignment algorithm as the alignment algorithm to use for aligning the sequence read to the first portion of the graph, when either the number of paths through the first portio
Related publications grouped by family.
Answers are generated from the same data shown on this page.