Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform
US-11842796-B2 · Dec 12, 2023 · US
US12087403B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12087403-B2 |
| Application number | US-202318114065-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 24, 2023 |
| Priority date | Oct 21, 2015 |
| Publication date | Sep 10, 2024 |
| Grant date | Sep 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
System and method for constructing a hierarchical index table usable for matching a search sequence to reference data. The index table may be constructed to contain entries associated with an exhaustive list of all subsequences of a given length, wherein each entry contains the number and locations of matches of each subsequence in the reference data. The hierarchical index table may be constructed in an iterative manner, wherein entries for each lengthened subsequence are selectively and iteratively constructed based on the number of matches being greater than each of a set of respective thresholds. The hierarchical index table may be used to search for matches between a search sequence and reference data, and to perform misfit identification and characterization upon each respective candidate match.
Opening claim text (preview).
What is claimed is: 1. A multiprocessor array (MPA) comprising interspersed processors and communication elements coupled to at least one non-transitory computer-readable memory medium, wherein the at least one non-transitory computer-readable memory medium comprises program instructions executable by the MPA to: determine a length and a type of an indel in a search sequence using an anchor position in the search sequence, wherein the search sequence is matched to reference data at a match location; determine a starting position of the indel, wherein, in determining the starting position of the indel, the program instructions are executable to cause the MPA to: compute a first running error sum between the search sequence and the reference data matched at the match location; and compute a second running error sum between the search sequence and the reference data matched at an offset from the match location, wherein the offset is based on the type and length of the indel; determine the starting position of the indel to correspond to a minimum of the second running error sum subtracted from the first running error sum; and store the length, size, and starting position of the indel in the non-transitory computer-readable memory medium. 2. The MPA of claim 1 , wherein the program instructions are further executable to cause the MPA to: select the anchor position in the search sequence, wherein, in selecting the anchor position in the search sequence, the program instructions are executable to cause the MPA to: determine a first number of mismatches at a low end of the search sequence and a second number of mismatches at a high end of the search sequence; and select the anchor position to be the low end or the high end based on a comparison of the first and second numbers. 3. The MPA of claim 1 , wherein, in determining the length and the type of the indel using the anchor position, the program instructions are executable to cause the MPA to: for each of a plurality of offsets from the match location: determine a respective number of mismatches between the search sequence and the reference data within a predetermined number of locations from an opposite end of the search sequence from the anchor position; determine the length of the indel to correspond to a magnitude of a first offset of the plurality of offsets corresponding to a smallest number of mismatches; and determine the type of the indel to be an insertion when the first offset is negative and a deletion when the first offset is positive. 4. The MPA of claim 1 , wherein the first and second running error sums are computed starting from the anchor position. 5. The MPA of claim 1 , wherein the program instructions are further executable to cause the MPA to: match the search sequence to the reference data using a hierarchical index table. 6. The MPA of claim 1 , wherein the reference data comprises a reference genome, and wherein the search sequence comprises a short read of a sample genome. 7. The MPA of claim 1 , wherein the program instructions are further executable to cause the MPA to: determine a running average of a number of mismatches of the search sequence to the reference data from a first end of the search sequence for a predetermined number of locations; and determine whether the running average is within a predetermined threshold of a first ratio, wherein determining the length, type and starting position of the indel is performed responsive to a determination that the running average is within the predetermined threshold of the first ratio. 8. The MPA of claim 7 , wherein the reference data comprises a reference genome, wherein the search sequence comprises a short read of a sample genome, and wherein the first ratio is ¾. 9. A non-transitory computer-readable memory medium comprising program instructions for characterizing an indel in a search sequence, wherein the program instructions are executable to: determine a length and a type of the indel using an anchor position in the search sequence, wherein the search sequence is matched to reference data at a match location; determine a starting position of the indel, wherein, in determining the starting position of the indel, the program instructions are executable to: compute a first running error sum between the search sequence and the reference data matched at the match location; and compute a second running error sum between the search sequence and the reference data matched at an offset from the match location, wherein the offset is based on the type and length of the indel; determine the starting position of the indel to correspond to a minimum of the second running error sum subtracted from the first running error sum; and store the length, size, and starting position of the indel in the non-transitory computer-readable memory medium. 10. The non-transitory computer-readable memory medium of claim 9 , wherein the program instructions are further executable to: select the anchor position in the search sequence, wherein, in selecting the anchor position in the search sequence, the program instructions are executable to: determine a first number of mismatches at a low end of the search sequence and a second number of mismatches at a high end of the search sequence; and select the anchor position to be the low end or the high end based on a comparison of the first and second numbers. 11. The non-transitory computer-readable memory medium of claim 9 , wherein, in determining the length and the type of the indel using the anchor position, the program instructions are executable to: for each of a plurality of offsets from the match location: determine a respective number of mismatches between the search sequence and the reference data within a predetermined number of locations from an opposite end of the search sequence from the anchor position; determine the length of the indel to correspond to a magnitude of a first offset of the plurality of offsets corresponding to a smallest number of mismatches; and determine the type of the indel to be an insertion when the first offset is negative and a deletion when the first offset is positive. 12. The non-transitory computer-readable memory medium of claim 9 , wherein the first and second running error sums are computed starting from the anchor position. 13. The non-transitory computer-readable memory medium of claim 9 , wherein the reference data comprises a reference genome, and wherein the search sequence comprises a short read of a sample genome. 14. A method for characterizing an indel in a search sequence, wherein the search sequence is matched to reference data at a match location, the method comprising: determining a length and a type of the indel using an anchor position in the search sequence; determining a starting position of the indel, comprising: computing a first running error sum between the search sequence and the reference data matched at the match location; and computing a second running error sum between the search sequence and the reference data matched at an offset from the match location, wherein the offset is based on the type and length of the indel; determining the starting position of the indel to correspond to a minimum of the second running error sum subtracted from the first running error sum; and storing the length, size, and starting position of the indel in a non-transitory computer-readable memory medium. 15. The method of claim 14 , further comprising: selecting the anchor position in the search sequence, comprising: determining a first num
Data warehousing; Computing architectures · CPC title
Inverted lists · CPC title
ICT programming tools or database systems specially adapted for bioinformatics · CPC title
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
Indexing structures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.