What technology area does this patent fall under?

Primary CPC classification G06F16/319. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

DNA alignment using a hierarchical inverted index table

US12087403B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12087403-B2
Application number	US-202318114065-A
Country	US
Kind code	B2
Filing date	Feb 24, 2023
Priority date	Oct 21, 2015
Publication date	Sep 10, 2024
Grant date	Sep 10, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

System and method for constructing a hierarchical index table usable for matching a search sequence to reference data. The index table may be constructed to contain entries associated with an exhaustive list of all subsequences of a given length, wherein each entry contains the number and locations of matches of each subsequence in the reference data. The hierarchical index table may be constructed in an iterative manner, wherein entries for each lengthened subsequence are selectively and iteratively constructed based on the number of matches being greater than each of a set of respective thresholds. The hierarchical index table may be used to search for matches between a search sequence and reference data, and to perform misfit identification and characterization upon each respective candidate match.

First claim

Opening claim text (preview).

What is claimed is: 1. A multiprocessor array (MPA) comprising interspersed processors and communication elements coupled to at least one non-transitory computer-readable memory medium, wherein the at least one non-transitory computer-readable memory medium comprises program instructions executable by the MPA to: determine a length and a type of an indel in a search sequence using an anchor position in the search sequence, wherein the search sequence is matched to reference data at a match location; determine a starting position of the indel, wherein, in determining the starting position of the indel, the program instructions are executable to cause the MPA to: compute a first running error sum between the search sequence and the reference data matched at the match location; and compute a second running error sum between the search sequence and the reference data matched at an offset from the match location, wherein the offset is based on the type and length of the indel; determine the starting position of the indel to correspond to a minimum of the second running error sum subtracted from the first running error sum; and store the length, size, and starting position of the indel in the non-transitory computer-readable memory medium. 2. The MPA of claim 1 , wherein the program instructions are further executable to cause the MPA to: select the anchor position in the search sequence, wherein, in selecting the anchor position in the search sequence, the program instructions are executable to cause the MPA to: determine a first number of mismatches at a low end of the search sequence and a second number of mismatches at a high end of the search sequence; and select the anchor position to be the low end or the high end based on a comparison of the first and second numbers. 3. The MPA of claim 1 , wherein, in determining the length and the type of the indel using the anchor position, the program instructions are executable to cause the MPA to: for each of a plurality of offsets from the match location: determine a respective number of mismatches between the search sequence and the reference data within a predetermined number of locations from an opposite end of the search sequence from the anchor position; determine the length of the indel to correspond to a magnitude of a first offset of the plurality of offsets corresponding to a smallest number of mismatches; and determine the type of the indel to be an insertion when the first offset is negative and a deletion when the first offset is positive. 4. The MPA of claim 1 , wherein the first and second running error sums are computed starting from the anchor position. 5. The MPA of claim 1 , wherein the program instructions are further executable to cause the MPA to: match the search sequence to the reference data using a hierarchical index table. 6. The MPA of claim 1 , wherein the reference data comprises a reference genome, and wherein the search sequence comprises a short read of a sample genome. 7. The MPA of claim 1 , wherein the program instructions are further executable to cause the MPA to: determine a running average of a number of mismatches of the search sequence to the reference data from a first end of the search sequence for a predetermined number of locations; and determine whether the running average is within a predetermined threshold of a first ratio, wherein determining the length, type and starting position of the indel is performed responsive to a determination that the running average is within the predetermined threshold of the first ratio. 8. The MPA of claim 7 , wherein the reference data comprises a reference genome, wherein the search sequence comprises a short read of a sample genome, and wherein the first ratio is ¾. 9. A non-transitory computer-readable memory medium comprising program instructions for characterizing an indel in a search sequence, wherein the program instructions are executable to: determine a length and a type of the indel using an anchor position in the search sequence, wherein the search sequence is matched to reference data at a match location; determine a starting position of the indel, wherein, in determining the starting position of the indel, the program instructions are executable to: compute a first running error sum between the search sequence and the reference data matched at the match location; and compute a second running error sum between the search sequence and the reference data matched at an offset from the match location, wherein the offset is based on the type and length of the indel; determine the starting position of the indel to correspond to a minimum of the second running error sum subtracted from the first running error sum; and store the length, size, and starting position of the indel in the non-transitory computer-readable memory medium. 10. The non-transitory computer-readable memory medium of claim 9 , wherein the program instructions are further executable to: select the anchor position in the search sequence, wherein, in selecting the anchor position in the search sequence, the program instructions are executable to: determine a first number of mismatches at a low end of the search sequence and a second number of mismatches at a high end of the search sequence; and select the anchor position to be the low end or the high end based on a comparison of the first and second numbers. 11. The non-transitory computer-readable memory medium of claim 9 , wherein, in determining the length and the type of the indel using the anchor position, the program instructions are executable to: for each of a plurality of offsets from the match location: determine a respective number of mismatches between the search sequence and the reference data within a predetermined number of locations from an opposite end of the search sequence from the anchor position; determine the length of the indel to correspond to a magnitude of a first offset of the plurality of offsets corresponding to a smallest number of mismatches; and determine the type of the indel to be an insertion when the first offset is negative and a deletion when the first offset is positive. 12. The non-transitory computer-readable memory medium of claim 9 , wherein the first and second running error sums are computed starting from the anchor position. 13. The non-transitory computer-readable memory medium of claim 9 , wherein the reference data comprises a reference genome, and wherein the search sequence comprises a short read of a sample genome. 14. A method for characterizing an indel in a search sequence, wherein the search sequence is matched to reference data at a match location, the method comprising: determining a length and a type of the indel using an anchor position in the search sequence; determining a starting position of the indel, comprising: computing a first running error sum between the search sequence and the reference data matched at the match location; and computing a second running error sum between the search sequence and the reference data matched at an offset from the match location, wherein the offset is based on the type and length of the indel; determining the starting position of the indel to correspond to a minimum of the second running error sum subtracted from the first running error sum; and storing the length, size, and starting position of the indel in a non-transitory computer-readable memory medium. 15. The method of claim 14 , further comprising: selecting the anchor position in the search sequence, comprising: determining a first num

Assignees

Coherent Logix Inc

Inventors

Classifications

G16B50/30
Data warehousing; Computing architectures · CPC title
G06F16/319Primary
Inverted lists · CPC title
G16B50/00
ICT programming tools or database systems specially adapted for bioinformatics · CPC title
G16B30/00Primary
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
G06F16/2228
Indexing structures · CPC title

Patent family

Related publications grouped by family.

View patent family 58557902

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12087403B2 cover?: System and method for constructing a hierarchical index table usable for matching a search sequence to reference data. The index table may be constructed to contain entries associated with an exhaustive list of all subsequences of a given length, wherein each entry contains the number and locations of matches of each subsequence in the reference data. The hierarchical index table may be constru…
Who is the assignee on this patent?: Coherent Logix Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/319. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).