Reduced memory nucleotide sequence comparison

US10241970B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10241970-B2
Application numberUS-201615351372-A
CountryUS
Kind codeB2
Filing dateNov 14, 2016
Priority dateNov 14, 2016
Publication dateMar 26, 2019
Grant dateMar 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Comparisons between two nucleotide sequences can be performed by customized integrated circuitry that can implement a Smith Waterman analysis in a reduced memory footprint, storing and referencing only individual portions, or subsections, of a two-dimensional matrix that is representative of the comparison between the two nucleotide sequences. As the backtracking proceeds, backtracking metadata corresponding to a cell from a subsection that is not currently retained in memory can be required. Such a subsection can be regenerated from previously generated scores associated with checkpoint cells of the two-dimensional matrix that comprise two edges of the subsection being regenerated. Moreover, to further reduce memory consumption, the backtracking metadata stored for each cell can comprise four binary digits: two indicative of a directional assignment, one indicative of whether the corresponding cell is part of a deletion stretching across multiple contiguous cells, and one analogously indicative of insertions stretching across multiple contiguous cells.

First claim

Opening claim text (preview).

We claim: 1. A customized integrated circuit comprising: a calculation engine comprising circuitry that, during operation of the customized integrated circuit, enables the customized integrated circuit to: generate, for individual cells of a two-dimensional matrix, scores that are based on generated scores of prior cells and pairs of nucleotides that correspond to the individual cells, wherein the pairs of nucleotides each comprise one nucleotide from each of a first nucleotide sequence and a second nucleotide sequence that are being compared to one another; and generate backtrack metadata for the individual cells based at least in part on the generating of the scores; and a backtrack metadata unit comprising circuitry that, during operation of the customized integrated circuit, enables the customized integrated circuit to: store, in a cache, backtrack metadata of cells of only a first subsection of the two-dimensional matrix; determine that a first cell of the two-dimensional matrix, for which the backtrack metadata unit is to output a first backtrack metadata, is not in the first subsection; request, in response to the determining, the calculation engine to generate again backtrack metadata of cells of only a second subsection of the two-dimensional matrix, the calculation engine regenerating the backtrack metadata of the cells of only the second subsection based on scores of cells of the two-dimensional matrix that comprise two edges of the second subsection, the second subsection being mutually exclusive of the first subsection; and obtain, from the regenerated backtrack metadata of the cells of only the second subsection, stored in the cache, the first backtrack metadata of the first cell; wherein the customized integrated circuit is utilized in the comparing of the first and second nucleotide sequences, the comparing comprising generating a textual string comprising indicators of similarities and differences of the first nucleotide sequence as compared with the second nucleotide sequence. 2. The customized integrated circuit of claim 1 , wherein the calculation engine generates scores for the cells in the two-dimensional matrix in series, one cell at a time. 3. The customized integrated circuit of claim 1 , wherein the calculation engine generates scores for the cells in the two-dimensional matrix in parallel, generating scores for multiple cells at a same time. 4. The customized integrated circuit of claim 1 , further comprising a comparison string generator comprising circuitry that, during operation of the customized integrated circuit, enables the customized integrated circuit to: select the first cell; request the first backtrack metadata of the first cell from the backtrack metadata unit; and modify the textual string in accordance with the first backtrack metadata. 5. The customized integrated circuit of claim 4 , further comprising a score aggregation unit comprising circuitry that, during operation of the customized integrated circuit, enables the customized integrated circuit to: receive scores of cells of the two-dimensional matrix as generated by the calculation engine; retain a coordinates, within the two-dimensional matrix, of a highest score cell having a score that is greater in value than scores of other cells of the two-dimensional matrix. 6. The customized integrated circuit of claim 5 , wherein the selecting is informed by the retained coordinates. 7. The customized integrated circuit of claim 4 , wherein the comparison string generator comprises further circuitry that, during operation of the customized integrated circuit, enables the customized integrated circuit to: select subsequent cells based on backtrack metadata of previously selected cells; and repeat the requesting and the modifying for the subsequent cells. 8. The customized integrated circuit of claim 1 , further comprising a checkpoint store unit comprising circuitry that, during operation of the customized integrated circuit, enables the customized integrated circuit to either: receive scores, as generated by the calculation engine, of cells of the two-dimensional matrix and then discard those of cells other than checkpoint cells; or request scores, as generated by the calculation engine, only of checkpoint cells of the two-dimensional matrix; wherein checkpoint cells are cells of the two-dimensional matrix that comprise edges of mutually exclusive subsections of the two-dimensional matrix, the mutually exclusive subsections comprising both the first and second subsections. 9. The customized integrated circuit of claim 8 , wherein the mutually exclusive subsections of the two-dimensional matrix are each a same size, except for subsections at two edges of the two-dimensional matrix that are remnants left over after a rest of the two-dimensional matrix was subdivided into the mutually exclusive subsections of the same size. 10. The customized integrated circuit of claim 9 , wherein the remnant subsections are at an opposite side of the two-dimensional matrix from the first subsection initially stored by the backtrack metadata unit. 11. The customized integrated circuit of claim 1 , wherein the first backtrack metadata comprises a first two-digit binary value indicative of a directional assignment associated with the first cell, a first one-digit binary value indicative of whether the first cell is part of a deletion stretching across multiple contiguous cells, and a second one-digit binary value indicative of whether the first cell is part of an insertion stretching across multiple contiguous cells. 12. The customized integrated circuit of claim 1 , wherein the backtrack metadata unit comprises further circuitry that, during operation of the customized integrated circuit, enables the customized integrated circuit to either: receive backtrack metadata, as generated by the calculation engine, of cells of the two-dimensional matrix and then discard those of cells other than cells of the first subsection; or request scores, as generated by the calculation engine, only of cells of the first subsection. 13. The customized integrated circuit of claim 1 , further comprising the cache, the cache being an on-chip cache, the customized integrated circuit being packaged as a chip. 14. One or more computing devices comprising one or more processing units; and one or more computer-readable storage media comprising computer-executable instructions, which, when executed by the one or more processing units, cause the one or more computing devices to: generate, for individual cells of a two-dimensional matrix, scores that are based on generated scores of prior cells and pairs of nucleotides that correspond to the individual cells, wherein the pairs of nucleotides each comprise one nucleotide from each of a first nucleotide sequence and a second nucleotide sequence that are being compared to one another; generate backtrack metadata for the individual cells based at least in part on the generating of the scores; store, in a cache, backtrack metadata of cells of only a first subsection of the two-dimensional matrix; generate again backtrack metadata of cells of only a second subsection of the two-dimensional matrix, based on scores of cells of the two-dimensional matrix that comprise two edges of the second subsection, if a first cell, for which a first backtrack metadata is to be determined, is in the second subsection, the second subsection being mutually exclusive of the first subsection; obtain, from the regenerated backtrack metadata of the cells of only the second subsection, stored in the cache, the first backtrack metadata of the fi

Assignees

Inventors

Classifications

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • Physics · mapped topic

  • G16B30/10Primary

    Sequence alignment; Homology search · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10241970B2 cover?
Comparisons between two nucleotide sequences can be performed by customized integrated circuitry that can implement a Smith Waterman analysis in a reduced memory footprint, storing and referencing only individual portions, or subsections, of a two-dimensional matrix that is representative of the comparison between the two nucleotide sequences. As the backtracking proceeds, backtracking metadata…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).