Managing gene sequences

US10586609B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10586609-B2
Application numberUS-201514926051-A
CountryUS
Kind codeB2
Filing dateOct 29, 2015
Priority dateOct 30, 2014
Publication dateMar 10, 2020
Grant dateMar 10, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and apparatus for determining similarity among gene sequences, for compressing a gene sequence, and for decompressing a gene sequence. The method for determining similarity between a first gene sequence and a second gene sequence includes: moving a sliding window of a predefined length on the first gene sequence and the second gene sequence respectively; extracting a first part String 1 i of the first gene sequence within the sliding window, and a second part String 2 i of the second gene sequence within the sliding window during the i th movement of the sliding window; and determining similarity between the first gene sequence and the second gene sequence based on the first part String 1 i and the second part String 2 i . Also provided is an apparatus for the above method.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for determining similarity between a first gene sequence of a first sample and a second gene sequence of a second sample, wherein the first sample is taken from a first organism and the second sample is taken from a second organism, the method comprising: moving a sliding window having a predefined length on the first gene sequence; moving the sliding window on the second gene sequence, simultaneously with moving the sliding window on the first gene sequence; extracting a first part String 1 i of the first gene sequence that is present within the sliding window during an i th movement of the sliding window; extracting a second part String 2 i of the second gene sequence that is present within the sliding window during the i th movement of the sliding window; determining the similarity between the first gene sequence and the second gene sequence based on a similarity between the first part String 1 i and the second part String 2 i , thereby identifying the similarity of the first gene sequence and second gene sequence; selecting the first gene sequence as a reference gene sequence for the second gene sequence, based on the similarity between the first gene sequence and the second gene sequence meeting a predefined threshold; storing the reference gene sequence in a memory device; and storing the second gene sequence in the memory device as an identifier for the reference gene sequence plus a difference between the reference gene sequence and the second gene sequence, thereby minimizing an amount of data from the second gene sequence that is stored in the memory device. 2. The computer-implemented method according to claim 1 , wherein moving the sliding window on the first gene sequence and moving the sliding window on the second gene sequence further comprises: moving the sliding window at a stepsize that is less than or equal to the predefined length. 3. The computer-implemented method according to claim 1 , wherein determining the similarity between the first gene sequence and the second gene sequence based on the similarity between the first part String 1 i and the second part String 2 i during the i th movement of the sliding window, comprises: calculating local similarity, similarity i , between the first part String 1 i and the second part String 2 i ; and determining the similarity between the first gene sequence and the second gene sequence based on the local similarity, similarity i . 4. The computer-implemented method according to claim 3 , wherein calculating local similarity, similarity i , between the first part String 1 i and the second part String 2 i comprises: calculating the local similarity, similarity i , based on an edit distance, d i , between the first part String 1 i and the second part String 2 i . 5. The computer-implemented method according to claim 3 , wherein determining similarity between the first gene sequence and the second gene sequence based on the local similarity, similarity i , comprises: calculating the similarity between the first gene sequence and the second gene sequence based on a formula similarity=Σ i=1 N similarity i , wherein N is the number of movements of the sliding window. 6. An apparatus for determining similarity between a first gene sequence of a first sample and a second gene sequence of a second sample, wherein the first sample is taken from a first organism and the second sample is taken from a second organism, comprising: a processor device; and a memory communicatively coupled to the processor device, the memory storing a program product that, when executed by the processor device, causes the processor device to carry out steps for determining a similarity between a first gene sequence and a second gene sequence, the steps comprising: moving a sliding window of a predefined length on the first gene sequence; moving the sliding window on the second gene sequence, simultaneously with moving the sliding window on the first gene sequence; extracting a first part String 1 i of the first gene sequence that is present within the sliding window during an i th movement of the sliding window; extracting a second part String 2 i of the second gene sequence that is present within the sliding window during the i th movement of the sliding window; determining the similarity between the first gene sequence and the second gene sequence based on a similarity between the first part String 1 i and the second part String 2 i ; selecting the first gene sequence as a reference gene sequence for the second gene sequence, based on the similarity between the first gene sequence and the second gene sequence meeting a predefined threshold; storing the reference gene sequence in the memory; and storing the second gene sequence in the memory as an identifier for the reference gene sequence plus a difference between the reference gene sequence and the second gene sequence, thereby minimizing an amount of data from the second gene sequence that is stored in the memory. 7. The apparatus according to claim 6 , wherein the moving the sliding window on the first gene sequence and the moving the sliding window on the second gene sequence further comprises: moving the sliding window at a stepsize that is less than or equal to the predefined length. 8. The apparatus according to claim 6 , wherein the determining step of the computer-implemented method further comprises: calculating local similarity, similarity i , between the first part String 1 i and the second part String 2 i during the i th movement of the sliding window; and determining the similarity between the first gene sequence and the second gene sequence based on the local similarity, similarity i . 9. The apparatus according to claim 8 , wherein the calculating step of the computer-implemented method further comprises: calculating the local similarity, similarity i , based on an edit distance, d i , between the first part String 1 i and the second part String 2 i . 10. The apparatus according to claim 8 , wherein the calculating step of the computer-implemented method further comprises: calculating the similarity between the first gene sequence and the second gene sequence based on a formula similarity=Σ i=1 N similarity i , wherein N is the number of movements of the sliding window. 11. The apparatus according to claim 7 , wherein the i th movement of the sliding window is one of a plurality of movements of the sliding window, and wherein each movement of the plurality of movements is made at the same stepsize. 12. The apparatus according to claim 11 , wherein the stepsize comprises a number of characters of the first gene sequence and of the second gene sequence by which the sliding window is shifted. 13. The apparatus according to claim 6 , wherein the computer-implemented method further comprises: extracting a third part of the first gene sequence that is present within the sliding window during an i+1 th movement of the sliding window, wherein the third part of the first gene sequence contains a different character string from the first part of the first gene sequence; extracting a fourth part of the second gene sequence that is present within the sliding window during the i+1 th movement of the sliding window, wherein the fourth part of the second gene sequence contains a different character string from the second part of the second gene sequence, wherein the determining the similarity between the first gene sequence and the second gene sequence is further based on a similarity between the third part of the firs

Assignees

Inventors

Classifications

  • G16B30/00Primary

    ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10586609B2 cover?
A method and apparatus for determining similarity among gene sequences, for compressing a gene sequence, and for decompressing a gene sequence. The method for determining similarity between a first gene sequence and a second gene sequence includes: moving a sliding window of a predefined length on the first gene sequence and the second gene sequence respectively; extracting a first part String …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G16B30/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).