Methods and computer program products for compression of sequencing data

US9864846B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9864846-B2
Application numberUS-201314018069-A
CountryUS
Kind codeB2
Filing dateSep 4, 2013
Priority dateJan 31, 2012
Publication dateJan 9, 2018
Grant dateJan 9, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A compression method includes: measuring a waveform associated with a chemical event occurring on a sensor array, wherein the waveform comprises a plurality of measured values and the chemical event is indicative of a number of nucleotide incorporations in a genetic sequencing reaction; applying a first compression process to the waveform, the first compression process including a truncating of data corresponding to a portion of the waveform that is not related to nucleotide incorporations in the genetic sequencing reaction; and applying a second compression process to the waveform, the second compression process including a data substitution process that replaces at least a portion of the waveform with a plurality of coefficients representative of the portion of the waveform.

First claim

Opening claim text (preview).

The invention claimed is: 1. A compression method, comprising: measuring a waveform associated with a chemical event occurring on a sensor array, wherein the waveform comprises a plurality of measured values and the chemical event is indicative of a number of nucleotide incorporations in a genetic sequencing reaction; and applying a first compression process to the waveform using a processor, the first compression process including a truncating of data corresponding to a portion of the waveform that is not related to nucleotide incorporations in the genetic sequencing reaction thereby forming a compressed data structure and storing the compressed data structure in a memory. 2. The method of claim 1 , wherein the truncating of data comprises determining, for each of a plurality of sensors in the sensor array, a cut-off time point for the waveform for that sensor defining a data range to be truncated. 3. The method of claim 2 , wherein each cut-off time point is determined by mining a plurality of past analysis runs for a given sensor array geometry. 4. The method of claim 2 , wherein each cut-off time point is determined prior to every run or during a calibration procedure. 5. The method of claim 2 , wherein each cut-off time point is factory pre-determined for a given sensor array geometry. 6. The method of claim 2 , wherein the sensors are arranged in a plurality of regions in the sensor array, and wherein each cut-off time point for sensors in a given region is determined to have a common cut-off time point determined for sensors for that region. 7. The method of claim 6 , wherein each common cut-off time point is determined by finding a best fit to a linear hinge model on a median trace for a region. 8. The method of claim 6 , wherein each common cut-off time point is determined empirically and depends on a position of that region relative to other regions along a fluidic flow of nucleotides onto the sensor array. 9. The method of claim 8 , wherein the cut-off time point of sensors in a region substantially near a fluidic inlet is different from the cut-off time point of sensors in a region substantially near a fluidic outlet. 10. The method of claim 1 , further comprising: applying a second compression process to the waveform using the processor, the second compression process including a data substitution process that replaces at least a second portion of the waveform with a plurality of coefficients representative of the second portion of the waveform, wherein the second portion is related to nucleotide incorporations in the genetic sequencing reaction. 11. The method of claim 10 , wherein the second compression process includes replacing the second portion of the waveform with a plurality of coefficients of a linear combination of one or more principal component vectors representative of the second portion of the waveform. 12. The method of claim 11 , wherein the second compression process includes storing the plurality of coefficients compactly in the memory by dynamically truncating the coefficients to a lower precision and encoding the truncated coefficients using a Huffman code. 13. The method of claim 11 , wherein the second compression process includes replacing the second portion of the waveform with a plurality of coefficients of a linear combination of between about 5 and about 10 principal component vectors representative of the second portion of the waveform. 14. The method of claim 11 , wherein the second compression process includes replacing the second portion of the waveform with a plurality of coefficients of a linear combination of 5 or 6 principal component vectors representative of the second portion of the waveform. 15. The method of claim 1 , wherein the measuring the waveform comprises measuring the waveform of a dynamic response of an ion-sensitive field effect transistor (ISFET) array to a change in ionic strength of an analyte solution in fluid contact with the ISFET array, wherein the measuring the waveform of the dynamic response of the ISFET array comprises associating a portion of the waveform to a stepwise increase in ion concentration in the analyte solution and associating another portion of the waveform to at least one portion of the dynamic response outside of the stepwise increase in ion concentration. 16. A computer program product comprising a non-transitory computer-usable medium having computer program logic recorded thereon that, when executed by one or more processors, samples and compresses data from a sensor array, the computer program logic comprising: first computer readable program code that enables a processor to measure a waveform associated with a chemical event occurring on a sensor array, wherein the waveform comprises a plurality of measured values and the chemical event is indicative of a number of nucleotide incorporations in a genetic sequencing reaction; and second computer readable program code that enables a processor to apply a first compression process to the waveform, the first compression process including a truncating of data corresponding to a portion of the waveform that is not related to nucleotide incorporations in the genetic sequencing reaction thereby forming a compressed data structure and storing the compressed data structure in a memory. 17. The computer program product of claim 16 , further comprising third computer readable program code that enables a processor to apply a second compression process to the waveform, the second compression process including replacing at least a second portion of the waveform with a plurality of coefficients representative of the second portion of the waveform, wherein the second portion of the waveform is related to nucleotide incorporations in the genetic sequencing reaction. 18. A method for compressing nucleic acid sequencing data, comprising: obtaining raw data from a semiconductor-based sequencing sensor array comprising a plurality of sensors during a data acquisition time period, the raw data comprising at least a non-informative portion corresponding to a subinterval of the data acquisition time period having a location within the data acquisition time period that varies for different sensors according to a position of the sensor in the sensor array; and transforming the raw data into compressed data using a lossy compression process including a data truncation process, the data truncation process being related for each sensor to the position of the sensor in the sensor array and configured to discard the non-informative portion of the raw data thereby forming a compressed data structure and storing the compressed data structure in a memory. 19. The method of claim 18 , wherein the lossy compression process further comprises a data substitution process adapted to replace at least a portion of the raw data for each sensor with a plurality of coefficients of a linear combination of one or more principal component vectors representative of the portion of the raw data for each sensor. 20. The method of claim 19 , wherein the data substitution process comprises storing the plurality of coefficients compactly in the memory by dynamically truncating the coefficients to a lower precision and encoding the truncated coefficients using a Huffman code.

Assignees

Inventors

Classifications

  • H03M7/30Primary

    Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title

  • G06F19/702Primary

    Physics · mapped topic

  • Physics · mapped topic

  • G16C20/10Primary

    Analysis or design of chemical reactions, syntheses or processes · CPC title

  • Programming languages; Computing architectures; Database systems; Data warehousing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9864846B2 cover?
A compression method includes: measuring a waveform associated with a chemical event occurring on a sensor array, wherein the waveform comprises a plurality of measured values and the chemical event is indicative of a number of nucleotide incorporations in a genetic sequencing reaction; applying a first compression process to the waveform, the first compression process including a truncating of…
Who is the assignee on this patent?
Life Technologies Corp
What technology area does this patent fall under?
Primary CPC classification H03M7/30. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jan 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).