Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform

US9483610B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9483610-B2
Application numberUS-201414158758-A
CountryUS
Kind codeB2
Filing dateJan 17, 2014
Priority dateJan 17, 2013
Publication dateNov 1, 2016
Grant dateNov 1, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, method and apparatus for executing a sequence analysis pipeline on genetic sequence data includes an integrated circuit formed of a set of hardwired digital logic circuits that are interconnected by physical electrical interconnects. One of the physical electrical interconnects forms an input to the integrated circuit connected with an electronic data source for receiving reads of genomic data. The hardwired digital logic circuits are arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic circuits to perform one or more steps in the sequence analysis pipeline on the reads of genomic data. Each subset of the hardwired digital logic circuits is formed in a wired configuration to perform the one or more steps in the sequence analysis pipeline.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus for executing a sequence analysis pipeline on a plurality of reads of genomic data, one or more genetic reference sequences, and an index of the one or more genetic reference sequences, each read of genomic data and each genetic reference sequence comprising a sequence of nucleotides, the system comprising: an integrated circuit formed of a set of pre-configured hardwired digital logic circuits that are interconnected by a plurality of physical electrical interconnects, one or more of the plurality of physical electrical interconnects comprising an input to the integrated circuit connected with an electronic data source for receiving the plurality of reads of genomic data, one or more of the plurality of physical electrical interconnects further comprising a memory interface for the integrated circuit to access a memory storing the plurality of reads of genomic data, the one or more genetic reference sequences, and the index of the one or more genetic reference sequences, the hardwired digital logic circuits being arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic circuits to perform at least one step in the sequence analysis pipeline on the plurality of reads of genomic data, the set of processing engines comprising: a mapping module in a first pre-configured hardwired configuration to access from the memory, according to at least some of the sequence of nucleotides in a selected read of the plurality of reads, the index of the one or more genetic reference sequences to map the selected read to one or more segments of the one or more genetic reference sequences based on the index; an alignment module in a second pre-configured hardwired configuration to access from the memory the one or more genetic reference sequences to align the selected read to one or more positions in the one or more segments of the one or more genetic reference sequences from the mapping module to produce one or more aligned reads; and a variant calling module in a third pre-configured hardwired configuration to access from the memory the one or more aligned reads and the one or more genetic reference sequences, compare the nucleotides in the aligned reads to the nucleotides of the one or more genetic reference sequences to determine one or more differences between the sequences of nucleotides in the one or more aligned reads and the sequence of nucleotides in the one or more genetic reference sequences, and generate one or more variant calls representing the one or more differences; and one or more of the plurality of physical electrical interconnects comprising an output from the integrated circuit for communicating result data from the mapping module and/or the alignment module and/or variant calling module. 2. The apparatus in accordance with claim 1 , wherein the index of the one or more genetic reference sequences further comprises a hash table, and wherein the mapping module applies a hash function to the at least some of the sequence of nucleotides to access the hash table of the index. 3. The apparatus in accordance with claim 2 , wherein the integrated circuit and the memory are housed on an expansion card. 4. The apparatus in accordance with claim 3 , wherein the expansion card is a peripheral component interconnect (PCI) card. 5. The apparatus in accordance with claim 4 , wherein the system further comprises a sequencer, the sequencer having the electronic data source that provides digital signals representing the plurality of reads of genomic data. 6. The apparatus in accordance with claim 5 , wherein the expansion card is physically integrated with the sequencer. 7. The apparatus in accordance with claim 1 , further comprising a cloud computing cluster having one or more servers, wherein the integrated circuit is housed in at least one of the one or more servers. 8. The apparatus in accordance with claim 7 , wherein the cloud computing cluster further comprises the electronic data source providing digital signals representing the plurality of reads of genomic data to the integrated circuit. 9. An apparatus for executing a sequence analysis pipeline on genetic sequence data, the genetic sequence data comprising one or more genetic reference sequences having one or more segments and one or more reads of genomic data, each read of genomic data and each genetic reference sequence comprising a sequence of nucleotides, the apparatus comprising: a memory storing the one or more reads of genomic data, the one or more genetic reference sequences, and an index of the one or more genetic reference sequences; and an integrated circuit comprising a set of pre-configured hardwired digital logic circuits that are interconnected by a plurality of physical electrical interconnects, at least one of the plurality of physical electrical interconnects comprising an input for receiving the one or more reads of genomic data, at least one of the plurality of physical electrical interconnects comprising a memory interface for the integrated circuit to access the memory, and at least one of the plurality of physical electrical interconnects comprising an output for providing result data, the set of pre-configured hardwired digital logic circuits of the integrated circuit to: access, from the memory via the memory interface, by a first hardwired digital logic circuit in a first hardwired configuration and according to at least some of the sequence of nucleotides in at least one read of the one or more reads of genomic data and the index of the one or more genetic reference sequences; map, by the first hardwired digital logic circuit, the at least some of the sequence of nucleotides in the at least one read of the one or more reads of genomic data to one or more segments of the one or more genetic reference sequences based on the index to produce at least one mapped read; access, from the memory via the memory interface, by a second hardwired digital logic circuit in a second hardwired configuration the one or more genetic reference sequences and the at least one mapped read; align, by the second hardwired digital logic circuit, the at least one mapped read to one or more positions in the one or more segments of the one or more genetic reference sequences to produce at least one aligned read; access, from the memory via the memory interface, by a third hardwired digital logic circuit in a third hardwired configuration the one or more genetic reference sequences and the at least one aligned read; and compare the nucleotides in the at least one aligned read to the nucleotides of the genetic reference sequence to determine one or more differences between the sequences of nucleotides in the at least one aligned read and the sequence of nucleotides in the genetic reference sequence, and generate one or more variant calls representing the one or more differences. 10. The apparatus in accordance with claim 9 , wherein the index of the one or more genetic reference sequences further comprises a hash table, and wherein the first hardwired digital logic circuit maps the at least some of the sequence of nucleotides in the at least one read of the one or more reads of genomic data to the one or more segments of the one or more genetic reference sequences by applying a hash function to the at least some of the sequence of nucleotides to access the hash table of the index. 11. The apparatus in accordance with claim 9 , wherein the integrated circuit comprises a field programmable gate array (FPGA) of the hardwired digital logic circuits. 12. The apparatus in accordance with claim 9 , wherein the memory and the integrated circuit a

Assignees

Inventors

Classifications

  • G06F19/22Primary

    Physics · mapped topic

  • G16B30/10Primary

    Sequence alignment; Homology search · CPC title

  • Data warehousing; Computing architectures · CPC title

  • ICT programming tools or database systems specially adapted for bioinformatics · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9483610B2 cover?
A system, method and apparatus for executing a sequence analysis pipeline on genetic sequence data includes an integrated circuit formed of a set of hardwired digital logic circuits that are interconnected by physical electrical interconnects. One of the physical electrical interconnects forms an input to the integrated circuit connected with an electronic data source for receiving reads of gen…
Who is the assignee on this patent?
Edico Genome Corp, Edico Genome Corp
What technology area does this patent fall under?
Primary CPC classification G06F19/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).