Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform

US10216898B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10216898-B2
Application numberUS-201715436435-A
CountryUS
Kind codeB2
Filing dateFeb 17, 2017
Priority dateJan 17, 2013
Publication dateFeb 26, 2019
Grant dateFeb 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, method and apparatus for executing a sequence analysis pipeline on genetic sequence data includes a structured ASIC formed of a set of hardwired digital logic circuits that are interconnected by physical electrical interconnects. One of the physical electrical interconnects forms an input to the structured ASIC connected with an electronic data source for receiving reads of genomic data. The hardwired digital logic circuits are arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic circuits to perform one or more steps in the sequence analysis pipeline on the reads of genomic data. Each subset of the hardwired digital logic circuits is formed in a wired configuration to perform the one or more steps in the sequence analysis pipeline.

First claim

Opening claim text (preview).

We claim as our invention the following: 1. A system for executing a sequence analysis on a plurality of reads of genomic data using an index of genetic reference data stored in a memory, each read of genomic data representing a sequence of nucleotides, the genetic reference data representing one or more genetic reference sequences, the system comprising: a cloud-based server; and an integrated circuit connected with the cloud-based server, the integrated circuit being formed of a set of pre-configured hardwired digital logic circuits that are interconnected by a plurality of physical electrical interconnects, one or more of the plurality of physical electrical interconnects comprising a memory interface for the integrated circuit to access the memory, the hardwired digital logic circuits being arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic circuits to perform one or more steps in the sequence analysis on the plurality of reads of genomic data, the set of processing engines comprising a mapping module in a first hardwired configuration to: receive a read of genomic data via one or more of the plurality of physical electrical interconnects; extract a portion of the read to generate a seed, the seed representing a subset of the sequence of nucleotides represented by the read; calculate a first address within the index based on the seed; access the address in the index in the memory; receive a record from the address, the record representing position information in the genetic reference sequence; determine, based on the record, one or more matching positions from the read to the genetic reference sequence; and output, to the cloud-based server, at least one of the matching positions to the memory via the memory interface. 2. The system according to claim 1 , wherein the integrated circuit is a field programmable gate array (FPGA). 3. The system according to claim 1 , wherein the integrated circuit is an application specific integrated circuit (ASIC). 4. The system according to claim 1 , wherein the mapping module is further configured to: calculate a second address within the index based on both of the record and of a second subset of the sequence of nucleotides that is not contained in the first subset of the sequence of nucleotides; access the second address in the index in the memory; receive a second record from the second address, the second record or a subsequent record comprising position information in the genetic reference sequence; further determine, based on the position information, the one or more matching positions from the read to the genetic reference sequence. 5. The system according to claim 1 , wherein the set of processing engines of the integrated circuit further comprises a an alignment module in a second pre-configured hardwired configuration to access the genetic reference data from the memory via the memory interface to align the received read to one or more positions in the genetic reference sequence from the mapping module. 6. The system according to claim 5 , wherein the set of processing engines of the integrated circuit further comprises a sorting module in a third pre-configured hardwired configuration to sort each aligned read according to the one or more positions in the genetic reference sequence. 7. The system according to claim 1 , wherein the index of genetic reference data further comprises a hash table, and wherein the mapping module applies a hash function to the at least some of the sequence of nucleotides to access the hash table of the index. 8. A system for mapping a plurality of reads of genomic data to genetic reference sequence, using an index of genetic reference data stored in a memory, each read of genomic data representing a sequence of nucleotides, the genetic reference data representing at least a portion of the genetic reference sequence, the system comprising: a cloud-based server; and a mapping module connected with the cloud-based server, and being formed of a set of a first set of pre-configured hardwired digital logic circuits that are interconnected by a first plurality of physical electrical interconnects, one or more of the first plurality of physical electrical interconnects comprising a memory interface for the mapping module to access the memory, the hardwired digital logic circuits being in a first pre-configured wired configuration to: receive a read of genomic data via the memory interface; extract a portion of the read to generate a seed, the seed representing a first subset of the sequence of nucleotides represented by the read; calculate an address within the index based on the seed; access the address in the index in the memory; receive a record from the address; determine, based on the record, one or more matching positions from the read to the genetic reference sequence; and output, to the cloud-based server, at least one of the matching positions to the memory via the memory interface. 9. The system according to claim 8 , wherein the pre-configured hardwired digital logic circuits are present on a field programmable gate array (FPGA). 10. The system according to claim 9 , wherein the pre-configured hardwired digital logic circuits are present on an application specific integrated circuit (ASIC). 11. The system according to claim 8 , wherein the mapping module is further configured to: calculate a second address within the index based on both of the record and of a second subset of the sequence of nucleotides that is not contained in the first subset of the sequence of nucleotides; access the second address in the index in the memory; receive a second record from the second address, the second record or a subsequent record comprising position information in the genetic reference sequence; further determine, based on the position information, the one or more matching positions from the read to the genetic reference sequence. 12. The system according to claim 8 , further comprising an alignment module comprising a second set of pre-configured hardwired digital logic circuits that are interconnected by a second plurality of physical electrical interconnects, one or more of the second plurality of physical electrical interconnects comprising a memory interface for the alignment module to access the memory, the hardwired digital logic circuits being in a second pre-configured wired configuration to access the genetic reference data from the memory via the memory interface to align the received read to one or more positions in the genetic reference sequence from the mapping module. 13. The system according to claim 12 , further comprising a sorting module comprising a third set of pre-configured hardwired digital logic circuits that are interconnected by a third plurality of physical electrical interconnects, one or more of the third plurality of physical electrical interconnects comprising a memory interface for the sorting module to access the memory, the hardwired digital logic circuits being in a third pre-configured wired configuration to sort each aligned read according to the one or more positions in the genetic reference sequence. 14. The system according to claim 8 , wherein the index of genetic reference data further comprises a hash table, and wherein the mapping module applies a hash function to the at least some of the sequence of nucleotides to access the hash table of the index. 15. A system for executing a sequence analysis on a plurality of reads of genomic data using genetic reference data stored in a memory, each read of genomic data rep

Assignees

Inventors

Classifications

  • G16B50/00Primary

    ICT programming tools or database systems specially adapted for bioinformatics · CPC title

  • ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title

  • for access to memory bus (G06F13/28 takes precedence) · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • on a point to point bus (G06F13/4247, G06F13/4282 take precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10216898B2 cover?
A system, method and apparatus for executing a sequence analysis pipeline on genetic sequence data includes a structured ASIC formed of a set of hardwired digital logic circuits that are interconnected by physical electrical interconnects. One of the physical electrical interconnects forms an input to the structured ASIC connected with an electronic data source for receiving reads of genomic da…
Who is the assignee on this patent?
Edico Genome Corp
What technology area does this patent fall under?
Primary CPC classification G16B50/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).