Methods and systems for detecting genetic variants
US-2016046986-A1 · Feb 18, 2016 · US
US9483610B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9483610-B2 |
| Application number | US-201414158758-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 17, 2014 |
| Priority date | Jan 17, 2013 |
| Publication date | Nov 1, 2016 |
| Grant date | Nov 1, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system, method and apparatus for executing a sequence analysis pipeline on genetic sequence data includes an integrated circuit formed of a set of hardwired digital logic circuits that are interconnected by physical electrical interconnects. One of the physical electrical interconnects forms an input to the integrated circuit connected with an electronic data source for receiving reads of genomic data. The hardwired digital logic circuits are arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic circuits to perform one or more steps in the sequence analysis pipeline on the reads of genomic data. Each subset of the hardwired digital logic circuits is formed in a wired configuration to perform the one or more steps in the sequence analysis pipeline.
Opening claim text (preview).
What is claimed is: 1. An apparatus for executing a sequence analysis pipeline on a plurality of reads of genomic data, one or more genetic reference sequences, and an index of the one or more genetic reference sequences, each read of genomic data and each genetic reference sequence comprising a sequence of nucleotides, the system comprising: an integrated circuit formed of a set of pre-configured hardwired digital logic circuits that are interconnected by a plurality of physical electrical interconnects, one or more of the plurality of physical electrical interconnects comprising an input to the integrated circuit connected with an electronic data source for receiving the plurality of reads of genomic data, one or more of the plurality of physical electrical interconnects further comprising a memory interface for the integrated circuit to access a memory storing the plurality of reads of genomic data, the one or more genetic reference sequences, and the index of the one or more genetic reference sequences, the hardwired digital logic circuits being arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic circuits to perform at least one step in the sequence analysis pipeline on the plurality of reads of genomic data, the set of processing engines comprising: a mapping module in a first pre-configured hardwired configuration to access from the memory, according to at least some of the sequence of nucleotides in a selected read of the plurality of reads, the index of the one or more genetic reference sequences to map the selected read to one or more segments of the one or more genetic reference sequences based on the index; an alignment module in a second pre-configured hardwired configuration to access from the memory the one or more genetic reference sequences to align the selected read to one or more positions in the one or more segments of the one or more genetic reference sequences from the mapping module to produce one or more aligned reads; and a variant calling module in a third pre-configured hardwired configuration to access from the memory the one or more aligned reads and the one or more genetic reference sequences, compare the nucleotides in the aligned reads to the nucleotides of the one or more genetic reference sequences to determine one or more differences between the sequences of nucleotides in the one or more aligned reads and the sequence of nucleotides in the one or more genetic reference sequences, and generate one or more variant calls representing the one or more differences; and one or more of the plurality of physical electrical interconnects comprising an output from the integrated circuit for communicating result data from the mapping module and/or the alignment module and/or variant calling module. 2. The apparatus in accordance with claim 1 , wherein the index of the one or more genetic reference sequences further comprises a hash table, and wherein the mapping module applies a hash function to the at least some of the sequence of nucleotides to access the hash table of the index. 3. The apparatus in accordance with claim 2 , wherein the integrated circuit and the memory are housed on an expansion card. 4. The apparatus in accordance with claim 3 , wherein the expansion card is a peripheral component interconnect (PCI) card. 5. The apparatus in accordance with claim 4 , wherein the system further comprises a sequencer, the sequencer having the electronic data source that provides digital signals representing the plurality of reads of genomic data. 6. The apparatus in accordance with claim 5 , wherein the expansion card is physically integrated with the sequencer. 7. The apparatus in accordance with claim 1 , further comprising a cloud computing cluster having one or more servers, wherein the integrated circuit is housed in at least one of the one or more servers. 8. The apparatus in accordance with claim 7 , wherein the cloud computing cluster further comprises the electronic data source providing digital signals representing the plurality of reads of genomic data to the integrated circuit. 9. An apparatus for executing a sequence analysis pipeline on genetic sequence data, the genetic sequence data comprising one or more genetic reference sequences having one or more segments and one or more reads of genomic data, each read of genomic data and each genetic reference sequence comprising a sequence of nucleotides, the apparatus comprising: a memory storing the one or more reads of genomic data, the one or more genetic reference sequences, and an index of the one or more genetic reference sequences; and an integrated circuit comprising a set of pre-configured hardwired digital logic circuits that are interconnected by a plurality of physical electrical interconnects, at least one of the plurality of physical electrical interconnects comprising an input for receiving the one or more reads of genomic data, at least one of the plurality of physical electrical interconnects comprising a memory interface for the integrated circuit to access the memory, and at least one of the plurality of physical electrical interconnects comprising an output for providing result data, the set of pre-configured hardwired digital logic circuits of the integrated circuit to: access, from the memory via the memory interface, by a first hardwired digital logic circuit in a first hardwired configuration and according to at least some of the sequence of nucleotides in at least one read of the one or more reads of genomic data and the index of the one or more genetic reference sequences; map, by the first hardwired digital logic circuit, the at least some of the sequence of nucleotides in the at least one read of the one or more reads of genomic data to one or more segments of the one or more genetic reference sequences based on the index to produce at least one mapped read; access, from the memory via the memory interface, by a second hardwired digital logic circuit in a second hardwired configuration the one or more genetic reference sequences and the at least one mapped read; align, by the second hardwired digital logic circuit, the at least one mapped read to one or more positions in the one or more segments of the one or more genetic reference sequences to produce at least one aligned read; access, from the memory via the memory interface, by a third hardwired digital logic circuit in a third hardwired configuration the one or more genetic reference sequences and the at least one aligned read; and compare the nucleotides in the at least one aligned read to the nucleotides of the genetic reference sequence to determine one or more differences between the sequences of nucleotides in the at least one aligned read and the sequence of nucleotides in the genetic reference sequence, and generate one or more variant calls representing the one or more differences. 10. The apparatus in accordance with claim 9 , wherein the index of the one or more genetic reference sequences further comprises a hash table, and wherein the first hardwired digital logic circuit maps the at least some of the sequence of nucleotides in the at least one read of the one or more reads of genomic data to the one or more segments of the one or more genetic reference sequences by applying a hash function to the at least some of the sequence of nucleotides to access the hash table of the index. 11. The apparatus in accordance with claim 9 , wherein the integrated circuit comprises a field programmable gate array (FPGA) of the hardwired digital logic circuits. 12. The apparatus in accordance with claim 9 , wherein the memory and the integrated circuit a
Physics · mapped topic
Sequence alignment; Homology search · CPC title
Data warehousing; Computing architectures · CPC title
ICT programming tools or database systems specially adapted for bioinformatics · CPC title
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.