Who is the assignee on this patent?

Univ Indiana Res & Tech Corp, Indiana Univ Research & Technology Corp

What technology area does this patent fall under?

Primary CPC classification G06F19/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 29 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Secure and scalable mapping of human sequencing reads on hybrid clouds

US10192029B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10192029-B2
Application number	US-201514984109-A
Country	US
Kind code	B2
Filing date	Dec 30, 2015
Priority date	May 13, 2011
Publication date	Jan 29, 2019
Grant date	Jan 29, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

System and methods are provided for performing privacy-preserving, high-performance, and scalable DNA read mapping on hybrid clouds including a public cloud and a private cloud. The systems and methods offer strong privacy protection and have the capacity to process millions of reads and allocate most of the workload to the public cloud at a small overall cost. The systems and methods perform seeding on the public cloud using keyed hash values of individual sequencing reads' seeds and then extend matched seeds on the private cloud. The systems and methods are designed to move the workload of read mapping from the extension stage to the seeding stage, thereby ensuring that the dominant portion of the overhead is shouldered by the public cloud.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of mapping a plurality of DNA sequence reads to a reference genome, the method comprising: partitioning each of the plurality of DNA sequence reads into a plurality of seeds using computing resources of a private cloud; combining at least two seeds of the plurality of seeds to generate a combined seed using the private cloud computing resources; encrypting, by the private cloud computing resources, the combined seed using a keyed encryption algorithm to produce a keyed-hash value of the combined seed; transmitting the keyed hash value representing the combined seed from the private cloud computing resources to computing resources of a public cloud, wherein the keyed hash value is usable to search against a plurality of keyed hash values derived from a reference genome; receiving, by the private cloud computing resources, from the public cloud computing resources, data indicating positions where the reference genome matches the at least two seeds of the combined seed; and extending, using the private cloud computing resources, each of the at least two seeds at each of the positions where the reference genome matches the at least two seeds of the combined seed to determine whether the DNA sequence read corresponding to each of the at least two seeds aligns with the reference genome at that position. 2. The method of claim 1 , further comprising: dividing the reference genome into a plurality of substrings using the private cloud computing resources, each of the plurality of substrings and each of the plurality of seeds being of equal length; encrypting, by the private cloud computing resources, each unique substring of the plurality of substrings using a keyed encryption algorithm to produce a corresponding keyed-hash value for each unique substring of the plurality of substrings; and transmitting each of the corresponding keyed-hash values representing each of the unique substrings from the private cloud computing resources to the public cloud computing resources. 3. The method of claim 2 , further comprising: comparing the encrypted data representing the combined seed to groups of the encrypted data representing the unique substrings using the public cloud computing resources; and transmitting data indicating which group of the substrings matches the at least two seeds of the combined seed from the public cloud computing resources to the private cloud computing resources. 4. The method of claim 1 , wherein extending each of the at least two seeds at each of the positions where the reference genome matches the at least two seeds of the combined seed comprises determining whether the DNA sequence read corresponding to each of the at least two seeds matches the reference genome at the corresponding position with an edit distance less than or equal to an integer d. 5. The method of claim 4 , wherein partitioning each of the plurality of DNA sequence reads comprises partitioning each of the plurality of DNA sequence reads into (d+1) seeds. 6. The method of claim 5 , wherein each of the plurality of seeds is twenty or more base pairs in length. 7. The method of claim 4 , wherein partitioning each of the plurality of DNA sequence reads comprises partitioning each of the plurality of DNA sequence reads into (d+2) seeds. 8. The method of claim 7 , wherein each of the plurality of seeds is between ten and twenty base pairs in length. 9. One or more non-transitory, computer-readable media comprising a first plurality of instructions that, when executed by a first plurality of processors of a private cloud, causes the processors of the private cloud to: partition each of a plurality of DNA sequence reads into a plurality of seeds; combine at least two seeds of the plurality of seeds to generate a combined seed; encrypt the combined seed using a keyed encryption algorithm to produce a keyed-hash value of the combined seed, wherein the keyed hash value is usable to search against a plurality of keyed hash values derived from a reference genome; transmit the keyed hash value representing the combined seed to computing resources of a public cloud; receive, from the computing resources of the public cloud, data indicating positions where a reference genome matches the at least two seeds of the combined seed; and extend each of the at least two seeds at each of the positions where the reference genome matches the at least two seeds of the combined seed to determine whether the DNA sequence read corresponding to each of the at least two seeds aligns with the reference genome at that position. 10. The one or more non-transitory, computer-readable media of claim 9 , wherein the first plurality of instructions, when executed by the first plurality of processors, further causes the processors of the private cloud to: divide the reference genome into a plurality of substrings using the processors of the private cloud computing resources, each of the plurality of substrings and each of the plurality of seeds being of equal length; encrypt, by the processors of the private cloud computing, each unique substring of the plurality of substrings using a keyed encryption algorithm to produce a corresponding keyed-hash value for each unique substring of the plurality of substrings; and transmit each of the corresponding keyed-hash values representing each of the unique substrings from the processors of the private cloud computing to the public cloud computing resources. 11. The one or more non-transitory, computer-readable media of claim 10 , further comprising a second plurality of instructions that, when executed by a second plurality of processors of the public cloud, causes the processors of the public cloud to: compare the encrypted data representing the combined seed to groups of the encrypted data representing the unique substrings using the public cloud computing resources; and transmit data indicating which group of the substrings matches the at least two seeds of the combined seed from the public cloud computing resources to the processors of the private cloud computing. 12. The one or more non-transitory, computer-readable media of claim 9 , wherein the first plurality of instructions, when executed by the first plurality of processors, causes the processors of the private cloud to determine whether the DNA sequence read corresponding to each of the at least two seeds matches the reference genome at the corresponding position with an edit distance less than or equal to an integer d. 13. The one or more non-transitory, computer-readable media of claim 12 , wherein the first plurality of instructions, when executed by the first plurality of processors, causes the processors of the private cloud to partition each of the plurality of DNA sequence reads into (d+1) seeds. 14. The one or more non-transitory, computer-readable media of claim 13 , wherein the first plurality of instructions, when executed by the first plurality of processors, causes the processors of the private cloud to partition each of the plurality of DNA sequence reads into (d+1) seeds that are each twenty or more base pairs in length. 15. One or more non-transitory, computer-readable media comprising a plurality of instructions that, when executed by a plurality of processors of a private cloud, causes the processors of the private cloud to: partition a DNA sequence read into (d+2) seeds, where d is an integer; combine at least two seeds of the (d+2) seeds to generate a combined seed; encrypt the combined seed using a keyed encryption algorithm to produce a keyed-hash value of the combined seed, wherein the keyed hash v

Assignees

Inventors

Classifications

G06F19/22Primary
Physics · mapped topic
H04L63/0428
wherein the data content is protected, e.g. by encrypting or encapsulating the payload · CPC title
H04L9/0637
Modes of operation, e.g. cipher block chaining [CBC], electronic codebook [ECB] or Galois/counter mode [GCM] · CPC title
H04L9/3239
involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD · CPC title
G06F19/28
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 47177298

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10192029B2 cover?: System and methods are provided for performing privacy-preserving, high-performance, and scalable DNA read mapping on hybrid clouds including a public cloud and a private cloud. The systems and methods offer strong privacy protection and have the capacity to process millions of reads and allocate most of the workload to the public cloud at a small overall cost. The systems and methods perform s…
Who is the assignee on this patent?: Univ Indiana Res & Tech Corp, Indiana Univ Research & Technology Corp
What technology area does this patent fall under?: Primary CPC classification G06F19/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 29 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).