Methods and Compositions for Assessing Lung Grafts
US-2015377904-A1 · Dec 31, 2015 · US
US9845552B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9845552-B2 |
| Application number | US-201214354528-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 18, 2012 |
| Priority date | Oct 27, 2011 |
| Publication date | Dec 19, 2017 |
| Grant date | Dec 19, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are methods and tools for rapidly aligning reads to a reference sequence. These methods and tools employ Bloom filters or similar set membership testers to perform the alignment. The reads may be short sequences of nucleic acids or other biological molecules and the reference sequences may be sequences of genomes, chromosomes, etc. The Bloom filters include a collection of hash functions, a bit array, and associated logic for applying reads to the filter. Each filter, and there may be multiple of these used in a particular application, is used to determine whether an applied read is present in a reference sequence. Each Bloom filter is associated with a single reference sequence such as the sequence of a particular chromosome. In one example, chromosomal abundance is determined by aligning reads from a sequencer to multiple chromosomes, each having an associated Bloom filter or other set membership tester.
Opening claim text (preview).
What is claimed is: 1. A method, implemented on a computer system comprising one or more processors and system memory, for detecting copy number variations, the method comprising: (a) receiving, by the computer system, a plurality of reads obtained from a sample; (b) providing, on the computer system, a plurality of One Read Bloom filters corresponding to a plurality of regions of a genome and a plurality of Multiple Read Bloom filters corresponding to the plurality of regions of the genome, wherein each Bloom filter comprises a bit array, one or more hash functions, and logic for applying reads to the Bloom filter, each One Read Bloom filter was constructed using approximately read-sized sequences in its corresponding region of the genome, and each Multiple Read Bloom filter was constructed using approximately read-sized sequences found more than once in its corresponding region of the genome; (c) applying, by the one or more processors, each read of the plurality of reads to each One Read Bloom filter to determine a membership of each read in each One Read Bloom filter, wherein applying a read to a Bloom filter comprises: providing the read as an input to each hash function of the one or more hash functions of the Bloom filter, obtaining an output value from each hash function and the read, wherein each output value is associated with a bit position in the bit array of the Bloom filter, and determining that the read is a member of the Bloom filter based on bit values of the bit array at bit positions associated with output values obtained from the one or more hash functions and the read; (d) applying, by the one or more processors, each read of the plurality of reads to each Multiple Read Bloom filter to determine a membership of each read in each Multiple Read Bloom filter; (e) determining, based on the memberships determined in (c) and (d) and by the one or more processors, which one or more regions of the plurality of regions the reads are aligned to; (f) determining, by the one or more processors, from a number of reads aligned to each of the plurality of regions of the genome, read abundance values of the plurality of regions of the genome; (g) comparing, on a region-to-region basis and by the one or more processors, a read abundance value of each region of the plurality of regions of the genome to a threshold number to produce one or more statistical values indicating aberrations of read abundance in one or more regions of the plurality of regions; and (h) making, based on the one or more statistical values, one or more detection calls of copy number variation in one or more of the plurality of regions of the genome. 2. The method of claim 1 , wherein determining the read abundance values of the plurality of regions comprises excluding a read from any of the plurality of regions when the read is a member of two or more filters of the plurality of One Read Bloom filters. 3. The method of claim 1 , wherein the plurality of regions of the genome corresponds to a plurality of chromosomes of an organism, and the copy number variation comprises a chromosomal aneuploidy. 4. The method of claim 1 , wherein the sample comprises a mixture of genomes. 5. The method of claim 4 , wherein the sample comprises cells taken from a pregnant individual. 6. The method of claim 1 , wherein at least one of the Bloom filters comprises 9 or 10 hash functions. 7. The method of claim 6 , wherein the hash functions require at most about 5 machine instructions to hash a character. 8. The method of claim 1 , wherein at least one of the Bloom filters comprises a bit array having between about 1.5×10 10 to 8.5×10 11 bit positions. 9. The method of claim 1 , wherein at least one of the Bloom filters has a false positive probability of at most about 0.00001. 10. The method of claim 1 , wherein the plurality of regions of a genome are portions of chromosomes, and the copy number variation comprises a partial chromosomal aneuploidy. 11. The method of claim 1 , further comprising applying the plurality of reads to an exclusion region Bloom filter to determine whether any reads should be excluded from alignment to any regions. 12. The method of claim 1 , wherein at least one Multiple Read Bloom filter of the plurality of Multiple Read Bloom filters was constructed using repeated sequences. 13. The method of claim 12 , wherein the repeated sequences are located in the at least one Multiple Read Bloom filter's corresponding region of the genome. 14. The method of claim 1 , wherein at least one filter of the plurality of One Read Bloom filters was constructed using approximately read-sized sequences in its corresponding region of the genome but not in one or more exclusion regions in its corresponding region of the genome. 15. The method of claim 14 , wherein at least one Multiple Read Bloom filter of the plurality of Multiple Read Bloom filters was constructed using approximately read-sized sequences in the one or more exclusion regions, as well as approximately read-sized sequences found more than once in its corresponding region of the genome. 16. The method of claim 1 , wherein (e) comprises determining a read is aligned to a region when the read is a member of a One Read Bloom filter of the region and is not a member of any of the plurality of Multiple Read Bloom filters. 17. The method of claim 1 , wherein (e) comprises determining a read is aligned to a region when the read is a member of a One Read Bloom filter of the region and is not a member of any of the plurality of Multiple Read Bloom filters or any other One Read Bloom filters. 18. The method of claim 1 , wherein the plurality of regions of the genome corresponds to a plurality chromosome strands, sub-strand regions, or custom regions. 19. The method of claim 1 , wherein the approximately read-sized sequences fit into one or more read sizes of the plurality of reads. 20. A computer program product for detecting copy number variations, the computer program product comprising a non-transitory computer readable medium on which is provided program instructions comprising: (a) code for receiving a plurality of reads obtained from a sample; (b) code for providing a plurality of One Read Bloom filters corresponding to a plurality of regions of a genome and a plurality of Multiple Read Bloom filters corresponding to the plurality of regions of the genome, wherein each Bloom filter comprises a bit array, one or more hash functions, and logic for applying reads to the Bloom filter, each One Read Bloom filter was constructed using approximately read-sized sequences in its corresponding region of the genome, and each Multiple Read Bloom filter was constructed using approximately read-sized sequences found more than once in its corresponding region of the genome; (c) code for applying each read of the plurality of reads to each One Read Bloom filter to determine a membership of each read in each One Read Bloom filter, wherein applying a read to a Bloom filter comprises: providing the read as an input to each hash function of the one or more hash functions of the Bloom filter, obtaining an output value from each hash function and the read, wherein each output value is associated with a bit position in the bit array of the Bloom filter, and determining that the read is a member of the Bloom filter based on bit values of the bit array at bit positions associated with output values obtained from the one or more hash functions and the read; (d) code for applying each read of t
Related publications grouped by family.
Answers are generated from the same data shown on this page.