High-throughput methodology for identifying rna-protein interactions transcriptome-wide
US-2015355173-A1 · Dec 10, 2015 · US
US2016239602A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016239602-A1 |
| Application number | US-201415024990-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 27, 2014 |
| Priority date | Sep 27, 2013 |
| Publication date | Aug 18, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Computational methods used for large scale scaffolding of a genome assembly are provided. Such methods may include a step of applying a location clustering model to a test set of contigs to form two or more location cluster groups, each location cluster group comprising one or more location-clustered contigs; a step of applying an ordering model to each of the two or more location cluster groups to form an ordered set of one or more location-clustered contigs within each cluster group; and a step of applying an orienting model to each ordered set of one or more location-clustered contigs to assign a relative orientation to each of the location-clustered contigs within each location cluster group. In some aspects, the test set of contigs are generated from aligning a set of reads generated by a chromosome conformation analysis technique (e.g., Hi-C) with a draft assembly, a reference assembly, or both.
Opening claim text (preview).
1 . A method performed by a computing system for large scale scaffolding of a genome assembly comprising: applying a location clustering model to a test set of contigs to form two or more location cluster groups, each location cluster group comprising one or more location-clustered contigs; applying an ordering model to each of the two or more location cluster groups to form an ordered set of one or more location-clustered contigs within each cluster group; and applying an orienting model to each ordered set of one or more location-clustered contigs to assign a relative orientation to each of the location-clustered contigs within each location cluster group; wherein the test set of contigs are generated from aligning a set of reads generated by a chromosome conformation analysis technique with a draft assembly, a reference assembly, or both. 2 . The method of claim 1 , wherein the location clustering model comprises building a graph and applying a hierarchical agglomerative clustering algorithm with an average-linkage metric to calculate a link density between each of the contigs of the test set. 3 . The method of claim 1 , wherein the two or more location cluster groups are two or more chromosome groups, each chromosome group comprising one or more contigs derived from the same chromosome. 4 . The method of claim 1 , wherein the ordering model comprises building a graph and calculating a minimum spanning tree. 5 . The method of claim 1 , wherein the orienting model comprises building a graph and calculating an orientation quality score for each location-clustered contig, and wherein the graph is optionally a weighted directed acyclic graph (WDAG). 6 . (canceled) 7 . The method of claim 1 , wherein the chromosome conformation analysis technique is Chromatin Conformation Capture (3C), Circularized Chromatin Conformation Capture (4C), Carbon Copy Chromosome Conformation Capture (5C), Chromatin Immunoprecipitation (ChIP), ChIP-Loop, Hi-C, combined 3C-ChIP-cloning (6C), or Capture-C. 8 . The method of claim 1 , further comprising, prior to applying a location clustering model, applying a species clustering model to a heterogeneous set of contigs to form two or more species cluster groups, each species cluster group comprising one or more species-clustered contigs from a single species; wherein the heterogeneous set of contigs are generated from aligning a set of reads generated by a chromosome conformation analysis technique with a metagenome assembly, and wherein the one or more species-clustered contigs are used as the test set of contigs. 9 . A system for performing large scale scaffolding of a genome assembly comprising: a computer readable storage medium which stores computer-executable instructions comprising instructions for applying a location clustering model to a test set of contigs to form two or more location cluster groups, each location cluster group comprising one or more location-clustered contigs; instructions for applying an ordering model to each of the two or more location cluster groups to form an ordered set of one or more location-clustered contigs within each cluster group; and instructions for applying an orienting model to each ordered set of one or more location-clustered contigs to assign a relative orientation to each of the location-clustered contigs within each location cluster group; wherein the test set of contigs are generated from aligning a set of reads generated by a chromosome conformation analysis technique with a draft assembly, a reference assembly, or both. a processor which is configured to perform steps comprising receiving a set of input files which comprise a file comprising the set of reads generated by a chromosome conformation analysis technique; and the draft assembly, reference assembly, or both; executing the computer-executable instructions stored in the computer-readable storage medium. 10 . The system of claim 9 , wherein the location clustering model comprises building a graph and applying a hierarchical agglomerative clustering algorithm with an average-linkage metric to calculate a link density between each of the contigs of the test set. 11 . The system of claim 9 , wherein the two or more location cluster groups are two or more chromosome groups, each chromosome group comprising one or more contigs derived from the same chromosome. 12 . The system of claim 9 , wherein the ordering model comprises building a graph and calculating a minimum spanning tree. 13 . The system of claim 9 , wherein the orienting model comprises building a graph and calculating an orientation quality score for each location-clustered contig, and wherein the graph is optionally a weighted directed acyclic graph (WDAG). 14 . (canceled) 15 . The system of claim 9 , wherein the chromosome conformation analysis technique is Chromatin Conformation Capture (3C), Circularized Chromatin Conformation Capture (4C), Carbon Copy Chromosome Conformation Capture (5C), Chromatin Immunoprecipitation (ChIP), ChIP-Loop, Hi-C, combined 3C-ChIP-cloning (6C), or Capture-C. 16 . The system of claim 9 , wherein the computer-executable instructions further comprises instructions for applying a species clustering model to a heterogeneous set of contigs to form two or more species cluster groups, each species cluster group comprising one or more species-clustered contigs from a single species; wherein the heterogeneous set of contigs are generated from aligning a set of reads generated by a chromosome conformation analysis technique with a metagenome assembly, and wherein the one or more species-clustered contigs are used as the test set of contigs in the instructions for applying a location clustering model. 17 . A computer readable storage medium which stores computer-executable instructions comprising: instructions for applying a location clustering model to a test set of contigs to form two or more location cluster groups, each location cluster group comprising one or more location-clustered contigs; instructions for applying an ordering model to each of the two or more location cluster groups to form an ordered set of one or more location-clustered contigs within each cluster group; instructions for applying an orienting model to each ordered set of one or more location-clustered contigs to assign a relative orientation to each of the location-clustered contigs within each location cluster group; and instructions for applying a species clustering model to a heterogeneous set of contigs to form two or more species cluster groups, each species cluster group comprising one or more species-clustered contigs from a single species; wherein the test set of contigs are generated from aligning a set of reads generated by a chromosome conformation analysis technique with a draft assembly, a reference assembly, or both; wherein the heterogeneous set of contigs are generated from aligning a set of reads generated by a chromosome conformation analysis technique with a metagenome assembly, and wherein the one or more species-clustered contigs are used as the test set of contigs in the instructions for applying a location clustering model. 18 . The computer readable storage medium of claim 17 , wherein the location clustering model comprises building a graph and applying a hierarchical agglomerative clustering algorithm with an average-linkage metric to calculate a link density between each of the contigs of the test set. 19 . The computer readable storage medium of claim 17 , wherein the two or mo
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
Physics · mapped topic
Physics · mapped topic
Sequence assembly · CPC title
ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.