Download presentation
Presentation is loading. Please wait.
1
Bioinformatics of mammaliain gene expression (BoMGE) 07 June 2005 Gene Regulation Informatics
2
Deliver what? System... History/timelines Competitive position
3
Deliver what? ‘Comprehensive catalog’ of mammalian regulatory elements ‘Validated’, known accuracy Clustered into similar groups - ‘TF models’ Annotated as known/novel Modules identified, ‘specific to...’ Predictions extrapolated to remote regions
4
Predictive system Mostly Java Some Perl/bash 270 CPUs/OSCAR TRANFSAC 9.1 Manual TFBS EnsEMBL-based Generalize... OPTICS Accuracy metrics
5
Coexpression resource How best to use it? Motif discovery? Motif co-ccurrence?
6
Multi-source orthologue resource Compara, HomoloGene, Inparanoid, KEGG Compara, HomoloGen e, Inparanoid, KEGG, …
7
Visual comparative genomics: Assessing ortholog annotations LAGAN alignment detects misannotated chicken gene Orthologues of a human gene Assess sequence conservation for a coding exon (MLAGAN).
8
Motif discovery with multiple methods/params Methods (W)CONSENSUS MEME MotifSampler Gibbs Sampler Bioprospector, MDmodule, … Weeder CisModule NestedMICA, Sombrero,... ‘Multiple’ means Methods Motif occurrence models Other parameters
9
Motif scores p-values Target Cumulative motif score distns p-val = 0.02 No p-val threshold 1 Discover with target and random sequences. 2 Apply method-independent score. 3 Use random distribution to assign p-value to a score. Random 1500b region
11
Motif clustering, co-occurrence TRANFSAC 9.1 Manual TFBS OPTICS Accuracy metrics
12
Clustering with OPTICS Reachability plot JASPAR scan test: 50-PWMs, 100 target sequence sets Labeled cluster contents 1 Pairwise motif similarity measure. 2 Scalable hierarchical clustering method with automatic stopping. [32 CPUs, 96 GB RAM, 64-bit OS]
13
www.cisred.org v1.1: human, mouse human: 6K genes, 120K motifs Web database design and construction
14
Main competitors Zhang - Cold Spring Harbor Lab Lander/Kellis - MIT Bolouri - Institute for Systems Biology Hardison/Haussler - Penn State/UCSC... High throughput... low throughput
15
Large scale’s here. Now what? Production / R&D Hi/lo throughput. Collaborators Accuracy / complexity / data integration ChIP-xxxx, expression specificity, chromatin state, 3’UTRs, LREs... ENCODE Regulatory networks and cascades
16
Competitive opportunities Monica - C. elegans, briggsae, unannotated Erin - Drosophila,..., unannotated Han Hao / Jim Kronstad (UBC) - fungi Generaliz e SNPs - Stephen Montgomery Repetitive regions - Dixie Mager
17
Competitive opportunities Many target genes, many orthologues Low-coverage/unannotated genomes Accuracy - resources, methods, protocols,... Coexpression and orthology Discovery input vs. co-occurrence/modules Motif similarity, clustering - a superset? cisRED annotations in EnsEMBL ‘Contextual’ motif/module resource...
18
‘Context’ in cisRED Discovered motifs Motif similarity measures Clustering methods ‘Known’ motif resources Annotate motifs as known/novel Motif groups (specific to...) Other result types ‘Accuracy’ Motif classification system
19
Competitive opportunities Validated predictions Myers/Stanford Collaborators Be ‘on the short list’ Collaborators, publications GC3 - ChIP-SAGE, networks...
20
Acknowledgements Misha Bilenky, Chris Fjell, Obi Griffith, Han Hao, Ann He, Bernard Li, Keven Lin, Stephen Montgomery, Mehrdad Oveisi, Erin Pleasance, Neil Robertson, Wenjia Pan, Monica Sleumer, Kevin Teague, Richard Varhol, Maggie Zhang, Asim Siddiqui, Steven Jones Jianjun Zhou, Jörg Sander Dept. Computing Science, University of Alberta Tamara Astakhova, Maik Hassel, James Kennedy, Eddy Tsang, Tony Fu,... Funding Genome Canada, BC Cancer Foundation, Michael Smith Foundation for Health Research
21
TF classification / known motifs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.