Download presentation
Presentation is loading. Please wait.
1
Current and Future Directions
State of CBIL Current and Future Directions
2
Computational Biology and Informatics Laboratory October, 2001
3
CBIL Research Gene Discovery Gene Regulation
EST analysis Genomic sequence analysis Gene Regulation Microarray analysis Promoter/ regulatory region analysis Biological data representation Data integration Ontology
4
CBIL: Gene Discovery Gene Annotation (Kolchanov) Gene coding potential
Gene function prediction AllGenes EPConDB (Kaestner, Permutt, Melton) StemCellDB/ StroCDB (Lemischka, Moore) Mouse chromosome 5 (Bucan) PlasmoDB (Roos, Kissinger) ParaDB (Roos) Posters
5
CBIL: Gene Regulation PaGE PROM_REC TESS
Pancreatic development (Kaestner) TGF-B signaling (Bottinger) Fetal globin expression in adults (Fortina) Brain disease and injury (Eberwine, Meaney) Endothelial cell function (Davies)
6
CBIL: Biological Data Representation
Genomic Unified Schema RNA Abundance Database Connecting to a brain atlas (Nissanov, Davidson) Microarray ontology (MGED)
7
CBIL Project Architecture
Sequence & annotation Gene index (ESTs and mRNAs) Microarray expression data experimental annotation Relational DB (Oracle) with Perl object layer GUS RAD
8
GUS: Genomics Unified Schema
free text Controlled vocabs. GO Species Tissue Dev. Stage Genes, gene models STSs, repeats, etc Cross-species analysis Genomic Sequence RAD RNA Abundance DB Characterize transcripts RH mapping Library analysis Cross-species analysis DOTS Transcribed Sequence Special Features Transcript Expression Arrays SAGE Conditions Ownership Protection Algorithm Evidence Similarity Versioning under development Domains Function Structure Cross-species analysis Protein Sequence Pathways Networks Representation Reconstruction
9
Clusters vs. Contig Assemblies
UniGene Transcribed Sequences (DOTS) BLAST: Clusters of ESTs & mRNAs CAP4: Consensus Sequences -Alternative splicing -Paralogs
10
Assembled Transcripts
About 3 million human EST and mRNA sequences used Combined into 797,028 assemblies Cluster into 150,006 “genes” Can identify a protein for 76,771 genes And predict a function for 24,127 genes About 2 million mouse EST and mRNA sequences used Combined into 355,770 assemblies Cluster into 74,024 “genes” Can identify a protein for 34,008 genes And predict a function for 15,403 genes
11
Crabtree et al. Genome Research 2001
Bridging Fingerprint Contigs and RH Maps on Mouse Chromosome 5 Crabtree et al. Genome Research 2001 Fingerprint Map Chr. 5 RH Map
12
Predicting Gene Ontology Functions
13
AllGenes
14
AllGenes Enhancements: Annotated Entries
15
AllGenes Enhancements: Genomic Data
16
New site
17
Contig View OM Restriction Sites Microsatellites Self-BLAST NRDB-BLAST
SAGE Tags EST/GSS FullPHAT GeneFinder GlimerM Annotation (chr2-TIGR)
18
RAD: RNA Abundance Database
Experiment Platform Raw Data Processed Data Algorithm Metadata Compliant with the MGED standards
19
Microarray Gene Expression Database group (MGED)
International effort on microarray data standards: Develop standards for storing and communicating microarray-based gene expression data defining the minimal information required to ensure reproducibility and verifiability of results and to facilitate data exchange (MIAME, MAGEML-MAGEDOM) collecting (and where needed creating) controlled vocabularies/ ontologies. developing standards for data comparison and normalization. The schema is compliant with the minimum annotations recommended by MGED. MIAME: Minimum Information About a Microarray Experiment (common set of concepts that need to be captured in a database to describe gene expression experiments adequately for interpretation, reproduction or critical assessment). MAML: MicroArray Mark-up Language (XML Document Type Definitions of the concepts).
20
EPConDB Pathway query
21
Microarray Analysis: PaGE
22
RAD GUS EST clustering and assembly Identify shared TF binding sites
TESS (Transcription Element Search Software) PROM-REC (Promoter recognition) Genomic alignment and comparative Sequence analysis Identify shared TF binding sites
23
Promoter Analysis: PROM_REC
25
CAP4 provided by Paracel
Acknowledgements CBIL: Chris Stoeckert Vladimir Babenko Brian Brunk Jonathan Crabtree Sharon Diskin Greg Grant Yuri Kondrakhin Georgi Kostov Phil Le Li Li Junmin Liu Elisabetta Manduchi Joan Mazzarelli Shannon McWeeney Debbie Pinney Angel Pizarro Jonathan Schug PlasmoDB collaborators: David Roos Martin Fraunholz Jesse Kissinger Jules Milgram Ross Koppel, Monash U. Malarial Genome Sequencing Consortium (Sanger Centre, Stanford U., TIGR/NMRC) EPConDB collaborators: Klaus Kaestner Marie Scearce Doug Melton, Harvard Alan Permutt, Wash. U Comparative Sequence Analysis Collaborators: Maja Bucan Shaying Zhao Whitehead/MIT Center for Genome Research CAP4 provided by Paracel
27
CBIL: Future Directions
Sequence/ Sequence annotation Gene expression experiment Proteomics, Metabolomics Pathways/ Networks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.