Current and Future Directions

Slides:



Advertisements
Similar presentations
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Advertisements

Methods to read out regulatory functions
Integrated Data Systems for Genomic Analysis Genomics and Bioinformatics for the Advancement of Clinical Sciences Thomas Jefferson University, Oct. 14,
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
TRANSFAC Project Roadmap Discussion.  Structure DNA-binding domain (DBD)  The portion (domain) of the transcription factor that binds DNA Trans-activating.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Introduction to BioInformatics GCB/CIS535
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Doug Brutlag 2011 Next Generation Sequencing and Human Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University.
GUS Overview June 18, GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
GUS The Genomics Unified Schema A Platform for Genomics Databases V. Babenko, B. Brunk, J.Crabtree, S. Diskin, S. Fischer, G. Grant, Y. Kondrahkin, L.Li,
Sharing Microarray Experiment Knowledge Chips to Hits Oct. 28, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for Bioinformatics University of.
GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM.
First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
Protein and RNA Families
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
EB3233 Bioinformatics Introduction to Bioinformatics.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Mining the Biomedical Research Literature Ken Baclawski.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
High throughput biology data management and data intensive computing drivers George Michaels.
生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵
The regulation of Caspase 8 chIP-seq motifs mRNA expression DNA methylation.
GUS We have created the Genomic Unified Schema (GUS), a relational database that warehouses and integrates biological sequence, sequence annotation, and.
The Transcriptional Landscape of the Mammalian Genome
VectorBase genome annotation
Interrogation of cross talk between proteins and gene regulatory networks in breast cancer Chambers, Teressa Lee Hiren Karathia Sridhar Hannenhalli.
EPConDB: Endocrine Pancreas Consortium Database
Lettuce/Sunflower EST CGPDB project.
Functional Annotation of the Horse Genome
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Ensembl Genome Repository.
Next Generation Sequencing and Human Genome Databases
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Rationale for GUS Answer queries:
Information Management Infrastructure for the Systematic Annotation of Vertebrate Genomes V Babenko (1), B Brunk (1), J Crabtree (1), S Diskin (1), Y Kondrahkin.
RAD (RNA Abundance Database)
The Computational Biology and Informatics Laboratory
From EpoDB to EPConDB: Adventures in Gene Expression Databases
Integrating Genomic Databases
Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory.
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
ChIP-seq Robert J. Trumbly
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics
Aligning Transcribed Sequences to the Human and Mouse Genomes
Annotator Interface GUS 3.0 Workshop June 18-21, 2002.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Current and Future Directions State of CBIL Current and Future Directions

Computational Biology and Informatics Laboratory October, 2001

CBIL Research Gene Discovery Gene Regulation EST analysis Genomic sequence analysis Gene Regulation Microarray analysis Promoter/ regulatory region analysis Biological data representation Data integration Ontology

CBIL: Gene Discovery Gene Annotation (Kolchanov) Gene coding potential Gene function prediction AllGenes EPConDB (Kaestner, Permutt, Melton) StemCellDB/ StroCDB (Lemischka, Moore) Mouse chromosome 5 (Bucan) PlasmoDB (Roos, Kissinger) ParaDB (Roos) Posters

CBIL: Gene Regulation PaGE PROM_REC TESS Pancreatic development (Kaestner) TGF-B signaling (Bottinger) Fetal globin expression in adults (Fortina) Brain disease and injury (Eberwine, Meaney) Endothelial cell function (Davies)

CBIL: Biological Data Representation Genomic Unified Schema RNA Abundance Database Connecting to a brain atlas (Nissanov, Davidson) Microarray ontology (MGED)

CBIL Project Architecture Sequence & annotation Gene index (ESTs and mRNAs) Microarray expression data experimental annotation Relational DB (Oracle) with Perl object layer GUS RAD

GUS: Genomics Unified Schema free text Controlled vocabs. GO Species Tissue Dev. Stage Genes, gene models STSs, repeats, etc Cross-species analysis Genomic Sequence RAD RNA Abundance DB Characterize transcripts RH mapping Library analysis Cross-species analysis DOTS Transcribed Sequence Special Features Transcript Expression Arrays SAGE Conditions Ownership Protection Algorithm Evidence Similarity Versioning under development Domains Function Structure Cross-species analysis Protein Sequence Pathways Networks Representation Reconstruction

Clusters vs. Contig Assemblies UniGene Transcribed Sequences (DOTS) BLAST: Clusters of ESTs & mRNAs CAP4: Consensus Sequences -Alternative splicing -Paralogs

Assembled Transcripts About 3 million human EST and mRNA sequences used Combined into 797,028 assemblies Cluster into 150,006 “genes” Can identify a protein for 76,771 genes And predict a function for 24,127 genes About 2 million mouse EST and mRNA sequences used Combined into 355,770 assemblies Cluster into 74,024 “genes” Can identify a protein for 34,008 genes And predict a function for 15,403 genes

Crabtree et al. Genome Research 2001 Bridging Fingerprint Contigs and RH Maps on Mouse Chromosome 5 Crabtree et al. Genome Research 2001 Fingerprint Map Chr. 5 RH Map

Predicting Gene Ontology Functions

AllGenes

AllGenes Enhancements: Annotated Entries

AllGenes Enhancements: Genomic Data

http://plasmodb.org New site

Contig View OM Restriction Sites Microsatellites Self-BLAST NRDB-BLAST SAGE Tags EST/GSS FullPHAT GeneFinder GlimerM Annotation (chr2-TIGR)

RAD: RNA Abundance Database Experiment Platform Raw Data Processed Data Algorithm Metadata Compliant with the MGED standards

Microarray Gene Expression Database group (MGED) International effort on microarray data standards: Develop standards for storing and communicating microarray-based gene expression data defining the minimal information required to ensure reproducibility and verifiability of results and to facilitate data exchange (MIAME, MAGEML-MAGEDOM) collecting (and where needed creating) controlled vocabularies/ ontologies. developing standards for data comparison and normalization. The schema is compliant with the minimum annotations recommended by MGED. MIAME: Minimum Information About a Microarray Experiment (common set of concepts that need to be captured in a database to describe gene expression experiments adequately for interpretation, reproduction or critical assessment). MAML: MicroArray Mark-up Language (XML Document Type Definitions of the concepts). http://www.mged.org

EPConDB Pathway query

Microarray Analysis: PaGE

RAD GUS EST clustering and assembly Identify shared TF binding sites TESS (Transcription Element Search Software) PROM-REC (Promoter recognition) Genomic alignment and comparative Sequence analysis Identify shared TF binding sites

Promoter Analysis: PROM_REC

http:www.cbil.upenn.edu

CAP4 provided by Paracel Acknowledgements CBIL: Chris Stoeckert Vladimir Babenko Brian Brunk Jonathan Crabtree Sharon Diskin Greg Grant Yuri Kondrakhin Georgi Kostov Phil Le Li Li Junmin Liu Elisabetta Manduchi Joan Mazzarelli Shannon McWeeney Debbie Pinney Angel Pizarro Jonathan Schug PlasmoDB collaborators: David Roos Martin Fraunholz Jesse Kissinger Jules Milgram Ross Koppel, Monash U. Malarial Genome Sequencing Consortium (Sanger Centre, Stanford U., TIGR/NMRC) EPConDB collaborators: Klaus Kaestner Marie Scearce Doug Melton, Harvard Alan Permutt, Wash. U Comparative Sequence Analysis Collaborators: Maja Bucan Shaying Zhao Whitehead/MIT Center for Genome Research CAP4 provided by Paracel

CBIL: Future Directions Sequence/ Sequence annotation Gene expression experiment Proteomics, Metabolomics Pathways/ Networks