Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory.

Slides:



Advertisements
Similar presentations
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Advertisements

The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Integrated Data Systems for Genomic Analysis Genomics and Bioinformatics for the Advancement of Clinical Sciences Thomas Jefferson University, Oct. 14,
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
GUS Overview June 18, GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
GUS The Genomics Unified Schema A Platform for Genomics Databases V. Babenko, B. Brunk, J.Crabtree, S. Diskin, S. Fischer, G. Grant, Y. Kondrahkin, L.Li,
Sharing Microarray Experiment Knowledge Chips to Hits Oct. 28, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for Bioinformatics University of.
GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM.
First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
Copyright OpenHelix. No use or reproduction without express written consent1.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
RNA Sequencing I: De novo RNAseq
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Sackler Medical School
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
GUS 3.0: Implementation and Dependencies June 19, 2002 Jonathan Crabtree
GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Accessing and visualizing genomics data
Legend Global = Subgraph call Make Data Dir = Step Load Genomic Sequence & Annotation = Subgraph reference Proteome Analysis = Optional step [Taxon] Pk.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
The regulation of Caspase 8 chIP-seq motifs mRNA expression DNA methylation.
David Amar, Tom Hait, and Ron Shamir
GUS We have created the Genomic Unified Schema (GUS), a relational database that warehouses and integrates biological sequence, sequence annotation, and.
The Transcriptional Landscape of the Mammalian Genome
VectorBase genome annotation
Basics of BLAST Basic BLAST Search - What is BLAST?
EPConDB: Endocrine Pancreas Consortium Database
Lettuce/Sunflower EST CGPDB project.
University of Pittsburgh
High-throughput Biological Data The data deluge
GEP Annotation Workflow
Identification and Characterization of pre-miRNA Candidates in the C
Ensembl Genome Repository.
Identify D. melanogaster ortholog
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Rationale for GUS Answer queries:
Current and Future Directions
Information Management Infrastructure for the Systematic Annotation of Vertebrate Genomes V Babenko (1), B Brunk (1), J Crabtree (1), S Diskin (1), Y Kondrahkin.
RAD (RNA Abundance Database)
The Computational Biology and Informatics Laboratory
From EpoDB to EPConDB: Adventures in Gene Expression Databases
Integrating Genomic Databases
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
ORF identification in Allgenes Project
BIOBASE Training TRANSFAC® ExPlain™
Aligning Transcribed Sequences to the Human and Mouse Genomes
Annotator Interface GUS 3.0 Workshop June 18-21, 2002.
Presentation transcript:

Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory http://www.cbil.upenn.edu

RAD GUS EST clustering and assembly Identify shared TF binding sites TESS (Transcription Element Search Software) PROM-REC (Promoter recognition) Genomic alignment and comparative Sequence analysis Identify shared TF binding sites

Light weight PERL object layer GUS system External Datasources Data Integration Computational Annotation Validation Light weight PERL object layer Data Warehouse ~230 Tables/Views Annotators interface Browser & bioWidgets Java Servlet (views)

GUS: Genomics Unified Schema free text GO Species Tissue Dev. Stage Controlled Vocabs Genes, gene models STSs, repeats, etc Cross-species analysis Genes / Sequence RAD RNA Abundance DB Characterize transcripts RH mapping Library analysis Cross-species analysis DOTS RNAs / Sequence Special Features Transcript Expression Arrays SAGE Conditions Ownership Protection Algorithm Evidence Similarity Versioning under development Domains Function Structure Cross-species analysis Proteins / Sequence Pathways Networks Representation Reconstruction

Clusters vs. Contig Assemblies UniGene Transcribed Sequences (DoTS) BLAST: Clusters of ESTs & mRNAs CAP4: Consensus Sequences -Alternative splicing -Paralogs

“Unassembled” clusters (consensus sequences and new) Incremental Updates of DoTS Sequences Incoming Sequences (EST/mRNA) Make Quality (remove vector, polyA, NNNs) “Quality” sequences AssemblySequence Block with RepeatMasker Blocked sequences Assign to DOTS consensus sequences (blastn at 40 bp length, 92% identity) Cluster incoming sequences. DOTS Consensus Sequences “Unassembled” clusters Assemble DOTS consensus sequences and incoming sequences with CAP4. CAP4 assemblies (consensus sequences and new) Calculate new DOTS consensus sequence using weighted consensus sequence(s) and new CAP4 assembly. New Consensus sequences Update GUS database

Assembly Validation Alignment to Genomic Sequence via Blast/sim4. preliminary data look good Assembly consistency (Assemblies provide potential SNPs) Add BLAST sim4 figure

Current DoTS content (www.allgenes.org) Human Mouse Build Beginning Date 7/20/2001 6/1/2001 Input Sequences 3,169,487 1,939,246 Non-singleton Assemblies 175,153 79,746 “Gene” clusters 140,369 74,050 With nrdb similarities - 34,033 (46%) With prodom/CDD similarities 27,602 (37%) With GO function assignment 12,777 (17%)

RAD Multiple labs Multiple biological systems Multiple platforms Expressed genes? Differentially-expressed genes? Co-regulated genes? Gene pathways?

RAD: RNA Abundance Database Experiment Platform Raw Data Processed Data Algorithm Metadata Compliant with the MGED standards

Different Views of GUS/RAD Focused annotation of specific organisms and biological systems: organisms biological systems Endocrine pancreas Human Mouse CNS GUS GUS Plasmodium falciparum Hematopoiesis *not drawn to scale*

WWW.CBIL.UPENN.EDU/EPCONDB

EpConDB Pathway query

WWW.PLASMODB.ORG New site

PlasmoDB query integrating gene expression, genomic sequence and GO Function prediction

RAD GUS EST clustering and assembly Identify shared TF binding sites TESS (Transcription Element Search Software) PROM-REC (Promoter recognition) Genomic alignment and comparative Sequence analysis Identify shared TF binding sites

Acknowledgements CBIL: Chris Overton Chris Stoeckert Vladimir Babenko Brian Brunk Jonathan Crabtree Sharon Diskin Greg Grant Yuri Kondrakhin Georgi Kostov Phil Le Elisabetta Manduchi Joan Mazzarelli Shannon McWeeney Debbie Pinney Angel Pizarro Jonathan Schug PlasmoDB collaborators: David Roos Martin Fraunholz Jesse Kissinger Jules Milgram Ross Koppel, Monash U. Malarial Genome Sequencing Consortium (Sanger Centre, Stanford U., TIGR/NMRC) Allgenes.org collaborators: Ed Uberbacher, ORNL Doug Hyatt, ORNL EPConDB collaborators: Klaus Kaestner Marie Scearce Doug Melton, Harvard Alan Permutt, Wash. U Comparative Sequence Analysis Collaborators: Maja Bucan Shaying Zhao Whitehead/MIT Center for Genome Research

GUS Object View Gene Genomic Sequence Gene Instance Gene Feature NA RNA RNA Sequence RNA Instance RNA Feature Protein Protein Sequence Protein Instance Protein Feature AA Sequence AA Feature

Query RAD by Sample or by Experiment Access by Experiment groups Sample info ontologies Image info

Predicting Gene Ontology Functions

Experiment Tables Label Sample Treatment Disease Devel. Stage Anatomy Hybridization Conditions Label Sample Treatment Disease Devel. Stage ExperimentSample Anatomy Taxon RelExperiments Exp.ControlGenes ControlGenes Experiment ExpGroups Groups

High Level Flow Diagram of GUS Annotation Genomic Sequence mRNA/EST Sequence BLAST/SIM4 ORNL Gene predictions GRAIL/GenScan Clustering and Assembly Predicted Genes DOTS consensus Sequences Merge Genes Gene/RNA cluster assignment Gene Index Gene families, Orthologs Assign Gene Name, Manual Annotation.. Predicted RNAs Predicted Proteins Grail/Genscan, framefinder BLASTX PFAM,SignalP, TMPred, ProDom, etc BLASTP Algorithms for functional predictions BLAST Similarities Protein Features/Motifs GO Functions