Presentation is loading. Please wait.

Presentation is loading. Please wait.

Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory.

Similar presentations


Presentation on theme: "Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory."— Presentation transcript:

1 Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory

2 RAD GUS EST clustering and assembly Identify shared TF binding sites
TESS (Transcription Element Search Software) PROM-REC (Promoter recognition) Genomic alignment and comparative Sequence analysis Identify shared TF binding sites

3 Light weight PERL object layer
GUS system External Datasources Data Integration Computational Annotation Validation Light weight PERL object layer Data Warehouse ~230 Tables/Views Annotators interface Browser & bioWidgets Java Servlet (views)

4 GUS: Genomics Unified Schema
free text GO Species Tissue Dev. Stage Controlled Vocabs Genes, gene models STSs, repeats, etc Cross-species analysis Genes / Sequence RAD RNA Abundance DB Characterize transcripts RH mapping Library analysis Cross-species analysis DOTS RNAs / Sequence Special Features Transcript Expression Arrays SAGE Conditions Ownership Protection Algorithm Evidence Similarity Versioning under development Domains Function Structure Cross-species analysis Proteins / Sequence Pathways Networks Representation Reconstruction

5 Clusters vs. Contig Assemblies
UniGene Transcribed Sequences (DoTS) BLAST: Clusters of ESTs & mRNAs CAP4: Consensus Sequences -Alternative splicing -Paralogs

6 “Unassembled” clusters (consensus sequences and new)
Incremental Updates of DoTS Sequences Incoming Sequences (EST/mRNA) Make Quality (remove vector, polyA, NNNs) “Quality” sequences AssemblySequence Block with RepeatMasker Blocked sequences Assign to DOTS consensus sequences (blastn at 40 bp length, 92% identity) Cluster incoming sequences. DOTS Consensus Sequences “Unassembled” clusters Assemble DOTS consensus sequences and incoming sequences with CAP4. CAP4 assemblies (consensus sequences and new) Calculate new DOTS consensus sequence using weighted consensus sequence(s) and new CAP4 assembly. New Consensus sequences Update GUS database

7 Assembly Validation Alignment to Genomic Sequence via Blast/sim4.
preliminary data look good Assembly consistency (Assemblies provide potential SNPs) Add BLAST sim4 figure

8 Current DoTS content (www.allgenes.org)
Human Mouse Build Beginning Date 7/20/2001 6/1/2001 Input Sequences 3,169,487 1,939,246 Non-singleton Assemblies 175,153 79,746 “Gene” clusters 140,369 74,050 With nrdb similarities - 34,033 (46%) With prodom/CDD similarities 27,602 (37%) With GO function assignment 12,777 (17%)

9 RAD Multiple labs Multiple biological systems Multiple platforms
Expressed genes? Differentially-expressed genes? Co-regulated genes? Gene pathways?

10 RAD: RNA Abundance Database
Experiment Platform Raw Data Processed Data Algorithm Metadata Compliant with the MGED standards

11 Different Views of GUS/RAD
Focused annotation of specific organisms and biological systems: organisms biological systems Endocrine pancreas Human Mouse CNS GUS GUS Plasmodium falciparum Hematopoiesis *not drawn to scale*

12

13 EpConDB Pathway query

14 New site

15 PlasmoDB query integrating gene expression, genomic sequence and GO Function prediction

16

17 RAD GUS EST clustering and assembly Identify shared TF binding sites
TESS (Transcription Element Search Software) PROM-REC (Promoter recognition) Genomic alignment and comparative Sequence analysis Identify shared TF binding sites

18 Acknowledgements CBIL: Chris Overton Chris Stoeckert Vladimir Babenko
Brian Brunk Jonathan Crabtree Sharon Diskin Greg Grant Yuri Kondrakhin Georgi Kostov Phil Le Elisabetta Manduchi Joan Mazzarelli Shannon McWeeney Debbie Pinney Angel Pizarro Jonathan Schug PlasmoDB collaborators: David Roos Martin Fraunholz Jesse Kissinger Jules Milgram Ross Koppel, Monash U. Malarial Genome Sequencing Consortium (Sanger Centre, Stanford U., TIGR/NMRC) Allgenes.org collaborators: Ed Uberbacher, ORNL Doug Hyatt, ORNL EPConDB collaborators: Klaus Kaestner Marie Scearce Doug Melton, Harvard Alan Permutt, Wash. U Comparative Sequence Analysis Collaborators: Maja Bucan Shaying Zhao Whitehead/MIT Center for Genome Research

19

20 GUS Object View Gene Genomic Sequence Gene Instance Gene Feature NA
RNA RNA Sequence RNA Instance RNA Feature Protein Protein Sequence Protein Instance Protein Feature AA Sequence AA Feature

21 Query RAD by Sample or by Experiment
Access by Experiment groups Sample info ontologies Image info

22

23

24 Predicting Gene Ontology Functions

25 Experiment Tables Label Sample Treatment Disease Devel. Stage Anatomy
Hybridization Conditions Label Sample Treatment Disease Devel. Stage ExperimentSample Anatomy Taxon RelExperiments Exp.ControlGenes ControlGenes Experiment ExpGroups Groups

26 High Level Flow Diagram of GUS Annotation
Genomic Sequence mRNA/EST Sequence BLAST/SIM4 ORNL Gene predictions GRAIL/GenScan Clustering and Assembly Predicted Genes DOTS consensus Sequences Merge Genes Gene/RNA cluster assignment Gene Index Gene families, Orthologs Assign Gene Name, Manual Annotation.. Predicted RNAs Predicted Proteins Grail/Genscan, framefinder BLASTX PFAM,SignalP, TMPred, ProDom, etc BLASTP Algorithms for functional predictions BLAST Similarities Protein Features/Motifs GO Functions


Download ppt "Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory."

Similar presentations


Ads by Google