The European Molecular Biology Laboratory (EMBL) is supported by sixteen countries. Consists of the main Laboratory in Heidelberg (Germany), Outstations in Hamburg (Germany), Grenoble (France) and Hinxton (U. K.), and an external Research Programme in Monterotondo (Italy). from from 1996
The EBI Mission To provide Bioinformatics Facilities for the Scientific Community To become a flagship laboratory for research in bioinformatics To provide bioinformatics training To help disseminate standards & technologies
Role of Bioinformatics To Support Experimental Biology To Collect and Archive Data To provide Framework and Integration To give Easy Access to Data To make New Discoveries through Data Analysis To predict through modelling To facilitate application and exploitation of academic research in Medicine, Agriculture, Health and Environment
Dramatic Changes in Biology over last 5 years Data Explosion & New Types of Data Move towards High-Throughput Biology Move towards Systems Biology Much larger community – often naïve users Growth of Applied Biology – molecular medicine, agriculture, food, environmental sciences
Genomes Hypotheses and in silico models Bioinformatics Expression- profiling Comparative genomics Mutant/RNAi data Metabolic data Literature Proteome data Biochemistry
Molecules to Cells to Organisms E.coli Genome Protein Genomes
Systems Biology Output Input CheZ CheW CheB ATP ADP Pi Methyl CheR Methyl Adaptor Flim C Pi CheY CheA
Molecular Basis of Disease p53 tumour suppressor core domain – cancers of many types Cu-Zn Superoxide Dismutase - Autosomal dominant Amyotrophic lateral sclerosis
From Structure to Functional Annotation
PQS biological assemblies MSDchem ligand data Electron Density Visualisation AstexViewer MSDPro, MSDlite SSM fold matchingSurface MatchingMSDsite Active sites Linking to Domain data, eFamily Sequence Mapping, SIFTS
From Structure To Biochemical Function Gene Protein 3D Structure Function Given a protein structure: Where is the functional site? What is the multimeric state of the protein? Which ligands bind to the protein? What is biochemical function?
High throughput A new sequence every 4 seconds web requests a day users 5-10 core databases cross-references About 160 other databases
Data Growth
Web requests per day (excluding Ensembl)
ftp year million files; Terabytes
Web Servers Requestsmillions
Distinct hosts served Number users(millions)
dynamic pages domains (2005) 1..uk (United Kingdom) 21.14% 2..com (Commercial) 17.16% 3. [unknown domain] 13.37% 4.[unresolved numerical addresses] 11.05% 5..edu (USA Higher Education) 5.29% 6..net (Networks) 5.27% 7..fr (France) 4.76% 8..it (Italy) 4.68% 9..de (Germany) 2.81% 10..nl (Netherlands) 2.00%
The Services of the EBI Nucleotide sequences Genes Transcription information Protein sequences Protein families Macromolecular structures Molecular interactions Pathways Metabolic information Scientific Literature
Structure of EBI: Services
Apweiler, Stoesser Brazma Birney Henrick Database Integration and External Services Lopez Stoehr, Zhu
Structure of EBI: Research
Text MiningComputational Genomics Structural Proteomics Neuroinformatics Phylogeny & Evolution
EBI DATA BASES
EMBL-Bank DNA sequences
SWISS-PROT + TrEMBL Protein Sequences
EMBL-Bank DNA sequences SWISS-PROT + TrEMBL Protein Sequences EMSD Macromolecular Structure Data
EMBL-Bank DNA sequences SWISS-PROT + TrEMBL Protein Sequences Array-Express Microarray Expression Data EMSD Macromolecular Structure Data
EnsEMBL Human Genome Gene Annotation EMBL-Bank DNA sequences SWISS-PROT + TrEMBL Protein Sequences Array-Express Microarray Expression Data EMSD Macromolecular Structure Data
EnsEMBL Human Genome Gene Annotation EMBL-Bank DNA sequences SWISS-PROT + TrEMBL Protein Sequences Array-Express Microarray Expression Data EMSD Macromolecular Structure Data IntAct Protein Interactions
GKB Pathways EnsEMBL Human Genome Gene Annotation EMBL-Bank DNA sequences SWISS-PROT + TrEMBL Protein Sequences Array-Express Microarray Expression Data EMSD Macromolecular Structure Data IntAct Protein Interactions
Integration
Integrative science demands integrative resources EBI databases have a backbone of integrative links cross-references support trans- database navigation Is this good enough? sparse and coarse-grain not straight-forward to use
Integrative science demands integrative resources Major efforts involved in integration Interpro: database of protein families, domains and functional sites. Interg8: data integration project co-ordinated by the EBI, to provide an integrated layer for the exploitation of genomic and proteomic data. GRID technologies
European Patent Office Support the inclusion of sequence data in the public databases Development of tools to capture sequence data Run their searches at the EBI (similar arrangements in USA and Japan ensure exchange) Analogous systems being developed for structure information
Industry Support
Current successful Industry programme for Pharma Quarterly meetings R&D Training - workshops Industry Forum Funded by subscriptions New SME programme under development
New Data Expression Data Proteomic Data Metabolome Data Chip-on- Chip AtlasesElectron tomographs Human Variation Disease Links ??
The Magic Search Box