Bioinformatics tools for the EBI An overview.

Slides:



Advertisements
Similar presentations
The EMBL-European Bioinformatics Institute
Advertisements

Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The European Molecular Biology Laboratory (EMBL) is supported by sixteen countries. Consists of the main Laboratory in Heidelberg (Germany), Outstations.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
UniProt - The Universal Protein Resource
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Vicky Schneider, EMBL-EBI Training Programme Project leader Short Introduction To EMBL-EBI.
Welcome to EMBL-EBI Dr Laura Emery. Before we start… Stand up How experienced are you in bioinformatics? Get to know each other by arranging yourselves.
Small Molecules EBI Bioinformatics Roadshow Gareth Owen, ChEBI group
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Metagenomic Analysis Using MEGAN4
Development of Bioinformatics and its application on Biotechnology
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number.
Gene Expression Omnibus (GEO)
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
Biological Databases By : Lim Yun Ping E mail :
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Learning and exploring Life science through the EBI reosurces and tools BIOQUEST workshop_2011 Vicky Schneider, EMBL-EBI Training Programme Project leader.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Protein and RNA Families
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
A curated database of biological pathways.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Bioinformatics and Computational Biology
EBI is an Outstation of the European Molecular Biology Laboratory. EBI patent related services Jennifer McDowall Senior Scientist, EMBL-EBI 3 rd Annual.
Learning and exploring Life science through the EBI reosurces and tools BIOQUEST workshop_2011 Vicky Schneider, EMBL-EBI Training Programme Project leader.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
InterPro Sandra Orchard.
For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
EMBL’s European Bioinformatics Institute
ELIXIR: Authentication and Authorization Infrastructure Requirements
Overview of EBI Data Resources and Services
Genomes and Their Evolution
Florian Gräf Software Developer of the McEntyre group at EMBL-EBI
Introduction to Bioinformatics
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Bioinformatics tools for the EBI An overview

2EBI Overview Bioinformatics The science of storing, retrieving and analyzing large amounts of biological information An interdisciplinary science, involving biologists, computer scientists and mathematicians At the heart of modern biology

3EBI Overview “Large-scale” focus Data explosion and new types of data High-throughput biology Emphasis on systems, not reductionism Large community of users with no training in bioinformatics Growth of applied biology – molecular medicine, agriculture, food, environmental sciences…

4EBI Overview4 What is EMBL-EBI? Based on the Wellcome Trust Genome Campus near Cambridge, UK Part of the European Molecular Biology Laboratory Non-profit organization

5EBI Overview The EBI’s mission servicesTo provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress researchTo contribute to the advancement of biology through basic investigator-driven research in bioinformatics trainingTo provide advanced bioinformatics training to scientists at all levels, from PhD students to independent investigators industryTo help disseminate cutting-edge technologies to industry Filler text

Databases and tools

New types of data Genomes DNA & RNA sequence Gene expression Protein sequence Protein families, motifs and domains Protein families, motifs and domains Protein structure Protein interactions Chemical entities Pathways Systems Literature and ontologies 7EBI Overview

8 Genomes Ensembl Ensembl Genomes EGA Genomes Ensembl Ensembl Genomes EGA Nucleotide sequence EMBL-Bank Nucleotide sequence EMBL-Bank Microarray & gene expression data ArrayExpress Microarray & gene expression data ArrayExpress Proteomes UniProt, PRIDE Proteomes UniProt, PRIDE Protein families, motifs and domains InterPro Protein families, motifs and domains InterPro Protein structure PDBe Protein structure PDBe Protein interactions IntAct Protein interactions IntAct Chemical entities ChEBI Chemical entities ChEBI Pathways Reactome Pathways Reactome Systems BioModels Systems BioModels Literature and ontologies CiteXplore, GO Literature and ontologies CiteXplore, GO 8 Databases: molecules to systems

9 Database collaborations 9EBI Overview

10EBI Overview10 Standards development – international collaborations Genome annotation Genome annotation Microarray and Gene Expression Data (MGED) Microarray and Gene Expression Data (MGED) Protein sequence Protein sequence HUPO- Proteomics Standards Initiative (PSI) HUPO- Proteomics Standards Initiative (PSI) Protein structure Protein structure Cheminformatics Cheminformatics Pathways Pathways Systems modeling standards Systems modeling standards Metabolomics Standards Initiative (MSI) Metabolomics Standards Initiative (MSI) Genomics Standards Consortium (GSC) Genomics Standards Consortium (GSC) Nucleotide sequence Nucleotide sequence

EBI website: 11EBI Overview DatabasesTools

12EBI Overview Search all main databases in one go EBI search engine: EB-eye

13 Nucleotides: European Nucleotide Archive (ENA) ENA provides a comprehensive, accessible and publicly available repository for nucleotide sequence data Collaboration with GenBank and DDBJ for data sharing It consolidates information from EMBL- Bank, the European Trace Archive (containing raw data from electrophoresis- based sequencing machines) and the Sequence Read Archive (containing raw data from next-generation sequencing platforms) Provides access to the whole scale of sequencing information: from raw data, through assembly and mapping information, through to high-level functional annotation (see figure). EBI Overview

Nucleotides: ENA Download data Navigate to view related data, e.g. taxon-specific data Other type of data include SRA experiments 14EBI Overview

Genomes: Ensembl & Ensembl Genomes Genome browser providing free access to the complete sequences of higher and model organism With Ensembl you can: Retrieve all or part of a genome sequence Perform sequence alignment using BLAST or BLAT Link to genome annotation from microarray results View expressed mRNA, protein, etc. in a chromosomal region View variations such as SNPs across strains or populations View all alternative splicing for a gene Explore homologues and phylogenetic tree across > 30 species View conserved regions across species Ensembl Genomes extends to non-vertebrate genomes 15EBI Overview

Genomes: Ensembl Across speciesWithin species Synteny Pick a genome Orthology Genomic alignments Gene families SNPs Genes Chromosomes 16EBI Overview

Genomes: Ensembl Genomes 17EBI Overview Across speciesView options Ensembl Metazoa Ensembl Metazoa Ensembl Bacteria Ensembl-like genome browser for non- vertebrate species Select Orthologue view to see putative orthologues Using view options, you can select to view only the current gene or the entire expanded gene tree

Retrieving data with Biomart BioMart is a search engine that can be used to download data into a table format Many EBI databases are powered by Biomart For example, you can use Ensembl Biomart to retrieve: All the genes for one species Or… only genes on one specific region of a chromosome Or… genes on one region of a chromosome associated with an InterPro domain Or…etc. 18EBI Overview

Biomart – how it works First Step: Choose a dataset Second step: Add filters to define a gene set Third step: Add attributes to determine column output 19EBI Overview

Biomart results 20EBI Overview

21EBI Overview

ArrayExpress & Atlas of Gene Expression ArrayExpress Archive is a public repository of functional genomics experiments, including gene expression, supporting scientific publications You can query it to retrieve experimental information and download functional genomics data Atlas of Gene Expression contains a subset of curated and re-annotated Archive data Can be queried for individual gene expression under different biological conditions across experiments 22EBI Overview

Transcriptomes: ArrayExpress Expand results Spreadsheets describing the experiment, sample properties or array design Search by keyword ArrayExpress Archive: browse experiments 23EBI Overview

Transcriptomes : Atlas of Gene Expression Search by gene name or biological condition Gene summary page Atlas interface Experiment page 24EBI Overview

Protein sequence: UniProt Provides the scientific community with a comprehensive, richly curated, high- quality and freely accessible resource of protein sequence and functional information Users can perform simple and complex text-based queries, run sequence-based searches, perform multiple sequence alignments, etc. Consists of: UniProtKB/Swiss-prot, manually annotated UniProtKB/TrEMBL, computationally analyzed records Uniref, clustered by sequence identity UniParc, most comprehensive publicly available non-redundant protein sequence db, un- annotated UniMES, protein sequence from metagenomic and environmental data 25EBI Overview

UniPort text search for Brca1 26EBI Overview

Integrated documentation resource for protein families, domains and functional sites Protein signatures from different member databases describing the same biological protein family or domain are united into a single InterPro entry containing information about the signature(s) and links to the protein in UniProt Links to Gene Ontology indicate the biological function and process that the proteins are involved in 27EBI Overview Protein families, motifs & domains: InterPro

Protein families, motifs and domains: InterPro View architectures of proteins containing a signature Compare methods of protein signature prediction Visualize the taxonomic range for a protein signature 28EBI Overview

Molecular interaction database: Intact IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions With Intact you can: Find molecules that interact with your protein of interest Display interaction networks Analyze interaction networks using GO terms, molecule type, role, etc. Download data Install IntAct system locally 29EBI Overview

The Protein Data Bank in Europe (PDBe) PDBe is a resource for the collection, organization and dissemination of data about biological macromolecular structures A suite of web-based services allows you to: PDBeView and PDBeLite provide a flexible and user-friendly query interface to the PDBe database PDBeAnalysis provides searches and statistical analyses of macromolecular structure and residue information PDBeFold allows performing pairwise or multiple comparisons as well as 3D alignments of structures PDBeChem allows searching for and visualize any molecule in the PDB’s ligand dictionary PDBePisa is an interactive tool for exploring macromolecular interfaces and surfaces, predicting probable quaternary structures (assemblies) and searching the PDB for structurally similar interfaces and assemblies PDBeMotif allows complex searches of the PDB based on small 3D motifs, sequence motifs in conjunction with ligand environment, secondary structure patterns Many more tools available 30EBI Overview

Structures: PDBe Ligands Sequence mapping Linking to domain data Assemblies Surface matching Fold matching Active sites Electron density visualization 31EBI Overview

PRoteomics IDEntifications database (PRIDE) PRIDE is a centralized, standards compliant, public data repository for proteomics data Provides the proteomics community with a public repository for protein and peptide identifications together with the evidence supporting these identifications. PRIDE is also able to capture details of post-translational modifications coordinated relative to the peptides in which they have been found. 32EBI Overview

Enzymes: IntEnz IntEnz (Integrated relational Enzyme database) is a freely available resource focused on enzyme nomenclature. IntEnz contains the recommendations of the Nomenclature Committee of the IUBMB on the nomenclature and classification of enzyme- catalysed reactions. 33EBI Overview

Chemical entities: ChEBI ChEBI is a freely available, manually annotated database of small molecular entities A molecular entity is any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity, not directly encoded by the genome With ChEBI you can: Find the correct chemical terminolgy using name, formula or registry number Visualize chemical structures Perform similarity searches View the relationship between molecules using the chEBI ontology Bridge the gap between small molecules and the macromolecules they interact with (crosslink to UniProt and Reactome) Downoload chemical structures Submit new structures 34EBI Overview

Chemical entities: ChEBI Link to other databases View mappings to other databases such as Reactome and Uniprot View structure, nomenclature, formula and more View relationships in the ChEBI Ontology Download flat files, database dumps and the ChEBI Ontology for local installation 35EBI Overview

ChEMBL is a publicly available database of drugs, drug-like small molecules and their targets The data includes information about how small molecules bind to their targets, how these compounds affect cells and whole organisms, and information on the molecules’ absorption, distribution, metabolism, excretion and toxicity. ChEMBL holds two-dimensional structures, calculated molecular properties (e.g. logP, molecular weight, Lipinski ‘Rule of Five’ parameters) and bioactivity data (such as binding constants and pharmacology). The bioactivity data is tagged to show links between molecular targets and published assays, with a set of varying confidence levels. Additional data on the clinical progress of compounds is being integrated into ChEMBL. 36EBI Overview Chemogenomics: ChEMBL

ChEMBL 37EBI Overview

Pathways: Reactome A free, online, open-source curated database of pathways and reactions in human biology Information in the database is authored by expert biologist researchers, maintained by Reactome editorial staff Used to infer orthologous events in 22 non-human species including mouse, rat, chicken, puffer fish, worm, fly, yeast Extensively cross-referenced to other resources e.g. NCBI, Ensembl, UCSC genome Browser, UniProt, PubMed, KEGG, ChEBI and GO. 38EBI Overview

Pathways: Reactome View reactions and events in detail Select a pathway Compare events in different species Export pathway

Pathways: Reactome Display expression data Link to source databases 40EBI Overview

Biological ontologies: Gene Ontology (GO) The GO project is a collaborative effort to address the need for consistent descriptions of gene products in different databases GO develops ontologies that describe biological processes, cellular components and molecular functions in a species-independent manner Also GO annotates several of the EBI’s databases with GO terms 41EBI Overview

User support 2Can bioinformatics user support – Online help pages – support – 42EBI Overview

43EBI Overview

Research

45EBI Overview45 Key facts about research The EBI provides a unique environment for bioinformatics research Seven dedicated research groups aim to understand biology through new approaches to interpreting biological data Services teams also carry out R&D to enhance existing services and develop new ones Research program complements services and the two are mutually supportive

Mammalian stem cell differentiation and development Bertone Vertebrate genome annotation Flicek Genome analysis using evolutionary tools Goldman Transcriptome analysis on a genomic scale Brazma Functional genomics and small RNA analysis Enright Literature analysis and semantic data integration in life science research Rebholz-Schuhmann Protein sequence analysis and functional annotation Apweiler Cheminformatics and metabolism Steinbeck Chemogenomics and drug discovery Overington Neurobiology networks and systems Le Novère Genome-scale analysis of regulatory systems Luscombe Analysis of protein structure, function and evolution Thornton Algorithmic methods for genome analysis Birney Analysis and validation of protein structures; protein– ligand interactions Kleywegt Research Systems Biomedicine Saez-Rodriguez Evolutionary biology Marioni

Training

48EBI Overview48 A tripartite user-training programme Training comes to you Training comes to you Training any time, anywhere, at any pace Training any time, anywhere, at any pace Hands-on user training on all our core data resources for researchers Hands-on user training on all our core data resources for researchers

49EBI Overview49 Hands-on training for all levels of experience Interactive training in our purpose-built IT training suite at EMBL- EBI, Hinxton, Cambridge Learn from the EBI’s experts through a combination of talks and practical exercises Take a tour of all our core data resources, or focus in on specific data types Full programme at

50EBI Overview50 eLearning project – pilot phase 50 Do you want to learn at your own pace at a time that suits you? We are developing a new eLearning platform and need our users to help us test it If you would like to get involved, contact: