Bioinformatics Summer School June 2011

Slides:



Advertisements
Similar presentations
Genome Annotation: A Protein-centric Perspective.
Advertisements

Bioinformatics Ayesha M. Khan Spring 2013.
Databanks (A) NCBINCBI (National Center for Biotechnology Information) is a home for many public biological databases (see an older diagram below). All.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
On line (DNA and amino acid) Sequence Information Lecture 7.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Protein Databases EBI – European Bioinformatics Institute
Proteins and Protein Function Charles Yan Spring 2006.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bio/CS 251 Introduction to Bioinformatics. Class Web Site This site will contain all important documents.
Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit
URL: European Bioinformatics Institute (EMBL-EBI) Swiss Institute of Bioinformatics (SIB) Protein Information Resource.
UniProt - The Universal Protein Resource
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
An Introduction to Bioinformatics Molecular Biology Databases.
Joint EBI-Wellcome Trust Summer School June 2010.
A Seminar report On Electronic Resources :An Overview
What is Bioinformatics?. Conceptualizing biology in terms of molecules and then applying “informatics” techniques from math, computer science, and statistics.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
9/10/20151 Teresa K.Attwood University of Manchester.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Organizing information in the post-genomic era The rise of bioinformatics.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein Database David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Protein and RNA Families
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Bioinformatics and Computational Biology
Computer Storage of Sequences
Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore.
Integration of Bioinformatics into Inquiry Based Learning by Kathleen Gabric.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
1 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data.
EMBOSS "The European Molecular Biology Open Software Suite "
Research Paper on BioInformatics
Introduction to Bioinformatics and Functional Genomics
Demo: Protein Information Resource
Archives and Information Retrieval
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Genomes and Their Evolution
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
Introduction to Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
InterPro An Introduction
Introduction to Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Bioinformatics Summer School 20-24 June 2011 EBI-Wellcome Trust Bioinformatics Summer School 20-24 June 2011

University of Manchester Concepts, historical milestones & the central place of bioinformatics in modern biology: a European perspective 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Concepts, historical milestones & the central place of bioinformatics in modern biology: a personal perspective from a European 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Concepts, historical milestones & the central place of bioinformatics in modern biology: a personal perspective from a European 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Overview Where the term bioinformatics originated Where the ‘modern’ concept originated Some key milestones & people Its place in‘the new biology’ 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Disclaimer Bear in mind that this is a personal view That it’s hard to step out of a situation & look back in & remain objective to separate the European & American histories Observers from different perspectives will see & tell the story differently! So this is just my perspective… & it’s bound up with sequences & dbs 4/27/2017 Teresa K.Attwood University of Manchester

Origin of bioinformatics The origin of the term ‘bioinformatics’ has been attributed to Paulien Hogeweg Dutch theoretical biologist With Ben Hesper, she coined the term in the early ‘70s, defining its meaning as “the study of informatic processes in biotic systems” Hogeweg, P. (2011) The roots of bioinformatics in theoretical biology. PLoS Computational Biology, 7(3), e1002021 The term failed to gain traction for ~20 years 4/27/2017 Teresa K.Attwood University of Manchester

Origin of bioinformatics The origins of the ‘modern’ concept of bioinformatics are rooted in sequence analysis Driven by the desire to collect annotate & analyse them systematically (i.e., using computers)! This concept of‘bioinformatics’ was barely known pre 1990… 4/27/2017 Teresa K.Attwood University of Manchester

Key milestones 1950 1960 1970 1980 1990 2000 2010 CSD insulin GIVEQCCASVCSLYQLENYCN FVNQHLCGSHLVEALYLVCGERGFFYTPKA CSD 1950 1960 1970 1980 1990 2000 2010 insulin ribonuclease Dayhoff Atlas

University of Manchester Margaret Dayhoff 1925-1983 Pioneer of computer methods to compare protein sequences & to derive evolutionary histories from alignments Particular interest in deducing evolutionary connections from sequence evidence 4/27/2017 Teresa K.Attwood University of Manchester

Teresa K.Attwood University of Manchester Margaret Dayhoff Collected all the known protein sequences made them available to the scientific community In 1965, she compiled a book the Atlas of Protein Sequence and Structure 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Margaret Dayhoff 4/27/2017 Teresa K.Attwood University of Manchester

Key milestones 1950 1960 1970 1980 1990 2000 2010 CSD PDB ARPAnet email 1950 1960 1970 1980 1990 2000 2010 insulin ribonuclease DNA sequencing Dayhoff Atlas Auto DNA sequencing Auto protein sequencers 65 7

Data overload in the USA 4/27/2017 Teresa K.Attwood University of Manchester

Data overload in the USA 4/27/2017 Teresa K.Attwood University of Manchester

Data overload in Europe While the US debated where to locate a new centralised resource, EMBL acted… The 1st nucleotide sequence database was thus based in Heidelberg this was the EMBL data library preceded the 1st release of GenBank by ~6 months 4/27/2017 Teresa K.Attwood University of Manchester

Data overload in Europe Copies of the EMBL data library & GenBank were being maintained in Cambridge together with their search tools, etc. Researchers were given access to the dbs via an integrated system “this system is presently being used by over 30 researchers in 8 departments in the University & in local research institutes. These users can keep in touch with each other via the MAIL command”! 4/27/2017 Teresa K.Attwood University of Manchester

Key milestones 1950 1960 1970 1980 1990 2000 2010 EMBL, GenBank PIR CSD PDB ARPAnet Internet email 1950 1960 1970 1980 1990 2000 2010 insulin ribonuclease DNA sequencing Dayhoff Atlas Auto DNA sequencing Auto protein sequencers 65 7 568 859

University of Manchester Enter Amos Bairoch A crazy postgrad student in Switzerland interested in space exploration & the search for ET life His project was to develop software to analyse protein & nucleotide sequences PC/Gene 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Amos Bairoch He published his 1st paper in 1982 a letter to the BJ Suggested use of checksums to “facilitate the detection of typographical & keyboard errors” 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Amos Bairoch Why did he do this? In developing PC/Gene, he also needed to supply a db The Atlas wasn’t available electronically typed in >1,000 protein sequences some from the literature most from the Atlas by 1981, this was a large book, plus several supplements, listing 1,660 proteins 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Amos Bairoch In 1983, he acquired a computer tape of the EMBL databank this was version 2, with 811 sequences In 1984, he received the 1st available computer tape copy of the Atlas (which quickly became the PIR-PSD) but he was deeply unhappy with the PIR format 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Amos Bairoch So he decided to convert the PIR database into the semi-structured format of EMBL part manually & part automatically the result was PIR+ it was distributed as part of PC/Gene (now commercial) In summer 1986, he decided to release the database independently of PC/Gene so that it would be available to all, free of charge 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Amos Bairoch The new database was called Swiss-Prot The 1st release was made on 21 July 1986 the exact number of entries is unknown, as he can’t find the original floppy disks! 4/27/2017 Teresa K.Attwood University of Manchester

Key milestones 1950 1960 1970 1980 1990 2000 2010 EMBL, GenBank Swiss-Prot PROSITE PRINTS EMBL, GenBank PIR CSD PDB ARPAnet Internet email 1950 1960 1970 1980 1990 2000 2010 insulin ribonuclease DNA sequencing Dayhoff Atlas Auto DNA sequencing Auto protein sequencers 65 7 568 859 3,900

University of Manchester Global data overload The number of sequences was growing The number of structures was growing So was the number of protein family signatures Two extraordinary developments had yet to take place what were they? 4/27/2017 Teresa K.Attwood University of Manchester

Key milestones 1950 1960 1970 1980 1990 2000 2010 Swiss-Prot PROSITE PRINTS FlyBase TrEMBL Pfam InterPro EMBL, GenBank PIR CSD PDB ARPAnet Internet email www 1950 1960 1970 1980 1990 2000 2010 insulin ribonuclease DNA sequencing Dayhoff Atlas C.elegans genome H.sapiens genome HT DNA sequencing S.cerevisae genome H.influenzae genome Auto DNA sequencing Auto protein sequencers D.melanogaster genome 65 7 568 859 3,900 105,000 2,423

InterPro Prosite PRINTS ProDom Profiles PIRSF HAMAP Gene3D TIGRFAM SUPERFAMILY InterPro is an integrated documentation resource for protein families, domains & sites. By uniting databases that use different methodologies & a varying degree of biological information, InterPro capitalises on their individual strengths, producing a powerful integrated database & diagnostic tool. Naïvely, we wanted to make life easier! We aimed to: simplify & rationalise protein family analysis; centralise & streamline the annotation process; & reduce manual annotation burdens; &, in the wake of all the genome projects, to facilitate automatic functional annotation of uncharacterised proteins. In fact (& now with 11 partners) we made life a lot harder! But that’s another story TIGRFAM PANTHER Profiles Pfam SMART Teresa K.Attwood

Key milestones 1950 1960 1970 1980 1990 2000 2010 EMBnet NCBI ELIXIR EBI SIB Swiss-Prot PROSITE PRINTS FlyBase TrEMBL Pfam InterPro UniProt ENA EMBL, GenBank PIR CSD PDB ARPAnet Internet email www 1950 1960 1970 1980 1990 2000 2010 insulin ribonuclease DNA sequencing Dayhoff Atlas C.elegans genome H.sapiens genome HT DNA sequencing S.cerevisae genome H.influenzae genome Auto DNA sequencing Auto protein sequencers D.melanogaster genome 65 7 568 859 3,900 105,000 2,423 15.4M 500B

Key milestones 1950 1960 1970 1980 1990 2000 2010 EMBnet NCBI ELIXIR EBI SIB Swiss-Prot PROSITE PRINTS FlyBase TrEMBL Pfam InterPro UniProt ENA EMBL, GenBank PIR CSD PDB hundreds more ARPAnet Internet email www 1950 1960 1970 1980 1990 2000 2010 insulin ribonuclease DNA sequencing Dayhoff Atlas C.elegans genome H.sapiens genome S.cerevisae genome thousands more HT DNA sequencing H.influenzae genome Auto DNA sequencing Auto protein sequencers D.melanogaster genome billions more 65 7 568 859 3,900 105,000 2,423 15.4M 500B

Red Line Growth of EMBL since its inception Scary monsters! By 2020, NGS will be producing data at a million times the current rate Green Line Growth of manually annotated Swiss-Prot Blue Line Growth of PDB 4/27/2017

The central place of bioinformatics in modern biology Hopefully, this potted history speaks for itself In the last 30 years, bioinformatics has given us the first ‘complete’ catalogues of DNA & protein sequences including genomes & proteomes of organisms across the Tree of Life software to analyse biological data on an unprecedented scale & hence tools to help understand more about evolutionary processes in general our place on the Tree of Life in particular &, ultimately, more about health & disease It isn’t a panacea, but its contribution has been huge 4/27/2017 Teresa K.Attwood University of Manchester

University of Manchester Recommended reading A.B.Richon. A short history of bioinformatics (http://www.netsci.org/Science/Bioinform/feature06.html) A.Bairoch (2000) Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times. Bioinformatics, 16(1), 48-64. M.Ashburner (2006) Won for all – How the Drosophila genome was sequenced. Cold Spring Harbor Laboratory Press. B.J.Strasser (2008) GenBank – Natural history in the 21st century? Science, 322, 537-538. What makes us the same but different from each other 4/27/2017 Teresa K.Attwood University of Manchester Teresa K.Attwood