Download presentation
Presentation is loading. Please wait.
Published byPhillip Shepherd Modified over 9 years ago
1
9/10/20151 Teresa K.Attwood University of Manchester
2
Where the term bioinformatics originatedWhere the term bioinformatics originated Where the ‘modern’ concept originatedWhere the ‘modern’ concept originated Some key events & folkSome key events & folk Its place in‘the new biology’Its place in‘the new biology’ 9/10/2015Teresa K.Attwood University of Manchester 2
3
The origin of the term ‘bioinformatics’ has been attributed to Paulien HogewegThe origin of the term ‘bioinformatics’ has been attributed to Paulien Hogeweg –Dutch theoretical biologist She & colleague Ben Hesper coined the term in the early ‘70s, defining it asShe & colleague Ben Hesper coined the term in the early ‘70s, defining it as –“the study of informatic processes in biotic systems” Hogeweg, P. (2011) The roots of bioinformatics in theoretical biology. PLoS Computational Biology, 7(3), e1002021Hogeweg, P. (2011) The roots of bioinformatics in theoretical biology. PLoS Computational Biology, 7(3), e1002021 The term failed to gain traction for ~20 yearsThe term failed to gain traction for ~20 years 9/10/2015Teresa K.Attwood University of Manchester 3
4
The origins of the ‘modern’ concept of bioinformatics are rooted in sequence analysisThe origins of the ‘modern’ concept of bioinformatics are rooted in sequence analysis Driven by the desire toDriven by the desire to –collect –annotate –& analyse sequence data systematically (i.e., using computers)!systematically (i.e., using computers)! 9/10/2015Teresa K.Attwood University of Manchester 4 This concept of‘bioinformatics’was barely known pre 1990…
5
1950 1960 1970 1980 19902000 2010 insulin ribonuclease Dayhoff Atlas GIVEQCCASVCSLYQLENYCN FVNQHLCGSHLVEALYLVCGERGFFYTPKA CSD
6
Pioneer of computer methods to compare proteinsPioneer of computer methods to compare proteins –& to derive evolutionary histories from alignments Particular interest in deducing evolutionary connections from sequence evidenceParticular interest in deducing evolutionary connections from sequence evidence 9/10/2015Teresa K.Attwood University of Manchester 6
7
Collected all the known protein sequencesCollected all the known protein sequences –made them available to the scientific community In 1965, she compiled a bookIn 1965, she compiled a book –Atlas of Protein Sequence & Structure 9/10/20157Teresa K.Attwood University of Manchester
8
“There is a tremendous amount of information regarding the evolutionary history and biochemical function implicit in each sequence and the number of known sequences is growing explosively. We feel it is important to collect this significant information, correlate it into a unified whole and interpret it” M.O.Dayhoff to C.Berkley, February 27, 1967 Strasser, B. (2008) “GenBank – Natural history in the 21 st century?” Science, 322, 537-538 9/10/2015Teresa K.Attwood University of Manchester 8
9
1950 1960 1970 1980 19902000 2010 insulin ribonuclease Dayhoff Atlas CSD ARPAnet PDB 65 7 Auto protein sequencers DNA sequencing Auto DNA sequencing Exam 1 What pernicious, life-changing development occurred in 1971?
10
“the rate limiting step in the process of nucleic acid sequencing is now shifting from data acquisition towards the organization and analysis of that data” Gingeras, T.R. & Roberts, R.J. (1980) “Steps toward Computer Analysis of Nucleotide Sequences,” Science, 209, 1322-1328 9/10/2015Teresa K.Attwood University of Manchester 10
11
“a centralized data bank [is] essential for the efficient use of nucleic acid sequence information” C.Anderson, Minutes, 1980 9/10/2015Teresa K.Attwood University of Manchester 11
12
While the US debated where to locate a new centralised resource, EMBL acted…While the US debated where to locate a new centralised resource, EMBL acted… The 1 st internationally funded, public ‘central’ nucleotide sequence database was thus EuropeanThe 1 st internationally funded, public ‘central’ nucleotide sequence database was thus European –the EMBL data library, Heidelberg preceded the 1 st release of GenBank by ~6 monthspreceded the 1 st release of GenBank by ~6 months 9/10/2015Teresa K.Attwood University of Manchester 12 Attwood, T.K. et al. (2011) Concepts, Historical Milestones & the Central Place of Bioinformatics in Modern Biology: Concepts, Historical Milestones & the Central Place of Bioinformatics in Modern Biology: A European Perspective In Bioinformatics - Trends & Methodologies, Intech Online Publishers,
13
Copies of the EMBL data library & GenBank were being maintained in CambridgeCopies of the EMBL data library & GenBank were being maintained in Cambridge –together with their search tools, etc. An integrated system gave access to the dbs & toolsAn integrated system gave access to the dbs & tools –“this system is presently being used by over 30 researchers in 8 departments in the University & in local research institutes. These users can keep in touch with each other via the MAIL command”! 9/10/2015Teresa K.Attwood University of Manchester 13
14
1950 1960 1970 1980 19902000 2010 insulin ribonuclease Dayhoff Atlas CSD ARPAnet email PDB 65 7 Auto protein sequencers DNA sequencing Auto DNA sequencing Internet EMBL, GenBank PIR 568 859
15
A crazy postgrad student in SwitzerlandA crazy postgrad student in Switzerland –interested in space exploration & the search for ET life His project was to develop s/w to analyse protein & nucleotide sequencesHis project was to develop s/w to analyse protein & nucleotide sequences –PC/Gene 9/10/2015Teresa K.Attwood University of Manchester 15
16
Published his 1 st paper in 1982Published his 1 st paper in 1982 –a letter to the BJ Suggested use of checksumsSuggested use of checksums –“to facilitate detection of typographical & keyboard errors” 9/10/2015Teresa K.Attwood University of Manchester 16
17
Why?Why? Alongside PC/Gene, he needed to supply a dbAlongside PC/Gene, he needed to supply a db The Atlas wasn’t available electronicallyThe Atlas wasn’t available electronically –typed in >1,000 protein sequences –some from the literature –most from the Atlas by 1981, this was a large book, plus several supplements, listing 1,660 proteinsby 1981, this was a large book, plus several supplements, listing 1,660 proteins 9/10/2015Teresa K.Attwood University of Manchester 17
18
In 1983, he acquired a computer tape of the EMBL Data LibraryIn 1983, he acquired a computer tape of the EMBL Data Library –version 2, with 811 sequences In 1984, he received the 1 st available computer tape copy of the AtlasIn 1984, he received the 1 st available computer tape copy of the Atlas –(which became known as the PIR-PSD) –but… he disliked the PIR format 9/10/2015Teresa K.Attwood University of Manchester 18
19
So he converted the PIR database into the semi- structured format of EMBLSo he converted the PIR database into the semi- structured format of EMBL –part manually & part automatically The result was PIR+The result was PIR+ –& was distributed as part of PC/Gene (now commercial) In summer 1986, he finally released the database independently of PC/GeneIn summer 1986, he finally released the database independently of PC/Gene –to make it available to all, free of charge 9/10/2015Teresa K.Attwood University of Manchester 19
20
This new database was called Swiss-ProtThis new database was called Swiss-Prot 1 st released on 21 July 19861 st released on 21 July 1986 –the exact number of entries is unknown, as he lost the original floppy disks! 9/10/2015Teresa K.Attwood University of Manchester 20
21
As part of his work on PC/Gene, he created another key databaseAs part of his work on PC/Gene, he created another key database –diagnostic tool for characterising protein families 1 st released March1989, with 58 entries1 st released March1989, with 58 entries –this was PROSITE Philosophy of his approachPhilosophy of his approach –coupling high quality data analysis with manual annotation 9/10/2015Teresa K.Attwood University of Manchester 21
22
10/09/2015 Teresa K Attwood University of Manchester 22 PRINTSPRINTS[IVM]-[AS]-L-W-S-L-V2-L-A-[IV]-E-R-Y-[IV]3-C-K-P-MPROSITEPROSITE
23
Database annotation…Database annotation… 10/09/2015 Teresa K Attwood University of Manchester 23 Database Maintenanc e Database annotatio n Nirvan a
24
10/09/2015 Teresa K Attwood University of Manchester 24 “It is quite depressive to think that we are spending millions in grants for people to perform experiments, produce new knowledge, hide this knowledge in often badly written text and then spend some more millions trying to second guess what the authors really did and found” Bairoch, A. (2009) The future of annotation/biocuration Nature Precedings
25
1950 1960 1970 1980 19902000 2010 insulin ribonuclease Dayhoff Atlas CSD ARPAnet email PDB 65 7 Auto protein sequencers DNA sequencing Auto DNA sequencing Internet EMBL, GenBank PIR 568 859 Swiss-Prot PROSITE PRINTS 3,900
26
The number of sequences was growingThe number of sequences was growing The number of structures was growingThe number of structures was growing The number of protein family signatures was growingThe number of protein family signatures was growing 9/10/2015Teresa K.Attwood University of Manchester 26 Exam 2 Two extraordinary developments had yet to take place. What were they?
27
1950 1960 1970 1980 19902000 2010 insulin ribonuclease Dayhoff Atlas CSD ARPAnet email PDB 65 7 Auto protein sequencers DNA sequencing Auto DNA sequencing Internet EMBL, GenBank PIR 568 859 Swiss-Prot PROSITE PRINTS 3,900 HT DNA sequencing www H.influenzae genome S.cerevisae genome D.melanogaster genome H.sapiens genome C.elegans genome FlyBase TrEMBL 105,000 Pfam InterPro 2,423
28
10/09/201528 InterProInterPro ProDomProDom PRINTSPRINTS PrositeProsite PANTHERPANTHER SMARTSMART HAMAPHAMAP PIRSFPIRSF TIGRFAMTIGRFAM SUPERFAMILYSUPERFAMILY Gene3DGene3D PfamPfam ProfilesProfiles
29
insulin ribonuclease Dayhoff Atlas CSD ARPAnet email PDB 65 7 Auto protein sequencers DNA sequencing Auto DNA sequencing Internet EMBL, GenBank PIR 568 859 Swiss-Prot PROSITE PRINTS 3,900 HT DNA sequencing www H.influenzae genome S.cerevisae genome D.melanogaster genome H.sapiens genome C.elegans genome FlyBase TrEMBL 105,000 Pfam InterPro 2,423 >500B 36.0M ENA 1950 1960 1970 1980 19902000 2010 UniProt ELIXIR SIB EBI EMBnet NCBI
30
insulin ribonuclease Dayhoff Atlas CSD ARPAnet email PDB 65 7 Auto protein sequencers DNA sequencing Auto DNA sequencing Internet EMBL, GenBank PIR 568 859 Swiss-Prot PROSITE PRINTS 3,900 HT DNA sequencing www H.influenzae genome S.cerevisae genome D.melanogaster genome H.sapiens genome C.elegans genome FlyBase TrEMBL 105,000 Pfam InterPro 2,423 >500B 36.0M ENA 1950 1960 1970 1980 19902000 2010 UniProt ELIXIR SIB EBI EMBnet NCBI thousands more billions more hundreds more
31
Red Line Growth of EMBL since its inception Green Line Growth of manually annotated Swiss-Prot Blue Line Growth of PDB By 2020, NGS & 3Gen technologies will be producing data a million times faster than the current rate 9/10/201531 282 M 540 K 35 M 84 K
32
Hopefully, this potted history speaks for itselfHopefully, this potted history speaks for itself In the last 30 years, bioinformatics has given usIn the last 30 years, bioinformatics has given us –the first ‘complete’ catalogues of DNA & protein sequences including genomes & proteomes of organisms across the Tree of Lifeincluding genomes & proteomes of organisms across the Tree of Life –software to analyse biological data on an unprecedented scale –& hence tools to help understand more about evolutionary processes in generalmore about evolutionary processes in general our place on the Tree of Life in particularour place on the Tree of Life in particular &, ultimately, more about health & disease&, ultimately, more about health & disease It isn’t a panacea, but its contribution has been hugeIt isn’t a panacea, but its contribution has been huge 9/10/2015Teresa K.Attwood University of Manchester 32
33
Recommended reading Richon, A.B. A short history of bioinformatics (http://www.netsci.org/Science/Bioinform/feature06.html) Bairoch, A. (2000) Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times. Bioinformatics, 16(1), 48-64. Ashburner, M. (2006) Won for all – How the Drosophila genome was sequenced. Cold Spring Harbor Lab. Press Strasser, B.J. (2008) GenBank – Natural history in the 21 st century? Science, 322, 537-538. Attwood, T.K., Gisel, A., Eriksson, N-E. & Bongcam-Rudloff, E. (2011) Concepts, Historical Milestones and the Central Place of Bioinformatics in Modern Biology: A European Perspective 9/10/2015Teresa K.Attwood University of Manchester 33
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.