1Introduction 1.0
2Introduction 1.0 Welcome to the Canadian Bioinformatics Workshops Bioinformatics, 9 th Edition Toronto, On, Feb 13 – 25, 2006 Francis Ouellette, Director, CGDN Bioinformatics Core Facility Director, UBC Bioinformatics Centre
3Introduction 1.0 Outline Instructors, schedule, other things... Why does bioinformatics exist? (data, data, data …) Will it exist for ever? (some experts say “no”!) What is bioinformatics? (get 100 scientists in a room, get 100 answers … ) What are the big challenges in bioinformatics? (Research & discipline differences between life sciences and computer sciences) Resources available to Bioinformaticians The importance of Open Access and Open Source
4Introduction 1.0 David Boris Francis Fiona Chris Stephen Francois “CBW core”
5Introduction 1.0 David Boris Francis Fiona John Gabor Chris CBW – Bioinformatics – Toronto 2006 Mike Shum Quaid
6Introduction 1.0 Joanne Jenn Will CBW – Bioinformatics – Toronto 2006 Stephen Gerald Jennifer Philip Tong Bhooma David Philip
7Introduction 1.0 Acknowledgement A great part of this talk is adapted from what Fiona Brinkman developed two years ago (2004). When ever you see “ “, this means that slide is from Fiona’s presentation. Other slides are simply acknowledged with names. Fiona
8Introduction 1.0 Today Introduction Break Biology 11:15 UNIX Lunch (on your own) Biological Databases 15:30 Break 15:45 Entrez lab 17:15 Break 17:30 Public Lecture – Tony Pawson
9Introduction 1.0 François Major Closing Keynote presentation on Saturday Feb 25 th Title Time Etc
10Introduction 1.0 Administrative stuff Accounts on Linux machines –login: guest –password: cbw2005 Security, badges, fire exits, food, code name, marks
11Introduction 1.0 Canadian Bioinformatics Workshops Bioinformatics Genomics Proteomics Developing the Tools You are here bioinformatics.ca Vancouver May 1-6Calgary June Montreal July March 17 March 24
12Introduction 1.0 CBW Sponsors UOttawa
13Introduction 1.0 Questions?
14Introduction 1.0 Introduction - Objectives Why does bioinformatics exist What is bioinformatics What are the big challenges in bioinformatics –Research –Discipline differences between Bio and CS
15Introduction 1.0 Why is there Bioinformatics? Lots of new sequences being added Automated sequencers Genome Projects EST sequencing Microarray studies Proteomics Metagenomics (“Metagenomics” describes the functional and sequence-based analysis of the collective microbial genomes contained in an environmental sample) Patterns in datasets that can only be analyzed using computers Sequencing technology!
16Introduction 1.0 Need for informatics in biology: origins Early databases: Dayhoff, 1972; Erdmann, 1978: The Atlas of Protein Sequences was available on Digital Tape, and in 1980, by modem (300 Baud). Early programs: restriction enzyme sites, patern finding, promoters, etc… circa – 1993: Nucleic Acids Research published supplemental information 1982: DDBJ/EMBL/GenBank are created as a public repository of genetic sequence information. 1983: NIH funds the PIR (Protein Information Resource) database. 1988: Pearson and Lipman create FASTA
17Introduction 1.0 Genomes Number of base pairs ___________________________________________________________ 1971 First published DNA sequence PhiX174 5, Lambda 48, Yeast Chromosome III 316, Haemophilus influenza 1,830, Saccharomyces 12,068, C. elegans 97,000, D. melanogaster 120,000, H. sapines (draft) 2,600,000, H. sapiens 2,850,000,000
18Introduction 1.0 Fr History of DNA Sequencing Avery: Proposes DNA as ‘Genetic Material’ Watson & Crick: Double Helix Structure of DNA Holley: Sequences Yeast tRNA Ala Miescher: Discovers DNA Wu: Sequences Cohesive End DNA Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation Messing: M13 Cloning Hood et al.: Partial Automation Cycle Sequencing Improved Sequencing Enzymes Improved Fluorescent Detection Schemes 1986 Adapted from Messing & Llaca, PNAS (1998) ,000 25,000 1, ,000 >100,000,000 Efficiency (bp/person/year) 15,000 From Eric Green
19Introduction 1.0 Genbank doubles every 14 months (from the National Centre for Biotechnology Information) Shorter than Moore’s law (computer power doubling every 20 months!)
20Introduction 1.0 The next step is obviously to locate all of the genes and regulatory regions, describe their functions, and identify how they differ between different groups (i.e. “disease” vs “healthy”)… …bioinformatics plays a critical role Fiona
21Introduction 1.0
22Introduction 1.0
23Introduction 1.0 Bioinformatics will help with……. Similarity Searching Sequence Databases What is similar to my sequence? Searching gets harder as the databases get bigger - and quality changes Tools: BLAST and FASTA = time saving heuristics (approximate methods) Statistics + informed judgement of the biologist Fiona
24Introduction 1.0 Bioinformatics will help with……. Structure- Function Relationships Can we predict the function of protein molecules from their sequence? sequence > structure > function Prediction of some simple 3-D structures ( - helix, -sheet, membrane spanning, etc.) Fiona
25Introduction 1.0 Top 10 Future Challenges for Bioinformatics Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome Precise, predictive model of RNA splicing/alternative splicing: ability to predict the splicing pattern of any primary transcript in any tissue Precise, quantitative models of signal transduction pathways: ability to predict cellular responses to external stimuli Determining effective protein:DNA, protein:RNA and protein:protein recognition codes Accurate ab initio protein structure prediction Rational design of small molecule inhibitors of proteins Mechanistic understanding of protein evolution: understanding exactly how new protein functions evolve Mechanistic understanding of speciation: molecular details of how speciation occurs Continued development of effective gene ontologies - systematic ways to describe the functions of any gene or protein Education: development of appropriate bioinformatics curricula for secondary, undergraduate and graduate education Chris Burge, Ewan Birney, Jim Fickett. Genome Technology, issue No. 17, January, 2002
26Introduction 1.0 What is Bioinformatics? Think – Pair – Share! Fiona
27Introduction 1.0 Bioinformatics is about understanding how life works. It is an hypothesis driven science.
28Introduction 1.0 Bioinformatics is about integrating biological themes together with the help of computer tools and biological databases, and gaining new knowledge from this.
29Introduction 1.0 BLAST Result Basic Local Alignment Search Tool
30Introduction 1.0 Genetic Analysis of Cancer in Families The Genetic Predisposition to Cancer PubMed Text Neighboring Common terms could indicate similar subject matter Statistical method Weights based on term frequencies within document and within the database as a whole Some terms are better than others From Mark Boguski
31Introduction 1.0 Micro-array analysis: Figure 4 Figure 1 Science Jan : The Transcriptional Program in the Response of Human Fibroblasts to Serum Vishwanath R. Iyer, Michael B. Eisen, Douglas T. Ross, Greg Schuler, Troy Moore, Jeffrey C. F. Lee, Jeffrey M. Trent, Louis M. Staudt, James Hudson Jr., Mark S. Boguski, Deval Lashkari, Dari Shalon, David Botstein, Patrick O. Brown
32Introduction 1.0 VAST Result Ferredoxin Halobacterium marismortui Chlorella fusca Vector Alignment Search Tool
33Introduction 1.0 Computational Biology Analysis Q Gln NH 2 -C-CH 2 -CH 2 - O R Arg NH 2 -C-NH-CH 2 -CH 2 -CH 2 - +NH 2
34Introduction 1.0 UBiC: Links Directory Curated list of links to bioinformatics tools available worldwide SCIENCE 16 April 2004
35Introduction 1.0 UBiC: Links Directory
36Introduction 1.0 bioinformatics.ubc.ca/resources/links_directory
37Introduction 1.0
38Introduction 1.0 Open Source and Open Access Making It Work with Open Source and Open Access What does it mean? –Will it cause the economy to go bust? –Is it too good to be true? –What if I get scooped?
39Introduction 1.0 Open Source: what does it mean? Open source - Any software whose code is available for users to look at and modify freely. Linux is the best-known example; others include Apache, the dominant software for servers that provide web pages worldwide.
40Introduction 1.0 Open Source in the life sciences Present in all areas of bioinformatics Some very well known examples of tools used in industry and academic circles include: –BLAST –EMBOSS –EnsEMBL –MLagan –GenScan –Bioconductor
41Introduction 1.0 Open Access Unrestricted access to data Allows all to work and make discoveries Discoveries are not necessarily open access Open access is applicable to any kind of data you want to apply it to: –Sequence data (DNA, RNA or protein) –Gene expression data –Protein-protein interaction data –Publication
42Introduction 1.0 An Open Access Publication is one that meets the following two conditions: –The author(s) and copyright holder(s) grant(s) to all users a free, irrevocable, worldwide, perpetual right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship, as well as the right to make small numbers of printed copies for their personal use. –A complete version of the work and all supplemental materials, including a copy of the permission as stated above, in a suitable standard electronic format is deposited immediately upon initial publication in at least one online repository that is supported by an academic institution, scholarly society, government agency, or other well-established organization that seeks to enable open access, unrestricted distribution, interoperability, and long-term archiving (for the biomedical sciences, PubMed Central is such a repository). From:
43Introduction 1.0 Open access critical to progress in Science Without GenBank and other public sequence databases –There would be no BLAST –There would be no diagnostics DNA testing –There would be no understanding of the human genome (there probably would not have been a human genome to work on in the first place).
44Introduction 1.0 Open Access of Publications We are way overdue to break down the ivory towers that surround a few journals that are allowed to hide data from everybody that does not pay them! Who did the work? Taxpayer’s dollars! There are enough good (read “well reviewed”) journals out there now so that we need not publish in closed journals. We will need to get rid of the “old guard” that only wants to publish in Science, Nature and Cell. I think these journals will change – Physicists have been doing this for decades – biologists will figure it out soon. Need the reagents to do discoveries on the text data: have diseases to understand, cures to find!
45Introduction 1.0
46Introduction 1.0
47Introduction 1.0
48Introduction 1.0
49Introduction 1.0 So what are we doing here? We will figure out how to think like a bioinformaticians, and plan, execute and interpret bioinformatics experimets. For this we will need to know and fully understand on available (understandable) tools and databases. How they work, and how to interpret the output file. LET US DO IT!!!
50Introduction 1.0 One other tool to help in this process: The wiki Every year since we have been teaching this workshop we have been adding and changing things from year to year – This year, amongst many things, we are adding the wiki …
51Introduction 1.0 Questions? Open access? Open source? Where the washrooms are? CBW – Bioinformatics? CBW – Proteomics? Genomics? Tools? Graduate program in Bioinformatics at UBC? When do we start?