What is Bioinformatics?. Conceptualizing biology in terms of molecules and then applying “informatics” techniques from math, computer science, and statistics.

Slides:



Advertisements
Similar presentations
Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB
Advertisements

Bioinformatics for genomics Kickoff Bioinformatics Expertise Center 10 November 2009 Judith Boer Dept. of Human Genetics.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
BIOINFORMATICS Ency Lee.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Bioinformatics David Brodin BEA core facility MOLEKYLÄRBIOLOGI MED GENETIK – BIOINFORMATIK HT -07 Course web page:
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
AI and Bioinformatics From Database Mining to the Robot Scientist.
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Archives and Information Retrieval
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Bioinformatics and Phylogenetic Analysis
JYC: CSM17 BioinformaticsCSM17 Week1:What is Bioinformatics? A Multidisciplinary Subject incorporating: Biology –the study of living systems Informatics.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies Lonnie Welch School of Electrical Engineering & Computer Science Biomedical.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Sequence Analysis Determining how similar 2 (or more) gene/protein sequences are (too each other) is a “staple” function in bioinformatics. This information.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Bioinformatics Sean Langford, Larry Hale. What is it?  Bioinformatics is a scientific field involving many disciplines that focuses on the development.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
© What do bioinformaticians do?
Introduction to Bioinformatics Prologue. Bioinformatics Living things have the ability to store, utilize, and pass on information Bioinformatics strives.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Master’s Degrees in Bioinformatics in Switzerland: Past, present and near future Patricia M. Palagi Swiss Institute of Bioinformatics.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
EMBRACE An example of Grid Integration (I): The EMBRACE project Jean SALZEMANN CNRS/IN2P3.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
EB3233 Bioinformatics Introduction to Bioinformatics.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
BIOINFOGRID: Bioinformatics Grid Application for life science MILANESI, Luciano National Research Council Institute of.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Bioinformatics Summer School June 2011
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Instructor Prof. Chandrama P. Upadhyaya 220, Life Sciences Building ,
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.
BME435 BIOINFORMATICS.
Research Paper on BioInformatics
Introduction to bioinformatics
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
생물정보학 Bioinformatics.
Algorithms for Biological Sequence Analysis
Mangaldai College, Mangaldai
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

What is Bioinformatics?

Conceptualizing biology in terms of molecules and then applying “informatics” techniques from math, computer science, and statistics to understand and organize the information associated with these molecules on a large scale

Focus

Profile of a bioinformatician (General) knowledge of biology and genome sciences Translation biology informatics Knowledge of Unix-based operating systems Programming skills (Java, Python, Shell/Perl scripting, R) (Parallel) computing environments Data storage and database technology Statistics Mathematics Freely adapted from Richter et al (2009) PLoS computational biology

How do we use Bioinformatics? Store/retrieve biological information (databases) Retrieve/compare gene sequences Predict function of unknown genes/proteins Search for previously known functions of a gene Compare data with other researchers Compile/distribute data for other researchers

Other bioinformatics organisations European Bioinformatics Institute (EBI) – National Center for Biotechnology Information (NCBI) – EMBnet – International Society for Computational Biology (ISCB) –

1965 Margaret Dayhoff's Atlas of Protein Sequences 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document Grid computing as a metaphor for making computer power as easy to access as an electric power grid EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published Minimum information about a microarray experiment (MIAME; Brazma) Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna) Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways History of bioinformatics

1965 Margaret Dayhoff's Atlas of Protein Sequences 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document Grid computing as a metaphor for making computer power as easy to access as an electric power grid EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published Minimum information about a microarray experiment (MIAME; Brazma) Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna) Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

1965 Margaret Dayhoff's Atlas of Protein Sequences 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document Grid computing as a metaphor for making computer power as easy to access as an electric power grid EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published Minimum information about a microarray experiment (MIAME; Brazma) Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna) Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

Global alignment (toy example) CATGATGA CTGAGAT Can you “align” these two sequences  introduce “gaps” in these two sequences such that you maximize the number of matching nucleotides

Global alignment (toy example) CATGATGA CTGAGAT CATGATGA- C-TGA-GAT Helps us to understand the function of ‘new’DNA Dynamic programming gives optimal solution… … but is slow. Often heuristic methods are used (BLAST, BLAT)

Hogeweg, P. (1978). Simulating the growth of cellular forms. Simulation 31, 90-96; Hogeweg, P. and Hesper, B. (1978) Interactive instruction on population interactions. Comput Biol Med 8: Paulien Hogeweg (1943) Dutch theoretical biologist and complex systems researcher studying biological systems as dynamic information processing systems at many interconnected levels. Together with Ben Hesper she coined the term Bioinformatics in 1978 as the study of informatic processes in biotic systems 1978

1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document Grid computing as a metaphor for making computer power as easy to access as an electric power grid EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published Minimum information about a microarray experiment (MIAME; Brazma) Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna) Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document Grid computing as a metaphor for making computer power as easy to access as an electric power grid EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published Minimum information about a microarray experiment (MIAME; Brazma) Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna) Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document Grid computing as a metaphor for making computer power as easy to access as an electric power grid EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published Minimum information about a microarray experiment (MIAME; Brazma) Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna) Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document Grid computing as a metaphor for making computer power as easy to access as an electric power grid EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published Minimum information about a microarray experiment (MIAME; Brazma) Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna) Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document Grid computing as a metaphor for making computer power as easy to access as an electric power grid EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 The term e-Science was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published Minimum information about a microarray experiment (MIAME; Brazma) Genetical Genomics (Ritsert Jansen, Jan Peter Nap) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna) Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

Bioinformatics in the Netherlands 1976 Pauline Hogeweg (theoretical biology) 1979 Gert Vriend (proteins) 1985 Computer Assisted Organic Synthesis/Computer Assisted Molecular Modelling Centre (CAOS/CAMM) was founded (Nijmegen, Jan Noordik) 1989 Jack Leunissen (first Dutch researcher with PhD in Bioinformatics) 90 ‘s Driving forces: Herman Berendsen, Charles Buys, Jacob de Vlieg 1999 CAOS/CAMM was reorganized; Gert Vriend becomes director of CMBI KNAW committee(chaired by Berendsen) wrote the report ‘Bioexact’ in which strong stimulation of bioinformatics was recommended KNCV working group bioinformatics 2000 NWO-BMI (Biomolecular informatics); program committee chaired by De Vlieg 2001 NWO/KNAW workshop ‘The future of bioinformatics in the Netherlands’ 2002 Position paper ‘De toekomst van de bioinformatica in Nederland’ representing the vision of the NWO/KNAW 2003 NBIC was founded 2003 First BioRange proposal (Vriend, Berendsen, Hertzberger, Tellegen) 2005 Start of BioRange (NBIC-I) 2008 ……………

Publication history

Many different bioinformatic tools are freely available – BLAST, EMBOSS, EnsEMBL, GenScan, BioConductor, Many different biological databases are freely available – GenBank, UniProtKB, KEGG, Many publications in open access journals – BMC bioinformatics – PLoS computational biology Also many commercial software packages available – Spotfire, Rosetta Resolver, Genelogic, Bioinformaticians write their own tools for specialized tasks Bioinformatics tools and databases

National Center for Biotechnology Information GenBank and other genome databases Sequence retrieval: Protein Structure: 3D modeling programs – RasMol, Protein Explorer Sequence comparison programs: BLASTGCGMacVector

Similarity Search: BLAST A tool for searching gene or protein sequence databases for related genes of interest The structure, function, and evolution of a gene may be determined by such comparisons Alignments between the query sequence and any given database sequence, allowing for mismatches and gaps, indicate their degree of similarity

MRCKTETGAR MRCGTETGAR % identity 90% CATTATGATA GTTTATGATT 70%

Strengths: Accessibility Growing rapidly User friendly Weaknesses: Sometimes not up-to-date Limited possibilities Limited comparisons and information Not accurate

Need for improved Bioinformatics Genomics:Human Genome Project Gene array technology Comparative genomics Functional genomics Proteomics: Global view of protein function/interactions Protein motifs Structural databases

Data Mining Handling enormous amounts of data Sort through what is important and what is not Manipulate and analyze data to find patterns and variations that correlate with biological function

bioinformatics students educators researchers institutions