1-Month Practical Master Course Genome Analysis (Integrative Bioinformatics & Genomics) Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije.

Slides:



Advertisements
Similar presentations
MNW leerlijn Bioinformatics Bioinformatics & Systems Biology Faculty of Sciences & Faculty of Earth and Life Sciences Jaap Heringa – 12 sep 2011.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Ulf Schmitz, Introduction to molecular and cell biology1 Bioinformatics Introduction to molecular and cell biology Ulf Schmitz
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Master Course Sequence Analysis Anton Feenstra, Bart van Houte, Walter Pirovano, Jaap Heringa Tel , Rm.
Introduction to bioinformatics Lecture 2 Genes and Genomes.
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Integrative Bioinformatics Institute VU (IBIVU) Tel ,
Molecular Evolution Revised 29/12/06
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Structural bioinformatics
Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)
“Nothing in Biology makes sense except in the light of evolution” (Theodosius Dobzhansky ( )) “Nothing in bioinformatics makes sense except in.
“INTRODUCTION TO BIOINFORMATICS” by (Aqsad). What is Bioinformatics? Bioinformatics = Biology + Information Biology is becoming an information science.
1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands.
Sequence analysis lecture 6 Sequence analysis course Lecture 6 Multiple sequence alignment 2 of 3 Multiple alignment methods.
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
MCB Class 2. TA: Amanda Dick Office: BioPhysics 402B.
MCB Class 1. Protein structure: Angles in the protein backbone.
Bioinformatics and Phylogenetic Analysis
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
Course Sequence Analysis for Bioinformatics Master’s Bart van Houte, Radek Szklarczyk, Walter Pirovano, Jaap Heringa
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
Master Course Sequence Analysis Anton Feenstra, Bart van Houte, Radek Szklarczyk, Walter Pirovano, Jaap Heringa Tel.
Protein Modules An Introduction to Bioinformatics.
Sequence similarity.
Similar Sequence Similar Function Charles Yan Spring 2006.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Pairwise alignment Computational Genomics and Proteomics.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 16/11/06 Multiple sequence alignment 1 Sequence analysis 2006 Multiple.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Pair-wise Sequence Alignment Introduction to bioinformatics 2007 Lecture 5 C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pairwise & Multiple sequence alignments
Pairwise Alignments Part 1 Biology 224 Instructor: Tom Peavy Sept 8
Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Chapter 11 Assessing Pairwise Sequence Similarity: BLAST and FASTA (Lecture follows chapter pretty closely) This lecture is designed to introduce you to.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Centre for Integrative Bioinformatics VU (IBIVU) Tel ,
Construction of Substitution Matrices
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Pair-wise Sequence Alignment Introduction to bioinformatics 2007 Lecture 5 C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Course Sequence Analysis for Bioinformatics Master’s Bart van Houte, Radek Szklarczyk, Victor Simossis, Jens Kleinjung, Jaap Heringa
COT 6930 HPC and Bioinformatics Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Construction of Substitution matrices
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
PM703 Practical Biotechnology (2015). Bioinformatics Lab Learn the DNA language Material by Dr. Ramy K. Aziz.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Last lecture summary. Sequence alignment What is sequence alignment Three flavors of sequence alignment Point mutations, indels.
Bioinformatics Overview
Sequence similarity, BLAST alignments & multiple sequence alignments
Basics of Comparative Genomics
MCB Class 1.
Bioinformatics For MNW 2nd Year
Introduction to bioinformatics 2007
MCB Class 1.
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Basics of Comparative Genomics
1-month Practical Course
Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment
Presentation transcript:

1-Month Practical Master Course Genome Analysis (Integrative Bioinformatics & Genomics) Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E

Mathematics Statistics Computer Science Informatics Biology Molecular biology Medicine Chemistry Physics Bioinformatics

Biological Sequence Analysis Pair-wise sequence alignment Residue exchange matrices Multiple sequence alignment Phylogeny CENTRFORINTEGRATIVE BIOINFORMATICSVU E

.....acctc ctgtgcaaga acatgaaaca nctgtggttc tcccagatgg gtcctgtccc aggtgcacct gcaggagtcg ggcccaggac tggggaagcc tccagagctc aaaaccccac ttggtgacac aactcacaca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc acggtgccca gagcccaaat cttgtgacac acctccccca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc ccggtgccca gcacctgaac tcttgggagg accgtcagtc ttcctcttcc ccccaaaacc caaggatacc cttatgattt cccggacccc tgaggtcacg tgcgtggtgg tggacgtgag ccacgaagac ccnnnngtcc agttcaagtg gtacgtggac ggcgtggagg tgcataatgc caagacaaag ctgcgggagg agcagtacaa cagcacgttc cgtgtggtca gcgtcctcac cgtcctgcac caggactggc tgaacggcaa ggagtacaag tgcaaggtct ccaacaaagc aaccaagtca gcctgacctg cctggtcaaa ggcttctacc ccagcgacat cgccgtggag tgggagagca atgggcagcc ggagaacaac tacaacacca cgcctcccat gctggactcc gacggctcct tcttcctcta cagcaagctc accgtggaca agagcaggtg gcagcagggg aacatcttct catgctccgt gatgcatgag gctctgcaca accgctacac gcagaagagc ctctc..... DNA sequence

Genome size OrganismNumber of base pairs  X-174 virus5,386 Epstein Bar Virus172,282 Mycoplasma genitalium580,000 Hemophilus Influenza1.8  10 6 Yeast (S. Cerevisiae)12.1  10 6 Human 3.2  10 9 Wheat16  10 9 Lilium longiflorum 90  10 9 Salamander100  10 9 Amoeba dubia670  10 9

Three main principles DNA makes RNA makes Protein Structure more conserved than sequence Sequence Structure Function

TERTIARY STRUCTURE (fold) Genome Expressome Proteome Metabolome Functional Genomics Regulation, signalling cascades, chaperonins, compartmentalisation

How to go from DNA to protein sequence A piece of double stranded DNA: 5’ attcgttggcaaatcgcccctatccggc 3’ 3’ taagcaaccgtttagcggggataggccg 5’ DNA direction is from 5’ to 3’

How to go from DNA to protein sequence 6-frame translation using the codon table (last lecture): 5’ attcgttggcaaatcgcccctatccggc 3’ 3’ taagcaaccgtttagcggggataggccg 5’

Dean, A. M. and G. B. Golding: Pacific Symposium on Bioinformatics 2000 Evolution and three-dimensional protein structure information Isocitrate dehydrogenase: The distance from the active site (in yellow) determines the rate of evolution (red = fast evolution, blue = slow evolution)

Protein Sequence-Structure-Function Sequence Structure Function Threading Homology searching (BLAST) Ab initio prediction and folding Function prediction from structure

Widely used tool for homology detection: PSI-BLAST Heuristic tool to cut down computations required for database searching (~1M sequences in DB) Sensitivity gained by iteratively finding hits (local alignments) and repeating search Q DBT hits PSSM

Threading Query sequence Template sequence + Template structure Compatibility score

Threading Query sequence Template sequence + Template structure Compatibility score

Fold recognition by threading Query sequence Compatibility scores Fold 1 Fold 2 Fold 3 Fold N

“Nothing in Biology makes sense except in the light of evolution” (Theodosius Dobzhansky ( )) “Nothing in bioinformatics makes sense except in the light of Biology” Bioinformatics

Divergent evolution Ancestral sequence: ABCD ACCD (B C) ABD (C ø) ACCD or ACCD Pairwise Alignment AB─D A─BD mutation deletion

Divergent evolution Ancestral sequence: ABCD ACCD (B C) ABD (C ø) ACCD or ACCD Pairwise Alignment AB─D A─BD true alignment mutation deletion

Mutations under divergent evolution Ancestral sequence Sequence 1Sequence 2 1: ACCTGTAATC 2: ACGTGCGATC * ** D = 3/10 (fraction different sites (nucleotides)) G GC (a)G AC (b) G AA (c) One substitution - one visible Two substitutions - one visible Two substitutions - none visible G G A (d) Back mutation - not visible G

Convergent evolution Often with shorter motifs (e.g. active sites) Motif (function) has evolved more than once independently, e.g. starting with two very different sequences adopting different folds Sequences and associated structures remain different, but (functional) motif can become identical Classical example: serine proteinase and chymotrypsin

Serine proteinase (subtilisin) and chymotrypsin Different evolutionary origins, no sequence similarity Similarities in the reaction mechanisms. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base. The geometric orientations of the catalytic residues are similar between families, despite different protein folds. The linear arrangements of the catalytic residues reflect different family relationships. For example the catalytic triad in the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC).

A protein sequence alignment MSTGAVLIY--TSILIKECHAMPAGNE GGILLFHRTHELIKESHAMANDEGGSNNS * * * **** *** A DNA sequence alignment attcgttggcaaatcgcccctatccggccttaa att---tggcggatcg-cctctacgggcc---- *** **** **** ** ******

What can sequence tell us about structure (HSSP) Sander & Schneider, 1991

Searching for similarities What is the function of the new gene? The “lazy” investigation (i.e., no biologial experiments, just bioinformatics techniques): – Find a set of similar protein sequences to the unknown sequence – Identify similarities and differences – For long proteins: identify domains first

Evolutionary and functional relationships Reconstruct evolutionary relation: Based on sequence -Identity (simplest method) -Similarity Homology (common ancestry: the ultimate goal) Other (e.g., 3D structure) Functional relation: Sequence Structure Function

Common ancestry is more interesting: Makes it more likely that genes share the same function Homology: sharing a common ancestor – a binary property (yes/no) – it is a very useful property: When (an unknown) gene X is homologous to (a known) gene G it means that we gain a lot of information on X: what we know about G can be transferred to X as a good suggestion. Searching for similarities

Biological definitions for related sequences  Homologues are similar sequences in two different organisms that have been derived from a common ancestor sequence. Homologues can be described as either orthologues or paralogues.  Orthologues are similar sequences in two different organisms that have arisen due to a speciation event. Orthologs typically retain identical or similar functionality throughout evolution.  Paralogues are similar sequences within a single organism that have arisen due to a gene duplication event.  Xenologues are similar sequences that do not share the same evolutionary origin, but rather have arisen out of horizontal transfer events through symbiosis, viruses, etc.

How to evolve Important distinction: Orthologues: homologous proteins in different species (all deriving from same ancestor) Paralogues: homologous proteins in same species (internal gene duplication) In practice: to recognise orthology, bi-directional best hit is used in conjunction with database search program (this is called an operational definition)

Source: So this means …

Pairwise sequence alignment needs sense of evolution Global dynamic programming MDAGSTVILCFVG MDAASTILCGSMDAASTILCGS Amino Acid Exchange Matrix Gap penalties (open,extension) Search matrix MDAGSTVILCFVG- MDAAST-ILC--GS Evolutionary model

How to determine similarity Frequent evolutionary events at the DNA level: 1. Substitution 2. Insertion, deletion 3. Duplication 4. Inversion We will restrict ourselves to these events

A DNA sequence alignment attcgttggcaaatcgcccctatccggccttaa att---tggcggatcg-cctctacgggcc---- *** **** **** ** ****** A protein sequence alignment MSTGAVLIY--TSILIKECHAMPAGNE GGILLFHRTHELIKESHAMANDEGGSNNS * * * **** *** nucleotide one- letter code amino acid one- letter code