Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Slides:



Advertisements
Similar presentations
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Advertisements

Bioinformatics Tutorial I BLAST and Sequence Alignment.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Bioinformatics and Phylogenetic Analysis
Multiple sequence alignments and motif discovery Tutorial 5.
Protein Sequence Classification Using Neighbor-Joining Method
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
HMMER tutorial 羅偉軒 Account IP: Account: binfo2005 Password: 2005binfo.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Comparative Genomics of Viruses: VirGen as a case study Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune Pune
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Chapter 5 Multiple Sequence Alignment.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Multiple sequence alignment
“Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Protein Sequence Alignment and Database Searching.
BLAST Workshop Maya Schushan June 2009.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Copyright OpenHelix. No use or reproduction without express written consent1.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Novel vertebrate homologues detected for two families of mechanosensitive channels. HYUN JI KIM and MARK S. P. SANSOM Structural Bioinfomatics and Computational.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Construction of Substitution matrices
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Copyright OpenHelix. No use or reproduction without express written consent1.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Bioinformatics Overview
Basics of BLAST Basic BLAST Search - What is BLAST?
Dr Tan Tin Wee Director Bioinformatics Centre
Sequence Based Analysis Tutorial
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
BLAST Slides adapted & edited from a set by
Presentation transcript:

Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK

1) BLAST/WUBLAST A search engine to find sequences of your interest. BLAST can sophisticate its search, by varying substitution matrices/filtering options on a specified database ) ClustalW/T-Coffee/Muscle Helps us make sense of a bunch of unaligned sequences, via generating multiple or pairwise sequence alignments. Uses a progressive-alignment method. 3) HMMer/PSI-BLAST Builds a profile Hidden Markov Model from a set of sequences aligned. Aligns sequences using a pHMM, searches from a sequence database, and can assign functions to a given sequence. 4) Phylip/TreeDyn Calculates a distance matrix from a set of sequences. Derives phylogenetic trees, by taking such matrix as input, based upon theories of minimum evolution, parsimony and more. Basic Tools

5) Databases Nucleotide databases; EMBL, Genbank &DDBJ Protein databases; fully annotated, e.g. Swiss-Prot v52.3, as of 17 th of Apr., (264,492 entries) a computer-annotated, e.g. TrEMBL v35.3 Genomics databases; Ensembl & Eukaryota, Bacteria and Archaea genomes 20+14;(v44), 51, 445, 40, as of 20 th of Apr., ) Major Bioinformatics Centres, around the globe

Searching for sequences by homology - BLAST

x y i j

Reference: Gish, W. ( ) Query= KcsA (160 letters) >Filtered+0 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE RRGHFVRHSEKXXXXXXXXXXXXLHERFDRLERMLDDNRR Database: swissprot 223,100 sequences; 81,965,973 total letters. Searching % done Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N SW:KCSA_STRCO P0A333 Voltage-gated potassium channel e-60 1 SW:KCSA_STRLI P0A334 Voltage-gated potassium channel e-60 1 >SW:KCSA_STRCO P0A333 Voltage-gated potassium channel. Length = 160 Score = 615 (221.5 bits), Expect = 3.0e-60, P = 3.0e-60, Group = 1 Identities = 120/160 (75%), Positives = 120/160 (75%) Query: 1 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI 60 MPPM GRHGSALHWR GSYLAVLAERGAPGAQLI Sbjct: 1 MPPMLSGLLARLVKLLLGRHGSALHWRAAGAATVLLVIVLLAGSYLAVLAERGAPGAQLI 60 Query: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE Sbjct: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120

Multiple sequence alignment – ClustalW

***************************************************** CLUSTAL W (1.83) Multiple Sequence Alignments ***************************************************** 1. Sequence Input From Disc 2. Multiple Alignments 3. Profile / Structure Alignments 4. Phylogenetic trees S. Execute a system command H. HELP X. EXIT (leave program) Your choice: 2 ****** MULTIPLE ALIGNMENT MENU ****** 1. Do complete multiple alignment now (Slow/Accurate) 2. Produce guide tree file only 3. Do alignment using old guide tree file 4. Toggle Slow/Fast pairwise alignments = SLOW 5. Pairwise alignment parameters 6. Multiple alignment parameters 7. Reset gaps before alignment? = OFF 8. Toggle screen display = ON 9. Output format options S. Execute a system command H. HELP or press [RETURN] to go back to main menu Your choice:

CLUSTAL W (1.82) multiple sequence alignment KVAP_AERPE FDALW-WAVVTATTVGYGDVVP-ATPIGKVIGIAVMLTGISALTLLIGTVSNMF MVP_METJA FDAFY-FTTISITTVGYGDITP-KTDAGKLI---IIFS---VLFFISGLITS O28600 FDSLY-MTVITITTTGYGEVKP-MGPGGRVISMLLMFVGVGTF Q8TXQ4 LTCLY-FTAATITTVGYGDVVP-TTEAGRLLSVIVMFSGIGVASYAL Q6L2S2 FTSLW-WTMQTITTVGYGDTPV-YGFYGRINGMLIMVFGIGTIGYVTASLAT Q979Z2 FTAIW-FTMETVTTVGYGDVVP-VSNLGRVVAMLIMVSGIGLLGTLTATISAYLF----Q 80 O26605 EDSLW-YVLQTITTVGYGDIVP-VTSLGRFTGMVIMFSAIASTSLITASATSTLLERGEQ 114 Q9HIA8 GNAFY-YTGEVITTLGFGDILP-VTMDAKIFTISLAFLGVAIFFSSITALILPSVERRLG 94 Q97CK5 GTALY-YTGETVTTLGFGDILP-VDLESRLFTISLAFLGVAIFFSAMTALITPTIERRVG 84 GrayOthers Hydroxyl, AmineGreenSTYHCNGQ BasicMagentaRHK AcidicBlueDE Small (small+ hydrophobic (incl.aromatic -Y)) RedAVFPMILW

Profile alignment & Pattern recognition: HMMer More sensitive homology-search: PSI-BLAST & HMMer

DNA sequence Amino acid sequence

PSI-BLAST

Phylogeny: Phylip & Treedyn

Saitou N and Nei M, The neighbour-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol, 4(4): , 1987

TreeDyn

Protein secondary structure prediction: two consensus methods

| | | | | | | MFAKGYGKNNEPLRGYILTFLIALGFILIAELNVIAPIISNFFLASYALINFSVFHASLAKSPGWRPAFK ALOM2 ***************** DAS **************************************** HMMTOP2 ****************** ************************* MEMSAT1.5 ************************* PHD ************************* SPLIT4 **************** *************************** TMAP ***************************** TMFINDER **************************************** TMHMM2 *********************** ****************** TMPRED ************************* TOPPRED2 ********************* ********************* Consensus ???hhhhHHHHHHHHHHHHHHHHHhHHhhhhhhhhh??????????? Dr. Jonathan Cuthbertson developed Transmembrane Prediction Server. Example Output

Pongo

Example Output by Pongo

Background for practical sessions

Ion channels ; Potassium channels ; Voltage-gated potassium channels Ion channels are a diverse class of transmembrane proteins that are responsible for the diffusion of ions across the cell membranes. There are several major families of ion channels, for instance K +, Na +, Ca 2+ and Cl - channels as well as ligand gated ion channels (LGICs). Many human neurological and muscular disorders have been traced to defects in voltage-gated and ligand-gated ion channels. Fig 2. A. Long et al., Science, Vol. 309, p897, 2005 TM T1  Introduction to your input sequence

K + channels, blastp Homologues are visualised in BLIXEM. Your expected blastp-output

Kv BK SK Erg Kir CNG AKT Kv1.x Shab Kv2.x Shal Kv4.x Kv Shaw Kv3.x Kir2.x Kir6.2 Kir3.x Kir4.x Kir1.1 Kir6.1 Kir2.3 Fig 4. Shealy et al., Biophysical Journal, Vol 84, p2929, 2003 Alignment you are about to build, not necessarily as big.

hmmsearch - search a sequence database with a profile HMM HMM file: Kv.hmm [Kv_homologues] Sequence database: infile_comb Query HMM: Kv_homologues HMM has been calibrated; E-values are empirical estimates] Scores for complete sequences (score includes all domains): Sequence Description Score E-value N CIKS_DROME e-71 1 Q9VX00_DROME e-69 1 CIKB_DROME e-46 1 O62350_Celegans e-46 1 Q9VLC6_DROME e-46 1 CIKW_DROME e-45 1 Q8SYL2_DROME e-45 1 Q22012_Celegans e-45 1 Filtered_5DROME e-41 1 Filtered_6DROME e-41 1 Q9XXD1_Celegans e-36 1 Example of pHMM-related output

Kir Kv BK SK AKT CNG/HErg KcsA MthK Kv1.2 KvAP Raw tree-files produced by PHYLIP

Phylogenetic trees modified in TreeDyn