Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK
1) BLAST/WUBLAST A search engine to find sequences of your interest. BLAST can sophisticate its search, by varying substitution matrices/filtering options on a specified database ) ClustalW/T-Coffee/Muscle Helps us make sense of a bunch of unaligned sequences, via generating multiple or pairwise sequence alignments. Uses a progressive-alignment method. 3) HMMer/PSI-BLAST Builds a profile Hidden Markov Model from a set of sequences aligned. Aligns sequences using a pHMM, searches from a sequence database, and can assign functions to a given sequence. 4) Phylip/TreeDyn Calculates a distance matrix from a set of sequences. Derives phylogenetic trees, by taking such matrix as input, based upon theories of minimum evolution, parsimony and more. Basic Tools
5) Databases Nucleotide databases; EMBL, Genbank &DDBJ Protein databases; fully annotated, e.g. Swiss-Prot v52.3, as of 17 th of Apr., (264,492 entries) a computer-annotated, e.g. TrEMBL v35.3 Genomics databases; Ensembl & Eukaryota, Bacteria and Archaea genomes 20+14;(v44), 51, 445, 40, as of 20 th of Apr., ) Major Bioinformatics Centres, around the globe
Searching for sequences by homology - BLAST
x y i j
Reference: Gish, W. ( ) Query= KcsA (160 letters) >Filtered+0 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE RRGHFVRHSEKXXXXXXXXXXXXLHERFDRLERMLDDNRR Database: swissprot 223,100 sequences; 81,965,973 total letters. Searching % done Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N SW:KCSA_STRCO P0A333 Voltage-gated potassium channel e-60 1 SW:KCSA_STRLI P0A334 Voltage-gated potassium channel e-60 1 >SW:KCSA_STRCO P0A333 Voltage-gated potassium channel. Length = 160 Score = 615 (221.5 bits), Expect = 3.0e-60, P = 3.0e-60, Group = 1 Identities = 120/160 (75%), Positives = 120/160 (75%) Query: 1 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI 60 MPPM GRHGSALHWR GSYLAVLAERGAPGAQLI Sbjct: 1 MPPMLSGLLARLVKLLLGRHGSALHWRAAGAATVLLVIVLLAGSYLAVLAERGAPGAQLI 60 Query: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE Sbjct: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120
Multiple sequence alignment – ClustalW
***************************************************** CLUSTAL W (1.83) Multiple Sequence Alignments ***************************************************** 1. Sequence Input From Disc 2. Multiple Alignments 3. Profile / Structure Alignments 4. Phylogenetic trees S. Execute a system command H. HELP X. EXIT (leave program) Your choice: 2 ****** MULTIPLE ALIGNMENT MENU ****** 1. Do complete multiple alignment now (Slow/Accurate) 2. Produce guide tree file only 3. Do alignment using old guide tree file 4. Toggle Slow/Fast pairwise alignments = SLOW 5. Pairwise alignment parameters 6. Multiple alignment parameters 7. Reset gaps before alignment? = OFF 8. Toggle screen display = ON 9. Output format options S. Execute a system command H. HELP or press [RETURN] to go back to main menu Your choice:
CLUSTAL W (1.82) multiple sequence alignment KVAP_AERPE FDALW-WAVVTATTVGYGDVVP-ATPIGKVIGIAVMLTGISALTLLIGTVSNMF MVP_METJA FDAFY-FTTISITTVGYGDITP-KTDAGKLI---IIFS---VLFFISGLITS O28600 FDSLY-MTVITITTTGYGEVKP-MGPGGRVISMLLMFVGVGTF Q8TXQ4 LTCLY-FTAATITTVGYGDVVP-TTEAGRLLSVIVMFSGIGVASYAL Q6L2S2 FTSLW-WTMQTITTVGYGDTPV-YGFYGRINGMLIMVFGIGTIGYVTASLAT Q979Z2 FTAIW-FTMETVTTVGYGDVVP-VSNLGRVVAMLIMVSGIGLLGTLTATISAYLF----Q 80 O26605 EDSLW-YVLQTITTVGYGDIVP-VTSLGRFTGMVIMFSAIASTSLITASATSTLLERGEQ 114 Q9HIA8 GNAFY-YTGEVITTLGFGDILP-VTMDAKIFTISLAFLGVAIFFSSITALILPSVERRLG 94 Q97CK5 GTALY-YTGETVTTLGFGDILP-VDLESRLFTISLAFLGVAIFFSAMTALITPTIERRVG 84 GrayOthers Hydroxyl, AmineGreenSTYHCNGQ BasicMagentaRHK AcidicBlueDE Small (small+ hydrophobic (incl.aromatic -Y)) RedAVFPMILW
Profile alignment & Pattern recognition: HMMer More sensitive homology-search: PSI-BLAST & HMMer
DNA sequence Amino acid sequence
PSI-BLAST
Phylogeny: Phylip & Treedyn
Saitou N and Nei M, The neighbour-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol, 4(4): , 1987
TreeDyn
Protein secondary structure prediction: two consensus methods
| | | | | | | MFAKGYGKNNEPLRGYILTFLIALGFILIAELNVIAPIISNFFLASYALINFSVFHASLAKSPGWRPAFK ALOM2 ***************** DAS **************************************** HMMTOP2 ****************** ************************* MEMSAT1.5 ************************* PHD ************************* SPLIT4 **************** *************************** TMAP ***************************** TMFINDER **************************************** TMHMM2 *********************** ****************** TMPRED ************************* TOPPRED2 ********************* ********************* Consensus ???hhhhHHHHHHHHHHHHHHHHHhHHhhhhhhhhh??????????? Dr. Jonathan Cuthbertson developed Transmembrane Prediction Server. Example Output
Pongo
Example Output by Pongo
Background for practical sessions
Ion channels ; Potassium channels ; Voltage-gated potassium channels Ion channels are a diverse class of transmembrane proteins that are responsible for the diffusion of ions across the cell membranes. There are several major families of ion channels, for instance K +, Na +, Ca 2+ and Cl - channels as well as ligand gated ion channels (LGICs). Many human neurological and muscular disorders have been traced to defects in voltage-gated and ligand-gated ion channels. Fig 2. A. Long et al., Science, Vol. 309, p897, 2005 TM T1 Introduction to your input sequence
K + channels, blastp Homologues are visualised in BLIXEM. Your expected blastp-output
Kv BK SK Erg Kir CNG AKT Kv1.x Shab Kv2.x Shal Kv4.x Kv Shaw Kv3.x Kir2.x Kir6.2 Kir3.x Kir4.x Kir1.1 Kir6.1 Kir2.3 Fig 4. Shealy et al., Biophysical Journal, Vol 84, p2929, 2003 Alignment you are about to build, not necessarily as big.
hmmsearch - search a sequence database with a profile HMM HMM file: Kv.hmm [Kv_homologues] Sequence database: infile_comb Query HMM: Kv_homologues HMM has been calibrated; E-values are empirical estimates] Scores for complete sequences (score includes all domains): Sequence Description Score E-value N CIKS_DROME e-71 1 Q9VX00_DROME e-69 1 CIKB_DROME e-46 1 O62350_Celegans e-46 1 Q9VLC6_DROME e-46 1 CIKW_DROME e-45 1 Q8SYL2_DROME e-45 1 Q22012_Celegans e-45 1 Filtered_5DROME e-41 1 Filtered_6DROME e-41 1 Q9XXD1_Celegans e-36 1 Example of pHMM-related output
Kir Kv BK SK AKT CNG/HErg KcsA MthK Kv1.2 KvAP Raw tree-files produced by PHYLIP
Phylogenetic trees modified in TreeDyn