Basic bioinformatics tools for studying proteins Dong Xu Computer Science Department C. S. Bond Life Sciences Center University of Missouri, Columbia
Introduction l Broaden knowledge for undergraduate education l Many opportunities for biomedical and agricultural related jobs l Practice basic protein tools: å Useful for biological studies å Intellectually stimulating l Dong’s picks for beginners : å Not unnecessarily the most accurate tool å Easy to use and understand å Very popular
Proteins – Some Basics l What Is a Protein? å Linear Sequence of Amino Acids... l What is an Amino Acid?
20 Amino acids Glycine (G) Glutamic acid (E) Asparatic acid (D) Methionine (M) Threonine (T) Serine (S) Glutamine (Q) Asparagine (N) Tryptophan (W) Phenylalanine (F) Cysteine (C) Proline (P) Leucine (L) Isoleucine (I) Valine (V) Alanine (A) Histidine (H) Lysine (K) Tyrosine (Y) Arginine (R) White: Hydrophobic, Green: Hydrophilic, Red: Acidic, Blue: Basic
l Amino Acids connect via PEPTIDE BOND Peptide Bond AA F N G G S T S D K
An Overview o A protein folds into a unique 3D structure under the physiological condition Lysozyme sequence (129 amino acids): KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV QAWIRGCRL Protein backbones: Side chain
Primary, Secondary and Tertiary Structures of Proteins
Protein Structure Representations Lysozyme structure: ball & stick strand surface
Structure Visualization l Rasmol ( l MDL Chime (plug-in) ( l Protein Explorer ( l Jmol: l Pymol: l Vmd:
Sequence Homology Software l NCBI-BLAST å l Comparing 2 (pairwise) or more (multiple) sequences. l Searching for a series of identical or similar characters in the sequences. VLSPADKTNVKAAWAKVGAHAAGHG ||| | | |||| | |||| VLSEAEWQLVLHVWAKVEADVAGHG
Typical BLAST Output
InterPro Scan
InterPro Scan PCNA
MyHits Local Motifs Search
MyHits Local Motifs Summary
MyHits Local Motif Hits
Multiple Alignment VTISCTGSESNIGAG-NHVKWYQQLPG VTISCTGTESNIGS--ITVNWYQQLPG LRLSCSSSDFIFSS--YAMYWVRQAPG LSLTCTVSETSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKEFYPSD--IAVEWWSNG--
Phylogeny Tree Multiple protein sequence alignment conserved sites and hence possibly functional sites phylogenetic tree
MSA with ClustalW ClustalW:
Cell localization
Typical Sorting Signals Signal FunctionExample Import into nucleus-P-P-K-K-K-R-K-V- Export from nucleus-L-A-L-K-L-A-G-L-D-I- Import into mitochondria<-MLSLRQSIRFFKPATRTLCSSRYLL- Import into plastid <-MVAMAMASLQSSMSSLSLSSNS FLGQPLSPITLSPFLQG- Import into peroxisomes-S-K-L-> Import into ER <-MMSFVSLLLVGILFWAT EAEQLTKCEVFN- Return to ER-K-D-E-L->
Localizations Cell localization PSORT: TargetP: Signal peptide SingalP:
SignalP result
Membrane Bilayer with Proteins
Helix Bundle TM Proteins PDB = 1QHJ PDB = 1RRC Single helix or helical bundles (> 90% of TM proteins) Examples: Human growth hormone receptor, Insulin receptor ATP binding cassette family - CFTR Multidrug resistance proteins 7TM receptors - G protein-linked receptors
Beta Barrel TM Proteins
Transmembrane Prediction (alpha) (beta)
Secondary Structure Prediction SSpro 4.1: PSI-PRED: SAM: PHD:
Coiled coil prediction bin/npsa_automat.pl?page=/NPSA/npsa_lupas.htm l
Special motif prediction Helix-turn-helix motif prediction bin/npsa_automat.pl?page=/NPSA/npsa_hth.html Kinase related motifs Leucine Zippers
Protein disorder prediction PreDisorder: A collection of disorder predictors:
2D: Contact Map Prediction 1 2 ………..………..…j...…………………..…n i n i n 3D Structure 2D Contact Map Distance Threshold = 8A o
Contact Prediction l SVMcon: l NNcon: l SCRATCH: l SAM: apps/HMM-applications.htmlhttp://compbio.soe.ucsc.edu/HMM- apps/HMM-applications.html
Structure Comparison Visualize structure alignment using VAST: Two ferredoxins, 1DOI and 1AWD, are aligned structurally, showing an insertion in 1DOI that contains potassium-ion binding sites. This may be the result of adaptations to the high salt environment of the Dead Sea.
Structure Alignment Tools l CE ( l DALI ( ) l TM-Align:
Structure-Based Search Comparing a query protein structure against all the structures in the PDB The DALI server: When new structures are solved, researchers often submit them to the DALI server to find structural neighbors and their alignments.
Swiss Model: Comparative Modeling Server
Protein Structure Homology Modeling: Modeller
Analysis software l PROCHECK l WHATCHECK l Suite Biotech l PROSA
Entrez Databases
Design Program l DEZYMER (Hellinga) å Given a ligand and a protein with known structure, suggest residues to be mutated so that the resulting protein binds the ligand. l ORBIT (Mayo) å Given a backbone structure, design a sequence such that it folds to that backbone. l Rosetta (Baker) å One program to treat diverse problems å Prediction and design
DEZYMER 1. Define the expected binding geometry 2. Find backbone places where if appropriate side chains are added, the predefined geometry is satisfied 3. Place the side chains and ligand, and optimize there position 4. Repack residues in positions other than binding residues. If necessary, change residue type Hellinga and Richards, JMB, Construction of new ligand binding sites in protein of known structure
ORBIT Comparison between the designed backbone (averaged NMR structure, blue) and the target backbone (red) Solution structure of the designed protein. Stereoview showing the best-fit superposition of the Divide the target structure into three parts: core, surface and boundary 2. Core: Ala, Val, Leu, Ile, Phe, Tyr, Trp Surface: Ala, Ser, Thr, His, Asp, Asn, Glu, Gln, Lys, and Arg Boundary: union of the above two *10 27 possible sequence 4. Select best sequence efficiently, using dead end elimination (DDE)
Calciomics l Calciomics is a specialized area of biochemistry focusing on the study of calcium- binding biological macromolecules and proteins to understand the factors that contribute to calcium-binding affinity and the selectivity of proteins and calcium-dependent conformational change. l m m
SOSUI Remove transmembrane region s SignalP Remove signal region ProDom Modified sequences PROSPECT Original sequence Set of domain sequences Coiled coils Remove disorder regions SSP Secondary Structure prediction PSI-BLAST Iterations: Analysis of E-value, set of profile sequences STOP if homolog found in PDB 3D model Function annotation SWISS-PROT annotation PFAM Family classification Motif Active sites PSORT Subcellular location Enzyme structure DB Medline Literature search WHATIF / PROCHECK Evaluate & adjust alignments MODELLER / Jackle sequence analysis and processing structure prediction and evaluation function inference toolkit
Summary l Practice 10 selected tools l Help answer the question: what does this protein do? l Collaborate with experimentalists l Find more tools at å å
Acknowledgments This file is for the educational purpose only. Some materials (including pictures and text) were taken from the Internet at the public domain.