Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 1: Protein Structure Basics (1) Centre for Integrative Bioinformatics.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

PROTEINS Proteins are the most complex and most diverse group of biological compounds. If you weigh about 70 kg: About 50 of your 70 kg is water. Many.
Protein Structure Prediction
©CMBI 2001 The amino acids in their natural habitat.
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 1: Protein Structure Basics (1) Centre for Integrative Bioinformatics.
Structure Prediction (I): Secondary structure Structure Prediction (I): Secondary structure DNA/Protein structure-function analysis and prediction Lecture.
Secondary Structure Prediction Victor A. Simossis Bioinformatics Center IBIVU.
1 September, 2004 Chapter 5 Macromolecular Structure.
1-month Practical Course Genome Analysis Protein Structure-Function Relationships Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
1 Levels of Protein Structure Primary to Quaternary Structure.
Sequence analysis course Lecture 8 Sequence databank searching 1.
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Jaap Heringa Integrative Bioinformatics.
Proteins Dr Una Fairbrother. Dipeptides u Two amino acids are combined as in the diagram, to form a dipeptide. u Water is the other product.
Protein Functions: catalyze reactions (enzymes) receptors (eg. pain receptors) transport (ions across membranes, oxygen in blood) molecular motors recognition.
Protein Secondary Structure : Kendrew Solves the Structure of Myoglobin “Perhaps the most remarkable features of the molecule are its complexity.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Master Course Sequence Alignment Lecture 10 Database searching Issues (1)
Protein Modules An Introduction to Bioinformatics.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Master Course DNA/Protein Structure- function Analysis and Prediction Lecture 7.
Computing for Bioinformatics Lecture 8: protein folding.
Protein Basics Protein function Protein structure –Primary Amino acids Linkage Protein conformation framework –Dihedral angles –Ramachandran plots Sequence.
SnapDRAGON: protein 3D prediction-based DOMAINATION: based on PSI-BLAST Two methods to predict domain boundary sequence positions from sequence information.
Chapter 3 Macromolecules.
(Foundation Block) Dr. Ahmed Mujamammi Dr. Sumbul Fatma
Proteins Dr. Sumbul Fatma Clinical Chemistry Unit
Protein Tertiary Structure Prediction
1 Chapter 3: Protein ZHOU Yong Department of Biology Xinjiang Medical University.
Supersecondary structures. Supersecondary structures motifs motifs or folds, are particularly stable arrangements of several elements of the secondary.
Lecture 10: Protein structure
STRUCTURAL ORGANIZATION
Protein “folding” occurs due to the intrinsic chemical/physical properties of the 1° structure “Unstructured” “Disordered” “Denatured” “Unfolded” “Structured”
High-throughput Biological Data The data deluge and bioinformatics algorithms Introduction to bioinformatics 2005 Lecture 3.
Lecture 15 Secondary Structure Prediction Bioinformatics Center IBIVU.
Amino acids and proteins … for AS Biology. Amino acids Proteins are macromolecules consisting of long unbranched chains of amino acids. All amino acids.
Protein Structure 101 Alexey Onufriev, Virginia Tech
CS790 – BioinformaticsProtein Structure and Function1 Review of fundamental concepts  Know how electron orbitals and subshells are filled Know why atoms.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure 1 Primary and Secondary Structure.
©CMBI 2001 Step 5: The amino acids in their natural habitat.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Structure (Foundation Block) What are proteins? Four levels of structure (primary, secondary, tertiary, quaternary) Protein folding and stability.
Protein Structure (Foundation Block) What are proteins? Four levels of structure (primary, secondary, tertiary, quaternary) Protein folding and stability.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Introduction to Bioinformatics Lecture 13 Protein Secondary Structure Elements and their Prediction IBIVU Centre C E N T R F O R I N T E G R A T I V E.
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Protein Tertiary Structure Prediction Structural Bioinformatics.
Proteins Structure Predictions Structural Bioinformatics.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Structure and Function
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Structural organization of proteins
Mir Ishruna Muniyat. Primary structure (Amino acid sequence) ↓ Secondary structure ( α -helix, β -sheet ) ↓ Tertiary structure ( Three-dimensional.
CHM 708: MEDICINAL CHEMISTRY
The heroic times of crystallography
The (Greek) Key to Structures of Neural Adhesion Molecules
Conformationally changed Stability
There are four levels of structure in proteins
Lecture 14 Secondary Structure Prediction
Volume 124, Issue 1, Pages (January 2006)
Conformationally changed Stability
SnapDRAGON: protein 3D prediction-based
Structure of CheA, a Signal-Transducing Histidine Kinase
Volume 124, Issue 5, Pages (March 2006)
Volume 14, Issue 8, Pages (August 2006)
Lecture 10 Secondary Structure Prediction
Volume 12, Issue 11, Pages (November 2004)
Structure of an IκBα/NF-κB Complex
The Three-Dimensional Structure of Proteins
Presentation transcript:

Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 1: Protein Structure Basics (1) Centre for Integrative Bioinformatics VU (IBIVU) Faculty of Exact Sciences / Faculty of Earth and Life Sciences C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E

DNA/Protein structure-function analysis and prediction SCHEDULE Centre for Integrative Bioinformatics VU (IBIVU) Faculty of Exact Sciences / Faculty of Earth and Life Sciences C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E

The first protein structure in 1960: Myoglobin (Sir John Kendrew)

Protein Data Bank Primary repository of protein teriary structures

Dickerson’s formula: equivalent to Moore’s law On 27 March 2001 there were 12,123 3D protein structures in the PDB: Dickerson’s formula predicts 12,066 (within 0.5%)! n = e 0.19(y-1960) where y is the year.

Protein primary structure 20 amino acid types A generic residue Peptide bond 1 MKYNNHDKIR DFIIIEAYMF RFKKKVKPEV 1 MKYNNHDKIR DFIIIEAYMF RFKKKVKPEV 31 DMTIKEFILL TYLFHQQENT LPFKKIVSDL 31 DMTIKEFILL TYLFHQQENT LPFKKIVSDL 61 CYKQSDLVQH IKVLVKHSYI SKVRSKIDER 61 CYKQSDLVQH IKVLVKHSYI SKVRSKIDER 91 NTYISISEEQ REKIAERVTL FDQIIKQFNL 91 NTYISISEEQ REKIAERVTL FDQIIKQFNL 121 ADQSESQMIP KDSKEFLNLM MYTMYFKNII 151 KKHLTLSFVE FTILAIITSQ NKNIVLLKDL 181 IETIHHKYPQ TVRALNNLKK QGYLIKERST 211 EDERKILIHM DDAQQDHAEQ LLAQVNQLLA 241 DKDHLHLVFE SARS Protein From Staphylococcus Aureus

Protein secondary structure Alpha-helix Beta strands/sheet 1 MKYNNHDKIR DFIIIEAYMF RFKKKVKPEV DMTIKEFILL TYLFHQQENT 1 MKYNNHDKIR DFIIIEAYMF RFKKKVKPEV DMTIKEFILL TYLFHQQENT SHHH HHHHHHHHHH HHHHHHTTT SS HHHHHHH HHHHS S SE SHHH HHHHHHHHHH HHHHHHTTT SS HHHHHHH HHHHS S SE 51 LPFKKIVSDL CYKQSDLVQH IKVLVKHSYI SKVRSKIDER NTYISISEEQ 51 LPFKKIVSDL CYKQSDLVQH IKVLVKHSYI SKVRSKIDER NTYISISEEQ EEHHHHHHHS SS GGGTHHH HHHHHHTTS EEEE SSSTT EEEE HHH 101 REKIAERVTL FDQIIKQFNL ADQSESQMIP KDSKEFLNLM MYTMYFKNII EEHHHHHHHS SS GGGTHHH HHHHHHTTS EEEE SSSTT EEEE HHH 101 REKIAERVTL FDQIIKQFNL ADQSESQMIP KDSKEFLNLM MYTMYFKNII HHHHHHHHHH HHHHHHHHHH HTT SS S SHHHHHHHH HHHHHHHHHH 151 KKHLTLSFVE FTILAIITSQ NKNIVLLKDL IETIHHKYPQ TVRALNNLKK HHHHHHHHHH HHHHHHHHHH HTT SS S SHHHHHHHH HHHHHHHHHH 151 KKHLTLSFVE FTILAIITSQ NKNIVLLKDL IETIHHKYPQ TVRALNNLKK HHH SS HHH HHHHHHHHTT TT EEHHHH HHHSSS HHH HHHHHHHHHH 201 QGYLIKERST EDERKILIHM DDAQQDHAEQ LLAQVNQLLA DKDHLHLVFE HHH SS HHH HHHHHHHHTT TT EEHHHH HHHSSS HHH HHHHHHHHHH 201 QGYLIKERST EDERKILIHM DDAQQDHAEQ LLAQVNQLLA DKDHLHLVFE HTSSEEEE S SSTT EEEE HHHHHHHHH HHHHHHHHTS SS TT SS HTSSEEEE S SSTT EEEE HHHHHHHHH HHHHHHHHTS SS TT SS SARS Protein From Staphylococcus Aureus

Protein structure hierarchical levels VHLTPEEKSAVTALWGKVNVDE VGGEALGRLLVVYPWTQRFFE SFGDLSTPDAVMGNPKVKAHG KKVLGAFSDGLAHLDNLKGTFA TLSELHCDKLHVDPENFRLLGN VLVCVLAHHFGKEFTPPVQAAY QKVVAGVANALAHKYH PRIMARY STRUCTURE (amino acid sequence) QUATERNARY STRUCTURE (oligomers) SECONDARY STRUCTURE (helices, strands) TERTIARY STRUCTURE (fold)

Protein folding problem VHLTPEEKSAVTALWGKVNVDE VGGEALGRLLVVYPWTQRFFE SFGDLSTPDAVMGNPKVKAHG KKVLGAFSDGLAHLDNLKGTFA TLSELHCDKLHVDPENFRLLGN VLVCVLAHHFGKEFTPPVQAAY QKVVAGVANALAHKYH PRIMARY STRUCTURE (amino acid sequence) SECONDARY STRUCTURE (helices, strands) TERTIARY STRUCTURE (fold) Each protein sequence “knows” how to fold into its tertiary structure. We still do not understand how and why 1-step process 2-step process The 1-step process is based on a hydrophobic collapse; the 2-step process, more common in forming larger proteins, is called the framework model of folding

Globin fold  protein myoglobin PDB: 1MBN

 sandwich  protein immunoglobulin PDB: 7FAB

TIM barrel  /  protein Triose phosphate IsoMerase PDB: 1TIM

A fold in  +  protein ribonuclease A PDB: 7RSA The red balls represent waters that are ‘bound’ to the protein based on polar contacts

(a) ideal right-handed  helix. C: green; O: red; N: blue; H: not shown; hydrogen bond: dashed line. (b) The right-handed  helix without showing atoms. (c) the left-handed  helix (rarely observed). An  helix has the following features: every 3.6 residues make one turn, the distance between two turns is 0.54 nm, the C=O (or N-H) of one turn is hydrogen bonded to N-H (or C=O) of the neighboring turn -- the H-bonded N atom is 4 residues up in the chain   helix

  sheet A  sheet consists of two or more hydrogen bonded  strands. The two neighboring  strands may be parallel if they are aligned in the same direction from one terminus (N or C) to the other, or anti-parallel if they are aligned in the opposite direction. The  sheet structure found in RNase A

% But remember there are homologous relationships at very low identity levels (<10%)! Homology-derived Secondary Structure of Proteins (HSSP) Sander & Schneider, 1991

RMSD of backbone atoms (Ǻ) % identical residues in protein core Chotia & Lesk, 1986

C5 anaphylatoxin -- human (PDB code 1kjs) and pig (1c5a)) proteins are superposed RMSD: Two superposed protein structures (with two well-superposed helices) Red: well superposed Blue: low match quality Root mean square deviation (RMSD) is typically calculated between equivalent C  atoms

Burried and Edge strands Parallel  -sheet Anti-parallel  -sheet

ALPHA-HELIX: Hydrophobic-hydrophilic 2-2 residue periodicity patterns BETA-STRAND: Edge strands, hydrophobic- hydrophilic 1-1 residue periodicity patterns; burried strands often have consecutive hydrophobic residues OTHER: Loop regions contain a high proportion of small polar residues like alanine, glycine, serine and threonine. The abundance of glycine is due to its flexibility and proline for entropic reasons relating to the observed rigidity in its kinking the main-chain. As proline residues kink the main-chain in an incompatible way for helices and strands, they are normally not observed in these two structures (breakers), although they can occur in the N- terminal two positions of  -helices. Edge Buried Secondary structure hydrophobity patterns

5(  ) fold Flavodoxin fold

Flavodoxin family - TOPS diagrams (Flores et al., 1994) To date, all  structures deposited in the PDB start with a  - strand!

Protein structure evolution Insertion/deletion of secondary structural elements can ‘easily’ be done at loop sites

Protein structure evolution Insertion/deletion of structural domains can ‘easily’ be done at loop sites N C

A domain is a: Compact, semi-independent unit (Richardson, 1981). Stable unit of a protein structure that can fold autonomously (Wetlaufer, 1973). Recurring functional and evolutionary module (Bork, 1992). “Nature is a tinkerer and not an inventor” (Jacob, 1977).

Identification of domains is essential for: High resolution structures (e.g. Pfuhl & Pastore, 1995). Sequence analysis (Russell & Ponting, 1998) Multiple alignment methods Sequence database searches Prediction algorithms Fold recognition Structural/functional genomics

Domain connectivity

Domain size The size of individual structural domains varies widely from 36 residues in E-selectin to 692 residues in lipoxygenase-1 (Jones et al., 1998), the majority (90%) having less than 200 residues (Siddiqui and Barton, 1995) with an average of about 100 residues (Islam et al., 1995). Small domains (less than 40 residues) are often stabilised by metal ions or disulphide bonds. Large domains (greater than 300 residues) are likely to consist of multiple hydrophobic cores (Garel, 1992).

Domain characteristics Domains are genetically mobile units, and multidomain families are found in all three kingdoms (Archaea, Bacteria and Eukarya) The majority of proteins, 75% in unicellular organisms and >80% in metazoa, are multidomain proteins created as a result of gene duplication events (Apic et al., 2001). Domains in multidomain structures are likely to have once existed as independent proteins, and many domains in eukaryotic multidomain proteins can be found as independent proteins in prokaryotes (Davidson et al., 1993).

Genetic mechanisms influencing the layout of multidomain proteins include gross rearrangements such as inversions, translocations, deletions and duplications, homologous recombination, and slippage of DNA polymerase during replication (Bork et al., 1992). Although genetically conceivable, the transition from two single domain proteins to a multidomain protein requires that both domains fold correctly and that they accomplish to bury a fraction of the previously solvent-exposed surface area in a newly generated inter-domain surface. Domain fusion

Vertebrates have a multi-enzyme protein (GARs- AIRs-GARt) comprising the enzymes GAR synthetase (GARs), AIR synthetase (AIRs), and GAR transformylase (GARt) 1. In insects, the polypeptide appears as GARs- (AIRs)2-GARt. However, GARs-AIRs is encoded separately from GARt in yeast, and in bacteria each domain is encoded separately (Henikoff et al., 1997). 1GAR: glycinamide ribonucleotide synthetase AIR: aminoimidazole ribonucleotide synthetase Domain fusion example

Domain fusion – Rosetta Stone method David Eisenberg, Edward M. Marcotte, Ioannis Xenarios & Todd O. Yeates Inferring functional relationships If you find a genome with a fused multidomain protein, and another genome featuring these domains as separate proteins, then these separate domains can be predicted to be functionally linked (“guilt by association”)

Phylogenetic profiling David Eisenberg, Edward M. Marcotte, Ioannis Xenarios & Todd O. Yeates Inferring functional relationships If in some genomes, two (or more) proteins co-occur, and in some other genomes they cannot be found, then this joint presence/absence can be taken as evidence for a functional link between these proteins

Fraction exposed residues against chain length

If protein structure would be spherical: volume is 4/3*  r 3 surface area is 4  r 2 The surface/volume ratio therefore is 3/r If a single domain protein growths in size (increasing r), the ratio goes down linearly, indicating that the volume increases faster than the surface area. So, if proteins would just grow by forming larger and larger single domains, then one would expect an increasing fraction of hydrophobic residues (protein core is mostly hydrophobic, surface tends to be hydrophilic). The plots on the preceding slides show, however, that the fraction of surface (=exposed) residues becomes constant at larger protein sizes (larger numbers of residues), indicating a multi-domain situation

Analysis of chain hydrophobicity in multidomain proteins

Pyruvate kinase (Phosphotransferase )  barrel regulatory domain  barrel catalytic substrate binding domain  nucleotide binding domain 1 continuous + 2 discontinuous domains Protein domain organisation and chain connectivity Located in red blood cells Generate energy when insufficient oxygen is present in blood

The DEATH Domain Present in a variety of Eukaryotic proteins involved with cell death. Six helices enclose a tightly packed hydrophobic core. Some DEATH domains form homotypic and heterotypic dimers.

RGS proteins comprise a family of proteins named for their ability to negatively regulate heterotrimeric G protein signaling. RGS Protein Superfamily Founding members of the RGS protein superfamily were discovered in 1996 in a wide spectrum of species

Oligomerisation -- Domain swapping 3D domain swapping definitions. A: Closed monomers are comprised of tertiary or secondary structural domains (represented by a circle and square) linked by polypeptide linkers (hinge loops). The interface between domains in the closed monomer is referred to as the C- (closed) interface. Closed monomers may be opened by mildly denaturing conditions or by mutations that destabilize the closed monomer. Open monomers may dimerize by domain swapping. The domain-swapped dimer has two C-interfaces identical to those in the closed monomer, however, each is formed between a domain from one subunit (black) and a domain from the other subunit (gray). The only residues whose conformations significantly differ between the closed and open monomers are in the hinge loop. Domain- swapped dimers that are only metastable (e.g., DT, CD2, RNase A) may convert to monomers, as indicated by the backward arrow. B: Over time, amino acid substitutions may stabilize an interface that does not exist in the closed monomers. This interface formed between open monomers is referred to as the 0- (open) interface. The 0-interface can involve domains within a single subunit ( I ) and/or between subunits (II).

Functional Genomics Protein Sequence-Structure-Function Sequence Structure Function Threading Homology searching (BLAST) Ab initio prediction and folding Ab initio Function prediction from structure We are not good yet at forward inference (red arrows) based on first principles. That is why many widely used methods and techniques search for related entities in databases and perform backward inference (green arrows) Note: backward inference is based on evolutionary relationships!

TERTIARY STRUCTURE (fold) Genome Expressome Proteome Metabolome Functional Genomics From gene to function

Functional genomics The preceding slide shows a simplistic representation of sequence-structure-function relationships: From DNA (Genome) via RNA (Expressome) to Protein (Proteome, i.e. the complete protein repertoire for a given organism). The cellular proteins play a very important part in controlling the cellular networks (metabolic, regulatory, and signalling networks)

Protein structure – the chloroplast skyline Photosynthesis -- Making oxygen and storing energy in the plant

Protein Function: Metabolic networks controlled by enzymes Glycolysis and Gluconeogenesis Proteins are indicated in rectangular boxes using Enzyme Commission (EC) numbers (format: a.b.c.d)

Tropomyosin Coiled-coil domains This long protein is involved In muscle contraction