1-month Practical Course Genome Analysis Protein Structure-Function Relationships Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E
Genome/DNA Transcriptome/mRNA Proteome Metabolome Physiome Transcription factors Ribosomal proteins Chaperonins Enzymes Protein function
Not all proteins are enzymes: -crystallin: eye lens protein – needs to stay stable and transparent for a lifetime (very little turnover in the eye lens)
Protein function groups Catalysis (enzymes) Binding – transport (active/passive) –Protein-DNA/RNA binding (e.g. histones, transcription factors) –Protein-protein interactions (e.g. antibody-lysozyme) –Protein-fatty acid binding (e.g. apolipoproteins) –Protein – small molecules (drug interaction, structure decoding) Structural component (e.g. -crystallin) Regulation Transcription regulation Signalling Immune system Motor proteins (actin/myosin)
What can happen to protein function through evolution Proteins can have multiple functions (and sometimes many -- Ig). Enzyme function is defined by specificity and activity Through evolution: Function and specificity can stay the same Function stays same but specificity changes Change to some similar function (e.g. somewhere else in metabolic system) Change to completely new function
How to arrive at a given function Divergent evolution – homologous proteins –proteins have same structure and “same- ish” function Convergent evolution – analogous proteins – different structure but same function Question: can homologous proteins change structure (and function)?
Protein function evolution Chymotrypsin ‘Modern’ 2-barrel structure Putative ancestral barrel structure Active site (combination of ancestral active site residues) Activity ,000 times enhanced
How to evolve Important distinction: Orthologues: homologous proteins in different species (all deriving from same ancestor) Paralogues: homologous proteins in same species (internal gene duplication) In practice: to recognise orthology, bi-directional best hit is used in conjunction with database search program (this is called an operational definition)
How to evolve By addition of domains (at either end of protein sequence or at loop sites [see next slides]) Often through gene duplication followed by divergence Multi-domain proteins are a result of gene fusion (multiple genes ending up in a single ORF). Repetitions of the same domain in a single protein occur frequently (gene duplication followed by gene fusion)
Protein structure evolution Insertion/deletion of secondary structural elements can ‘easily’ be done at loop sites These sites are normally at the surface of a protein
Example -- Flavodoxin fold 5( ) fold
Flavodoxin family - TOPS diagrams (Flores et al., 1994) These are four variations of the same basic topology (bottom) Do you see what is inserted as compared to the basic topology? = alpha-helix = beta-strand A TOPS diagram is a schematic representation of a protein fold
Protein structure evolution Insertion/deletion of structural domains can ‘easily’ be done at loop sites N C
The basic functional unit of a protein is the domain A domain is a: Compact, semi-independent unit (Richardson, 1981). Stable unit of a protein structure that can fold autonomously (Wetlaufer, 1973). Recurring functional and evolutionary module (Bork, 1992). “Nature is a ‘tinkerer’ and not an inventor” (Jacob, 1977).
Delineating domains is essential for: Obtaining high resolution structures (x-ray, NMR) Sequence analysis Multiple sequence alignment methods Prediction algorithms (SS, Class, secondary/tertiary structure) Fold recognition and threading Elucidating the evolution, structure and function of a protein family (e.g. ‘Rosetta Stone’ method – next lecture) Structural/functional genomics Cross genome comparative analysis
Pyruvate kinase Phosphotransferase barrel regulatory domain barrel catalytic substrate binding domain nucleotide binding domain 1 continuous + 2 discontinuous domains Structural domain organisation can be nasty…
Complex protein functions are a result of multiple domains An example is the so-called swivelling domain in pyruvate phosphate dikinase (Herzberg et al., 1996), which brings an intermediate enzymatic product over about 45 Å from the active site of one domain to that of another. This enhances the enzymatic activity: delivery of intermediate product not by a diffusion process but by active transport
The DEATH Domain Present in a variety of Eukaryotic proteins involved with cell death. Six helices enclose a tightly packed hydrophobic core. Some DEATH domains form homotypic and heterotypic dimers.
Globin fold protein myoglobin PDB: 1MBN
sandwich protein immunoglobulin PDB: 7FAB
TIM barrel / protein Triose phosphate IsoMerase PDB: 1TIM
A fold in + protein ribonuclease A PDB: 7RSA The red balls represent waters that are ‘bound’ to the protein based on polar contacts
434 Cro protein complex (phage) PDB: 3CRO
Zinc finger DNA recognition (Drosophila) PDB: 2DRP..YRCKVCSRVY THISNFCRHY VTSH...
Characteristics of the family: Function: The DNA-binding motif is found as part of transcription regulatory proteins. Structure: One of the most abundant DNA-binding motifs. Proteins may contain more than one finger in a single chain. For example Transcription Factor TF3A was the first zinc-finger protein discovered to contain 9 C2H2 zinc-finger motifs (tandem repeats). Each motif consists of 2 antiparallel beta-strands followed by by an alpha-helix. A single zinc ion is tetrahedrally coordinated by conserved histidine and cysteine residues, stabilising the motif. Zinc-finger DNA binding protein family
Binding: Fingers bind to 3 base-pair subsites and specific contacts are mediated by amino acids in positions - 1, 2, 3 and 6 relative to the start of the alpha-helix. Contacts mainly involve one strand of the DNA. Where proteins contain multiple fingers, each finger binds to adjacent subsites within a larger DNA recognition site thus allowing a relatively simple motif to specifically bind to a wide range of DNA sequences. This means that the number and the type of zinc fingers dictates the specificity of binding to DNA Characteristics of the family: Zinc-finger DNA binding protein family
Leucine zipper (yeast) PDB: 1YSA..RA RKLQRMKQLE DKVEE LLSKN YHLENEVARL...