Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity Nicholas M. Luscombe and Janet M. Thornton JMB (2002) 320,
Outline Background Methods and Tools Results Discussions
Background DNA-binding proteins have a central role in all aspects of the genetic activity within an organism: transcription, packaging, rearrangement, replication and repair Of great importance to understand the nature of interactions between proteins and DNA
Previous Methods Individual structure studies Surveys in search for common principles of binding that apply across most, or all protein– DNA complexes atomic contacts between amino acid residues and bases Secondary structural elements and small structural motifs whole protein structure interactions
Existing Conclusions There is no simple code relating amino acid sequence to the DNA sequence it binds. Detailed rules for DNA-sequence recognition is best understood within the context of individual protein families strong underlying trends: e.g. arginine–guanine
This Paper The first global analysis of the conservation of amino acid residue sequences in DNA- binding proteins. to see whether amino acid residues that interact with DNA are better conserved to assess the effect that amino acid mutations have on binding specificity
Methods and Tools 1. Select 240 protein-DNA complexes (3.0A or better) from PDB 2. Classify into structural families by pairwise SSAP (54 families). 3. Structural multiple alignment of family members via CORA program suite. 4. Identify distinct DNA-binding domains 5. Use HMMER suite to train an HMM sequence template for each structural “template”.. 6. Use the trained HMMS to search SWISS-PROT.
Methods and Tools 7. Discard non DNA-binding proteins and collapse sets with greater than 95% sequence identity 8. Build multiple alignments of the selected SWISS- PROT entries via HMMER 9. Score amino-acid conservation via PET91 matrix [0, 100] – Unconserved - conserved 10. Identify surface residues via NACCESS 11. Identify DNA-binding positions via HBPLUS
Results Main conclusion: 3 classes
Result Statistics
Results (Aligned Positions)
Results Summary The average length of a multiple alignment is 138 amino acid residue positions, including gaps. Many more protein residues interact with the DNA backbone than with bases. The ratios are lower for multi-specific and highly specific families—emphasis towards interactions with bases
Results (Conservation)
Analysis Amino acids that interact with the DNA are better conserved than those that do not. Sequence-specific families place greater emphasis on interactions with DNA bases than non-specific families. DNA backbone-contacting positions are well conserved in all families.
About Mutations Conservation of base-contacting positions depends on the binding class of the family. For non-specific families, invariably in the minor groove. For highly-specific families target-contacting positions are very conserved. Fuzzy recognition allows single proteins to recognize different, but related target sequences. Members of multi-specific families recognize different DNA sequences by mutating amino acids at base contacting positions
“ Universal ” Code (Preferences)
Discussions First comprehensive assessment of the level of conservation in DNA-binding proteins Confirms many expectations about the nature of DNA-protein complexes. Interesting insight into the evolution of divergent bindings.
Personal Comments “Old”—2002 No silver bullet (various families) No DNA side analysis yet Ahmad et al,2004, Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information, Bioinformatics Ahmad et al,2008, Protein–DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA- binding proteins, NAR Thanks