Protein Structure
Why study protein structure? Studying the structure model allows better understanding of the structure-function relationship, and is an important starting point for many kinds of research
Structure determination Crystallography: A solution of protein molecules is assembled into a periodic lattice A solution of protein molecules is assembled into a periodic lattice The crystal is bombarded with X-ray beams The crystal is bombarded with X-ray beams The collision of the beams with the atom electrons creates a diffraction pattern The collision of the beams with the atom electrons creates a diffraction pattern The diffraction pattern is transformed into an electron density map of the protein from which the 3D locations of the atoms can be deduced The diffraction pattern is transformed into an electron density map of the protein from which the 3D locations of the atoms can be deduced F F
Structure determination Nuclear magnetic resonance (NMR): A solution of the protein is placed in a magnetic field A solution of the protein is placed in a magnetic field Spins align parallel or Spins align parallel or anti-parallel to the field RF pulses of electromagnetic energy shift spins from their alignment RF pulses of electromagnetic energy shift spins from their alignment Upon radiation termination spins re- align while emitting the energy they absorbed Upon radiation termination spins re- align while emitting the energy they absorbed The emission spectrum contains information about the identity of the nuclei and their immediate environment The emission spectrum contains information about the identity of the nuclei and their immediate environment The result is an ensemble of models rather than a single one The result is an ensemble of models rather than a single one
PDB: Protein Data Bank
PDB model Defines the 3D coordinates (x,y,z) of each of the atoms in one or more molecules (i.e., complex) There are models of proteins, protein complexes, proteins and DNA, protein segments, etc The models also include the positions of ligand molecules, solvent molecules, metal ions, etc PDB ID: integer + 3 integers/characters (e.g., 1a14)
The PDB file format
ATOM Records: Usually protein or DNA HETATM Records: Usually Ligand, ion, water chain Residue identity Residue number Atom number Atom identity atom coordinates Temperature factor XYZ Occupancy
Background and motivation DNA methylation at DNA CpG sites has a central role in imprinting (plants and mammals), but it is not clear how the imprinting machinery recognizes its target genes The Dnmt3 protein family (a,b,l) are de novo methyltransferases Dnmt3a and Dnmt3l KO mice show altered sex- specific de novo methylation in germ cells, indicating that these proteins are both required for the methylation of most imprinted loci in germ cells Goal: Conduct a structural and biochemical study of a homogeneous complex of Dnmt3L and Dnmt3a
Methods Dnmt3a2, the shorter isoform of Dnmt3a that is the predominant form in embryonic stem cells was selected For crystallographic reasons, a stable complex of the C-terminal domains from both proteins (Dnmt3a2-C and Dnmt3L-C) that retains substantial methyltransferase activity was focused on
Results The complex is a tetramer: Dnmt3L–Dnmt3a–Dnmt3a– Dnmt3L Mutagenesis at positions in both interfaces (a-a and a-L) indicate that these interfaces are essential for catalysis Dnmt3a-Dnmt3a dimerization brings two active sites together Dimeric Dnmt3a could methylate two CpGs separated by one helical turn in one binding event Dnmt3a Dnmt3l
“We observed a highly significant correlation of methylation status at distances of eight to ten base pairs between two CpG sites” Distribution of CpG sites among 12 known maternally imprinted genes, indicated to be Dnmt3a-Dnmt3l targets
Protein visualization Visualization tools (working on PC): RasMol / RasTop SwissPDBviewer (sPDBv) Protein Explorer (via the web) And many more…
Rastop / Rasmol
RasTop- main menu פתיחת קובץ סגירת קובץ
RasTop - display Wireframe קווים בין אטומים
Sphere VDW מנפח כל אטום לפי רדיוס ה- שלוVDW
Command editor
More on RasTop RasMol manual RasMol manual RasMol manual Using RasTop Using RasTop Using RasTop Commands Commands
Structure alignment Essential for: Protein classification Detection of conserved protein folding cores Detection of similarities between domains Detection of similarities in functional binding sites Evolutionary conservation Construction of nonredundant databases
Pairwise structure alignment Outline: Given two proteins structures, find the transformation that produces the best superimposition of one protein onto the other
Computationally Find the rotations and translations of one of the points set (atoms of protein A) which produce “large” superimpositions on the other points set (atoms of protein B) ? X Y Z X Y Z
RMSD Root Mean Square Deviation Average distance between the matched superimposed atoms usually between backbones Cα atoms
Matches C-alpha atoms Rigid pairwise alignement Sequence order independent Input: two PDB files or PDB IDs with specific chains Output: a set of high scoring conformations The superimposed structures may be viewed in a PDB viewer The superimposed structures may be viewed in a PDB viewer
BOBWHITE QUAIL LYSOZYME HEN EGG WHITE LYSOZYME
Results Ranking criteria: 1.Match size 2.RMSD
C-alpha correspondence
Aligned PDB file
Flexible structural alignment The first structure is assumed to be rigid, while in the second structure potential flexible regions - hinges, are automatically detected Input: two PDB IDs (specific chain) Output: list of alignments ranked according to the number of hinges
Results
Result with 0 hinges:
Result with one hinge:
Multiple structural alignments of protein structures Finds the common geometrical cores between the input molecules Does not require that all the input molecules participate in the alignment Actually, it efficiently detects high scoring partial multiple alignments for all possible number of molecules from the input The final structural alignment can either preserve the sequence order (like sequence alignment), or be sequence order independent
Results
DALI - Distance matrix ALIgnment Concept: “Similar 3D structures have similar inter- residue distances”
DALI Algorithm Generates an inter-residue distance matrix for each protein The distance matrix contains all pairwise distances (symmetrical) Dij = distance between C-alpha i and C-alpha j in the same protein Compares the two distance matrices for a pair of proteins to be aligned
DALI Services DALI sever Used by crystallographers to compare a newly solved structure against structures in the PDB DALI database Contains all-vs.-all PDB 3D structure comparisons and thus enables to find structural neighbors of structures that are already in the PDB Pairwise server Pairwise comparison of two structures Dalilite A standalone version of DALI
DALI Database
Non-redundant chains - no two chains are more than 90 % sequence identical DALI Database Number of structurally aligned residues Number of residues in the protein Sequence identity of aligned positions
Supplementary
אלמנטים של מבנה שניוני, קו המחבר בין C-alpha RasTop - Display
מחזק מבנה שניוני RasTol - Display
Remove rendering Clears the view from previous actions performed
לכל חומצה אמינית צבע משלה תצוגה מסוימת בו לכל אטום צבע מסוים עד כמה מיקום האטום קבוע במבנה: אדום- חופש גדול, כחול-קבוע מאוד מבנה שניוני – צהוב- beta sheet, כחול – loops and turns, אדום – alpha helix, לבן – random coil. RasTop - Color
RasTop - Labels סימון כל המולקולה או קטעים נבחרים בשם והמספר של חומצת האמינו.
The Select Command Select – מגדירה את האזור שעליו נפעל במולקולה - סט האטומים שיעבור מודיפיקציות ע"י שרשרת הפקודות הבאות. הפרמטר של פקודת ה- select הוא “atom expression”. ה - atom expression מגדיר באופן ייחודי קבוצה שרירותית של אטומים בתוך מולקולה. הוא יכול להיות: Primitive expressions, Predefined sets, Within expressions, or logical combination of all above mentioned. In order to display only what we selected, use the command: Edit => restrict
The primitive expressions allow to select by: Atom number - select atomno=102 Residue – select Val52 (select resno=52 or select 52) Chain id – select :a List of residue numbers – select 14,92,46 Range of atom numbers – select atomno=>35 A wildcard can be used to specify a whole field: * Any number of characters Atom or residue type – select *.sg (this will select all Sulphur atoms in Cysteine’s side chain) ? Single character wildcard – select ser.c? – will select all carbons in all serine residues.
The predefined sets are groups of atoms given the definite names: select helix select hoh (water molecules) select protein There is a list with the predefined sets in the Rasmol reference card (google it)
Boolean Expressions And – המשותף לשני תנאים Or – חיבור בין שני תנאים (גם את זה וגם את זה) Not – מה לא לכלול דוגמאות: select tyr and :a → all tyr in ‘a’ chain select tyr or :a → all tyr in the molecule and all ‘a’ chain select not (tyr,:a) → all the molecule beside tyr or ‘a’ chain
Using Select Expression
…or via the command (edit command)
1.Spacefill 2.Color picker 3.Make sure you’re on atoms!