Folds and motifs Using SCOP, CATH, CE to find structural homologs. What class/architecture, etc does your molecule fall in? Use CE to identify structural homologs for your protein of interest. Motifs: classic turn types. Extended turn types. Use Rasmol to find the glycines in your structure. That are the phi angles? What type of turn is it?
The SCOP database Contains information about classification of protein structures and within that classification, their sequences http://scop.berkeley.edu
SCOP classification heirarchy global characteristics (no evolutionary relation) (1) class (2) fold (3) superfamily (4) family (5) protein (6) species Similar “topology” . Distant evolutionary cousins? Clear structural homology Clear sequence homology functionally identical unique sequences
protein classes number of sub-categories 1. all 2. all 5. multidomain (21) 6. membrane (21) 7. small (10) 8. coiled coil (4) 9. low-resolution (4) 10. peptides (61) 11. designed proteins (17) number of sub-categories possibly not complete, or erroneous
Mainly parallel beta sheets (beta-alpha-beta units) class: proteins Mainly parallel beta sheets (beta-alpha-beta units) Folds: TIM-barrel (22) swivelling beta/beta/alpha domain (5) spoIIaa-like (2) flavodoxin-like (10) restriction endonuclease-like (2) ribokinase-like (2) chelatase-like (2) Many folds have historical names. “TIM” barrel was first seen in TIM. These classifications are done by eye, mostly.
fold: flavodoxin-like 3 layers, ; parallel beta-sheet of 5 strand, order 21345 Superfamilies: 1.Catalase, C-terminal domain (1) 2.CheY-like (1) 3.Succinyl-CoA synthetase domains (1) 4.Flavoproteins (3) 5.Cobalamin (vitamin B12)-binding domain (1) 6.Ornithine decarboxylase N-terminal "wing" domain (1) 7.Cutinase-like (1) 8.Esterase/acetylhydrolase (2) 9.Formate/glycerate dehydrogenase catalytic domain-like (3) 10.Type II 3-dehydroquinate dehydratase (1) Note the term: “layers” These are not domains. No implication of structural independence. Note how beta sheets are described: number of strands, order (N->C)
fold-level similarity common topological features catalase flavodoxin At the fold level, a common core of secondary structure is conserved. Outer secondary structure units may not be conserved.
Superfamily:Flavoproteins Flavodoxin-related (7) NADPH-cytochrome p450 reductase, N-terminal domain Quinone reductase These molecules do not superimpose well, but side-by-side you can easily see the similar topology. Sec struct’s align 1-to-1, mostly.
Family: quinone reductases binds FAD Proteins: NADPH quinone reductase quinone reductase type 2 Different members of the same family superimpose well. At this level, a structure may be used as a molecular replacement model.
CATH Class Architecture Topology Homology http://www.biochem.ucl.ac.uk/bsm/cath_new/index.html
Remote homologs are more likely than close ones likelihood percent identity for structural homologs
Structural conservation is stronger than sequence conservation The existence of large numbers of remote homologs shows us that true structural similarity is hard to see in the amino acid sequence.
Structural homology is sometimes difficult to see beta-catenin versus clathrin (from FSSP database)
Rasmol practice Using your protein of interest: Find all glycines. select gly Label each alpha-carbon by number label %r Measure the phi-angle by eye (align N and CA. R-handed is positive) Write down the residue number and phi angle.
Rasmol exercises (1) Color the ribbon drawing of 1qajA.pdb (it’s in your home directory) from the N-terminus (blue) to C-terminus (red) (2) Divide 1qajA.pdb into domains of different color. (3) In 1qajA.pdb, display only residues 66-87. What are the first and last residues with helical backbone angles? (4) Find all of the atoms that H-bond to the sidechain of Ser92.
Secondary structure Alpha helix Right-handed Overall dipole N+->C- 3.6 residues/turn i->i+4 H-bonds
3 types of Alpha helix non-polar amphipathic polar a Two ways to display position of sidechain on a helix. d e For amphipathic and non-polar, sidechains line up in a favorable way. b g f c
Helix caps Capping motifs stabilize helices. Please go to http://isites.bio.rpi.edu/ Click on: Glycine alpha (Schellman) C-cap Type 1 Glycine alpha (Schellman) C-cap Type 2 Proline alpha C-cap Frayed alpha helix Serine alpha N-cap (capping box) Exercise: Find a helix cap in 1QAJ. What type is it?
Proline helix C-cap structural variability note: hydrophobic cluster structural variability Sequence pattern= ...nppnnpp[HNYF]P[DE]n Pro blocks helix D or E stabilizes tight turn Locations of non-polar (magenta) and polar (green) sidechains “n”=non-polar “p”=polar [...]=alternative aa’s
Two Helix motifs a-a corner (helix-turn-helix) EF-hand (binds Ca2+)
motif pattern (summary): nnpp[nH]Gn[PDS][Px]pn a-a corner motif backbone angles: green=psi red=phi red=favorable blue = unfavorable green=neutral I-sites motif (Also described by A. Efimov) “n”=non-polar “p”=polar [...]=alternative aa’s motif pattern (summary): nnpp[nH]Gn[PDS][Px]pn Exercise: Find a a-a corner motif in myoglobin.
Find a a-a corner in 1DWT Download 1dwt from www.rcsb.org Open using Rasmol Rasmol commands: Display->Cartoons Color->structure select alpha label %m color labels yellow Find the a-a corner pattern (sequence and structure).
Can you see the a-a motif in the sequence? 1 GLSDGEWQQV LNVWGKVEAD IAGHGQEVLI RLFTGHPETL EKFDKFKHLK HHHHHHH HHHHHHHHHT HHHHHHHHHH HHHHHTHHHH HTTTTTTT 51 TEAEMKASED LKKHGTVVLT ALGGILKKKG HHEAELKPLA QSHATKHKIP SHHHHHTTHH HHHHHHHHHH HHHHHHHTTT HHHHHHHH HHHHHTS 101 IKYLEFISDA IIHVLHSKHP GDFGADAQGA MTKALELFRN DIAAKYKELG HHHHHHHHHH HHHHHHHHST TSS HHHHHH HHHHHHHHHH HHHHHHHHHT 151 FQG
EF-hand Exercise: Rasmol 1CLL. Follow instructions on next page
1CLL exercise. How does the EF-hand bind calcium? restrict hetero and CA (comments in blue...) spacefill (this displays only calciums) click to get residue # (x = residue number) restrict within (10., x) center selected (so you can rotate it easily) Display->ball-and-stick (from menu item. you see atoms and bonds) select protein and within (10., x) Display->sticks (now protein part has think bonds) select hoh and within (10., x) ( selects nearby waters) spacefill color cyan (or your favorite water color...) select within (3., x) ( atoms within bonding distance) label ( these are the atoms bonded to calcium x) color labels yellow (..so you can see them)
Calcium causes a conformational change in the EF hand. red=1CLL green=1cfd
Create a Rasmol script Write a file using jot or vi (for example: vi script.ras) Put keyboard commands in the script. Start Rasmol Invoke the script using the ‘script’ command (for example: script script.ras) Useful keyboard commands for scripts: load xxx.pdb (loads a new PDB file) zap (closes PDB file, blanks the display) pause (stops the script until you hit a key)
Create a Rasmol script to survey many structures List the pdb files in the xtal directory (ls xtal/*.pdb) Create a script file (myscript.ras) as follows: load xtal/file1.pdb (use one of the PDB files listed) backbone 0.6 color blue select hoh spacefill 1.0 color red set unitcell on pause zap load xtal/file2.pdb (repeat as before for each file listed) Run the script: script myscript.ras Next, change the script and run it again (i.e. change the colors to green and magenta)
Nested scripts Tired of having to edit the script 10 times over?? Write a “nested” script. Create a file called “mynestedscript.ras” Put in all the commands except the ‘load’ line. Just one copy. The last line is ‘zap’. Create a file called “mymainscript.ras” containing: load file1.pdb script mynestedscript.ras load file2.pdb script mynestedscript.ras etc. etc. etc
Further exercises: just a few examples! Select atoms by temperature factor: select temperature > 5000 (Rasmol uses B*100, so this selects all atoms with B > 50.00) Find crystal contacts: select :B & within (8., :A) (This gets all atoms of chain B that are close to chain A. Your chain IDs may vary.) Can you determine the crystal form (triclinic, orthorhombic, etc) just by looking at the unit cell? (set unitcell on) Survey the B-factors of waters in several structures using a nested script. (Use: color temperature) Use logic operators! Find alpha carbons within 10.A or residue x but not within 6.A. (select within (10.,x) and not within (6., x) ) Practice stereo viewing. (stereo -7 for walleyed viewing)
Proline helix C-cap structural variability note: hydrophobic cluster structural variability Sequence pattern= ...nppnnpp[HNYF]P[DE]n Pro blocks helix D or E stabilizes tight turn Locations of non-polar (magenta) and polar (green) sidechains “n”=non-polar “p”=polar [...]=alternative aa’s
Two Helix motifs a-a corner (helix-turn-helix) EF-hand (binds Ca2+)
beta-strand middle strand edge strand Antiparallel: Parallel:
Sequence motifs for beta strands Hydrophobic Amphipathic Note preference for beta-branched aa’s: I,V,T Found at the edges of a sheet, or when one side of the sheet is exposed to solvent (i.e. 2-layer proteins). Found in the buried middle strands of sheets in 3-layer proteins.
beta hairpins Two adjacent antiparallel beta strands = a beta hairpin Shown are “tight turns”, 2 residues in the loop region (shaded). Hairpins can have as many as 20 residues in the loop region.
hairpin motifs “Serine beta-hairpin” (my naming). A specific pattern (DPESG) forms an incomplete alpha-helical turn 4-residues long. “Extended Type-1 hairpin”. A type-1 “tight turn” has only 2 residues in the turn. This motif, more common than the tight turn, has an additional Pro or polar sidechain. Pattern: PDG.
diverging turn Exercise: Find the diverging turn in 1CSK. “Diverging turns” have a Type-2 beta turn and two strands that do not pair. The consensus sequence pattern is PDG. The residue before G can be anything polar, but not a D or an N.
Greek key motif One of the most common arrangements of four strands. Two permutations. 2341 and 3214 2 3 4 1 beta meander Exercise: Download 2PLT. Find the Greek key motifs.
bab motif When a helix occurs between two strands, they are often paired in a parallel sheet. The cross-over from one strand to the next is almost always right-handed. It is not known why this is true. (NOTE: the cross-over is always right-handed even when the two strands are not paired!!)
TOPS diagrams beta strand pointing up beta strand pointing down alpha helix bab motif C N Note R-handed crossover!
babab motif 1 2 3 Only 3 are possible. (with R-handed crossovers) 2 1
Efimov’s theory A. Efimov showed that almost all protein structures can be classified as being one of 7 trees, each starting with a motif and “growing” by one secondary structure unit at time. Does this recapitulate evolution? the folding pathway?
Domains Proteins fold heirarchically. secondary structure --> motifs --> subdomains-->domains --> multidomain proteins --> complexes Most domains fold independently of the rest of the chain.
All alpha folds 3-helix bundle 1CUK 156-203 4-helix bundle 1NFN Globin 1DWT ...many many other
alpha/beta folds “TIM barrel” 8TIM “Rossman fold” 1HET 176-318 many others... Exercise: in 1HET, find the “crevice” between two babab motifs where NAD binding commonly occurs. Exercise: Draw TOPS diagram for 1HET 176-318.
all beta folds “Jelly roll” 1JMU 371-500:D, 2PLT barrels 1EMC propellers beta-helix Exercise: Draw TOPS diagrams for the two domains above.
Term projects Look at the description online. Choose a molecule to study. Check with me before you start. Look at “Molecule of the Month” at RCSB.org for inspiration. But don’t pick one of these molecules.