Presentation is loading. Please wait.

Presentation is loading. Please wait.

A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011.

Similar presentations


Presentation on theme: "A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011."— Presentation transcript:

1 A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

2 Overview 1. Background & Motivation 2. Preliminary Research 3. Proposed Future Research

3 Fold Space What protein folds are possible? Discrete or Continuous? Both? Neither? What portion of fold space is utilized by nature? Long debated questions. Why? Understanding of structure-function relationship Protein design/engineering Protein evolution Classification

4 Previous Work Orengo, Flores, Taylor, Thornton. Protein Eng (1993) vol. 6 (5) pp. 485-500 Holm and Sander. J Mol Biol (1993) vol. 233 (1) pp. 123-38 Holm and Sander. Science (1996) vol. 273 (5275) pp. 595- 603 Shindyalov and Bourne. Proteins (2000) vol. 38 (3) pp. 247-60 Hou, Sims, Zhang, Kim. PNAS (2003) vol. 100 (5) pp. 2386-90 Taylor. Curr Opin Struct Biol (2007) vol. 17 (3) pp. 354-61 Sadreyev et al. Curr Opin Struct Biol (2009) vol. 19 (3) pp. 321-8 α α+β β α/β

5 Why can we do better? More structures Sampling of globular folds “saturated” Few novel folds being discovered Geometric arguments for saturation of small protein folds Recent all-vs-all computation Cluster sequence to 40% identity 17,852 representative (updated weekly) 189 million FATCAT rigid-body alignments 73503 http://www.rcsb.org/pdb/statistics/cont entGrowthChart.do?content=total&se qid=100 Accessed 5/31/2011

6 Structural Similarity Graph Nodes: PDB chains, non-redundant to 40% Edges: FATCAT-rigid alignments “Significant” edges: p<0.001 Length > 25 Coverage > 50 Hierarchically cluster to reduce complexity in visualization a b a/b a+b Multi Membrane Small

7 Agreement with SCOP Classp<10 -6 Foldp<10 -7 Superfamilyp<10 -10

8 Continuity Grishin. J Struct Biol (2001) vol. 134 (2-3) pp. 167-85 Skolnick claims ≤ 7 intermediates between any proteins We observe network diameter=15 Can find interesting paths

9 C4C4 C5C5 C6C6 C7C7 Symmetry Beta Propellers

10 Symmetry Functionally important Protein evolution (e.g. beta-trefoil) DNA binding Allosteric regulation Cooperativity Widespread (~20% of proteins) Focus of algorithmic work FGF-1Lee & Blaber. PNAS 2011 TATA Binding Protein 1TGH Hemoglobin 4HHB

11 Cross-class example 3GP6.A PagP, modifies lipid A f.4.1 (transmembrane beta- barrel) 1KT6.A Retinol-binding protein b.60.1 (Lipocalins)

12 Summary of Preliminary Research Calculated all-vs-all alignment Prlić A, Bliven S, Rose PW, Bluhm WF, Bizon C, Godzik A, Bourne PE. Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics (2010) vol. 26 (23) pp. 2983-2985 Built network of significant alignments Approximately matches SCOP classifications Improved structural alignment algorithms Identify symmetry, circular permutations, topology independent alignments Discussed more in report

13 Future Research Improve the network 1. Improve all-vs-all comparison algorithm 2. Tune parameters during graph generation Annotate the network & draw biological inferences 3. Annotate nodes with functional information 4. Compare with other networks Create new networks 5. Enhance structural comparison algorithms

14 1. Improve all-vs-all comparison algorithm Need domain decomposition Use Combinatorial Extension (CE)

15 2. Tune parameters during graph generation Don’t use p-values Shouldn’t compare p-values, statistically* Not normalized by secondary structure Not accurate due to multiple testing problem Use TM-score RMSD, normalized to the alignment length Determine optimal thresholds for determining “significance” For instance, train an SVG * Technically ok here, since one-to-one with the FATCAT score

16 FATCAT p-value by Class Perform poorly on all-alpha in “twilight zone” Terrible on membrane proteins Probably reflects non- structural considerations in SCOP assignment

17 (Dis)agreement with SCOP by Class

18 3. Annotate nodes with functional information SCOP/CATH classifications GO terms Metal binding Ligand binding Symmetry a b a/b a+b Multi Membrane Small

19 4. Compare with other networks Define other types of network over the set of protein representatives Protein-protein interactions Co-expression Correlate to the structural similarities Structural similarity Protein-protein interaction

20 5. Enhance structural comparison algorithms Improve automated pseudo-symmetry detection Find topology-independent relationships C3C3

21 Summary Fold space as network Improve network creation Annotate network with functional information Improve structural similarity detection

22 Remaining Challenges Short Term: Hierarchical clustering amplifies errors Bias towards short, helical alignments Better metric of clustering accuracy Correct p-value calculation (remove secondary structure bias), or use TM-value as threshold Long Term Including more functional characteristics (metal ions, GO terms, HDX profiles) Use other types of similarity to construct graph

23 Acknowledgments Bourne Lab Philip Bourne Andreas Prlić Lab & PDB members Qualifying Exam Committee Ruben Abagyan Patricia Jennings Andy McCammon Collaborators Philippe Youkharibache Jean-Pierre Changeux Rotation Advisors Pavel Pevzner Philip Bourne José Onuchic & Pat Jennings Mike MacCoss Virgil Woods


Download ppt "A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011."

Similar presentations


Ads by Google