11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function1 11/4/05 Protein Structure & Function
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function2 Announcements Exam 2 - Has been graded - Will be returned at end of class today Grade statistics – 444 Average = 81/ Average = 100/118 Questions?
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function3 Announcements BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri PM - Approvals/responses & tentative presentation schedule to students Dec 2 Fri noon - Written project reports due Dec 5,7,8,9 class/lab- Oral Presentations (20') (Dec 15 Thurs = Final Exam)
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function4 Bioinformatics Seminars Nov 4 Fri 12:10 PM BCB Faculty Seminar in E164 Lago How to do sequence alignments on parallel computers Srinivas Aluru, ECprE & Chair, BCB Program Next week: Nov 10 Thurs 3:40 PM ComS Seminar in 223 Atanasoff Computational Epidemiology Armin R. Mikler, Univ. North Texas
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function5 Bioinformatics Seminars CORRECTION: Week after next - Baker Center/BCB Seminars:Baker Center/BCB Seminars: (seminar abstracts available at above link) Nov 14 Mon 1:10 PM Doug Brutlag, Stanford Discovering transcription factor binding sites Nov 15 Tues 1:10 PM Ilya Vakser, Univ Kansas Modeling protein-protein interactions both seminars will be in Howe Hall Auditorium
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function6 RNA Structure & Function/Prediction Protein Structure & Function Mon Review - promoter prediction RNA structure & function Wed RNA structure prediction 2' & 3' structure prediction miRNA & target prediction - Lab 10 Fri - a few more words re: Algorithms Protein structure & function
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function7 Reading Assignment (for Fri/Mon) Mount Bioinformatics Chp 10 Protein classification & structure prediction pp Ck Errata: Other? That should be plenty…
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function8 Review last lecture: RNA Structure Prediction
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function9 miRNA and RNAi pathways RISC Dicer precursor miRNA siRNAs Dicer “translational repression” and/or mRNA degradation mRNA cleavage, degradation RNAi pathway microRNA pathway MicroRNA primary transcript Exogenous dsRNA, transposon, etc. target mRNA Drosha RISC C Burge 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function10 miRNA Challenges for Computational Biology Find the genes encoding microRNAs Predict their regulatory targets Integrate miRNAs into gene regulatory pathways & networks Computational Prediction of MicroRNA Genes & Targets C Burge 2005 Need to modify traditional paradigm of "transcriptional control" primarily by protein-DNA interactions to include miRNA regulatory mechanisms!
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function11 RNA structure prediction strategies 1)Energy minimization (thermodynamics) 2) Comparative sequence analysis (co-variation) 3) Combined experimental & computational Secondary structure prediction
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function12 Secondary structure prediction strategies 1)Energy minimization (thermodynamics) Algorithm: Dynamic programming to find high probability pairs (also, some genetic algorithms) Software: Software Mfold - Zuker Vienna RNA Package - Hofacker RNAstructure - Mathews Sfold - Ding & Lawrence R Knight 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function13 Secondary structure prediction strategies 2) Comparative sequence analysis (co-variation) Algorithms: Mutual information Stochastic context-free grammars Software: Software ConStruct Alifold Pfold FOLDALIGN Dynalign R Knight 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function14 Secondary structure prediction strategies 3) Combined experimental & computational Experiment: Map single-stranded vs double-stranded regions in folded RNA How? Enzymes: S1 nuclease, T1 RNase Chemicals: kethoxal, DMS R Knight 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function15 Experimental RNA structure determination? X-ray crystallography NMR spectroscopy Enzymatic/chemical mapping Molecular genetic analyses
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function16 1) Energy minimization method What are the assumptions? Native tertiary structure or "fold" of an RNA molecule is (one of) its "lowest" free energy configuration(s) Gibbs free energy = G in kcal/mol at 37 C = equilibrium stability of structure lower values (negative) are more favorable Is this assumption valid? in vivo? - this may not hold, but we don't really know
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function17 Free energy minimization What are the rules? A U A=U Basepair G = -1.2 kcal/mole A U U A A=U U=A G = -1.6 kcal/mole Basepair What gives here? C Staben 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function18 Energy minimization calculations: Base-stacking is critical - Tinocco et al. C Staben 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function19 Nearest-neighbor parameters Most methods for free energy minimization use nearest-neighbor parameters (derived from experiment) for predicting stability of an RNA secondary structure (in terms of G at 37 C) & most available software packages use the same set of parameters : Mathews, Sabina, Zuker & Turner, 1999
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function20 Energy minimization - calculations: Total free energy of a specific conformation for a specific RNA molecule = sum of incremental energy terms for: helical stacking (sequence dependent) loop initiation unpaired stacking (favorable "increments" are < 0) Fig 6.3 Baxevanis & Ouellette 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function21 But how many possible conformations for a single RNA molecule? Huge number: Zuker estimates (1.8) N possible secondary structures for a sequence of N nucleotides for 100 nts (small RNA…) = 3 X structures! Solution? Not exhaustive enumeration… Dynamic programming O(N 3 ) in time O(N 2 ) in space/storage iff pseudoknots excluded, otherwise: O(N 6 ), time O(N 4 ), space
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function22 Algorithms based on energy minimization For outline of algorithm used in Mfold, including description of dynamic programming recursion, please visit Michael Zuker's lecture: From this site, you may also download his lecture as either PDF or PS file. Hmmm, something based on this might make an interesting "Final Exam" question: how could one apply dynamic programming approaches learned in first half of course to RNA structure prediction problem?
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function23 2) Comparative sequence analysis (co-variation) Two basic approaches: Algorithms constrained by initial alignment Much faster, but not as robust as unconstrained Base-pairing probabilities determined by a partition function Algorithms not constrained by initial alignment Genetic algorithms often used for finding an alignment & set of structures
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function24 RNA Secondary structure prediction: Performance? How evaluate? Not many experimentally determined structures currently, ~ 50% are rRNA structures so "Gold Standard" (in absence of tertiary structure): compare with predicted RNA secondary structure with that determined by comparative sequence analysis (!!??) using Benchmark Datasets NOTE: Base-pairs predicted by comparative sequence analysis for large & small subunit rRNAs are 97% accurate when compared with high resolution crystal structures! - Gutell, Pace
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function25 RNA Secondary structure prediction: Performance? 1)Energy minimization (via dynamic programming) 73% avg. prediction accuracy - single sequence 2) Comparative sequence analysis 97% avg. prediction accuracy - multiple sequences (e.g., highly conserved rRNAs) much lower if sequence conservation is lower &/or fewer sequences are available for alignment 3) Combined - recent developments: combine thermodynamics & co-variation & experimental constraints? IMPROVED RESULTS
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function26 RNA structure prediction strategies Requires "craft" & significant user input & insight 1)Extensive comparative sequence analysis to predict tertiary contacts (co-variation) e.g., MANIP - Westhof 2)Use experimental data to constrain model building e.g., MC-CYM - Major 3)Homology modeling using sequence alignment & reference tertiary structure (not many of these!) 4)Low resolution molecular mechanics e.g., yammp - Harvey Tertiary structure prediction
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function27 New Today: Protein Structure & Function
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function28 Protein Structure & Function Protein structure - primarily determined by sequence Protein function - primarily determined by structure Globular proteins: compact hydrophobic core & hydrophilic surface Membrane proteins: special hydrophobic surfaces Folded proteins are only marginally stable Some proteins do not assume a stable "fold" until they bind to something = Intrinsically disordered Predicting protein structure and function can be very hard -- & fun!
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function29 4 Basic Levels of Protein Structure
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function30 Primary & Secondary Structure Primary Linear sequence of amino acids Description of covalent bonds linking aa’s Secondary Local spatial arrangement of amino acids Description of short-range non-covalent interactions Periodic structural patterns: -helix, -sheet
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function31 Tertiary & Quaternary Structure Tertiary Overall 3-D "fold" of a single polypeptide chain Spatial arrangement of 2’ structural elements; packing of these into compact "domains" Description of long-range non-covalent interactions (plus disulfide bonds) Quaternary In proteins with > 1 polypeptide chain, spatial arrangement of subunits
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function32 "Additional" Structural Levels Super-secondary elements Motifs Domains Foldons