Challenges and accomplishments in molecular prediction Yanay Ofran
accumulation of data not knowledge >70 million (as of ) Off chart since 1997
Central dogma – it’s all in the sequence DNARNAProtein Function Structure
- Structure - Function - Interaction
Annotation transfer: structure Rost (1999) Protein Engineering 12: PDB similar structure dissimilar structure
Score = 83.2 bits (205), Expect = 9e-17 Identities = 18/101 (X%), Positives = 36/101 (35%), Gaps = 2/101 (1%) Query: 111 AAGGIAAKYLARKNSSVFGFIGCGTQAYFQLEALRRVFDIGEVKAYDVREKAAKKF 170 AA +A + L + +G G ++L + V + + A + Sbjct: 153 AAVELAERELGSLHDKTVLVVGAGEMGKTVAKSLVD-RGVRAVLVANRTYERAVEL 211 Query: 171 EDRGISASVQPAEEASRCDVLVTTTPSRKPVVKAEWVEEGT R DV+V+ T + PV+ + V E Sbjct: 212 GGEAVRFDE-LVDHLARSDVVVSATAAPHPVIHVDDVREAL % 150aa >P1 MEDLVSVGITHKEAEVEELEKARFESDEAVRDIVESFGLSGS VLLQTSNRVEVYASGARDRAEELGDLIHDDAWVKRGSEAVRH LFRVASGLESMMVGEQEILRQVKKAYDRAARLGTLDEALKIV FRRAINLGKRAREETRISEGAVSI >P2 METLILTQEEVESLISMDEAMNAVEEAFRLYALGKAQMPPKV YLEFEKGDLRAMPAHLMGYAGLKWVNSHPGNPDKGLPTVMAL MILNSPETGFPLAVMDATYTTSLRTGAAGGIAAKYL P1P2 Structure prediction by homology
Score = 33.9 bits (77), Expect = Identities = 14/58 (y%), Positives = 28/58 (48%), Gaps = 2/58 (3%) Query: 178 SVQPAEEASRCDVLVTTTPSRKPVVKAEWVEEGTHINAIGADGPGKQELD-VEILKKA EE ++ D+LV T + +VK EW++ G + G + ++ E ++A Sbjct: 198 TAHLDEEVNKGDILVVATGQPE-MVKGEWIKPGAIVIDCGINYKVVGDVAYDEAKERA 254 >P1 MEDLVSVGITHKEAEVEELEKARFESDEAVRDIVESFGLSGS VLLQTSNRVEVYASGARDRAEELGDLIHDDAWVKRGSEAVRH LFRVASGLESMMVGEQEILRQVKKAYDRAARLGTLDEALKIV FRRAINLGKRAREETRISEGAVSI >P2 MLELLPTAVEGVSQAQITGRPEWIWLALGTALMGLGTLYFLV KGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPF GGEQNPIYWARYADWLFTTPLLLLDLALLVDADQGTILALVG ADGIMIGTGLVGALTK P1 40% 50aa P2 Structure prediction by homology
Structure prediction from sequence Liu & Rost (2002) Bioinformatics 18:
Rost et al. (2003) CMLS 60: Annotation Transfer %seq id.Hssp val E val Annotation transfer: Function
Annotation transfer: interaction Mika et al. (2006) PLoS CB Protein A and protein B bind each other. Do A’ and B’, their respective homologues, interact as well?
Interaction sites by homology
Annotation Transfer Limit of annotation transfer Seq ID 100% 0% structureFunctioninteraction Blind annotation transfer Ab initio
Annotation Transfer Template and model
>5(5; 2)(0.5;2)(0.5; 0.5-)(-2;-0.5)(2-; 5-)<5-
Combine ( I-TASSER, by Zhang )
Annotation Transfer Limit of annotation transfer
Annotation Transfer Some methods can do it
Local vs. non local interaction Levinthal “Paradox”: A protein with 100 amino acid has ~10 48 possible conformations => calculation unfeasible. Let’s assumes (generously): A protein can sample structures per second. It would take this protein about seconds ~ years to try out all the possible conformations. (Time since the big bang ~10 10 years).
Local vs. non local interaction
Annotation Transfer Low RMSD Witsow & Piatigorsky (1999) Science
Annotation Transfer High RMSD Subtilin (5sic) Chymotrypsin (5cha)
Annotation Transfer Challenges for next CASP i.modeling the structure of single-residue mutants. ii.modeling structure changes associated with specificity changes within protein families. iii.devise scoring functions that will reliably pick the most accurate models from a set of candidate structures produced by current new fold methods.
Annotation Transfer Olympic games of predictions Structure – CASP Interaction – CAPRI Function – AFP, CASP
Combine interaction + seq. analysis to predict function Li et al (2005) Nature biotech
Combine interaction + seq. analysis to predict function Li et al (2005) Nature biotech
- Structure - Function - Interaction
Predicting DNA binding sites Ofran et al (2007) in press
Predicting DNA binding sites Ofran et al (2007) in press
c-Myb + C/EBPβ bound to DNA
Identifying novel DNA binding proteins accuracy