Marcin Pacholczyk, Silesian University of Technology
Physics-based Laws of Physics – electrostatics, van der Waals, molecular flexibility, geometry of hydrogen bonds Computationally intensive, some effects difficult to model (desolvation) Knowledge-based Relatively simple, based on observation Training set!
Poisson-Boltzmann equation + Lenard-Jones potential
Robertson and Varani 2007 Gibbs energy probability of „correctness”
Probability of individual atomic contact P(C) – Bayesian prior of observing native-like protein-DNA complex – set to 1. Robertson and Varani 2007
Probability function Continous d ij is mapped to a set of discrete distance bins b 0, b 1, …, b n with distance cutoffs d b0, d b1, …, d bn A count is assigned to b i if d bi-1 d ij < d bi 3 Å, 4 Å, 5 Å, 6 Å, 7 Å, 8 Å, 9 Å, 10 Å Robertson and Varani 2007
Marginal distribution N C – total number of observed contacts between interface atoms of all types, at all distances in the training set Robertson and Varani 2007 Training set – Nucleic Acid Database ndbserver.rutgers.edu
Almanova et al Three members of the NF- B family of TF p50p50 homodimer (1NFK) p50RelB heterodimer (2V2T) p50p65 heterodimer (1VKX) Complexes with DNA fragments DNA chains were mutated one base pair at each step (backbone fixed) DNA chains were mutated (MMTSB – Multiscale Modeling Tools for Structural Biology) one base pair at each step (backbone fixed)
Almanova et al Three members of the NF- B family of TF p50p50 homodimer (1NFK) p50RelB heterodimer (2V2T) p50p65 heterodimer (1VKX) Complexes with DNA fragments (PDB)
p50p50 p50p65 p50RelB
DNA chains were mutated one base pair at each step (backbone fixed) DNA chains were mutated (MMTSB – Multiscale Modeling Tools for Structural Biology) one base pair at each step (backbone fixed) 4N + R All weights PWM linear equation All weights w(i, u) in the PWM predicted by solving the linear equation: X estimated weights X is a vector of 4N dimensions of the estimated weights A A is a binary matrix of dimensions ( 4N, 4N + R ), with all random DNA sequences whose free binding energy was computed. free binding energy vector The free binding energy vector b consists of 4N + R values obtained with the protein-DNA scoring procedure
Almanova et al. 2010
p50p50 p50RelB p50p65 TRANSFACV$NFKAPPAB_01
AlmanovaDDNA2TRANSFACp50p p50RelB2.84- p50p Relative entropy Almanova et al. 2010
69 human genes regulated by NF- B with 124 promoter sequences (TRANSPRO) Experimentally confirmed 31 out of 124 promoters belonging to 25 genes Matrix scan with Match on 58 confirmed binding sitesAlmanovaTRANSFACp50p5030 (5)25V$P50P50_Q3 p50p6525 (5)26 (6)V$P50RELAP65 _Q5_01 Binding site discovery Almanova et al. 2010
AlmanovaDDNA2TRANSFACp50p p50p AUC Almanova et al. 2010
Discovery of novel NF- B binding sites Investigation of postranslational modifications like RelA Ser 276 phosphorylation (Nowak et al. 2008) It is possible to compute PWMs which perform comparably to the ones derived from experimental data (TRANSFAC) Thermodynamic based models of transcriptional regulation including Synergistic Activation, Cooperative Binding and Short-Range Repression (He et al. 2010)