Force Fields Summary. Force Fields 2 What is a Force Field ? A force field is a set of equations and parameters which when evaluated for a (molecular)

Slides:



Advertisements
Similar presentations
Chemistry 2100 Lecture 10.
Advertisements

A Ala Alanine Alanine is a small, hydrophobic
Review of Basic Principles of Chemistry, Amino Acids and Proteins Brian Kuhlman: The material presented here is available on the.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Force Fields G Vriend It is all about time versus accuracy Quantum chemistry Approximations Force Fields Hybrid methods Self consistent fields.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Applied Bioinformatics The amino acids. Overview Proteins (sneak preview) – Primary structure – Secondary structure – Tertiary structure The amino acids.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Sequence analysis June 19, 2007 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Sequence analysis June 17, 2003 Learning objectives-Review amino acids structures. Understand sliding window programs. Understand difference between identity,
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
It & Health 2009 Summary Thomas Nordahl Petersen.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Sampling Distributions
Force Fields G Vriend Force Fields 2 What is a Force Field ? A force field is a set of equations and parameters which when evaluated for a.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
pH and Hydronium Ion Concentration
Entropy and the Second Law Lecture 2. Getting to know Entropy Imagine a box containing two different gases (for example, He and Ne) on either side of.
Proteins Secondary Structure Predictions Structural Bioinformatics.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
BINF6201/8201 Hidden Markov Models for Sequence Analysis
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Force Fields G Vriend Force Fields 2 What is a Force Field ? A force field is a set of equations and parameters which when evaluated for a.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
Department of Mechanical Engineering
Amino Acids & Side Groups Polar Charged ◦ ACIDIC negatively charged amino acids  ASP & GLU R group with a 2nd COOH that ionizes* above pH 7.02nd COOH.
Secondary structure prediction
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
CS790 – BioinformaticsProtein Structure and Function1 Review of fundamental concepts  Know how electron orbitals and subshells are filled Know why atoms.
Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,
Distributions of the Sample Mean
Force Fields Force Fields Seminar 3 in the series… G Vriend
Protein Secondary Structure Prediction G P S Raghava.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Energy forms and transformations. What is energy? We use the word all the time – but very few people have a strong understanding what it is It.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Proteins Structure of proteins Proteins are made of C, H, O and nitrogen and may have sulfur. The monomers of proteins are amino acids An amino acid.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
Chapter 1 ChemicalFoundations. Copyright © Houghton Mifflin Company. All rights reserved.CRS Question, 1–2 QUESTION The difference between a scientific.
Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Decision Analysis Basics Slide 1 of 21 Calculations.
Step 3: Tools Database Searching
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Electrostatics of Channels—pK’s and potential of mean force for permeation Sameer Varma, NCSA/UIUC/Beckman Institute Computational Biology/Nanoscience.
Proteins Structure Predictions Structural Bioinformatics.
Doug Raiford Lesson 14.  Reminder  Involved in virtually every chemical reaction ▪ Enzymes catalyze reactions  Structure ▪ muscle, keratins (skin,
Fibrous Proteins Examples 1. a-keratins 2. Silk Fibroin 3. Collagen
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Protein Structure and Properties
Hidden Markov Models Part 2: Algorithms
Computational Analysis
Sequence Similarity Andrew Torda, wintersemester 2006 / 2007, Angewandte … What is the easiest information to find about a protein ? sequence history.
Using the Rule Normal Quantile Plots
Presentation transcript:

Force Fields Summary

Force Fields 2 What is a Force Field ? A force field is a set of equations and parameters which when evaluated for a (molecular) system yields an energy There many different types of force fields: Quantum Chemistry -> Molecular dynamics (the MoaFF) (Seminar 6) Electrostatic calculations; self consistent field; finite differences Statistics; Chou and Fasman type FF; Other

Force Fields 3 Back to proteins and MD/EM We have seen that the few forces that we (think that we) understand mainly are of the form Q=k*(x-x 0 ) In this equation x 0 is known with great precision, while k can easily be wrong by a factor of two or more. Can we use the precision of x 0 ? 2

Force Fields 4 Electrostatic calculations Often physics looks like Chinese typed backwards by a drunken sailer, but when you spend a bit of time, you will that things actually are easy. Take the Poisson Bolzmann equation that is used for electrostatic calculations: which can be converted into: This looks clearly impossible, but after a few days of struggling, it becomes rather trivial (next slide):

Force Fields 5 Electrostatic calculations The Poisson Boltzman equation is worked out digitally, i.e., make a grid, and give every voxel (grid-box) a charge and a dielectricum. Now make sure neighbouring grid points have the correct relations. If a voxel has ‘too much charge’ it should give some charge to the neighbours. This is done iteratively till self-consistent. The same technology is used to design nuclear bombs, predict the weather (including the future path of tornados), design the hood of luxury cars, predict how water will flow in the Waal, optimize catalysts in mufflers, optimize the horse powers of a car given a certain amount of gasoline/sec (turbo chargers), etc. And the function is very simple!

Force Fields 6 Other force fields Force fields do not need to be based on concepts of physics. You can also base a FF on statistics. The idea being that if you see it often, it must have a high probability. So, a variant on the sequence rule: If it is important, you see it often. And now we will do an experiment counting sheep.

Force Fields 7 Other force fields Force fields do not need to be based on atoms. A very different concept would be a secondary structure evaluation force field: Take many different proteins and determine their secondary structure. Determine how many residues in total are H, S, or R, and do the same for each residue type. Determine all frequencies: P(aa,HSR)=P(aa)*P(HSR) Calibrate the method Use it by looping over the amino acids in the protein to be tested and multiply all chances P(aa,HSR).

Force Fields 8 One ‘serious’ example: Chou and Fasman Example of Chou and Fasman: We count all amino acids in a dataset of 400 proteins with know structure (they had many fewer proteins available in 1974, but anyway...) These 400 proteins in total have amino acids. Ala7.123 = 7.0%Helix = 34.3% Cys1.232 = 1.2%Sheet = 26.5% Asp5.993 etcRest = 39.3% Glu6.086 Phe4.822 Gly7.339 His 989 Ile6.550 Lys8.127

Force Fields 9 What is the null-model? The null-model is the model that assumes that there is no signal in the input data. In case of our Chou-and-Fasman example, the null model assumes that there is no relation between the amino acid type and the secondary structure. So, if 7% (0.07) of all amino acids are of type Ala, and ~34% (0.34) of all amino acids are in a helix, then 7% of 34% (0.07*0.34) is 2.4% (0.024) of all alanines should be observed in a helix. And since that isn’t true, we can make a model that differs from the null- model, and thus we can make predictions.

Force Fields 10 Chou and Fasman; null-model These 400 proteins in total have amino acids. Ala7.123 = 7.0%Helix = 34.3% Cys1.232 = 1.2%Sheet = 26.5% Asp5.993 etcRest = 39.3% (Ala,Helix) predicted =0.07*34.3=2.4% or 2505 Ala-in-helix predicted in the data set of 400 proteins. This is the null-model. But we count 3457 Ala-in-helix; that is 1.38 times ‘too many’. So chances are ‘better than random to find an alanine in a helix. How do we quantify this? So there is an above average preference for Ala to be in a helix.

Force Fields 11 Come to the rescue, one long dead physicist This is at the basis of: ΔG = ΔH - TΔS ΔG = -RTln(K) And of Vriend’s rule of 10...

Force Fields 12 One ‘serious’ example: Chou and Fasman These 400 proteins in total have amino acids. Ala7.123 = 7.0%Helix = 34.3% Cys1.232 = 1.2%Sheet = 26.5% Asp5.993 etcRest = 39.3% (Ala,Helix) predicted =0.07*34.3=2.4% or 2505 Ala-in-helix predicted in the data set of 400 proteins. This is the null-model. But we count 3457 Ala-in-helix; that is 1.38 times ‘too many’. So the ‘score’ for (Ala,helix) = Pref(A,H)= ln(observed/predicted) = ln(3457/2505)=ln(1.38)=0.32. The preference parameter Pref(A,H) is positive. So, here positive is good (unlike ΔG or AIDS tests). And how do we now predict the secondary structure of a protein?

Force Fields 13 And the other way around ΔG= -RT.ln(K) ΔG is just over 1kCal/Mole when K=10 and K is the ratio between two ‘somethings’ (can be anything). Swimming into a gradient of a factor 10 costs 1 kCal/Mole. A pH unit difference must be ‘worth’ a kCal/Mole. A nice exam question would be to think of an example of this ‘law of 10’ that hasn’t been discussed in the course yet...