Computers in Chemistry Dr John Mitchell University of St Andrews.

Slides:



Advertisements
Similar presentations
Simulazione di Biomolecole: metodi e applicazioni giorgio colombo
Advertisements

Computers in Chemistry Dr John Mitchell & Rosanna Alderson University of St Andrews.
Crystallography, Birkbeck MOLECULAR SIMULATIONS ALL YOU (N)EVER WANTED TO KNOW Julia M. Goodfellow Dynamic Processes: Lecture 1 Lecture Notes.
In silico calculation of aqueous solubility Dr John Mitchell University of St Andrews.
CHAPTER 14 THE CLASSICAL STATISTICAL TREATMENT OF AN IDEAL GAS.
Quantum translation-rotation dynamics of hydrogen molecules confined in the cages of clathrate hydrates Zlatko Bacic High-dimensional quantum dynamics.
CHE Inorganic, Physical & Solid State Chemistry Advanced Quantum Chemistry: lecture 4 Rob Jackson LJ1.16,
Solvation Models. Many reactions take place in solution Short-range effects Typically concentrated in the first solvation sphere Examples: H-bonds,
In silico prediction of solubility: Solid progress but no solution? Dr John Mitchell University of St Andrews.
Introduction to Molecular Orbitals
Computational Chemistry
© 2014 Carl Lund, all rights reserved A First Course on Kinetics and Reaction Engineering Class 3.
Anatoly B. Kolomeisky Department of Chemistry MECHANISMS AND TOPOLOGY DETERMINATION OF COMPLEX NETWORKS FROM FIRST-PASSAGE THEORETICAL APPROACH.
Incorporating Solvent Effects Into Molecular Dynamics: Potentials of Mean Force (PMF) and Stochastic Dynamics Eva ZurekSection 6.8 of M.M.
What is e-Science? e-Science refers to large scale science that will increasingly be carried out through distributed global collaborations enabled by the.
Case Studies Class 5. Computational Chemistry Structure of molecules and their reactivities Two major areas –molecular mechanics –electronic structure.
CHEMISTRY 2000 Topic #1: Bonding – What Holds Atoms Together? Spring 2008 Dr. Susan Lait.
Computers in Chemistry Dr John Mitchell University of St Andrews.
Application and Efficacy of Random Forest Method for QSAR Analysis
Solubility is an important issue in drug discovery and a major source of attrition This is expensive for the pharma industry A good model for predicting.
Computational Chemistry. Overview What is Computational Chemistry? How does it work? Why is it useful? What are its limits? Types of Computational Chemistry.
An Introduction to Molecular Orbital Theory. Levels of Calculation Classical (Molecular) Mechanics quick, simple; accuracy depends on parameterization;
Thorium molten salts, theory and practice Paul Madden (Oxford, UK) & Mathieu Salanne & Maximilien Levesque (UPMC, France) Euratom Project, 13 Groups Molten.
Calculation of Molecular Structures and Properties Molecular structures and molecular properties by quantum chemical methods Dr. Vasile Chiş Biomedical.
Computational Chemistry
Molecular Modeling: The Computer is the Lab
Molecular Modeling Fundamentals: Modus in Silico C372 Introduction to Cheminformatics II Kelsey Forsythe.
By: Lea Versoza. Chemistry  A branch of physical science, is the study of the composition, properties and behavior of matter.  Is concerned with atoms.
Gaussian process modelling
Ch 23 pages Lecture 15 – Molecular interactions.
Computational Science jsusciencesimulation Principles of Scientific Simulation Spring Semester 2005 Geoffrey Fox Community.
1 Physical Chemistry III Molecular Simulations Piti Treesukol Chemistry Department Faculty of Liberal Arts and Science Kasetsart University :
R. Martin - Pseudopotentials1 African School on Electronic Structure Methods and Applications Lecture by Richard M. Martin Department of Physics and Materials.
Chem 1140; Molecular Modeling Molecular Mechanics Semiempirical QM Modeling CaCHE.
Limits and Horizon of Computing Post silicon computing.
Quantum Chemical and Machine Learning Calculations of the Intrinsic Aqueous Solubility of Druglike Molecules Dr John Mitchell University of St Andrews.
1 John Mitchell; James McDonagh; Neetika Nath Rob Lowe; Richard Marchese Robinson.
1 Modelling in Chemistry: High and Low-Throughput Regimes Dr John Mitchell Unilever Centre for Molecular Science Informatics Department of Chemistry University.
Chem. 860 Molecular Simulations with Biophysical Applications Qiang Cui Department of Chemistry and Theoretical Chemistry Institute University of Wisconsin,
Outline of Chapter 9: Using Simulation to Solve Decision Problems Real world decisions are often too complex to be analyzed effectively using influence.
Understanding Molecular Simulations Introduction
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
The Nuts and Bolts of First-Principles Simulation Durham, 6th-13th December : Computational Materials Science: an Overview CASTEP Developers’ Group.
Informed by Informatics? Dr John Mitchell Unilever Centre for Molecular Science Informatics Department of Chemistry University of Cambridge, U.K.
ELECTRONIC STRUCTURE OF MATERIALS From reality to simulation and back A roundtrip ticket.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
TURBOMOLE Lee woong jae.
In silico calculation of aqueous solubility Dr John Mitchell Unilever Centre for Molecular Science Informatics Department of Chemistry University of Cambridge,
Quantum Mechanics/ Molecular Mechanics (QM/MM) Todd J. Martinez.
Javier Junquera Introduction to atomistic simulation methods in condensed matter Alberto García Pablo Ordejón.
Role of Theory Model and understand catalytic processes at the electronic/atomistic level. This involves proposing atomic structures, suggesting reaction.
In silico calculation of aqueous solubility Dr John Mitchell University of St Andrews.
Generalized van der Waals Partition Function
Monatomic Crystals.
Theory of dilute electrolyte solutions and ionized gases
Advanced methods of molecular dynamics 1.Monte Carlo methods 2.Free energy calculations 3.Ab initio molecular dynamics 4.Quantum molecular dynamics 5.Trajectory.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Computational Physics (Lecture 11) PHY4061. Variation quantum Monte Carlo the approximate solution of the Hamiltonian Time Independent many-body Schrodinger’s.
Computational Physics (Lecture 10) PHY4370. Simulation Details To simulate Ising models First step is to choose a lattice. For example, we can us SC,
Comp. Mat. Science School Electrons in Materials Density Functional Theory Richard M. Martin Electron density in La 2 CuO 4 - difference from sum.
Computational Physics (Lecture 10)
Computer usage Notur 2007.
Overview of Molecular Dynamics Simulation Theory
Limits and Horizon of Computing
Electronic Structure and First Principles Theory
Prof. Sanjay. V. Khare Department of Physics and Astronomy,
Large Time Scale Molecular Paths Using Least Action.
Parallel computing in Computational chemistry
Humanity v The Machines
Presentation transcript:

Computers in Chemistry Dr John Mitchell University of St Andrews

1. Why? Working with experiment to test our theories. Computer uses theory to calculate an answer that can be compared with experiment. If prediction and experiment don’t agree, something has to give.

Atoms in molecules are not spherical

To Test Our Theories The theory that lies beneath chemistry is ultimately quantum physics. To turn this into a prediction of the rate of a chemical reaction or the frequency of a transition in an IR spectrum requires a lot of computation.

To Test Our Theories Computation’s ability to make accurate predictions of experimental measurements is a good test of the validity of a theory. We only understand if we can predict.

Crystal Structure Prediction Given the structural diagram of an organic molecule, predict the 3D crystal structure. Slide after SL Price, Int. Sch. Crystallography, Erice, 2004

To Access Data that Experiment can’t Computational chemistry also provides a way of obtaining information that would be very difficult, expensive or time-consuming to get experimentally. Behaviour at very high temperature or pressure. Details of structure of liquids at atomic scale. Dynamics of proteins.

Phase Changes of Iron in the Earth’s Core et al.,

Structure of Liquid Water and Water Clusters Computer simulations are an important source of evidence, since atomic scale details of an irregular structure are hard to obtain by experiment.

Dynamic Motions of Proteins X-ray crystallography gives a single static structure

Dynamic Motions of Proteins Simulation can show how the protein flexes

2. The Power to Compute

Development of Computer Power University of Manchester SSEM, 1948

Development of Computer Power IBM Roadrunner, 2008

Computer Power: Moore’s Law Computer power doubles every two years: exponential growth

Computer Power: Moore’s Law Logarithmic scale

Computer Power: Moore’s Law This growth will, eventually, slow down as components reach atomic scale … we think!

The Size of the Problem

Scaling Nonetheless, theoretical chemistry is expensive Often cost scales as the fourth power of molecule size

Typical scaling is ~N 4. For the foreseeable future, there will be chemical problems at the limit of our computing power.

3. Philosophies of Computational Chemistry

The Two Faces of Computational Chemistry Theoretical Chemistry Informatics

“The problem is difficult, but by making suitable approximations we can solve it at reasonable cost based on our understanding of physics and chemistry.” Philosophy of Theoretical Chemistry

Theoretical Chemistry Calculations and simulations based on real physics. Calculations are either quantum mechanical or use numbers derived from quantum mechanics. Attempt to model or simulate reality. Usually Low Throughput.

What Kinds of Theoretical Chemistry can be Done? Prof. Eitan Geva (1) Quantum Chemistry

What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry Using quantum mechanics to solve the structures and energetics of molecules; everything depends on the distribution of electrons.

What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry Although quantum chemistry involves solving Schrödinger’s equation, it is not fully exact. There are some approximations involved.

What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry Wavefunction  Distribution of electrons within the molecule

What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry Distribution of electrons  Physical and chemical behaviour of the molecule

What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry There are two main kinds of quantum chemistry: Ab initio Density Functional Theory

What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry Ab initio “from first principles”. Solve Schrödinger equation to get wavefunction. In principle rigorous – we know what we calculate. But the standard “Hartree-Fock” method contains significant approximations. Expensive to adjust for these and get more accuracy.

What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry Density Functional Theory Makes use of the theorem that all properties of interest can be determined directly from the electron density. True in principle, but the correct “functional” is unknown. Less rigorous than ab initio, but usually more accurate for an equivalent cost (or cheaper for similar accuracy).

What Kinds of Theoretical Chemistry can be Done? (2) Molecular Simulation

What Kinds of Theoretical Chemistry can be Done? (2) Molecular Simulation There are various techniques for simulating molecules, the most significant is probably Molecular Dynamics. Molecular Dynamics makes a “balls-and- springs” model of the molecule in the computer, and follows its behaviour over time.

What Kinds of Theoretical Chemistry can be Done? (2) Molecular Simulation Light-harvesting protein subunit.

What Kinds of Theoretical Chemistry can be Done? (2) Molecular Simulation Time steps need to be very, very short (~ seconds), so it takes a million steps to simulate one nanosecond of real time and a billion steps to simulate a microsecond. So it is hard to directly simulate relatively slow or rare events, such as protein folding.

What Kinds of Theoretical Chemistry can be Done? (2) Molecular Simulation Also, a balls-and-springs model lacks the quantum mechanics needed to simulate a chemical reaction. Nonetheless, molecular dynamics is very important for understanding shape changes, interactions and energetics of large molecules.

The Two Faces of Computational Chemistry Theoretical Chemistry Informatics

Philosophy of Informatics “The problem is too difficult to solve at reasonable cost based on real physics and chemistry, so instead we will build a purely empirical model to predict the required molecular properties from chemical structure, using the available data.”

Informatics In general, informatics methods represent phenomena mathematically, but not in a physics-based way. Inputs and output model are based on an empirically parameterised equation or more elaborate mathematical model. Do not attempt to simulate reality. Usually High Throughput.

What is Cheminformatics? Calculating or predicting molecular properties without using a physics-based approach. Rather than modelling how the molecular world really works, cheminformatics is an empirical discipline, using available data to find correlations between chemical structure and properties. Cheminformatics techniques are often used in drug discovery and pharmaceutical research, and the requirements of the pharmaceutical industry have dominated the development of the subject.

Modelling in Chemistry Density Functional Theory ab initio Molecular Dynamics Monte Carlo Docking PHYSICS-BASED EMPIRICAL ATOMISTIC Car-Parrinello NON-ATOMISTIC DPD CoMFA 2-D QSAR/QSPR Machine Learning AM1, PM3 etc. Fluid Dynamics LOW THROUGHPUT HIGH THROUGHPUT

4. How Best to Compute Solubility?

Which would you Prefer... or ?

Which would you Prefer... or ? Solubility in water (and other biological fluids) is highly desirable for pharmaceuticals!

Solubility is an important issue in drug discovery and a major cause of failure of drug development projects This is expensive for the industry A good computational model for predicting the solubility of druglike molecules would be very valuable.

Drug Disc.Today, 10 (4), 289 (2005)

Our Methods … (A) Thermodynamic Cycle (Theoretical chemistry)

We want to construct a theoretical model that will predict solubility for druglike molecules … We expect our model to use real physics and chemistry and to give some insight … We don’t expect it to be fast by informatics standards, but it should be reasonably accurate … Our Thermodynamic Cycle method …

Can we use theoretical chemistry to calculate solubility via a thermodynamic cycle?

 G sub comes from lattice energy minimisation based on the experimental crystal structure.

Calculate Energy of Infinite Periodic Lattice Unit cell

Calculate Energy of Infinite Periodic Lattice Take one molecule Solve its Schrödinger equation Calculate its interactions Allow unit cell to change Find best size, shape, packing Find energy of infinite lattice This is the same methodology as used in crystal structure prediction.

 G sub comes from lattice energy minimisation based on the experimental crystal structure.

 G solv comes from a computational solvation model, RISM

Model of Solvent-Solute Interaction Calculate energy of interaction between solute and solvent Model is called RISM

 G solv comes from model of solvent-solute interaction

Theoretical Chemistry: Solubility Results

These results are OK, but we would hope to do better

Our Methods … (B) Random Forest (informatics)

We want to construct a model that will predict solubility for druglike molecules … We don’t expect our model either to use real physics and chemistry or to be easily interpretable … We do expect it to be fast and reasonably accurate … Our Random Forest Model …

Random Forest This is a decision tree. We use lots of them to make a forest! A Machine Learning Method

Random Forest This is a decision tree.

Random Forest Generate more trees randomly. (1) By randomly sampling with replacement to make different “bootstrap samples” of the data for each tree.

Random Forest Generate more trees randomly. (2) By randomly choosing the pool of questions to ask of the data for each node (junction) of each tree.

Random Forest ● Machine Learning method introduced by Briemann and Cutler (2001) ● Development of Decision Trees (Recursive Partitioning): ● Dataset is partitioned into consecutively smaller subsets ● Each partition is based upon the value of one descriptor ● The descriptor used at each split is selected so as to optimise splitting ● Bootstrap sample of N objects chosen from the N available objects with replacement

Random Forest Generate more trees randomly.

Random Forest Generate more trees randomly.

Random Forest Generate more trees randomly.

Random Forest Generate more trees randomly. We use lots of them to make a forest!

Random Forest for Solubility Prediction A Forest of Regression Trees Each leaf contains a group of molecules with similar solubility.

Random Forest The molecules whose solubility is to be predicted are run through every tree (~ flow chart) in the forest. Each tree predicts a solubility for each molecule. We average the predictions over hundreds of different trees.

Random Forest

Random Forest: Solubility Results RMSE(te)=0.69 r 2 (te)=0.89 Bias(te)=-0.04 RMSE(tr)=0.27 r 2 (tr)=0.98 Bias(tr)=0.005 RMSE(oob)=0.68 r 2 (oob)=0.90 Bias(oob)=0.01 DS Palmer et al., J. Chem. Inf. Model., 47, (2007)

RMSE(te)=0.69 r 2 (te)=0.89 Bias(te)=-0.04 RMSE(tr)=0.27 r 2 (tr)=0.98 Bias(tr)=0.005 RMSE(oob)=0.68 r 2 (oob)=0.90 Bias(oob)=0.01 DS Palmer et al., J. Chem. Inf. Model., 47, (2007) These results are competitive with the best solubility prediction methods

What Have we Learned? For this particular problem, informatics does a bit better than pure theoretical chemistry.

How to Utilise Informatics Fast informatics models can be integrated into drug discovery to compute solubilities for molecules before deciding whether to synthesise them. Saving much time and money on making useless compounds.

Fits into drug discovery pipeline here

Why Pursue Theory? Theory promises to give a greater understanding of why some molecules are more soluble than others. Advances in theory can be transferable to other contexts. Theoretical models can be systematically improved.