Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Robots J. A. Townsend, P. Murray-Rust,

Slides:



Advertisements
Similar presentations
The Normal Distribution
Advertisements

Control Charts for Variables
Open Access; Open Data I590 Spring Budapest Open Access Initiative Based on: –Self archiving by authors –Open Access journals, e.g., BioMed Central.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
Model Adequacy Checking in the ANOVA Text reference, Section 3-4, pg
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Sampling Distributions (§ )
Introduction to Statistical Quality Control, 4th Edition Chapter 7 Process and Measurement System Capability Analysis.
Modeling Process Quality
Capturing Chemistry in XML/CML J. A. Townsend *, S. E. Adams *, J. M. Goodman *, P. Murray-Rust *, C. A. Waudby * Capturing Chemistry in XML/CML ACS March.
Building Services for BCI with Taverna Jungkee (Jake) Kim Community Grids Laboratory.
T T Population Sampling Distribution Purpose Allows the analyst to determine the mean and standard deviation of a sampling distribution.
Evaluating Hypotheses
Analysis of Variance Chapter 3Design & Analysis of Experiments 7E 2009 Montgomery 1.
T T07-01 Sample Size Effect – Normal Distribution Purpose Allows the analyst to analyze the effect that sample size has on a sampling distribution.
EEM332 Design of Experiments En. Mohd Nazri Mahmud
Bootstrapping LING 572 Fei Xia 1/31/06.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
13-1 Designing Engineering Experiments Every experiment involves a sequence of activities: Conjecture – the original hypothesis that motivates the.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
Statistical Process Control
Statistical inference: confidence intervals and hypothesis testing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
Introduction to Statistical Quality Control, 4th Edition Chapter 7 Process and Measurement System Capability Analysis.
Process modelling and optimization aid FONTEIX Christian Professor of Chemical Engineering Polytechnical National Institute of Lorraine Chemical Engineering.
Computational Chemistry, WebMO, and Energy Calculations
Verification & Validation
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
WebMO: A Web-Based Interface for MOPAC Jordan R. Schmidt and William F. Polik Department of Chemistry, Hope College, Holland, MI
Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Sampling Distributions. What is a sampling distribution? Grab a sample of size N Compute a statistic (mean, variance, etc.) Record it Do it again (until.
EGEE is a project funded by the European Union under contract IST Advances in the Grid enabled molecular simulator (GEMS) EGEE 06 Conference.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
1)Construct a box and whisker plot for the data below that represents the goals in a soccer game. (USE APPROPRIATE SCALE) 7, 0, 2, 5, 4, 9, 5, 0 2)Calculate.
Determination of Sample Size: A Review of Statistical Theory
Section 8-5 Testing a Claim about a Mean: σ Not Known.
ANOVA: Analysis of Variance.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
Finite Element Analysis
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Sep 13, 2006 Scientific Computing 1 Managing Scientific Computing Projects Erik Deumens QTP and HPC Center.
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
Statistics for Political Science Levin and Fox Chapter Seven
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
AP Statistics.  If our data comes from a simple random sample (SRS) and the sample size is sufficiently large, then we know that the sampling distribution.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
ESTIMATION OF THE MEAN. 2 INTRO :: ESTIMATION Definition The assignment of plausible value(s) to a population parameter based on a value of a sample statistic.
Model adequacy checking in the ANOVA Checking assumptions is important –Normality –Constant variance –Independence –Have we fit the right model? Later.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
The Statistical Imagination Chapter 7. Using Probability Theory to Produce Sampling Distributions.
Sub-fields of computer science. Sub-fields of computer science.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
12 Inferential Analysis.
Statistics in Applied Science and Technology
Experimental Power Graphing Program
Statistical Methods For Engineers
CSCI B609: “Foundations of Data Science”
12 Inferential Analysis.
Sampling Distributions (§ )
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
ANOVA: Analysis of Variance
Objectives 6.1 Estimating with confidence Statistical confidence
Introduction to the t Test
Objectives 6.1 Estimating with confidence Statistical confidence
Testing Claims about a Population Standard Deviation
Presentation transcript:

Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Robots J. A. Townsend, P. Murray-Rust, S. M. Tyrrell, Y. Zhang

Computational Chemistry Robots ACS Sep 2005 Can high-throughput computation provide a reliable “experimental” resource for molecular properties? Can protocols be automated? Can we believe the results?

Computational Chemistry Robots ACS Sep 2005 Aspects of complete automation Humans must validate protocols rather than individual data Low rates of error must be addressed Users should know the rates of error and degree of conformance

Computational Chemistry Robots ACS Sep 2005 Approaches to conformance Explore limits of job behaviour (times, convergence, etc.) Analyse reproducibility Vary and analyse effects of parameters and algorithms Compare output with other “measurements” of same quantity

Computational Chemistry Robots ACS Sep 2005 The overall view moleculescomputationdissemination

Computational Chemistry Robots ACS Sep 2005 The overall view moleculescomputationdissemination Check results

Computational Chemistry Robots ACS Sep 2005 Components of System Workflow for management of jobs (Taverna) Natural Language Processing based parsing of outputs (JUMBOMarker) Pairwise comparison of data sets (R) Analysis of mean and variance Detection and analysis of outliers

Computational Chemistry Robots ACS Sep 2005 Computing the NCI database MOPAC PM5 a a MOPAC PM5 – collaboration with J.J.P. Stewart

Computational Chemistry Robots ACS Sep 2005 Protocol Log Files Parse System Crashes Science Errors Analysis Pathological Behaviour Statistics Other Science Disseminate Results Unsuitable Data Program Crashes Inform Developer

Computational Chemistry Robots ACS Sep 2005 Taverna Workflow programs allow a series of small tasks to be linked together to develop more complex tasks Open Source myGRID, eScience European Bioinformatics Institute University of Manchester

Computational Chemistry Robots ACS Sep 2005 An Example Taverna Workflow

Computational Chemistry Robots ACS Sep 2005 Parsing Log Files to CML Coordinates Molecular Formula Calculation Type Point Group Dipole Total Energy Computational Chemistry Log Files

Computational Chemistry Robots ACS Sep 2005 CompChem Output Coordinates Energy Levels Vibrations Coordinates Energy Level Vibration CML File CMLCore CMLComp CMLSpect Input/jobControlGeneral Parsers

Computational Chemistry Robots ACS Sep 2005 Dissemination of results LOG FILECML FILEHUMAN DISPLAY WWMM* Server and DSpace Outside world JUMBOMarker NLP-based log file parser * World Wide Molecular Matrix

Computational Chemistry Robots ACS Sep 2005 InChI: IUPAC International Chemical Identifier A non-proprietary unique identifier for the representation of chemical structures. A normal, canonicalised and serialised form of a chemical connection table. InChI FAQ:

Computational Chemistry Robots ACS Sep 2005 Proteus molecules * Calculation JUNK Cured by MOPAC * Proteus was a shape changing ocean deity

Computational Chemistry Robots ACS Sep 2005 Proteus molecules Calculation Input JUNK

Computational Chemistry Robots ACS Sep 2005 How do we know our results are valid? Computational Method 1 Computational Method 2 Experiment

Computational Chemistry Robots ACS Sep 2005 J.J.P. Stewart’s example Calculated  H f – Expt  H f

Computational Chemistry Robots ACS Sep 2005 GAMESS MOPAC results GAMESS a 631G* B3LYP Log Files a Project with Kim Baldridge and Wibke Sudholt

Computational Chemistry Robots ACS Sep 2005 Protocol Log Files Parse System Crashes Science Errors Analysis Pathological Behaviour Statistics Other Science Disseminate Results Unsuitable Data Program Crashes Inform Developer

Computational Chemistry Robots ACS Sep 2005 Repeat runs, different methods Multiple runs give same final structure from same input Changing memory allocation doesn’t make a difference

Computational Chemistry Robots ACS Sep 2005 Pathological behaviour - Early detection 100 min631G*, B3LYP200 min 15 min 631G*, B3LYP min divinyl ether trans-Crotonaldehyde Z matrix

Computational Chemistry Robots ACS Sep 2005 Times to run jobs

Computational Chemistry Robots ACS Sep 2005 Analysis of different computational methods Mean - Overall difference Normality - Distribution of values Outliers - Unusual molecules? Variance - Spread of the data, depends on both distributions. (standard deviation)

Computational Chemistry Robots ACS Sep 2005 Probability Plot (Normal QQ plot)

Computational Chemistry Robots ACS Sep 2005 Mean of distribution (Approx Å) Range over which sample distribution is approximately normal Outliers Probability Plot (Normal QQ plot) S.D Å

Computational Chemistry Robots ACS Sep 2005 All bonds*  r (MOPAC – GAMESS) / Å * Excludes bonds to Hydrogenc

Computational Chemistry Robots ACS Sep 2005 All bonds*  r (MOPAC – GAMESS) / Å Good agreement Nearly normal Outliers S.D Å * Excludes bonds to Hydrogenc

Computational Chemistry Robots ACS Sep Bad molecules and data usually cause outliers Na P O O H H

Computational Chemistry Robots ACS Sep 2005 Mean  r (M - G) / Å Standard Error of the Mean / Å CNOFSCl C N O All values given to 3 significant figures

Computational Chemistry Robots ACS Sep 2005  r CC bonds (M - G) / Å

Computational Chemistry Robots ACS Sep 2005  r CC bonds (M - G) / Å Good agreement Nearly normalOutliers S.D Å JUNK

Computational Chemistry Robots ACS Sep 2005 Selection of molecules with C C  r (M - G) > 0.05 Angstroms

Computational Chemistry Robots ACS Sep 2005 Y = X – Non aromatic C C bonds adjacent to CF n

Computational Chemistry Robots ACS Sep 2005  r NN bonds (M - G) / Å

Computational Chemistry Robots ACS Sep 2005 Good agreement Nearly normal Kink S.D Å  r NN bonds (M - G) / Å

Computational Chemistry Robots ACS Sep 2005 Density plot of  r NN bonds (M - G) / Å

Computational Chemistry Robots ACS Sep 2005 LEFT RIGHT Density plot of  r NN bonds (M - G) / Å

Computational Chemistry Robots ACS Sep 2005 Most common fragments found in Left set but not Right set C(sp 3 ) (sp 3 ) S(sp 2 ) N(ar) C(sp 2 ) S(sp 2 ) N(ar) C(sp 2 ) Or

Computational Chemistry Robots ACS Sep 2005 GAMESS Log Files Comparison of theory and experiment CIF* CIF 2 CML * CIF: Crystallographic Information File

Computational Chemistry Robots ACS Sep 2005 Reading Acta Crystallographica Section E

Computational Chemistry Robots ACS Sep 2005 All bonds*  r (Cryst. – GAMESS) /Å Single molecules, no disorder * Excludes bonds to Hydrogenc

Computational Chemistry Robots ACS Sep 2005 All bonds*  r (Cryst. – GAMESS) /Å Single molecules, no disorder Mean  r Å Nearly normal Outliers S.D Å * Excludes bonds to Hydrogenc

Computational Chemistry Robots ACS Sep 2005  r CC bonds (C – G) /Å

Computational Chemistry Robots ACS Sep 2005 Mean  r Å Nearly normal S.D Å  r CC bonds (C – G) /Å

Computational Chemistry Robots ACS Sep 2005  r CO bonds (C – G) /Å

Computational Chemistry Robots ACS Sep 2005 Good agreement Nearly normal Outliers ? S.D Å  r CO bonds (C – G) /Å

Computational Chemistry Robots ACS Sep 2005  r = Å Chemistry can cause outliers H movement

Computational Chemistry Robots ACS Sep 2005 Conclusions Protocols can be automated Machines can highlight unusual behaviour, geometries and distribution of results for humans to consider Computational programs can provide high quality “experimental” molecular properties

Computational Chemistry Robots ACS Sep 2005 Thanks J.J.P. Stewart Kim Baldridge Wibke Sudholt Simon Tyrrell Yong Zhang Peter Murray-Rust Unilever

Computational Chemistry Robots ACS Sep 2005 Questions Homepage: InChI FAQ: R: Taverna: MOPAC 2002: GAMESS: