Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Slides:



Advertisements
Similar presentations
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Advertisements

The Maximum Likelihood Method
Crystallography, Birkbeck MOLECULAR SIMULATIONS ALL YOU (N)EVER WANTED TO KNOW Julia M. Goodfellow Dynamic Processes: Lecture 1 Lecture Notes.
You have data! What’s next? Data Analysis, Your Research Questions, and Proposal Writing Zoo 511 Spring 2014.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
September 2000Department of Statistics Kansas State University 1 Statistics and Design of Experiments: Role in Research George A. Milliken, PhD Department.
Introduction to protein x-ray crystallography. Electromagnetic waves E- electromagnetic field strength A- amplitude  - angular velocity - frequency.
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Accuracy & Precision Date: ________ (you must have a calculator for today’s lesson)
Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.
Lab Meeting 06/05/20051 NMRQ: Quality Assessment and Validation for Protein Structures Generated by NMR Spectroscopy Gary Van Domselaar
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
3J Scalar Couplings 3 J HN-H  The 3 J coupling constants are related to the dihedral angles by the Karplus equation, which is an empirical relationship.
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of scientific research When you know the system: Estimation.
EEM332 Design of Experiments En. Mohd Nazri Mahmud
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.
Introduction to Regression Analysis, Chapter 13,
Introduction to the design (and analysis) of experiments James M. Curran Department of Statistics, University of Auckland
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Chemometrics Method comparison
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
Dr. Engr. Sami ur Rahman Assistant Professor Department of Computer Science University of Malakand Research Methods in Computer Science Lecture: Research.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Understanding Statistics
Chem. 31 – 9/23 Lecture Guest Lecture Dr. Roy Dixon.
The Research Enterprise in Psychology. The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Chapter 1 Measurement, Statistics, and Research. What is Measurement? Measurement is the process of comparing a value to a standard Measurement is the.
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Regression Analysis A statistical procedure used to find relations among a set of variables.
Lesson 8 Diffraction by an atom Atomic Displacement Parameters.
PS 225 Lecture 20 Linear Regression Equation and Prediction.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Precision and Accuracy Agreement Indices in HSP An Introduction to Rietveld Refinement using PANalytical X’Pert HighScore Plus v2.2d Scott A Speakman,
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
Chapter 10 The t Test for Two Independent Samples
Data Mining and Decision Support
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
How NMR is Used for the Study of Biomacromolecules Analytical biochemistry Comparative analysis Interactions between biomolecules Structure determination.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
ChE 551 Lecture 04 Statistical Tests Of Rate Equations 1.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Chapter 9 Introduction to the t Statistic
Stats Methods at IC Lecture 3: Regression.
Chapter 4 Basic Estimation Techniques
MATH-138 Elementary Statistics
Computational Structure Prediction
Deep Feedforward Networks
Basic Estimation Techniques
Douglas Kojetin, Ph.D. UC College of Medicine
Basic Estimation Techniques
1.b What are current best practices for selecting an initial target ligand atomic model(s) for structure refinement from X-ray diffraction data?
Regression Models - Introduction
Goals for Today Introduce automated refinement and validation.
Goals for Today Introduce automated refinement and validation.
Axel T Brünger, Paul D Adams, Luke M Rice  Structure 
Volume 15, Issue 9, Pages (September 2007)
Introduction to the design (and analysis) of experiments
Presentation transcript:

Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Between objectivity and subjectivity Carl-Ivar Bränd´en & T. Alwyn Jones Department of Molecular Biology, Uppsala Biomedical Center, PO Box 590, S Uppsala, Sweden. Protein crystallography is an exacting trade, and the results may contain errors that are difficult to identify. It is the crystallographer's responsibility to make sure that incorrect protein structures do not reach the literature. Nature 343, (22 February 1990 )

Amplitudes and Phases - Bias. Animal stories - by Kevin Cowtan

Amplitudes and Phases - Bias. More animal stories.

Stolen from Bernhard Rupp website without permission

How much of what we think? Stolen from --- James Holton, Berkeley, without permission.

VALIDATION Based on Geometry WHATIF PROCHECK MOLPROBITY RAMACHANDRAN PLOT. STRUCTURE VALIDATION Validation based on fit to DATA R-factor/R-free Real space fit, Etc. Problem: Data to parameter ratio. ADD Geometric Restraints - or Chemical Knowledge COMPOSITE VALIDATION: ASTRAL - SPACI

WHY MORE? DON’T WE HAVE ENOUGH VALIDATION TOOLS? WHAT IS COMMON BETWEEN ALL EXISTING VALIDATION TECHNIQUES? THERE IS AN ABSOLUTE CORRECT ANSWER WE KNOW THERE IS NO CORRECT ANSWER

THINK DIFFERENTLY All crystallographers want to deposit the correct structure. There is subjectivity and bias - all of which are random AVERAGE IS BEST !!

QUALITY & AVERAGE How different are you from the average is a measure of quality HOW DO YOU DESCRIBE THE AVERAGE?

Quality of Model Independent Variables Date submitted to PDB Maximum resolution X-Ray Source Number of atoms Similarity Index Cross Terms Dependent Variables R-factor R-free Real-space R-value Real-space CC Outliers Ramachandran Violations

Predictive Models Example: How To determine weight for 5’7” male make up an equation choose a group of males fit the equation to their weight evaluate equation.

Open problems What independent variables? Quality = f(resolution) Quality = f(resolution, date, x-ray source)‏ What equation? Quality = a x resolution + b x date + c Quality = a x res + log b2 (date) + c How to fit it to observations? - Least squares vs. Maximum likelihood - Outliers

Choose model based on LL Start with Metric = a x resolution + C Add or remove terms iteratively to decrease LL Use BIC to decide if a new parameter contributes to significant decrease in LL or not RESULT: An equation that predicts a given metric… Data is all structures in the PDB that have all independent and dependent variables (16,609)‏ PICK ALL AVAILABLE METRICS (R-factor/R-free etc.. ) and FOR EACH METRIC

EQUATIONS FOR METRICS!

INFORMATION INHERENT IN THE MODEL Model can tell us immediately What independent variables affect what metrics (dependent variables) and by how much? Example: R-factor Vs time R-factor Vs source & resolution

UNEXPLORED QUESTIONS IN THE MODEL? Unexplored Independent Variables : R-sym and Redundancy Space group and volume of unit cell? Refinement protocol Solvent modeling and B-factor modeling. Temperature of data collection. Complexity - as a function of number of chains of macromolecules.

Nine - metrics to ONE Principal component analysis We took the nine metrics and combined them to form one metric accounting for co-relations and redundancy. Now we have one metric which is what we can call Quality-values. CONSTRUCTION of the Q-value of the average is zero. Negative numbers mean better than average - positive numbers worse than the average. Standard deviation is one.

USE OF THE MODEL COMPARE STRUCTURES WITH THE AVERAGE - INDIVIDUALLY AND AS A GROUP. Q- value is now independent of all the independent variables used to make the model. (Resolution, number of atoms, date of data collection, novelty of structure etc..) Better indicator of quality than any one of the dependent variables.

STRUCTURAL GENOMICS (updated - Jan 2008)

MCSG over Time!

MORE-SG groups!

Quality Vs. Journals

WHAT CAN WE DO? Beam lines. Best practices. Protocols and methodologies. Countries. Institutions. Funding mechanisms. Investigators.

Is this the best we can do?

WE CAN DO BETTER We improve quality of structures by better design of experiments and refinement protocols if we know what independent variables affect what dependent variables and how? BEFORE WE DO THIS - FIX PROBLEMS THAT WE FOUND. Too much dependence of external databases! Problems with unknown atoms. Develop methods for missing data correction.

OTHER DATABASES - NMR Some thoughts on independent variables. Spectrometers Samples - size, tags, buffers etc.. Completeness of Assignments - percentage of backbone assigned etc.. Actual Data Used in Structural Calculations - NOE distance restraints, Hydrogen bond distance restraints (experimental vs. inferred), Torsion angle restraints, Dipolar coupling restraint, Paramagnetic restraint. Structural Statistics Date of structure determination. Relaxation measurements?

OTHER DATABASES - NMR DEPENDENT VARIABLES. RMS deviation of Ensemble Packing (Molprobity score?) Ramachandran violations Recall, Precision, F-measure ( Huang, Powers and Montelione ). Agreement with high resolution X-ray structures Other??

AFTER Today's LECTURES HOW ABOUT THE MODEL DATABASE? I am sure out modeling experts can think of the dependent and independent variables….

THANK YOU ACKNOWLEDGEMENT X-ray work - Eric N Brown and Lokesh Gakhar The R-statistical package! NMR work - Liping Yu and Andrew Fowler Thanks to Brian Fox for inviting me - though I am not a member of any SG initiative.

Questions and Accusations.