Automatic Structure Determination --- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and.

Slides:



Advertisements
Similar presentations
Continuous improvement of macromolecular crystal structures Tom Terwilliger (Los Alamos National Laboratory) DDD WG member ECM 2012: Diffraction Data Deposition.
Advertisements

Evaluation of Reconstruction Techniques
Synchrotron Diffraction. Synchrotron Applications What? Diffraction data are collected on diffractometer beam lines at the world’s synchrotron sources.
Small Molecule Example – YLID Unit Cell Contents and Z Value
1 Vertically Integrated Seismic Analysis Stuart Russell Computer Science Division, UC Berkeley Nimar Arora, Erik Sudderth, Nick Hay.
Structural Genomics – an example of transdisciplinary research at Stanford Goal of structural and functional genomics is to determine and analyze all possible.
A Brief Description of the Crystallographic Experiment
OASIS-2004 Institute of Physics, CAS, Beijing, P.R. China A direct-method program for ab initio phasing and reciprocal-space fragment.
ACA Summer School: Wrapup Andrew J Howard Illinois Institute of Technology 30 July 2005.
ACA Summer School: Wrapup Andrew J Howard Illinois Institute of Technology 22 July 2006.
A U.S. Department of Energy laboratory managed by The University of Chicago X-Ray Damage to Biological Crystalline Samples Gerd Rosenbaum Structural Biology.
Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.
The Crystallographic Refinement of TM1389- A methyl-transferase from Thermotoga maritima Rosanne Joseph SLAC Summer Intern Joint Center for Structural.
In honor of Professor B.C. Wang receiving the 2008 Patterson Award In honor of Professor B.C. Wang receiving the 2008 Patterson Award Direct Methods and.
In Macromolecular Crystallography Use of anomalous signal in phasing
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
The goal of Data Reduction From a series of diffraction images (films), obtain a file containing the intensity ( I ) and standard deviation (  ( I ))
Phasing based on anomalous diffraction Zbigniew Dauter.
Computing and Chemistry 3-41 Athabasca Hall Sept. 16, 2013.
CHE (Structural Inorganic Chemistry) X-ray Diffraction & Crystallography lecture 2 Dr Rob Jackson LJ1.16,
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.
Parameter selection in prostate IMRT Renzhi Lu, Richard J. Radke 1, Andrew Jackson 2 Rensselaer Polytechnic Institute 1,Memorial Sloan-Kettering Cancer.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.
High-Throughput Crystallography at Monash Noel Faux Dept of Biochemistry and Molecular Biology Monash University.
OASIS-2006 Institute of Physics Chinese Academy of Sciences Beijing , P.R. China Institute of Physics Chinese Academy of Sciences Beijing ,
On Distinguishing the Multiple Radio Paths in RSS-based Ranging Dian Zhang, Yunhuai Liu, Xiaonan Guo, Min Gao and Lionel M. Ni College of Software, Shenzhen.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
H.F. Fan & Y.X. Gu Beijing National Laboratory for Condensed Matter Physics Institute of Physics, Chinese Academy of Sciences P.R. China H.F. Fan & Y.X.
Lars Ehm National Synchrotron Light Source
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
International Conference on Structural Genomics 2006 Hands-on Workshop III International Conference on Structural Genomics 2006 Hands-on Workshop III Automated.
Quality of System requirements 1 Performance The performance of a Web service and therefore Solution 2 involves the speed that a request can be processed.
Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-
1. Diffraction intensity 2. Patterson map Lecture
Peter J. LaPuma1 © 1998 BRUKER AXS, Inc. All Rights Reserved This is powder diffraction!
Zhang, T., He, Y., Wang, J.W., Wu, L.J., Zheng, C.D., Hao, Q., Gu, Y.X. and Fan, H.F. (2012) Institute of Physics, Chinese Academy of Sciences Beijing,
OASIS What is it? How it works? What next? What is it? How it works? What next?
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Methods in Chemistry III – Part 1 Modul M.Che.1101 WS 2010/11 – 8 Modern Methods of Inorganic Chemistry Mi 10:15-12:00, Hörsaal II George Sheldrick
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Electron crystallographic methods of solving modulated structures Electron microscopy X-ray crystallography Direct methods Pseudo translational symmetries.
Pattersons The “third space” of crystallography. The “phase problem”
Anomalous Differences Bijvoet differences (hkl) vs (-h-k-l) Dispersive Differences 1 (hkl) vs 2 (hkl) From merged (hkl)’s.
EMBL-EBI Representative sets and Clustering.. EMBL-EBI Representative sets A subset of data that provides a statistically valid sample set for the complete.
Bethesda, March 4 th 2009 Semi-automatic structure solution with HKL-3000 Structural Biology.
H.F. Fan 1, Y.X. Gu 1, F. Jiang 1,2 & B.D. Sha 3 1 Institute of Physics, CAS, Beijing, China 2 Tsinghua University, Beijing, China 3 University of Alabama.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Lecture 3 Patterson functions. Patterson functions The Patterson function is the auto-correlation function of the electron density ρ(x) of the structure.
Crystallography : How do you do? From Diffraction to structure…. Normally one would use a microscope to view very small objects. If we use a light microscope.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Automating the Single Crystal X-Ray Diffraction Experiment – Mark Light – School of Chemistry - University of Southampton – ECM22 – Budapest 2004 Automating.
Why are. we not solving more struct tures? James Holton University of California San Francisco and Advanced Light Source Lawrence.
Project Database Handler The Project Database Handler is a brokering application which will mediate interactions between the project database and other.
Page 1 Phase Determination by Creative BiostructureCreative Biostructure.
Mathematical Derivation of Probability
Score maps improve clarity of density maps
The Crystal Screening Interface at ALS
Organic Chemistry Lesson 21 X-ray crystallography.
OASIS-2004 A direct-method program for
Reduce the need for human intervention in protein model building
Experimental phasing in Crank2 Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands
Volume 54, Issue 4, Pages (May 2007)
A. The Solid State Classification of Solid Structures
Presentation transcript:

Automatic Structure Determination --- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and systematically search algorithm/program and parameter space n Zheng-Qing (Albert) Fu n SER-CAT, APS, Argonne National Laboratory n Biochem. & Mol. Biology, Univ. Of Georgia, Athens, Georgia 2007 ACA Summer School

What we learnt from Structural Genomics Cloned (7%) Crystals (33%) Structures Overall Success Rate 2.45% Cloned (7%) Crystals (33%) Structures Overall Success Rate (from Clone to Structure): 2.45% All Targets ClonedCrystalsStructures

Gene Crystallization Phasing Key to Success From gene to final structure, crystallographic analysis of protein structures is a complicated Multi-Step, Multi-Discipline, Costly, and Systematic Engineering Project. Data Collection, Data Procession and Structure Solving Process (Intensive Computing) Structure Protein Prep Bottle Neck Fu (2002): Diffraction Methods In Structural Biology, Gordon Research Conferences. New London, CT, USA. Refinement Map Tracing Data Collection Data Processing Tedious & Time Consuming

Why Automation? Reason #1 Automation may optimize the steps of the whole process, and thus improve the success rate and accuracy of the final structure.

Why Automation? Reason #2: The Structural Biology in the post-genomics era challenges the X-ray crystallography to provide better hardware, better software and better full services. >> Every Structural Biologist was also an Excellent Crystallographer >> Most of the new-generation Structural Biologists only know, if any at all, some basic concepts of Crystallography. They depends on other people’s recipes, and at most learn how to run a bunch of computer programs. Do they want to, or have ability to solve new problems related to Crystallography?

Why Automation? Blood Coagulation Inhibitor: A small protein containing 12 Cys. Source: venom of habu (rattlesnake). A good target for S phasing. Native Data were collected at both home source and SER-CAT synchrotron beam line. Synchrotron Source (1.74Ǻ)Home Cr Source (2.29Ǻ) Automation may help avoid such un-recoverable mistakes that may happen at any step of the complicated process. Reason #3: Even experienced crystallographer may make careless mistakes, too.

Automation of Part of the Whole Process from Data Collection to Structure-Solving Feasibility, Current Implementation Structure-Solving Process Data Acquisition & Processing

1). How to detect and avoid these problems before too late? During data collection, any problem with the diffraction system such as of: X-ray source Shutter Goniometer & Stage Detector Crystal Mounting Other mechanical, optical, electronic defects etc. can ruin the data quality, leading to failure of the whole process.

In addition to the unexpected problems, there are many other issues during data collection: 2). Is the diffraction quality is acceptable? 3). Is the data quality still improving? 4). Is the data collected enough to solve the structure? 5). Should continue collecting more frames or better mount another fresh crystal? mount another fresh crystal? All these questions can be answered if and only if we know how to monitor the Signal/Noise ratio during data collection.

A New Statistic Index, Ras, to More Objectively and Accurately Evaluate Signal/Noise Ratio  a  I  I  a  a =  I  I  a 1). Fu et al. (2004). Acta Cryst D60: Signal/Noise ratio 1) Ras  a  c  c  I  I  c  c =  I  I  c Here  a is the ratio of Bijvoet difference and the standard error in intensity, calculated using accentric reflections.  c is  a  c  c is statistically evaluated as  a, but using centric reflections. Theoretically, it should be zero.  c is the counter-part of  a, and thus can serve as the indicator of noise level. Ras, thus defined, can server as a signal/noise ratio in terms of anomalous scattering. The higher the better. Tests show that it is more objective and reliable than other indices currently used for measuring anomalous signal.

Signal-based Data Collection with Ras as a reliable indicator, diffraction data can be acquired more appropriately for a given crystal, by monitoring the Signal/Noise ratio through the data collection

Structure-Solving Process

After data processed, we have to face a set of different issues in the structure-solving process 1). There are numerous programs (or algorithms) to choose. A program may outperform others in some cases and vise versa. A program may outperform others in some cases and vise versa. Which programs to use? Which programs to use? 2). Each program has multiple parameters. Which parameters to adjust? Which parameters to adjust? What combination of the parameters can give the best result? What combination of the parameters can give the best result? 3). If phasing produced a traceable map, is it the best map for you to work on for fitting, refining to complete the structure? work on for fitting, refining to complete the structure?

For a given data set, combination of different programs or parameter settings can produce totally different results. Some may succeed to give a solution, but many others will fail. For a given data set, combination of different programs or parameter settings can produce totally different results. Some may succeed to give a solution, but many others will fail 1). Test result on solving the structure of a hydrolase protein (864AAs, 30Se). The 2.8Å data Turner. Test result on solving the structure of a hydrolase protein (864AAs, 30Se). The 2.8Å data was provided by Dr. Turner. Green dots are the percentages of residues automatically traced from maps generated by phasing with different programs (SHELXD, ISAS, SOLVE, RESOLVE) and parameter settings. Pink represents resolution cutoff for heavy atom sites searching. Solid squares indicate SHELXD, while open ones for SOLVE. Blue represents resolution cutoff for phasing and density modification. Solid diamond marker indicate SOLVE/RESOLVE, while open one as ISAS. The Current common Try & Error practice in solving a structureis time-consuming and tedious. It may not give the best solution, and may even fail to find any solution at all for data with marginal quality. The Current common Try & Error practice in solving a structure is time-consuming and tedious. It may not give the best solution, and may even fail to find any solution at all for data with marginal quality. 1). Fu, Rose, Wang (2005): Acta Cryst D61:

Parallel Workflow Engine to systematically search program and parameter spaces to systematically search program and parameter spaces to find the best solution for given data. Figure 1. The dark blocks represent parallel tasks dynamically generated from various crystallographic computing programs with different parameter settings. The tasks are distributed by workflow engine to the computing facility and run parallel. Upon completion, the workflow engine will harvest and analyze the results, and dynamically create and start another group of tasks for the next step. And so on, until the whole process finishes. Fu (2003). Proceeding of the 5th Int. Conference on Mol. Struct. Biology. Vienna, Austria, Sept Fu et al. (2005). Acta Cryst D61:

Algorithm and Design

Where are we?

Robert Sparks, Acknowledgment George Wu and many Ph.D. students including Dongsheng Che, Jizhen Zhao, Feng Sun, Haijin Yan, Dept. of Computer Sciences, UGA B.C. Wang, John Rose, SER-CAT, SECSG, UGA John Chrzas, Zhongmin Jin, Jim Fait, SER-CAT, APS Andy Howard, Illinois Institute of Technology Robert Sparks, Bruker (formerly Siemens) AXS Inc. Xuong Nguyen-Huu, UC San Diego George Sheldrick, University of Göttingen, Germany. Randy Read, Cambridge University, England Tom Terwilliger, Los Alamos National Lab Peter Briggs (CCP4, England) and Authors of all the programs plugged into SGXPro. Work is supported in part with funds from the National Institute of Health (GM62407) and SERCAT, APS