Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Slides:



Advertisements
Similar presentations
CCP4 Molecular Graphics (CCP4MG)
Advertisements

Hydrological information systems Svein Taksdal Head of section, Section for Hydroinformatics Hydrology department Norwegian Water Resources and Energy.
Introduction to protein x-ray crystallography. Electromagnetic waves E- electromagnetic field strength A- amplitude  - angular velocity - frequency.
Synchrotron Diffraction. Synchrotron Applications What? Diffraction data are collected on diffractometer beam lines at the world’s synchrotron sources.
Alternate Software Development Methodologies
Recent developments 1) Tests (outlier analysis) and Bug fixing ( with Paul) 2) Regeneration of Values of Bonds and Bond-angles existing all structures.
Timothy G. Fawcett, Soorya N. Kabbekodu, Fangling Needham and Cyrus E. Crowder International Centre for Diffraction Data, Newtown Square, PA, USA Experimental.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Scientific Programming MAIN INPUTINITCOMPUTEOUTPUT SOLVER DERIV FUNC2 TABUL FUNC1 STATIC BLASLAPACKMEMLIB.
Refinement of Macromolecular structures using REFMAC5 Garib N Murshudov York Structural Laboratory Chemistry Department University of York.
16/13/2015 3:30 AM6/13/2015 3:30 AM6/13/2015 3:30 AMIntroduction to Software Development What is a computer? A computer system contains: Central Processing.
The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.
Macromolecular structure refinement Garib N Murshudov York Structural Biology Laboratory Chemistry Department University of York.
Current Status and Future Directions for TEXTAL March 2, 2003 The TEXTAL Group at Texas A&M: Thomas R. Ioerger James C. Sacchettini Tod Romo Kreshna Gopal.
TEXTAL - Automated Crystallographic Protein Structure Determination Using Pattern Recognition Principal Investigators: Thomas Ioerger (Dept. Computer Science)
Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.
Automated Model-Building with TEXTAL Thomas R. Ioerger Department of Computer Science Texas A&M University.
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
TEXTAL Progress Basic modeling of side-chain and backbone coordinates seems to be working well. –even for experimental MAD maps, 2.5-3A –using pattern-recognition.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.
Refinement with REFMAC
The P HENIX project Crystallographic software for automated structure determination Computational Crystallography Initiative (LBNL) -Paul Adams, Ralf Grosse-Kunstleve,
Not retired: Hall symbols + CIF or How Syd influenced my life without me noticing it. Ralf W. Grosse-Kunstleve Computational Crystallography Initiative.
Using the WS-PGRADE Portal in the ProSim Project Protein Molecule Simulation on the Grid Tamas Kiss, Gabor Testyanszky, Noam.
Cab55342 Autobuild model Density-modified map Autobuilding starting with morphed model.
Kevin Cowtan, DevMeet CCP4 Wiki ccp4wiki.org Maintainer: YOU.
Model-Building with Coot An Introduction Bernhard Lohkamp Karolinska Institute June 2009 Chicago (Paul Emsley) (University of Oxford)
First Aid & Pathology Data quality assessment in PHENIX
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
Data quality and model parameterisation Martyn Winn CCP4, Daresbury Laboratory, U.K. Prague, April 2009.
Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.
Parameter Sweep Workflows for Modelling Carbohydrate Recognition ProSim Project Tamas Kiss, Gabor Terstyanszky, Noam Weingarten.
Metadata Extraction for NASA Collection June 21, 2007 Kurt Maly, Steve Zeil, Mohammad Zubair {maly, zeil,
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
The european ITM Task Force data structure F. Imbeaux.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Bulk Model Construction and Molecular Replacement in CCP4 Automation Ronan Keegan, Norman Stein, Martyn Winn.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
R. Keegan 1, J. Bibby 3, C. Ballard 1, E. Krissinel 1, D. Waterman 1, A. Lebedev 1, M. Winn 2, D. Rigden 3 1 Research Complex at Harwell, STFC Rutherford.
POINTLESS & SCALA Phil Evans. POINTLESS What does it do? 1. Determination of Laue group & space group from unmerged data i. Finds highest symmetry lattice.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Siena Computational Crystallography School 2005
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart.
1 COMPUTER SCIENCE DEPARTMENT COLORADO STATE UNIVERSITY 1/9/2008 SAXS Software.
3. Spot Finding 7(i). 2D Integration 2. Image Handling 7(ii). 3D Integration 4. Indexing 8. Results 1. Introduction5. Refinement Background mask and plane.
Computational Crystallography InitiativePhysical Biosciences Division First Aid & Pathology Data quality assessment in PHENIX Peter Zwart.
Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák.
Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. Testing Spring Applications Unit Testing.
Atomic structure model
Software automation – What STAB sees as key aims? 1.Brief review of activities and recommendations (so far) 2.Reality checks 3. Things to do…
17 th October 2005CCP4 Database Meeting (York) CCP4i Database Overview Peter Briggs.
CCP4 Version The most recent version of the CCP4 suite is 4.1, which was released at the end of January 2001, with a minor patch release shortly.
Bethesda, March 4 th 2009 Semi-automatic structure solution with HKL-3000 Structural Biology.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Peter J. Briggs, Alun Ashton, Charles Ballard, Martyn Winn and Pryank Patel CCLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK The CCP4 project.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
Project Database Handler The Project Database Handler is a brokering application which will mediate interactions between the project database and other.
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Database Requirements for CCP4 17th October 2005
Complete automation in CCP4 What do we need and how to achieve it?
CCP4 from a user perspective
Progress Report in REFMAC
Algorithms and Problem Solving
Presentation transcript:

Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop on developments and directions of powder diffraction on proteins, June 22/23, 2007

My two lives Live 1 (PhD project): –Zeolite structure determination from powder data using extracted intensities Live 2: –Contributions to Xplor/CNS Single-crystal protein crystallography About 80% of all PDB entries refined with Xplor/CNS –Phenix project Fresh start after losing a legal battle

Funding: NIH Program Project (NIGMS, PSI), Director - Paul Adams CCI APPS SOLVE / RESOLVE PHASER TEXTAL MolProbity / REDUCE Computational Crystallography Initiative (LBNL) -Paul Adams, Ralf Grosse-Kunstleve, Pavel Afonine -Nigel Moriarty, Nicholas Sauter, Peter Zwart Los Alamos National Lab (LANL) -Tom Terwilliger, Li-Wei Hung Cambridge University -Randy Read, Airlie McCoy Texas A&M University -Tom Ioerger, Jim Sacchettini, Erik McKee Duke University - Jane Richardson, David Richardson, Ian Davis Phenix Collaboration

Spectrum of phenix components Automated analysis of data quality: phenix.xtriage Rapid substructure determination: phenix.hyss Phasing: Maximum likelihood – SOLVE, PHASER for SAD Density modification: Statistical density modification (RESOLVE) Automated model building: –Pattern matching methods (RESOLVE or TEXTAL) Structure refinement: phenix.refine (likelihood, annealing, TLS) Advanced automation: AutoSol – hkl to map Ligand building and fitting: eLBOW, AutoLigand Validation and Hydrogens: MolProbity + Reduce

phenix.refine - Group ADP refinement - Rigid body refinement - Restrained refinement (xyz, iso/aniso ADP) - Automatic water picking - Bond density - Unrestrained refinement - FFT or direct summation - Hydrogens - Automatic NCS restraints - Simulated Annealing - Occupancies (individual, group) - TLS refinement - Twinned data - X-ray, Neutron, joint X-ray + Neutron refinement

Refinement flowchart Input data and model processing Refinement strategy selection Bulk-solvent, Anisotropic scaling, Twinning parameters refinement Ordered solvent (add / remove) Target weights calculation Coordinate refinement (rigid body, individual) (minimization or Simulated Annealing) ADP refinement (TLS, group, individual iso / aniso) Occupancy refinement (individual, group) Output: Refined model, various maps, structure factors, complete statistics PDB model, Any data format (CNS, Shelx, MTZ, …) Files for COOT, O, PyMol Repeated several times

Designed to be very easy to use Refinement of individual coordinates and B-factors: % phenix.refine model.pdb data.hkl Same as above plus water picking: % phenix.refine model.pdb data.hkl ordered_solvent=true Run with parameter file: % phenix.refine model.pdb data.hkl parameter_file refinement.main { high_resolution = 2.0 simulated_annealing = True ordered_solvent = True number_of_macro_cycles = 5 } refinement.refine.adp { tls = chain A tls = chain B }

How to best make ends meet? GSAS & proteins –Extending a small-molecule powder program to deal with proteins –Advantage: program designed for the field Community used to inputs, outputs, idiosyncrasies –Disadvantage: some approaches suitable for small molecules don’t scale Direct-summation structure factor calculation Neighborhood calculations (nonbonded interactions, a.k.a. anti-bumping restraints) phenix.refine –Extending a single-crystal protein program to deal with powders –Advantage: program designed to deal with large structures Protein, RNA/DNA restraint libraries, optimized algorithms –Disadvantage: new data formats, differences in terminology

Two main challenges Challenge 1: –Input/output of powder-specific format Fundamentally trivial but potentially tedious New command? –No interference with existing, non-trivial algorithms for automatic recognition, processing, and consolidation of already very heterogeneous inputs Extend the existing input algorithms? –Nicer, but requires higher degree of collaboration Challenge 2: –Development of a powder-specific target function Based on extracted intensities or primary pattern + pre-fitted profile parameters? Maximum likelihood with or without cross-validation? Will probably require some refactoring of the refinement engine

Modular design Application level –phenix wizards (data in, structure out) –phenix.refine –phenix.hyss (hybrid substructure search) –Visible source Library level –cctbx project, organized in modules libtbx, scitbx, cctbx, iotbx, mmtbx –cctbx is intended to cover small-molecule work But nothing yet specific to powders –Unrestricted open source

Existing target functions Least-squares (variety) Maximum likelihood on amplitudes Maximum likelihood with experimental phases Least-squares twin target SAD-specific maximum likelihood target implemented in Phaser –Reusing target from external application! Dirty laundry –Severe code duplication in implementation of twin target Needs to be consolidated –Some friction integrating the Phaser ML-SAD target Phaser target relatively slow: we need better bookkeeping to avoid repeated calculations with exactly the same input

Precedence for reusing cctbx? cctbx used heavily by all phenix collaborators Phaser uses cctbx -> cctbx supported by CCP4 6.0 and up smtbx: small-molecule toolbox –Group at Durham University, U.K. collaborating with David Watkin at Oxford University, U.K. –Long-term goal: highly integrated single-crystal structure determination (direct methods), automatic model building and refinement –Initial focus: iterative model building and refinement –Initial approach: reuse + adjust cctbx core libraries directly combined with copying sub-modules to smtbx where they are modified –Long term: consolidate duplications as much as possible half the code = half the bugs, reuse of optimizations

Summary of ideas Implement powder-specific target function(s) that plug into the refinement engine in the open source cctbx libraries –Can be done stand-alone using ad-hoc input/output methods –Collaborate in making the necessary adjustments to the existing libraries Figure out the best way to handle input/output at the application level –Learn and re-evaluate as we go If the powder field joins in there will be the potential for direct cross-fertilization between three specializations in crystallography –Single-crystal protein –Single-crystal small-molecule –Powder diffraction protein –More? (powder diffraction small-molecule) cctbx libraries are very general Ever increasing integration is the secret behind the stunning successes in the development of computing technology –Can we make this idea work in crystallography?

Availability Phenix incl. Graphical User Interface – –Freely available to academic (non-profit) groups Core libraries (cctbx) – –Freely available to all

Acknowledgments Phenix developers –P.D. Adams –P. Afonine –T.R. Ioerger –A.J. McCoy –E.W. McKee –N.W. Moriarty –R.J. Read –N.K. Sauter –J.N. Smith –L.C. Storoni –T.C. Terwilliger –P.H. Zwart Funding: –LBNL (DE-AC03-76SF00098) –NIH/NIGMS (1P01GM063210) –P HENIX Industrial Consortium