From crystals to pdb: building a high throughput crystallography pipeline for structural genomics Chiu HJ 1, Wolf G 1, West W 2, van den Bedem H 1, Miller.

Slides:



Advertisements
Similar presentations
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Advertisements

Update on PDB Data Deposition Specifications
Shotgun crystallization of the Thermotoga maritima proteome Protein properties and crystallization conditions that correlate with crystallization success.
Two cases of chemometrics application in protein crystallography European Molecular Biology Laboratory (EMBL), Hamburg, Germany Andrey Bogomolov.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Structural Genomics – an example of transdisciplinary research at Stanford Goal of structural and functional genomics is to determine and analyze all possible.
Thomas Blicher Center for Biological Sequence Analysis
Expression and purification of membrane proteins: Initial screening of Thermotoga maritima α-helical membrane proteins for NMR structural studies This.
Automated Model-Building with TEXTAL Thomas R. Ioerger Department of Computer Science Texas A&M University.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
The Crystallographic Refinement of TM1389- A methyl-transferase from Thermotoga maritima Rosanne Joseph SLAC Summer Intern Joint Center for Structural.
Computing Protein Structures from Electron Density Maps: The Missing Fragment Problem Itay Lotan † Henry van den Bedem* Ashley M. Deacon* Jean-Claude Latombe.
23 May June May 2002 From genes to drugs via crystallography 19 May 1996 Experimental and computational approaches to structure based.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Rahul Raman, Ram Sasisekharan Bioinformatics Core Massachusetts Institute of Technology Glue Grants Bioinformatics Meeting April 22-23, 2004 San Diego,
SALVAGE METHODS APPLIED TO FAILED PFAM FAMILIES Anna Grzechnik 1, Dennis Carlton 1, Heath Klock 2 Mark W. Knuth 2 and Scott A. Lesley 1,2* 1 The Joint.
A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:
High-Throughput Crystallography at Monash Noel Faux Dept of Biochemistry and Molecular Biology Monash University.
Scientific computing in x-ray microscopy F. Meirer 1, Y. Liu 2, J.C. Andrews 2, A. Mehta 2, P. Pianetta 2 1 MiNALab, CMM-irst, Fondazione Bruno Kessler,
Data and Dissemination Core 1. Overview and EFI Website – Heidi Imker, UIUC 2. EFI LabDB LIMS – Wladek Minor, UVA 3. SFLD – Patsy Babbitt, UCSF (post lunch)
TSRI Administrative Core Ian Wilson Peter Kuhn Marc Elsliger Frank von Delft Tina Montgomery Gye Won Han Rong Chen Angela Walker UCSD Bioinformatics Core.
Ligand search and data mining of Structural Genomics structures Abhinav Kumar, Herbert Axelrod, Ashley Deacon Structure Determination Core, Joint Center.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
SimBioSys Inc.© Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package.
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
SSRL Crystal Mounting System. Sample Storage Cassette Stores 96 samples mounted on standard Hampton pins NdFeB ring magnet Teflon washer Hampton pin.
Building a user-friendly beamline Aina Cohen and Paul Ellis.
Data Integration and Management A PDB Perspective.
E-HTPX: A User Perspective Robert Esnouf, University of Oxford.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Acknowledgements Experiences with automated screening at the JCSG C.B.Trame 1,2, H-J.Chiu 1,2, S.Oommachen 1,2, M.Miller 1,2, A.Cohen 2, I.I.Mathews 2,
TOPSAN – A community-driven resource for enhanced impact of structural genomics data. Protein Structure Initiative "Bottlenecks" Workshop, NIH Campus,
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
Acknowledgements Comparative analysis of novel proteins from the CATH family of zinc peptidases Debanu Das 1,2, Abhinav Kumar 1,2, Lukasz Jaroszewski 1,3.
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
Conformational Space of a Flexible Protein Loop Jean-Claude Latombe Computer Science Department Stanford University (Joint work with Ankur Dhanik 1, Guanfeng.
Joint Center for Molecular Modeling Addressing Protein Crystallization Bottlenecks by Screening Multiple Homologs Lukasz Jaroszewski, Lukasz Slabinski,
Forward and inverse kinematics in RNA backbone conformations By Xueyi Wang and Jack Snoeyink Department of Computer Science UNC-Chapel Hill.
Goals Structural Biology Collaboratory Allow a team of researchers distributed anywhere in the world to perform a complete crystallographic experiment.
BUILDING BUILDING A USER-FRIENDLY BEAMLINE Aina Cohen and Paul Ellis.
Bethesda, March 4 th 2009 Semi-automatic structure solution with HKL-3000 Structural Biology.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
High throughput biology data management and data intensive computing drivers George Michaels.
AUTOMATION OF MACROMOLECULAR DATA COLLECTION - INTEGRATION OF DATA COLLECTION AND DATA PROCESSING Harold R. Powell 1, Graeme Winter 1, Andrew G.W. Leslie.
Why are. we not solving more struct tures? James Holton University of California San Francisco and Advanced Light Source Lawrence.
Computational Aspects of the Protein Target Selection, Protein Production Management and Structure Analysis Pipeline.
Experiences with automated crystal screening at the JCSG
The Crystal Screening Interface at ALS
BLU-ICE & The Distributed Control System Past, Present, and Future
Database Requirements for CCP4 17th October 2005
Scientific computing in x-ray microscopy
Crystal Screening and Data Collection Activities at SDC
Ligand Search and Data Mining of Structural Genomics Structures
Crystal Screening and Data Collection Activities at SDC
JCSG Bioinformatics core overview: 2006
Toward the automation of biological structure determination
Current CRIMS – ISPyB communication status
Itay Lotan† Henry van den Bedem* Ashley M. Deacon*
Itay Lotan† Henry van den Bedem* Ashley M. Deacon*
Towards high-throughput structure determination at SSRL
SDC pipeline crystals screened
Crystallomics Core Overview
Provide quick feedback to data collection experiments.
Robust and automated crystal screening
The site to download BALBES:
Presentation transcript:

From crystals to pdb: building a high throughput crystallography pipeline for structural genomics Chiu HJ 1, Wolf G 1, West W 2, van den Bedem H 1, Miller MD 1, Zhang Z 1, Morse A 2, Wang X 2, Xu Q 1, Levin I 1, von Delft F 3, Elsliger MA 3, Godzik A 2, Grzechnik SK 2 and Deacon AM 1 1 Stanford Synchrotron Radiation Laboratory, 2575 Sand Hill Road, Menlo Park, CA University of California, San Diego, 9500 Gilman Drive, La Jolla, CA The Scripps Research Institute, N. Torrey Pines Rd., La Jolla, CA The Structure Determination Core (SDC) of the Joint Center for the Structural Genomics (JCSG) is dedicated to developing technologies, which streamline all the steps in the structure determination process from crystals to PDB-ready atomic coordinates. Over the last year the JCSG production capacity has increased dramatically. SDC has screened more than 7000 crystals from 192 protein targets. A total of 232 datasets from 106 targets have been collected and 90 structures have been solved. In order to handle the rapidly growing flow of experimental data, we have developed a set of crystallographic and database tools to both track and streamline our workflow. Crystal cassettes are shipped to SDC from the Crystallomics Core. All relevant crystal information is captured in the central JCSG database and is downloaded in a “Beamline Report”. Crystals are screened automatically using the Stanford Auto-Mounter and Blu-Ice software. The visual and diffraction properties of each crystal are recorded. A computer program, DISTIL, is under development to automatically analyze diffraction images and provide an objective screening evaluation for each crystal. The best crystals for each target are flagged for data collection. A computer program, Xsolve, is used for automatic crystallographic data processing and structure solution. A model building tool providing crystallographers with the best possible initial model for refinement is under development. The results of the analysis are uploaded to a Structure Solution Tracking System. A Refinement Tracking System requests weekly updates and collects all the data necessary for a peer-review Quality Control step, before the coordinates are deposited to the Protein Data Bank. The Joint Center for Structural Genomics Mission: To establish a robust and scalable protein structure determination pipeline that will form the foundation for a large-scale cost effective production center for structural genomics. Structural Genomics of Thermotoga maritima T.maritima genome A system to test the pipeline Small bacterial genome 1877 gene products Proteins should express well in E. coli Proteins from a thermophile may be more stable Process entire genome Establish trends in process e.g. crystallization. Category Number % Nucleic acid binding DNA binding DNA repair DNA replication factor Transcription factor RNA binding Structural Ribosomal protein Translation factor Motor Enzyme Peptidase Protein Kinase Protein Phosphatase Signal transducer Cell adhesion Structural Protein Transporter Ion channel Ligand Binding or carrier Electron transporter Unknown or unclassified Total % HT Structure Determination 2 nd Generation HT Data Collection 1 st Generation Prototype 3 rd Generation Software Target Selection HT Imaging 1 st Generation Hardware 6 th Generation Software Structure Validation & Deposition Autosubmission of electronic publication Data flow parallels the experimental pipeline, harvesting ~300 parameters from 19 stages HT Crystallization HT Purification HT Expression PDB HT Pipeline Processes, Bottlenecks and Leaks purificationexpression clonin g struc. refinement struc. validation annotation publication phasing data collection xtal screening tracing bl xtal mounting crystallizationimaging harvesting target selection All relevant crystal information is captured in the central JCSG database in the form of Beamline Report Target ID Diffraction properties Resolution Spot quality Diffraction strength Beamline Crystallization coditionVisual properties Robust and automated crystal screening Initial design to production Large-scale capacity Shipping, storage and screening Used by JCSG since June 2002 Implemented on all SSRL beamlines Cassette kits distributed to PX user groups Integration with BLU-ICE Automated sample mounting Automated sample alignment Automated diffraction images Increased screening capacity during SSRL shutdown Leverage existing infrastructure X-ray MicroMax-002 generator installed June 2003 SSRL automated screening system used >4200 crystals screened in 9 months All data uploaded to JCSG DB Screening, collection and structure solution Work closely with BIC on implementation and debugging Still more features needed to handle expanding production Structure solution tracking Local SDC “dataset” database Active crystal report Xsolve: automation of structure determination 2004 developments Improve success rate: better autoindexing, determine optimal resolution for scaling sweeps More general: handle crystallographic details: re-indexing screw axes, merging sweeps More robust operation: catch timeouts, core dumps, infinite loops etc Implement parallelization: develop tools to monitor and control processing on a Linux cluster New program support: HKL2000, SHARP, SHELXD (not completely tested) Mosflm Autoindex Mosflm Integrate Solve Resolve Trace Scala Scale Solve P422 1 mol 2. Solve P422 2 mols 3. Solve P mol 2. Solve P mol 3. Solve P mol 2. Solve P mols 3. Autoindex Integrate Scale Solve Trace Main goals Handle majority cases Organize data and workflow Ease information flow to JCSG DB Allow integration of new programs. Use parallel execution of jobs Refinement Tracking System Automation of protein model completion: an inverse kinematics approach Automatically Build Backbone Fragments: Build candidate closing conformations using IK techniques (robotics) Rank according to electron density fit and conformational likelihood Subject top-ranking candidates to real-space, torsion angle SA refinement Results: Closed missing fragments of up to 12 residues in length to within 0.6A all-atom RMSD in 2.8A-model Manually Finalizing Model: Labor intensive, time consuming Existing aids are highly interactive Lotan et al. submitted van den Bedem et al. in preparation Total Crystals Screened at SDC10778 Unique Targets Represented356 TM/non-TM targets299/57 Datasets collected 394 (288 TM, 106 non-TM) Unique Targets Represented194 TM/non-TM targets 146/48 Structures solved155 (94 MAD; 51 MR; 3 SAD; 7 NMR) (125 TM: 30 non-TM) JCSG production statistics (August 10, 2004) can be searched by Shipment ID Dewar Target ID Cassette/puck Installation of a Microsource X-ray generator at 9-2 JCSG production statistics (August 10, 2004) More to come… 22 targets: data collected, not yet solved 92 targets: diffraction better than 3.5Å, not yet solved Growing reliance on the JCSG DB 500 crystals and 8 structures per month 20 cassettes (2000 crystals) inventory structures in refinement 2.0 TB of diffraction images 0.5 TB of processing files >100,000 diffraction images Average resolution of structures in PDB 2.0A Average protein chain length 260 aa Average number of residues in asu 480 aa TSRI Administrative Core Ian Wilson Peter Kuhn Marc Elsliger Frank von Delft Tina Montgomery Gye Won Han Rong Chen Angela Walker UCSD Bioinformatics Core John Wooley Adam Godzik Susan Taylor Slawomir Grzechnik Bill West Andrew Morse Jie Quyang Xianhong Wang Jaume Canaves Lukasz Jaroszewski Robert Schwarzenbacher Marc Robinson Rechavi Chris Edwards Olga Kirillova Ray Bean, Josie Alaoen Stanford /SSRL Structure Determination Core Keith Hodgson Ashley Deacon Britt Hedman Guenter Wolf Mitch Miller Henry van den Bedem Qingping Xu Herbert Axelrod Christopher Rife Inna Levin R. Paul Phizackerley Amanda Prado John Kovarik Ross Floyd Irimpan Mathews Michael Solits Aina Cohen Paul Ellis GNF & TSRI Crystallomics Core Ray Stevens Scott Lesley Rebbeca Page Carina Grittini Glen Spraggon Andreas Kreusch Michael DiDonato Daniel McMullan Heath Klock Polat Abdubek Eileen Ambing Tanya Biorac Joanna C. Hale Justin Haugen Mike Hornsby Eric Koesema Edward Nigoghossian Kevin Quijano Megan Wemmer Aprilfawn White Juli Vincent Jeff Velasquez Kin Moy Vandana Sridhar Bernard Collins Thomas Clayton Scientific Advisory Board Carl-Ivar Brändén, Karolinska Inst., Stockholm (retired 2003) Elbert Branscomb, DOE Joint Genome Inst., Walnut Creek Stephen Cusack, EMBL – Outstation Grenoble Leroy Hood, Inst. for Systems Biology, Seattle John Kuriyan, U.C. Berkeley Erkki Ruoslahti, The Burnham Institute James Wells, Sunesis Pharmaceuticals, Inc. Charles Cantor. Sequenom, Inc. Todd Yeates, UCLA-DOE, Inst. for Genomics and Proteomics James Paulson, Consortium for Functional Glycomics, The Scripps Research Institute Exploratory Projects Kurt Wüthrich (NMR) Linda Columbus Touraj Etezady-Esfarjani Wolfgang Peti Virgil Woods (DXMS) Acknowledgements NIH Protein Structure Initiative Grant P50 GM62411