Instruct Image Processing Centre I2PC Instruct Image Processing Centre JM CARAZO
CryoEM and Cloud: From “first principles” José María Carazo (carazo@cnb.csic.es) Spanish National Center for Biotechnology Instruct Image Processing Center
THE CELL: The basic block of life Pericentriolar material
An integrated view of Life Optical microscopy Electron microscopy M-Cell Salk Institute, Cornell Univesity X-ray diffraction Nuclear Magnetic Resonance Computational Modelling
Life is based on macromolecular machines DNA replication Protein synthesis Dynein motion
Resolution and Thickness range Resolution range IPC 6
An electron microscope
DnaB·DnaC in vitreous ice
The cryo-EM SPA pledge In 3D Electron Microscopy individual macromolecules are visualized down to atomic resolution. Trapped in ice, these molecules are free to expose their internal flexibility/plasticity.
The value of a «radiography»
Compared to full 3D CT
Tomography principles
Limits the comprehension In a first approximation….. Limits the comprehension of complex objects 2D projections : lack of information
Tomography Principle Acquisition of tilted image series Correction of microscope default (mechanical drift, CTF...) Reconstruction
Cryo 3D-EM Conceptual bases Experimental situation in cryo 3D EM
Krios (MRC- Cambridge) 1day 2.4 Tb Adquire Data as Reconstruct Understand Amount of data In this example Data is adquired with an electron microscope like this one The data look like this dark areas is a virus Merging the information contained in all the images we can get a 3D reconstruction like this Reconstruction that may be used to better understand the viral life-cycle
Tomography Principle Acquisition of tilted image series Correction of microscope default (mechanical drift, CTF...) Reconstruction
The “a priori unkown” geometry in SPA
Parameter space JUST for Geometry characterization For each particle we need to determine 3 angles and 2 shifts. FIVE parameters. If we have 100.000 particle images. We then have a space of 500.000 parameters!
Reconstruction as a linear set of equations
Parameter space of cryo EM SPA Target (the X’s): A volume of (for example) 100 x 100 x 100 voxels = 10**6 variables (Plus 500.000 = 5 x 10**5 geometry variables) (plus 100.000 x k (classes)) Measurements (the Y’s): 100.000 particle images of 100 x 100 pixels = 10**9 But we have noise!: 2 + 2 = 5 (or 3, or 6 …)
The 3D flexibility challenge
Molecular machines 15 m 15 10-9 m Dutch windmill
The 3D flexibility challenge NOW in RELION
Everything is mixed!!! alignment & classification are strongly intertwined! noise forms a serious problem!
Parameter space of cryo EM SPA f(x) local minima x global minimum
Parameter space of cryo EM SPA f(x) fs (x) x
Workflows: How do we do it in practice? Using different EM software packages is now like the tower of Babel Why we are working in Scipion? Simply put, the EM field needs software integration. Currently processing with different software packages is like a Babel Tower. User needs to deals with files convertion between packages, which wastes time and is error prone.
Task 2: Virtualize the execution hosts Internet Scipion client Cloud computing Big data transfers By Virtualizing the Execution Hosts, we will be able use computing resources more efficiently. First, we can computing nodes will be allocated when needed, which avoid having them when There are not computing jobs. And second, the number of nodes can be adapted for each Job requirements (ie. Maybe some jobs requires more RAM memory while other make use of more CPU power) Data storage Scipion server
Task 2: Virtualize server and storage Execution hosts Internet Scipion client Cloud computing Big data transfers By Virtualizing the Execution Hosts, we will be able use computing resources more efficiently. First, we can computing nodes will be allocated when needed, which avoid having them when There are not computing jobs. And second, the number of nodes can be adapted for each Job requirements (ie. Maybe some jobs requires more RAM memory while other make use of more CPU power) Data storage Scipion server
Scipion have specific goals Integrate EM software packages to be used in the same project. Full project traceability, improving reproducibility. Execute complete workflows in an automated manner. Easy to install and use. Easy to extend with new protocols. In this document we describe the main concepts and new features of Xmipp 3.0 for users. 32
Goal 1: Integrate EM software packages to be used in the same project. Our main goal, the reason of why we have started working in the Scipion project, is the need of software Integration for the field, as JMC mentioned. 33
We bridge across package differences by modeling our domain 3D Reconstruction Set of Images Initial Model 3D Volume Protocols Data In order to address the integration problem, we starting by creating a model of the EM domain. This model is composed by abstract entities (or objects), that will reflect the concepts more than The specificities of each software package. In this model we have two type of objects: Data and Protocols(or operations). Data objects will serve as input-output for the protocols. Protocols are like “big steps”, which wraps at higher level the logic of low level operations.
We bridge across package differences by modeling our domain We can´t modify all existing software packages to adopt this model. What we can do is to implement conversion routines that know how To map from our “objects” to the package specific files and operations. So we need conversions in both directions in order to execute the Packages programs and communicate with other protocols. With this approach, we can build tools in the upper world, and them can Be reused for all existing packages and even for future ones.
Goal 2: Full project traceability, improving reproducibility. Having a well-define model of the problem also facilitate to solve the issue of having full traceability. 36
We bet for a simple storage mechanism Data Objects Protocol Objects Mapper Layer We starting by modeling our domain. We consider two main type of objects: Data and Protocols(or operations). Data objects will serve as input-output for the protocols. Protocols are like “big steps”, which wraps at higher level the logic of low level operations.
Results should be reproducible, not more “black boxes” We starting by modeling our domain. We consider two main type of objects: Data and Protocols(or operations). Data objects will serve as input-output for the protocols. Protocols are like “big steps”, which wraps at higher level the logic of low level operations.
Goal 3: Execute complete workflows in an automated manner. In this document we describe the main concepts and new features of Xmipp 3.0 for users. 39
Designed to perform distributed execution Worker Host 2 Worker Host 1 Scipion client Big data transfers Relatives: There are a number of very good existing “Workflow Engines”, such as Taverna (Manchester) or Pegasus (San Diego) ….. BUT SCIPION is NOT a Workflow Engine, but can use any WE in the future Distributed data storage Bookeeping Scipion Server
The Initial Volume Problem in SPA f(x) local minima x global minimum
The Initial Volume Problem (in the Web)
The Initial Volume Problem (in the Web)
Breaking News…… FEI, la principal empresa proveedora de microscopios electrónicos de alta gama, acaba de expresar su deseo de que el I2PC sea su centro de referencia mundial para soluciones y servicios de procesamiento de imagen para Pharma Scipion y Cloud son partes importantes de esta estrategia
Instruct Open call for Access
www.structuralbiology.eu A distributed infrastructure for integrated structural biology