Condor in Cryo-EM image processing Weimin Wu, Wen Jiang Department of biological sciences Purdue University 04/30/2008.

Slides:



Advertisements
Similar presentations
Condor use in Department of Computing, Imperial College Stephen M c Gough, David McBride London e-Science Centre.
Advertisements

AI Pathfinding Representing the Search Space
Bacterial viruses. Very complex shape, requiring 20 gene products for assembly - Capsid (head), contains linear dsDNA genome - Tail, consists of sheath.
September 4, 2014 Using National Cyberinfrastructure Tom Doak Carrie Ganote National Center for Genome Analysis Support.
RCAC Research Computing Presents: DiaGird Overview Tuesday, September 24, 2013.
Mike Arnoult 9/30/2010 The role of Artificial Neural Networks in Phage Research.
Genetics of Viruses.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Performance Evaluation
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Assembly and exit of virions from cells LECTURE 14: Viro100: Virology 3 Credit hours NUST Centre of Virology & Immunology Waqas Nasir Chaudhry.
Introduction to Virus Structure
CHAPTER 12 THE STRUCTURE AND INFECTION CYCLE OF VIRUSES
 Viruses are not alive  A virus in an obligate intracellular parasite  Requires host cell to reproduce  Can be seen at magnifications provided by.
Virus Structure Tutorial Shuchismita Dutta, Ph.D. RCSB PDB, 2008.
Aligning and Averaging 3-D Subvolumes from Electron Cryo-Tomograms Michael F. Schmid.
Viruses Packet #24. Introduction  A virus, or virion, is a tiny particle consisting of DNA or RNA surrounded by a protein coat called a capsid.  Viruses.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
A comparison of membrane-containing bacteriophage and archaeal virus structures by cryo-EM Sarah Butcher ERICE 2006 Enveloped virus session.
What do these terms mean to you? You have 5 min to discuss possible meanings and examples with your group! DNA sequencing DNA profiling/fingerprinting.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Diversity of Living Things
Wenjing Wu Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing BOINC workshop 2013.
 Viruses- particles of nucleic acid, protein and sometimes lipids  Most viruses are so small, they can only be seen through a powerful electron microscope.
Viruses Packet #47 Chapter #18.
INTRODUCTION TO VIRUSES AND THEIR STRUCTURE: PETER H. RUSSELL, BVSc, PhD, FRCPath, MRCVS Department of Pathology and Infectious Diseases, The Royal Veterinary.
An Introduction to the Viruses Chapter 6 Copyright © The McGraw-Hill Companies, Inc) Permission required for reproduction or display.
Virus Virus, infectious agent found in virtually all life forms, including humans, animals, plants, fungi, and bacteria. Viruses consist of genetic material—either.
Chapter 27 Viruses Joe Ganoe Jesus Trochez. Types of Viruses Plant Virus – Helical capsid shape Animal Virus – Icosahedral capsid Bacterial Virus – Icosahedral.
 They are microscopic, nonliving particles  Some are harmless, while others kill their hosts, ALL are infectious!  They do not display most of the.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Genetics of Viruses.
“BSP, May I Be Excused? My Brain Is Full.” Gary Larson – The Far Side ASSESSMENT.
Unit 11 – Viruses, Bacteria, and Protist
Single Particle Cryo-EM Wen Jiang Markey Center for Structural Biology
Workshop Structural Proteomics of Biological Complexes.
Warm-Up What makes something alive? Is the common cold a virus or a bacteria? When is your project (Data Table and Graphs) due?
11.2 Viruses.
ISOLATION OF BACTERIOPHAGE CLERIGO, GEHAN ALYANNA V. DIMAANO, PETER BOB Z. DECIO, JOHN LAWRENCE GOCO, AMELIA BERNADETTE O.
Viruses and bacteria are the simplest biological systems - microbial models where scientists find life’s fundamental molecular mechanisms in their most.
What you need: Writing Utensil, Bellwork Sheet, Journal and virus worksheets. Bellwork Today: Write and answer the question below: How are viruses different.
Virology Lec 1 Dr Sadia Anjum.
Hierarchy of Biological Complexity Interactions of machines (molecular and cellular dynamics) Macromolecular machines Proteins and nucleic acids Sequences.
1 Zoology 145 course General Animal Biology For Premedical Student H Zoology Department Lecture 3 : Viruses.
Two Cycles and A Bit of Review Remember that viruses are not able to reproduce on their own. They rely on a ‘host cell’ for reproduction In the Lytic Cycle.
{ Viral Replication  Virus: A biological particle composed of nucleic acid and protein  Intracellular Parasites: organism that must “live” inside a.
Introduction to Virology.
Lysogenic Cycle Lytic Cycle Viral Replication.
CHAPTER 12 THE STRUCTURE AND INFECTION CYCLE OF VIRUSES
What is cryo EM? EM = (Transmission) Electron Microscopy
Agenda 4/10 Biotech Intro Uses for Bacteria and Viruses
Volume 21, Issue 7, Pages (July 2013)
Standard SB3d: Compare and contrast viruses with living organisms.
General Animal Biology
Viral Structure.
Viral Reproduction.
Volume 18, Issue 8, Pages (August 2010)
Volume 118, Issue 4, Pages (August 2004)
Phage Pierces the Host Cell Membrane with the Iron-Loaded Spike
Volume 26, Issue 2, Pages e3 (February 2018)
Cryo-EM Asymmetric Reconstruction of Bacteriophage P22 Reveals Organization of its DNA Packaging and Infecting Machinery  Juan Chang, Peter Weigele, Jonathan.
Agenda 4/8 Biotech Intro Uses for Bacteria and Viruses
Volume 13, Issue 3, Pages (March 2005)
Viruses Page 328.
Erika J Mancini, Felix de Haas, Stephen D Fuller  Structure 
Volume 94, Issue 1, Pages (July 1998)
Presentation transcript:

Condor in Cryo-EM image processing Weimin Wu, Wen Jiang Department of biological sciences Purdue University 04/30/2008

Cryo-EM: low temperature electron microscopy Image processing: get the 3D reconstruction from 2D images. Introduction: Viral infections have been and remain one of the major threats to human health. Viruses are large assemblies of proteins and nucleic acids that rely on infection of hosts to complete their life cycle and sustain their propagation. High resolution 3-D structure of the virus particles will provide important insights to understanding of these processes and the development of effective prevention and treatment strategies. Recently we have demonstrated, in collaboration with researchers in Baylor College of Medicine and MIT, the 3-D reconstruction of the infectious bacterial virus Epsilon15 (ε15) at 4.5 Å resolution, which allowed tracing of the polypeptide backbone of its major capsid protein gp7 (Jiang et al., Nature 451(7182):1130-4, 2008).

For many of the tailed dsDNA viruses, for example the bacterial viruses T7, T3 and ε15, one of the 12 icosahedral 5-fold vertices is occupied by a unique 12-fold portal protein complex. This unique portal vertex is responsible for the packaging of dsDNA genome into the protein shell during assembly and the ejection of the dsDNA genome out of the virus and into the host cell during infection. However, high resolution structure of these virus particles, especially the non-icosahedrally organized components such as the portal complex, the tail and the encapsulated dsDNA genome, are lacking. I am working on this kind of project without enforcing any symmetry on virus. Now we get a sub-nanometre resolution result which enables us to visualize the secondary structure of portal, tail hub and tail spikes.

(A) Schematic diagram of the T7/T3 phage particle assembly and dsDNA genome packaging pathway. Adapted from (Serwer, 2004). (B) A cryo- EM micrograph of T3 phage showing the particles representing each of the major stages during assembly and genome packaging.

Tail hub terminus spike portal core DNA rings Image processing is a critical step for generating the macromolecule 3D structure from the 2D images taken with cryo-EM technique. This step includes 2D alignment and 3D reconstruction. Both need intensive computing power. High performance computing (HPC) resources supported by RCAC enable us to work on huge datasets for getting high resolution results and therefore learning more details of biological system.

Scientific needs: Two major steps are involved in the cryo-EM image processing. One is the 2D alignment step, which is to find the orientation and center information of the sample particles by matching the images (2D projection of the sample particles) with the reference, the other step is 3D reconstruction step, which generates the 3D map by collecting all the particles’ orientation and center information and averaging them. 2D images Projections 55,000 vs 1,400 1 raw image vs 1 projection 1 second 22K CPU hours

GroEL as example to show the 3D reconstruction and many iterations needed for high resolution. For our E15 project, even we started with an intermediate resolution map (7Ǻ), more than 10 iterations were continued for achieving 4.5Ǻ. Features as a function of resolution to show how to evaluate the resolution qualitatively from density map

Condor Performance: We feel lucky in Purdue to get so many resources supported by RCAC, otherwise our research will take forever. Here I list the condor jobs we submitted and CPU hours we used. 10/01/2006~10/01/2007Condor JobsCPU hours Group-jiang129,465,9444,817,429* 08/17/2007~04/22/2008Condor JobsCPU hours wu494,046,9143,884,568* *each job took about half a hour. *each job took about one hour due to different algorithm and other reasons.

Running jobs versus Time. This is a long time job, about 64hours. It is obvious there are three major peaks. These three periods are overnight time. At daytime, the number of running jobs drop a lot due to owner use. The three peaks are getting smaller mean the user priority is getting lower. Now it is summer holiday, I can get more than 3,000 nodes for my condor jobs.

We tried to use all the platforms to run our condor jobs. How about the performance of different platforms? Average Running Time Average Remote Running CPU time Average Remote Wall Clock Time Average Queue Wait Time LINUX 32bit s2331.1s2957.0s255.5s LINUX 64 bit s1630.0s2457.0s329.1s WINDOWS2929.2s2302.8s2881.6s356.5s The LINUX 64-bit machines are not as fast as we expected. Why?

We checked the remote host condor jobs submitted to in this test, 90% of LINUX 64-bit machines were from ccl00.cse.nd.edu. Remote HostJobsAverage Running time *.nd.edu5, s *.purdue.edu s The condor jobs could go to the nodes out of campus and the performance was just slightly worse. It made us more confident to seriously think about the Teragrid, although we have tried Teragrid but still used the resources in campus. Anyway it is a problem when the files to be transferred are large, for example, more than 700M.

With icosahedral symmetry High quality Alpha- helix,Beta sheet and Side chain, which enabled us to do the modeling and get the backbone structure.

Our problem/concern about Condor : 1.Operation: the best thing for us is to submit the condor jobs from our desktop, and let condor itself to find resources, but now we need specify where to go if using Teragrid. 2.File transfer: in the case of large file transfer, the network becomes bottleneck which will easily overload the head node and crash it, especially when the file goes outside of campus. This is due to large amount of reading from the only copy of large dataset. However this might be circumvented by applying P2P client into the condor because in our image processing 2D alignment step, one image will be compared to all the reference projections, those projections might have been sent to neighboring computers to run another condor job, therefore for this condor job, the file could be transferred from neighboring nodes. Based on this, the number of reading from original copy will drop a lot, in theory, might be just a few times. The file transfer speed will also increase dramatically.

Acknowledgment: Preston Smith David Braun Steve Wilson Pia Mikeal Bruce L. Fuller Reference: 1.Jiang et.al Vol439|2 February 2006/Nature Jiang et.al Vol451|28 February 2008/Nature 06665