Download presentation
Presentation is loading. Please wait.
1
1 port BOSS on CAS@Home Wenjing Wu (IHEP-CC) wuwj@ihep.ac.cn 2011-5-10
2
2 Outline Volunteer Computing and CAS@Home How to port BOSS A big picture Related technology Glue together
3
Volunteer Computing and BOINC Volunteer Computing (VC) is an established technology that enables users to contribute to important challenges in fundamental science and medicine, by providing idle time on their PCs and even partaking in data analysis via the Internet. Berkeley Open Infrastructure for Network Computing (BOINC) is a kind of middleware which allows exploiting the computing resources provided by volunteers. The system currently has more than 300k registered users and 500K registered hosts. Middleware: BOINC, XtremWeb, Xgrid, Grid MP
4
Basic Model for VC Based BOINC The BOINC client is distributed to volunteer PCs, laptops or Clusters. BOINC client gets the application and input files from server, runs the application and sends the results back to BOINC server The scientist deploys and manages the BOINC server and develops the application 1 2 3 BOINC Server BOINC ClientApplication APP Input Job Output/CPU time Control Message
5
Workflow User submit a (batch of) job(s) to BOINC server 1 BOINC server generates work unites from the submitted jobs BOINC server sends job, input files, application to requesting client Job being executed on the client side, it will suspend if the client hosting machine is “busy”, and continue Job finished, output data being sent back to BOINC server, 2 An Active client requests job from BOINC server 3 4 5 6
6
CAS@Home CAS@Home is a BOINC based volunteer computing project based at IHEP-CC Goal: collecting free and large amount of computing resources from volunteers to support scientific computing from CAS and other institutes. Current resource: about11000 registered hosts, 8000 registered computers, roughly 60% of them are active Designed for Multiple application Current application: Scthread Working on porting application: BOSS, LAMMPS
7
Porting BOSS Challenges: BOSS is heavily platform dependent, but most volunteer computers are windows machine A big code base: several GB, takes long to download Challenges: BOSS is heavily platform dependent, but most volunteer computers are windows machine A big code base: several GB, takes long to download Solution: Virtual Machine is used to run provide BOSS running environment Solution: Virtual Machine is used to run provide BOSS running environment
8
E BOSS JOB Manage System BOINC Server volunteer computer 1 BOINC Client 1 Create/start /resume VM Migrate BOSS job files to VM Execute BOSS job from host Query job status Pause/stop VM when job finished SQUID CERM VM (Execute BOSS jobs) volunteer computer N BOINC Client N CERM VM (Execute BOSS jobs) CVMFS Server
9
How it works BOSS jobs are submitted to BOINC server via the current BOSS job manage system BOINC clients run on desktops request jobs from BOINC server, and get jobs. JOB: Create/Start/resume a virtual machine with CernVM image copy BOSS job files to virtual machine shared folder and run it on CernVM Get job status (create a file in shared folder to indicate job status) Finish job, pause/poweroff/remove vm CernVM download /cache BOSS software (only happens once), run BOSS job
10
Related Technologies BOINC CernVM VirtualBox BOINC CernVM VirtualBox
11
BOINC Client features: Sticky file: files remain at client after job is done (vm images) Report on RPC: client report the presence of files to BOINC server Locality schedule : jobs are scheduled to where files are located Client features: Sticky file: files remain at client after job is done (vm images) Report on RPC: client report the presence of files to BOINC server Locality schedule : jobs are scheduled to where files are located BOINC Wrapper: BOINC Wrapper is used to control the start/suspension/resume/finish of the application, and report CPU time. Application can be rewritten with BOINC API to do so(wrapper is not needed in this case) For Virtual Machine, the wrapper is to create/start/pause/resume/poweoff the hyper visor(VirtualBox) BOINC developer has finished a generic VM wrapper BOINC Wrapper: BOINC Wrapper is used to control the start/suspension/resume/finish of the application, and report CPU time. Application can be rewritten with BOINC API to do so(wrapper is not needed in this case) For Virtual Machine, the wrapper is to create/start/pause/resume/poweoff the hyper visor(VirtualBox) BOINC developer has finished a generic VM wrapper
12
History of BOINC wrapper A specific wrapper has been developed for LHC@home, this is VirtualBox and CernVM based. CoPilot is being used to schedule the jobs which is different from our case. With CoPilot, the LHC@home wrapper does not have to support host/guest machine file share, and guest control(ie, execute command on guest machine from host machine) LHC@home wrapper only support one instance of virtual machine running on a host machine. BOINC developer works on the generic VM wrapper which is still virtualbox based, but supports guest control and multiple instances of vm on a host machine. A specific wrapper has been developed for LHC@home, this is VirtualBox and CernVM based. CoPilot is being used to schedule the jobs which is different from our case. With CoPilot, the LHC@home wrapper does not have to support host/guest machine file share, and guest control(ie, execute command on guest machine from host machine) LHC@home wrapper only support one instance of virtual machine running on a host machine. BOINC developer works on the generic VM wrapper which is still virtualbox based, but supports guest control and multiple instances of vm on a host machine.
13
CernVM CernVM is a thin virtual machine image dedicated for LHC experiment users, its basic image is about 250MB, can be run on most Hyper visor (VirtualBox, VMWare, Xen, KVM,Hyper-V server) LHC and other HEP application can be distributed and run easily to different platforms(Linux/Windows/Mac) via Hypervisor+CernVM image Current version is 2.2, it provides 3 types of SLC5 based Linux system image : Desktop, basic and BOINC,earlier versions support SLC4
14
Software distribution and CVMFS CernVM comes with a network file system CVMFS which delivers the software to CernVM users. CVMFS is mounted to appear as a local file system to CernVM users. Files will be downloaded and cached on demand. CVMFS is like a software repository, it currently supports four LHC experiments (ALICE, ATLAS, CMS and LHCb), as well as other experiments and projects (LCD, NA61, and H1), BOSS has also been deployed CVMFS is http based, no firewall/proxy concerns
15
CoPilot CernVM comes with a job schedule system CoPilot (PULL model), XMPP protocol based CoPilot Client, run on any machine, submit jobs to CoPilot server CoPilot Server: central service, schedule and deliver jobs/input files to Copilot Agent, receiving output files CoPilot Agent: run on each CernVM machine, executes jobs
16
VirtualBox Hyper visor. free, open source, has versions for Linux/Windows/Mac Rich command lines to create and control the status of virtual machine: Create vm/ Modify vm/ Start vm Save vm status/Pause vm Resume vm Poweoff vm/ release vm /remove vm List running vms/existing vm instanaces Life cycle:create->start->[pause|resume/save_state]- >poweroff->[remove|release]
17
Glue all pieces together BOINC VM wrapper is needed to build communication between vm and client. VM wrapper uses guestcontrol (of virtualbox) to execute commands from host machine on guest machine. VM wrapper receives signals from client to decide whether pause/resume virtualbox VM wrapper copies the BOSS related files to a shared folder, so they appear in the virtual machine for execution For multiple vm, wrapper needs to keep track of the status of each vm.
18
Glue all pieces together Only a basic cernvm image is downloaded by the client, and put in the project directory, if to run multiple instances on a host, each instance should have a clone of the image Input files include image(cermvm, original size 800M), boss scripts and its associated input files. BOSS software is distributed via CVMFS, a local squid is used to cache the repository
19
19 Thanks! For more information: http://twiki.ihep.ac.cn/twiki/bin/view/CASAtHome/BOSSonBOI NC
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.