Download presentation
Presentation is loading. Please wait.
Published byGervais Sullivan Modified over 9 years ago
1
CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013 @Grenoble 1
2
outline CAS@home project Applications: – Lammps: dynamical molecular simulation – treeThreader: protein structure prediction Remote Job Submission 2015-10-17BOINC workshop 2013 @Grenoble2
3
CAS@HOME 2015-10-17BOINC workshop 2013 @Grenoble3 First and Only Volunteer Project in mainland China Launched in June 2010, hosted by the computer center of IHEP, CAS To support scientific computing from Chinese Academy of Sciences and other Research Institutes Host multiple applications from various research fields, including nanotechnology, bioinformation, physics
4
CAS@home status 2015-10-17BOINC workshop 2013 @Grenoble4 Ever Since it was launched in June 2010 10K active users 1/3 are Chinese 10K active users 1/3 are Chinese 23K active hosts 7M CPU hours Since Nov 2012 7M CPU hours Since Nov 2012 Hosting 3 applications: Lammps, treeThreader, Aevol Other ongoing applications: BOSS (VBoxwrapper based) Hosting 3 applications: Lammps, treeThreader, Aevol Other ongoing applications: BOSS (VBoxwrapper based) 1.3 TFLOPS (real time computing power) 1.3 TFLOPS (real time computing power) Peak: 1M/month validated CPU hours Peak: 1M/month validated CPU hours
5
Some project Statistics
6
Application 1: Lammps Software for dynamical molecular simulation, widely used by scientists from various research fields. Restartable, developed in C by an international group, can be compiled on both Windows and Linux with some effort. Input/output: 3 mandatory input files (<10MB)/ 1 compressed output file (hundreds of MB) Running time : 0.5 hour to 800 hours (it depends on a random number which decides the steps of the simulation) 2015-10-17BOINC workshop 2013 @Grenoble6
7
Problems Results are numerical, it generates discrepancy for 2 reasons: – float point calculation on different platforms – the checkpoints also cause discrepancy due to losing precision with printing the value to a text file. Solutions – Homogeneous Redundancy, or Homogeneous Application Version Running problems: – Some long jobs (~hundreds hours) crash in the middle without getting any credit. 2015-10-17BOINC workshop 2013 @Grenoble7
8
Application 2: treeThreader For Protein structure prediction Written in C by local scientists, can be compiled easily on both Windows and Linux platform, restartable Computing task: to compare a protein sequence file against all existing protein templates. Input files: configuration files, Protein Sequence file, ~50k Protein templates (about 4GB) Output files: a text file corresponds to a template file It needs about 42GFLOPS/hour to compare one sequence file against all templates. 2015-10-17BOINC workshop 2013 @Grenoble8
9
Each comparison takes 6s 1 Host Computing task A Protein sequence Protein Template 1 Protein Template 2 Protein Template 3 Protein Template 50,000 It takes about 84 hours on a single core
10
Each comparison takes 6s,each sub package takes 9000s on a host Running it on BOINC A Protein sequence It takes 9000s (2.5 hours) to finish the task Host A1 Sub Package 1 (sticky file) Protein Template 1500 Protein Template 1 Protein Template 2 Host A2 Sub Package 2(sticky file) Protein Template 3000 Protein Template 1501 Protein Template 1502 Host Am Sub Package 32(sticky file) Protein Template 48000 Protein Template 46501 Protein Template 46502 Host An Sub Package 14(sticky file) Sub Package 15(sticky file) Sub Package 16(sticky file) Locality Scheduling (job goes to where the data is)
11
Problems Long tail batches – There is a front end server which submits batches and does the pre-processing and post processing of the sequence, hence it can only maintain/watch a maximum number of active batches (batches in progress) in parallel (300) – a whole batch is delayed by the slowest job – No new batches will be submitted to the BOINC server due to some batches are still “in progress” (waiting for the slowest jobs) – A lot of hosts end up in “starving” situation 2015-10-17BOINC workshop 2013 @Grenoble11
12
Remote Job Submission CAS@home hosts multiple applications Each application has multiple users Application users have no privileges to submit jobs via CAS@home server directly It requires remote job submission which allows authorized and authenticated users to submit jobs through remote machines. Basic Remote Job Submission functions: batch submit/check_status/retire/abort/download results BOINC provides a quite rich set of APIs for remote batch (a set of jobs based on the same input files) operations, but each application still needs its own server side CGI code and client side code for remote job submission – Some operations (Batch retire/abort/status check) are generic, can directly use BOINC API – Other operations like batch submit/results downloading are application specific, need to be customized. – Can add fancy functions as “test running”, “estimate running time” 2015-10-17BOINC workshop 2013 @Grenoble12
13
Lammps Job Submission Jobs are created in batches. A batch = 1 set of input files + different parameter-value pairs A batch comprises from hundreds to thousands of jobs Remote Job Submission: Batches are submitted through a web portal by authenticated and authorized users Authenticated and Authorized users can “operate” the batches through the web portal (retire, abort, check status, download results) 2015-10-17BOINC workshop 2013 @Grenoble13 Batch A –(input file1, input file 2) Job 1: Ka1=Va1 Kb1=Vb1 Job 2: Ka2=Va2 Kb2=Vb2 ….. Job N: KaN=VaN KbN=VbN
14
LAMMPS CAS User Interface File Sandbox Test a Job Submit a Batch Check Batch Status Get Output CAS@home LAMMPS CGI File Sandbox Service Job1: Para List, Value List1 Job2: Para List, Value List2 Job3: Para List, Value List3 …. JobN: Para List, Value ListN Job1: Para List, Value List1 Job2: Para List, Value List2 Job3: Para List, Value List3 …. JobN: Para List, Value ListN … …
15
Syntax check, GLOPS, output size estimation http Web Portal http Pass the test 2015-10-17 BOINC workshop 2013 @Grenoble 15 Sandbox File1 File2 File1 File2 LAMMPS CGI on CAS@home server Job Tester Batch Creator Batch Monitor Job Monitor Batch Monitor Job Monitor Operations on Batch Abort/Retire a batchRetire Abort/Retire a batchRetire Download Results Batch Operations Zip Results Volunteer Hosts User Test a job with chosen input files Test a job with chosen input files Submit a batch http
16
BOINC Sandbox 2015-10-17BOINC workshop 2013 @Grenoble16 Can not repeat uploading a file Can not delete files used by a running batch
17
Lammps Job Testing 2015-10-17BOINC workshop 2013 @Grenoble17 Test the job to the server Submit the batch Lammps Specific !
18
Batch Monitoring 2015-10-17BOINC workshop 2013 @Grenoble18 Admin can see the status of all batches Batch status: In process, Completed, Aborted, Retired
19
Admin all batches 2015-10-17BOINC workshop 2013 @Grenoble19
20
Job Status 2015-10-17BOINC workshop 2013 @Grenoble20 Input files associated with this job Results can be downloaded respectively
21
Batch Operations 2015-10-17BOINC workshop 2013 @Grenoble21 Download results of this batch Retire a batch Download results of a work unit Can Abort an unfinished batch here
22
TreeThreader job submission Jobs are created in batches: 1 protein sequence corresponds to 1 batch (32 jobs) Remote Job Submission: – Client side: provide a set of PHP APIs which allows authenticated and authorized users to submit batches and operate (check status, retire, abort, get output)these batches from remote – Server side: Generic operations such as batch abort/retire/status check are already included in BOINC code Operations as batch submission and results downloading are application specific, and implemented in a CGI program on the server side 2015-10-17BOINC workshop 2013 @Grenoble22
23
TreeThreader Job Submission CGI Batch submission – Takes client uploaded the sequence and configuration files – create a batch of jobs based on the input files and all templates files which are already stored on the server side. – Return a Batch ID Batch result downloading – uncompress all output files of the batch – put uncompressed output files into a same directory and compress it – return the downloading URL of the batch result file 2015-10-17BOINC workshop 2013 @Grenoble23
24
TreeThreader Job Submission TreeThreader CGI CAS@home Template P1 Template P2 Template P3 Template P32 … … … … Template P4 ICT Web Services API Submit a sequence Status Check Get Output Sequence Merged Results
25
Thoughts on a more generic Job submission interface Server side still requires specific functions to create batches, merge results, testing, estimation On client side, can generalize the job submission and results downloading functions Use an XML file to describe input files, types of input files from the client side 2015-10-17BOINC workshop 2013 @Grenoble25
26
2015-10-17BOINC workshop 2013 @Grenoble26 0 upload !file needs to be uploaded to BOINC server 1 online !file already stored on BOINC server 0 MySEQ.tar.gz 1 Templates
27
The End! 2015-10-17BOINC workshop 2013 @Grenoble27
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.