TORQUE Kerry Chang CCLS December 13, 2010
O UTLINE Torque How does it work? Architecture MADA Demo Results Problems Future Improvements
T ORQUE – W HAT IS IT ? Open source project by Cluster Resources Inc. Cluster resource manager Manages batch jobs A series of programs to be executed without manual intervention Manages distributed compute nodes Distributed servers on which to execute batch jobs
T ORQUE A RCHITECTURE
T ORQUE S CHEDULER Currently using standard built-in schedule (FIFO) MOAB – more advanced scheduler
W HAT HAVE I DONE ? Used MADA as an application of TORQUE Treated the application as a blackbox Text parallelization on input Created a series of scripts for text manipulation and job submission to Torque queue Linear improvement in processing time by using Torque
MADA System for Morphological Analysis and Disambiguation for Arabic Input file is separated line by line
MADA A RCHITECTURE
H OW DO THE SCRIPTS WORK ? 1) First split the text file evenly across the number of specified jobs to be submitted 2) Create a script for each newly split text file e.g. If you wanted to run 5 jobs, split the text into 5 files and create a script to run each of the 5 files. 3) Submit each script to Torque 4) Concatenate the output of each script
D EMO Demonstration of Torque and MADA 3 Output Files file.bw file.bw.mada file.bw.mada.tok
R ESULTS 30 lines
R ESULTS 300 Lines
R ESULTS 3,000 Lines
R ESULTS 30,000 Lines
R ESULTS Network – Local Temp comparison (seconds) NetworkLocal TempImprovement ,4771, ,54413, ,105131,64913,456
P ROBLEMS How do we know when MADA has finished and we can concatenate the results? Where do we run MADA and have the results output to? Submission to compute node hangs Use smarter scheduler Supply machines dedicated to running Torque jobs
F UTURE I MPROVEMENTS Pipeline many jobs to Torque Work from local temp folders instead of on the network Split and rebuild certain output files by looking at provided testing.madaconfig file MADA TOKAN Preprocessor
Q UESTIONS