Blackbird: Accelerated Course Archives Using Condor with Blackboard Sam Hoover, IT Systems Architect Matt Garrett, System Administrator
End of Semester archives of all online courses in Blackboard since implementation in GB Oracle DB tied to a 1.3 TB Content system with over 13 million files Spring 2010: 4610 active Blackboard courses, 31,372 total courses in Blackboard Full system backups once a week, nightly incremental backups of entire system
The Archive Problem Blackboard is a mission critical system Why is 85.5 hours for archives a problem? Start of new semester vs. normal operations Time between semesters is short and getting shorter Faculty have to wait to set up next semester’s courses End of semester processes
Why do we need course archives?
Student Add / Drop at start of semester
Loss of course content or an entire course
CRLT archive uses Grade disputes
Blackboard EoS archives
The Archive Problem
Blackboard provides a script for executing batch archives given a list of courses as input. Weekly archive process at Clemson began in Fall 2006 after an accidental deletion of many courses. Started out splitting the course list into four equal chunks and giving each server ¼ of the total course list. All four servers usually finished within 2 hours of each other, total time for the batch was < 24 hours. By Fall 2008, archiving the active courses took 85.5 hours, and the servers finished at widely varying times.
The Archive Problem
Who wants to work weekends?
Blackboard archive script /usr/local/blackboard/apps/content- exchange/bin/batch_ImportExport.sh Archive/Restore: The Archive Course function creates a record of the Course including User interactions. It is most useful for recalling Student performance or interactions at later time. The archive package is saved as a.ZIP file that can be restored to the Blackboard system at another time. In effect, Archive/Restore acts as a backup tool at the individual course level.
The Archive Problem
Potential Solutions?
Throw money at the problem?
Add more servers?
The Archive Problem
Potential solutions Write our own job scheduler? Could we take advantage of the other 3 (CPUs)? How do we monitor performance so end user (Blackboard) experience isn’t impacted? Use a DB to store and manage the queue? What about security? Has anyone else out there already done this?
Project Blackbird +
Condor to the rescue? Job scheduler? Check Multi-core capable? Check Manage the queue? Check Performance monitoring? Check Security? Check Has anyone done this before? No
Steps in the weekly archive process Determine what to archive (active courses, orgs) Build a course list Create Blackbird submit files Submit DAGMan job to Condor Monitor Condor queue Receive notification when all courses have been archived Look for errors and verify archive integrity
Custom Condor Configuration DAGMAN_MAX_JOBS_IDLE = 25 DAGMAN_MAX_JOBS_SUBMITTED = 50 SLOTS_CONNECTED_TO_CONSOLE = 0 SLOTS_CONNECTED_TO_KEYBOARD = 0 ## Force Condor to use Blackboard Private Network NETWORK_INTERFACE = Private Blackboard Net
DAGMan example JOB UniqueCourseID /path/to/condor/submit/job/file/UniqueCourseID.bbCondor JOB UniqueCourseID2 /path/to/condor/submit/job/file/UniqueCourseID2.bbCondor JOB UniqueCourseID3 /path/to/condor/submit/job/file/UniqueCourseID3.bbCondor JOB UniqueCourseID4 /path/to/condor/submit/job/file/UniqueCourseID4.bbCondor JOB UniqueCourseID5 /path/to/condor/submit/job/file/UniqueCourseID5.bbCondor JOB UniqueCourseID6 /path/to/condor/submit/job/file/UniqueCourseID6.bbCondor SCRIPT POST UniqueCourseID6 /usr/local/CMSIntegration/bin/weeklyArchiveChecker.pl
Condor Submit example universe = vanilla requirements = (OpSys=="LINUX") && ((Arch=="INTEL") || (Arch=="X86_64")) executable = /usr/local/bin/condorSubmitArchive.pl arguments = shoover-S0000BKBRD_401001,/san/weeklyArchives/ / getenv = True log = /usr/local/logs/bbCondorLogs/archive log notification = Error notify_user = transfer_executable = False when_to_transfer_output = ON_EXIT queue 1
Blackbird archive solution
The Archive Problem
Blackbird archive solution
Blackbird Benefits Reduced total archive time from > 85 hrs to < 24 hrs Job scheduling – servers finish about the same time Zero impact to Blackboard Performance Automatic suspension/resumption of archives if Load reaches threshold on any core notification upon completion of all archives Load balancing – archive jobs are distributed as cores become available Takes advantage of all available CPU cores instead of just one core per server
Project Blackbird +
Blackbird Benefits
What did it take to implement? Have one or more multi-core (CPU) machines A large amount of shared storage for archives Choose one machine as your Central Manager Install and configure Condor on each machine Automate course list creation (Query DB or Directory) Automate Condor submit files and Condor DAGMan file creation Automate the whole thing with cron Check log files for errors upon archive completion
Where else could I use this? Any system that does batch processing that can be broken up into many jobs Recently implemented on our MySQL server to export all of the MySQL databases Reduced the export time from 10 hours to 3.5 hours on a single, quad core machine
Recent updates 64 Bit Red Hat 5.4 OS and JVM 1.6 Maximum (affordable) RAM per machine – 32 GB Web page to view Blackbird Condor Pool status Duplicate archives Error checking logs Redo any courses with errors or not completed Major Blackboard upgrade from 7.3 to 9.1 end of June
What’s next? New machines have 2 x Quad Core CPUs with HyperThreading so Condor sees 16 Cores Add out of warranty machines to the Blackboard Condor Pool (keep users off of them) Monitoring of queue (web page) Use ClassAds to specify architecture and memory requirements for large archive jobs Write code to query DB and find out what courses have changed, backup any course that has changed on a daily basis Automate installation and configuration
Please provide feedback for this session by ing The subject of the should be title of this session: [Blackbird: Accelerated Course Archives Using Condor with Blackboard]