CERN IT Department CH-1211 Genève 23 Switzerland t Multithreading in CASTOR Experiences from a real life application Giuseppe Lo Presti (IT/DM)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 2 Outline Castor: the CERN mass storage system –Facts & figures –Architectural overview A C++ framework for multithreading –Requirements and implementation –Some user code samples –The internals The framework in action Conclusions
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 3 Castor Overview Hierarchical Storage Manager –Disk cache + Tape archive Some facts & figures –In production for the Tier0 and a few Tier1s –CERN deployment: 650+ diskservers, 2.5+ PB of managed disk cache, 12+ PB on tape, several GB/s sustained I/O Disk cache network traffic during May CCRC:
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 4 Castor Architecture Overview Database centric –Stateless redundant software components –State stored in a central database for scalability and fault resiliency purposes Technology choices –A number of multithreaded daemons perform all needed tasks to serve user requests –Each operation is reflected in the database => tasks are inherently I/O bound or better “latency bound” Dominated by db/network latency –Concurrency issues resolved in the database by using locks
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 5 Outline Castor: the CERN mass storage system –Facts & figures –Architectural overview A C++ framework for multithreading –Requirements and implementation –Some user code samples –The internals The framework in action Conclusions
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 6 High Level Requirements Multithreading to achieve better overall throughput in terms of #requests/sec –System inherently superlinear because of I/O bound tasks Need for supporting thread pools –Each one dedicated to a different task Lightweight multithreading infrastructure –Limit memory footprint of the daemons Seamless integration with C++ Very limited issues with synchronization and data sharing across different threads –Context data is always in the db –Each thread deals with a different request: typical case of embarassing parallelism
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 7 A Framework for Multithreading Choices –Usage of Linux POSIX threads –C++ package to hide pthreads complexity and provide a Java-like interface IThread abstract class (cf. Java Runnable interface) Specialized thread pools to implement different functionalities (e.g. requests handling) Very high reusability across all software components –Ability to have thread-safe and thread-shared variables –Daemon mode with embedded signal handling Support for graceful stop and restart of daemons
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 8 Framework Implementation Usage of an existing OS abstraction layer: the Cthread API –Replicates all pthread API, and additionally provides thread-safe global variables –One of the most mature (read old…) parts in the Castor codebase, shared by different projects in IT C++ code –Clean interface for the user: generic methods to compose daemons out of user classes –Cthread / pthread / system calls are kept hidden from user code, but still accessible for special cases E.g. mutexes Typical use cases –Listening to a port and accepting requests –Polling the database for next operation to perform
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 9 Package Class Diagram Programmer’s interface
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 10 Main Classes Thread pools – ListenerThreadPool : generic socket connection dispatcher à-la Apache Specialized classes for TCP, UDP, … sockets – SignalThreadPool : pool manager for backend activities that need to run periodically or upon external signalling The signalling mechanism is based on condition variables – ForkedProcessPool : pool manager based on fork(), not on pthreads Classes for servers – BaseServer : basic generic server providing daemon mode (detach from shell) and logging initialization – BaseDaemon : more sophisticated base class for daemons, supporting system signal handling and any combinations of the implemented thread pools
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 11 Code Samples Excerpt from the Monitoring daemon’s main() –Different thread pools are mixed together –The start() method from BaseDaemon spawns all the requested threads RmMasterDaemon daemon;... // db threadpool daemon.addThreadPool(new castor::server::SignalThreadPool( "DatabaseActuator”, new DbActuatorThread( daemon.clusterStatus()), updateInterval)); daemon.getThreadPool('D')->setNbThreads(1); // update threadpool daemon.addThreadPool(new castor::server::UDPListenerThreadPool( "Update", new UpdateThread( daemon.clusterStatus()), listenPort));... // Start daemon daemon.parseCommandLine(argc, argv); daemon.start();
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 12 Code Samples User threads –As easy as inheriting from IThread: Typical pitfall: code is shared among all threads in each given pool –Mutex sections and synchronization to be explicitly implemented – no synchronized methods like in Java Consequence: class variables are thread-shared, only local variables are thread-safe –But you may need thread-safe singletons… Our solution (provided by Cthreads): for each thread-safe global variable, keep an hash map indexed by TID class UpdateThread : public castor::server::IThread { public: virtual void run(void *param) throw(); virtual void stop(); }
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 13 The Internals …So, where are the (p)threads? BaseThreadPool serves as basic infrastructure –A friend function _thread_run() is the thread entrypoint, which runs the user code –All specialized thread pools use this function when spawning threads void* castor::server::_thread_run(void* param) { struct threadArgs *args = (struct threadArgs*)param; castor::server::BaseThreadPool* pool = dynamic_cast (args->handler); // Executes the thread try { pool->m_thread->run(args->param); } catch(castor::exception::Exception any) { // error handling }
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 14 The Internals SignalThreadPool encapsulates pthread_create() calls and condition variables Threads wait until a condition variable gets notified, or after a timeout has passed – pthread_cond_wait() and pthread_cond_signal() –One (or more) thread in the pool is waken up and executes the user code –Pool keeps track of current # of busy threads void castor::server::SignalThreadPool::run() throw (...) {... // create pool of detached threads for (int i = 0; i < m_nbThreads; i++) { if (Cthread_create_detached( castor::server::_thread_run, args) >= 0) { ++n; // for later error handling }... }
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 15 The Internals BaseDaemon manages all threads and encapsulates the system signal handling –To avoid unpredictable behaviours, all threads need to be protected from signals via: pthread_sigmask(SIG_BLOCK, &signalSet, NULL) where signalSet includes all usual system signals –Yet another pthread performs the signal handling by looping on sigwait() –After spawning all user threads, the main thread waits for a notification from the dedicated signal handling thread, and broadcasts an appropriate message to all running threads E.g. on SIGTERM, all user threads’ stop() methods are called; after # of busy threads goes to 0, exit() is called.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 16 The Internals Additional facilities in the framework – BaseDbThread implements the IThread interface and provides a graceful termination of a thread-specific database connection upon stop() – Mutex wraps common pthread functions to handle mutexes on integer variables wait() and signal() methods provided Generic mutexes on variables of any type left to the user code
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 17 Outline Castor: the CERN mass storage system –Facts & figures –Architectural overview A C++ framework for multithreading –Requirements and implementation –Some user code samples –The internals The framework in action Conclusions
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 18 The Framework in Action Class Diagram from Castor doxygen documentation –Most Castor daemons inherit from BaseDaemon –They all support graceful stop, e.g.: DATE= HOST=lxb1952.cern.ch LVL=System FACILITY=Stager PID=11439 […] MESSAGE="GRACEFUL STOP [SIGTERM] - Shutting down the service" DATE= HOST=lxb1952.cern.ch LVL=System FACILITY=Stager PID=11439 […] MESSAGE="GRACEFUL STOP [SIGTERM] - Shut down successfully completed"
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 19 The Framework in Action Typical load on a node –8 cores run a total of ~90 threads, each owning a db connection, with a fraction of the total available CPU and memory resources even during high load peaks The stager daemon alone runs 53 threads –This is the current deployment of a production Castor instance! top - 16:17:53 up 115 days, 11:00, 4 users, load average: 1.06, 0.78, 0.59 Tasks: 173 total, 2 running, 171 sleeping, 0 stopped, 0 zombie Cpu(s): 6.5% us, 1.9% sy, 0.0% ni, 91.3% id, 0.0% wa, 0.0% hi, 0.3% si Mem: k total, k used, k free, k buffers Swap: k total, 220k used, k free, k cached PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR S COMMAND stage : m 32m 11m S migrator 3107 root : m 76m 5972 S dlfserver stage : m 32m 11m S migrator 3309 root : m 109m 9.8m S stagerDaemon 3315 root : m 109m 9.8m S stagerDaemon 3314 root : m 109m 9.8m S stagerDaemon 3238 root : m 29m 8380 S rhserver...
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Giuseppe Lo Presti, Multithreading in CASTOR - 20 Conclusions We have shown how the pthread API can be powerful enough to support many high level multithreaded tasks –But don’t forget that we started with an embarassing parallelism scenario… CASTOR service moved from 6 dual CPU nodes to one 8-cores node –No way out of multithreading I know, that’s become pretty obvious by now… Comments, questions?