Download presentation
Presentation is loading. Please wait.
Published byLenard Gordon Modified over 9 years ago
1
MDTM Implementation Design Liang Zhang, Wenji Wu 11/11/2013
2
Shared Memory MDTM Modules MDTM Daemon – Acquiring and publishing system information – Communicating with MDTM consoles and App. – Scheduling and binding application threads MDTM API – Interfacing the MDTM consoles and Apps. – Communicating with MDTM Daemon – Requesting and reading system information MDTM Console – Facilitating customers to access system information and status – Monitoring and development utility 2 OS (Linux) App. MDTM Daemon MDTM Console MDTM API Middleware
3
M2M Interaction: Intra Process The Module-to-Module(M2M) interaction between MDTM Console/App. and MDTM APIs is inside the same process of App. via C- style function calls. 3
4
M2M Interaction: IPC The interaction between MDTM API and Daemon modules is IPC. We choose shared memory(SHM) semantics due to its high speed. Data duo to their special natures can be accessed in two patterns: – Publish-Read Global data like system topology information either static or dynamic are visible to all users; They are published in predefined areas in SHM, called Bulletin SHM, e.g. “System Layout”, “Core Workload”. Users interested in those data just read the same object (saving the system resources), without bothering the publisher (less process context switching overhead), without synchronizing between readers (fast speed). The synchronization between publisher and readers are still in need but can be implemented with lock-free algorithm to reduce the performance impact. – Message Queuing Private data like query-response exclusively involves the individual API and Daemon; They are encapsulated in messages and use queues defined on SHM to communicate. Lock-free algorithms are needed. 4
5
Shared Memory System Layout MDMT IPC: Publish-Read The data can be static like System Layout, which is published once and the APIs can retrieve it by calling the synchronous read function. The data can be dynamic like Core Workload, which is published periodically. Our implementation provide two ways to handling this case: polling and async reading: – Polling: use synchronous read function many times in case of data changes. – Asynchronous Reading: register the event of data change; upon event occur, calling callback function to invoke a read. 5 App. Core workload MDTM Daemon Publish once Sync Reading Polling or Async Reading Publish dynamically
6
MDTM IPC: Message Queuing Each MDTM API instance share the common queue to send messages to MDTM Daemon; it own individual queue to receive messages from MDTM Daemon Lock-free algorithm to be implemented for the shared queue “MDTM Messaging Protocol (MMP)” defines internal messages used between APIs and Daemon 6 Example Scenario: ①App. #0 put Query to the common queue ②Daemon read Query message from the common queue ③Daemon process Query message ④Daemon put Response message to Queue for App. #0 ⑤App. #0 reads Response Shared Memory App. #0 App. #1 MDTM Daemon Common queue Queue for App. #0 Queue for App. #1 1 1 2 2 3 3 4 4 5 5
7
MDTM Daemon: Acquiring System Info Hardware Topology – System calls – 3 rd party libraries like libpci. Core Workload Detection Intensive Threads Detection 7
8
MDTM Daemon: Scheduling Policy – Local affinity first “IO intensive threads allocated/reallocated to CPU cores local to the IO devices.” – Load balance “Workload evenly distributed across NUMA nodes and cores.” 8
9
Alg. #1: Shortest Distance First Use hash table to save a list of reachable cores for each IO The core list is sorted by the physical distance between the IO and Cores The element of the core list refers to core descriptor which contain the current working load information The scheduler pick the first core in the list; if its working load is under the watermark, job is done; otherwise move to the next core; if every core is busy, choose either the first core or least busy core Pros and Cons – Pros: Simple data structure, fast – Cons: Distance and workload only; not considering intermediate devices. 9 Keys Hash Function Sorted List of Cores NIC 0 Core core status (cpu,…) Core Descriptors Disk 0 1 3 2 4 ①Search for core list ②Find the core with shortest distance ③Check the workload of core is under watermark, if yes, pick the current core ④If no, move to the next and repeat previous step Core Core status (cpu….) Core status (cpu…)
10
Alg. #2: Lowest Cost First Each connection associated with a cost value which reflects scheduling factors like distance, traffic throughput and etc. Each node contains a cost table to its neighbors All CPU cores are considered to be one single node but the connections from it represent different cores Applying Dijkstra’s Algorithm to find the lowest cost path from CPU node to the NIC/Disk node in question pick up the core associated to the lowest cost path Pros and Cons more extensive system picture; scalable; dynamic; more complicated data structure 10 CPUs/Core s PCI Hubs/Bridg es… NICs, Disks Connection between devices Lowest cost path for the pair (CPU, NIC/Disk) Devices
11
MDTM API: C-style API Functions typedef struct { int id; int type; int src_ip; int dst_ip; int storage; } thread_desc_t; struct sysinfo; typedef void (*sysinfo_handler)(struct sysinfo *info); struct coreinfo; typedef void (*coreinfo_handler)(struct coreinfo *info); 11 int mdtm_init(); int mdtm_schedule_threads(const thread_desc_t thread_descs[], const int numofdescs); int mdtm_retrieve_sysinfo(struct system_info *info); int mdtm_retrieve_sysinfo_asynch(sysinf o_handler cbfunc, struct system_info *info); int mdtm_retrieve_coreinfo(struct coreinfo *info); int mdtm_retrieve_coreinfo_asynch(cor einfo_handler cbfunc, struct coreinfo *info); Visit our project webpage for the latest updates: https://web.fnal.gov/project/mdtm/ Visit our project webpage for the latest updates: https://web.fnal.gov/project/mdtm/
12
MDTM API: Scheduling (optional) It is optional to run the scheduling algorithms within the API in the user application space. The performance might benefit from reduced IPC between Applications and MDTM daemon Comparison is needed for optimizing the performance 12
13
MDTM Console Monitoring and profiling Managing Development debugging Command line Web based (Bonus): remote control multiple machines from anywhere. 13
14
MDTM Package and Website API prototypes and data structures are included in mdtm.h. The library has both static and dynamic versions: libmdtm.lib and libmdtm.so. The daemon and console executables are mdtmd and mdtm. For more information, see https://web.fnal.gov/project/mdtm/ https://web.fnal.gov/project/mdtm/ 14
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.