Download presentation
Presentation is loading. Please wait.
Published byBryce Harvey Modified over 9 years ago
1
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management in a Cluster Environment
2
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 2 / 20 Introduction (1/2) Supercomputer High performance processor / high network bandwidth Expensive system but Beowulf system is cost-effective Motivation Focus on Cluster system Cluster Management system Manual method / add-on method / integrated method Registry Central repository of information about all aspects of the computer
3
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 3 / 20 Introduction (2/2) Challenge Integrated method has low availability and reliability Can’t manage computation nodes separately When failure occurs, system can’t be rejuvenated Goal ( using Registry ) Improve availability and reliability of integrated method Administrator can manage a cluster system easily Restore cluster system with a backup snapshot
4
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 4 / 20 Supercomputer Domestic Supercomputer Quantity : 14 Cluster : 4 MPP : 4 Constellation : 6 ※ SNU : 2 (51/413) 60.8%
5
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 5 / 20 Cluster Management System Manual approach System administrator brings up entire system manually Add-on method Bring up a frontend node, then add cluster packages OSCAR / Warewulf / OpenMosix Integrated method Cluster packages are installed and configured during the initial installation Rocks / Scyld
6
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 6 / 20 Cluster Management System Software Stack Linux Kernel Linux Environment HPC Device Drivers Job Scheduling and Launching Cluster software management Cluster State management / Monitoring Message passing / communication Layer Parallel code / Grid / computer lab … OS (Linux) SGE Application HPC
7
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 7 / 20 Rocks Overview Identity System to build and manage a Linux Cluster Free : Open source project Goal Make clusters easy Philosophy Computation nodes are 100% automatically installed Roll : set of packages Graph / Kickstart Run on heterogeneous system architecture Doesn’t attempt to incrementally update software
8
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 8 / 20 Rocks system Architecture Front-end node node Local Network eth1 eth0 internet
9
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 9 / 20 What is Registry ? Central repository of info about all aspects of the computer Hardware, OS, applications, users information Function Retrieve system information Update / add / delete software Backup & restore system Advantage Easier for applications to access system Storing large amounts of structured data (system info)
10
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 10 / 20 Registry Design ID (primary key) Name Membership CPUs Rack Rank Comment Nodes ID (primary key) Node MAC IP Gateway Name Device Module Network ID (primary key) Node Name Version Release Install Package ID (primary key) Node Name Aliases ID (primary key) Name Appliance Distribution Memberships ID (primary key) Name Graph Node Appliances ID (primary key) Name Release Lang Distribution Original Relational Schema Appended Relation H/W information S/W information
11
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 11 / 20 Strategy of management Rocks Setup Minimum modification Take advantage of original Rocks system Deploy cluster system easily Modify related source codes insert-ethers, kickstart.cgi, Kpp, Kgen, Rgen Running System Apply package modification Package management program : add / update / delete packages DB consistency management program
12
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 12 / 20 Collection Method Rgen Registry variables Package variables Appended component
13
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 13 / 20 Modification Method Insert command Packages table Package name / version / release Instruction : Add / update / delete add –c=compute-0-0 –i=amanda-2.4.5-2.i386 add –c=all –i=all del -c=compute-0-0 –i=amanda-2.4.5-2.i386 del -c=all -i=all Packages table Add / delete / update Compute Nodes
14
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 14 / 20 Registry consistency Setup time When frontend node removes / updates computation node Dependency : change node table → change package table Modify Kickstart.cgi / kgen Apply cascading tables change ※ mysql not support transaction property Running system Package install / delete / update Compute node rpm information = frontend node’s registry DB
15
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 15 / 20 Experiment Setup Public Ethernet Frontend node Compute nodes (14) Rocks.snu.ac.kr CPU 800Mhz RAM 768MB HDD 40G Compute-0-(1~14) CPU 850Mhz RAM 1G HDD 10G 468KB 117MB capacity 3 53 volume amanda HPC name Experiment Data 1.5GB479Rocks roll
16
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 16 / 20 Original Rocks Evaluation average service time : 18min 14secaverage transmit time : 11min 28sec Network card DHCP request
17
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 17 / 20 Amanda Packages Evaluation average install time : 6.62 secAverage delete time : 5.57sec
18
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 18 / 20 HPC Roll Evaluation average install time : 3min 38secaverage delete time : 1min 18sec
19
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 19 / 20 Conclusion Registry takes advantage of cluster system Improve availability and reliability using Registry Administrator can manage cluster systems easily Restore cluster systems with backup snapshots
20
So, Jung-ki (SNU DCS Lab) Introduction Related Work Design Evaluation Conclusion 20 / 20 Q & A Questions or Comments ? Thank you !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.