Download presentation
Presentation is loading. Please wait.
1
- Eddy Caron
2
E. Caron - Réunion de lancement LEGO - 10/02/06
Lego Team from GRAAL Anne Benoît (McF) Eddy Caron (McF) Frédéric Desprez (DR) Yves Caniou (McF) Raphaël Bolze (PhD) Pushpinder Kaur Chouhan (PhD) Jean-Sébastien Gay (PhD) Cedric Tedeschi (PhD) E. Caron - Réunion de lancement LEGO - 10/02/06
3
E. Caron - Réunion de lancement LEGO - 10/02/06
Lego Team from GRAAL Anne Benoît (McF) Eddy Caron (McF) Frédéric Desprez (DR) Yves Caniou (McF) Raphaël Bolze (PhD) Pushpinder Kaur Chouhan (PhD) Jean-Sébastien Gay (PhD) Cedric Tedeschi (PhD) E. Caron - Réunion de lancement LEGO - 10/02/06
4
E. Caron - Réunion de lancement LEGO - 10/02/06
DIET Architecture Client Master Agent MA JXTA MA Server front end LA LA FAST library Application Modeling System availabilities LDAP NWS LA LA Local Agent E. Caron - Réunion de lancement LEGO - 10/02/06
5
Data Management Join work with G. Antoniu, E. Caron, B. Del Fabbro, M. Jan
6
Data/replica management
Two needs Keep the data in place to reduce the overhead of communications between clients and servers Replicate data whenever possible Two approaches for DIET DTM (LIFC, Besançon) Hierarchy similar to the DIET’s one Distributed data manager Redistribution between servers JuxMem (Paris, Rennes) P2P data cache NetSolve IBP (Internet Backplane Protocol) : data cache Request Sequencing to find data dependences Work done within the GridRPC Working Group (GGF) Relations with workflow management Client A F G Y Server 1 Server 2 X B E. Caron - Réunion de lancement LEGO - 10/02/06
7
E. Caron - Réunion de lancement LEGO - 10/02/06
Data management with DTM within DIET Persistence at the server level To avoid useless data transfers Intermediate results (C, D) Between clients and servers Between servers “transparent” for the client Data Manager/Loc Manager Hierarchy mapped on the DIET one modularity Proposition to the Grid-RPC WG (GGF) Data handles Persistence flag Data management functions E. Caron - Réunion de lancement LEGO - 10/02/06
8
E. Caron - Réunion de lancement LEGO - 10/02/06
Performances Performance (A = C * B) E. Caron - Réunion de lancement LEGO - 10/02/06
9
E. Caron - Réunion de lancement LEGO - 10/02/06
Performances Performance (C = A * B; D = E + C; A =tA) Performances E. Caron - Réunion de lancement LEGO - 10/02/06
10
E. Caron - Réunion de lancement LEGO - 10/02/06
JUXMEM PARIS project, IRISA, France A peer-to-peer architecture for a data-sharing service in memory Persistence and data coherency mechanism Transparent data localization Toolbox for the development of P2P applications Set of protocols One peer Unique ID Several communication protocols (TCP, HTTP, …) Peer ID Peer ID Peer ID Peer ID Peer ID Peer ID Peer ID Peer ID Peer Peer Peer Peer Peer Peer Peer Peer Peer TCP/IP Firewall Firewall Peer Peer Peer Peer Peer HTTP E. Caron - Réunion de lancement LEGO - 10/02/06
11
Visualization Work with Raphaël Bolze
12
VizDIET: A visualization tool
Current view of the DIET platform A postmortem analysis from log files is available Good scalability We can show : Communication between agents State of SeD Available Services Persistent Data Name information CPU, memory and network load. E. Caron - Réunion de lancement LEGO - 10/02/06
13
E. Caron - Réunion de lancement LEGO - 10/02/06
LogService CORBA communications Messages ordering and scheduling Messages filtering System state E. Caron - Réunion de lancement LEGO - 10/02/06
14
E. Caron - Réunion de lancement LEGO - 10/02/06
LogService & DIET LogService Componant LogManager (LM) LogCentral Each LogManager receives information from agent and send them to LogCentral out of DIET structure. VizDiet shows graphicaly all messages from LogService Message transfert from agent using LogManager No disc storage E. Caron - Réunion de lancement LEGO - 10/02/06
15
E. Caron - Réunion de lancement LEGO - 10/02/06
VizDIET v1.0 XML: - DIET Agents - DIET Servers - Physical Machines - Physical Storage VizDIET Distributed DIET Deployment LogService GoDIET E. Caron - Réunion de lancement LEGO - 10/02/06
16
Screenshot : Platform Visualization
E. Caron - Réunion de lancement LEGO - 10/02/06
17
Screenshots: Statistic module
E. Caron - Réunion de lancement LEGO - 10/02/06
18
Platform Deployment Work from E. Caron, P.-K. Chouhan and A. Legrand
19
GoDIET: A tool for automated DIET deployment
Automate configuration, staging, execution and management of distributed DIET platform Support experiments at large scale Faster and easier bulk testing Reduce errors & debugging time for users Constraints: Simple XML file Console & batch mode Integrate w/ visualization tools and CORBA tools [wrote in Java] E. Caron - Réunion de lancement LEGO - 10/02/06
20
DIET usage with contrib services
Déploiement distribué de DIET Administration de DIET Traces XML GoDIET Sous-ensemble de traces LogService Sous-ensemble de traces VizDIET E. Caron - Réunion de lancement LEGO - 10/02/06
21
E. Caron - Réunion de lancement LEGO - 10/02/06
Launch process GoDIET follows DIET hierarchy in launch order For each element to be launched: Configuration file written local disk [including parent agent, naming service location, hostname and/or port endpoint…] Configuration file staged remote disk (scp) Remote command launched (ssh) [PID retrieved, stdout & stderr saved on request] Feedback from LogCentral used to time launch of next element E. Caron - Réunion de lancement LEGO - 10/02/06
22
E. Caron - Réunion de lancement LEGO - 10/02/06
GoDIET Console java -jar GoDIET.jar vthd4site.xml E. Caron - Réunion de lancement LEGO - 10/02/06
23
E. Caron - Réunion de lancement LEGO - 10/02/06
GoDIET: before launch E. Caron - Réunion de lancement LEGO - 10/02/06
24
E. Caron - Réunion de lancement LEGO - 10/02/06
GoDIET: after launch 27 sec launch w/ waiting for feedback E. Caron - Réunion de lancement LEGO - 10/02/06
25
E. Caron - Réunion de lancement LEGO - 10/02/06
Grid’5000 DIET deployment 7 sites / 8 clusters Bordeaux, Lille, Lyon, Orsay, Rennes, Sophia, Toulouse 1 MA 8 LA 574 SeD E. Caron - Réunion de lancement LEGO - 10/02/06
26
Scheduling Work with Alan Su, Peter Frauenkron, Eric Boix
27
E. Caron - Réunion de lancement LEGO - 10/02/06
The scheduling Plug-in scheduler Round robin as default scheduling Advanced scheduling only possible with more information. Existing schedulers in DIET use data of FAST and/or NWS. Limitations: deployment of appropriate hierarchies for a given grid platform is non-obvious limited consideration of inter-task factors non-standard application- and platform-specific performance measures FAST,NWS : low availability, SeD idles, for NWS no default weighting difficult (possible?). E. Caron - Réunion de lancement LEGO - 10/02/06
28
Plugin Scheduling Component Before After SeD Agents Client
Plugin scheduling facilities to enable application-specific definitions of appropriate performance metrics an extensible measurement system tunable comparison/aggregation routines for scheduling composite requirements enables various selection methods basic resource availability processor speed, memory database contention future requests Component Before After SeD automatic performance estimate (FAST/NWS) chosen/defined by application programmer Agents exec. time sorting “menu” of aggregation methods Client CLIENT CODE UNCHANGED E. Caron - Réunion de lancement LEGO - 10/02/06
29
E. Caron - Réunion de lancement LEGO - 10/02/06
CoRI Collector: an easy interface to gathering performance and load about a specific SeD. Two modules (currently): CoRI-Easy and FAST Possible to extend (new modules): Ganglia, Nagios, R-GMA, Hawkeye, INCA, MDS, … CoRI CoRI - Easy FAST other R-GMA, Hawkeye, INCA, MDS, … Using fast and basic functions or simple performance tests. Keep the independence of DIET. Able to run on “all” operating systems to allow a default scheduling with basic information. E. Caron - Réunion de lancement LEGO - 10/02/06
30
Batch and parallel submissions
Work with Yves Caniou
31
Difficulties of the problem
Several SeD types Parallel or sequential jobs Submit a parallel job (pdgemm,...) Transparent for the user General API agent agent SeD_seq SeD_batch SeD_parallel
32
SeD_parallel Frontal NFS SeD_parallel on the frontal
Submit a parallel job → system dependant NFS: copy the code ? MPI: LAM, MPICH ? Reservation ? Monitoring & Perf. prediction agent agent SeD_parallel Frontal NFS
33
SeD_batch GLUE OAR SGE LSF PBS Condor Loadleveler
SeD_batch on the frontal Submit a parallel job → even more system dependent Previous mentioned problems Numerous batch systems → homogenization ? Batch sched. behavior → queues, scripts, etc. agent GLUE SeD_batch OAR SGE LSF PBS Condor Loadleveler
34
Batch & parallel submissions
Asynchronous, long term production jobs Still more problems System dependent, numerous batch systems and their behavior Performance prediction ! → Application makespan in function of #proc? → If reservation available, how to compute deadline? Scheduling problems → Do we reserve when probing? How long hold it? → How to manage data transfers when waiting in the queue? Co-scheduling? Data & job migration?
35
Future Work
36
E. Caron - Réunion de lancement LEGO - 10/02/06
Future work LEGO applications with DIET CRAL (RAMSES) CERFACS TLSE (Update) Components and DIET Which architecture ? Deployment Link between ADAGE and theoretical solution on cluster [IJHPCA06] ? Anne Benoît approach … E. Caron - Réunion de lancement LEGO - 10/02/06
37
Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.