Download presentation
Presentation is loading. Please wait.
Published byErick Newton Modified over 9 years ago
1
Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit
2
2 Introduction Distributed supercomputing -Parallel applications on geographically distributed computing system (computational grid) -Examples: SETI@home, RSA-155 Programming support -Language-neutral systems: Legion, Globus -Language-centric: Java Goal: study wide-area parallel computing in Java -Programming model: Remote Method Invocation
3
3 Outline Wide-area parallel computing Java Remote Method Invocation (RMI) Performance of JDK RMI The Manta high-performance Java system Wide-area parallel Java applications using RMI Application performance
4
4 Wide-area parallel computing Challenge -Tolerating poor latency and bandwidth of WANs Basic assumption: wide-area system is hierarchical -Connect clusters, not individual workstations -Most links are fast General approach -Optimize applications to exploit hierarchical structure most communication is local
5
5 Distributed ASCI Supercomputer VU (128)UvA (24) Leiden (24)Delft (24) 6 Mb/s ATM Node configuration 200 MHz Pentium Pro 64-128 MB memory 2.5 GB local disks Myrinet LAN Redhat Linux 2.0.36
6
6 Java Growing interest in Java for parallel applications -Java Grande forum Parallel programming support in Java -Shared memory : multithreading -Distributed memory: Remote Method Invocation Study suitability of Java RMI for (wide-area) parallel programming -Optimizing performance of local RMI [PPoPP’99] -Wide-area parallel programming using RMI [JavaGrande’99]
7
7 RMI (1) Flexible object-oriented RPC-like primitive -Easy interoperability between Java Virtual Machines -Polymorphism dynamic bytecode loading void species(Animal x) throws … { System.out.println(“Species “ + x.name()); } o.species(new Orca()); “Species orca” o.species(new Panda()); “Species panda” o.species(new Manta()); “Species manta” Animal Orca Panda Manta
8
8 RMI (2) Designed for client-server applications Automatic serialization (marshalling) Normally used in a high latency environment -E.g. Internet Is RMI fast enough for parallel programming ?
9
9 JDK RMI Performance ( 200 MHz Pentium Pro, JDK 1.1.4 )
10
10 Why is JDK RMI slow ? Serialization uses run-time type inspection Protocol overhead (class information) Thread creation for incoming calls TCP/IP Most code is written in Java
11
11 The Manta system Designed for high-performance computing Native (static) compilation -Source executable New fast RMI protocol between Manta nodes Support (polymorphic) RMIs with JVMs Implemented on wide-area DAS system
12
12 JDK versus Manta 200 MHz Pentium Pro, Myrinet, JDK 1.1.4 interpreter, 1 object as parameter
13
13 Manta serialization class Test implements Serializable { int i; double d; Object o; } MantaJDK void PackageClass__Test(…) { WRITE_INT( type_id ); WRITE_INT( i ); WRITE_DOUBLE( d ); WRITE_OBJECT( o ); } Java Source
14
14 RMI protocol Light-weight RMI protocol -Send minimal type information Avoid thread creation -Simple nonblocking methods executed directly Avoid interrupts -Poll network when processor is idle Everything is written in C
15
15 Communication software Panda user space RPC protocol LFC Myrinet control program -Similar to active messages -Implemented partly on Myrinet network interfaces -Myrinet network interfaces mapped in user space Manta RMI Panda RPC LFCUDP EthernetMyrinet TCP ATM
16
16 Interoperability with JVMs Manta RMI protocol incompatible with JDK -Use fast RMI between Manta nodes -Use JDK-compliant protocol with JVMs Polymorphic RMI requires exchanging bytecodes -Also generate bytecodes when compiling a program -Dynamically compile and link bytecodes into running program
17
17 Null-RMI latency
18
18 RMI Throughput
19
19 Outline Wide-area parallel computing Java Remote Method Invocation (RMI) Performance of JDK RMI The Manta high-performance Java system Wide-area parallel Java applications using RMI Application performance
20
20 2 orders of magnitude between intra-cluster (LAN) and inter-cluster (WAN) communication performance Manta exposes hierarchical structure to application -Applications are optimized to reduce WAN-overhead Manta on wide-area DAS
21
21 Wide-area programming Problem: how to tolerate difference between LAN and WAN performance Wide-area system is structured hierarchically -Most links are fast Approach: application-level optimizations that exploit the hierarchical structure -Reduce wide-area communication
22
22 Application experience Parallel applications -Successive overrelaxation (SOR) -All-pairs shortest paths problem (ASP) -Traveling salesperson problem (TSP) -Iterative Deepening A* (IDA*) Measurements on wide-area DAS -1-4 clusters with 16 nodes -Comparison with single 64-node cluster
23
23 Successive Overrelaxation Red/black SOR -Neighbor communication, using RMI Problem: nodes at cluster-boundaries -Overlap wide-area communication with computation -RMI is synchronous use multithreading Cluster 1Cluster 2 CPU 3CPU 2CPU 1CPU 6CPU 5CPU 4 40 5600 µsec µs
24
24 All-pairs shortest paths Broadcast at beginning of each iteration Problem: broadcasting over wide-area links -Lack of broadcast in Java -> use spanning tree -Use coordinator node per cluster -Do asynchronous send to all remote coordinators -Implemented using threads Cluster123
25
25 Traveling salesperson problem Replicated-worker style parallel search algorithm Problem: work distribution -Central job-queue has high overhead -Statically distribute jobs over clusters -Use centralized job-queue per cluster -Easy to express using RMI 1 2 3
26
26 Iterative Deepening A* Parallel search algorithm using work stealing Problem: inter-cluster work stealing Optimization: first look for work in local cluster -Easy to express using RMI Cluster12
27
27 Performance Wide-area DAS system: 4 clusters of 16 CPUs Comparison with single 16-node and 64-node cluster
28
28 Fast RMI possible through -Compiler-generated serialization, light-weight communication & RMI protocols Optimized wide-area applications are efficient -Reduce wide-area communication, or hide its latency Java RMI is easy to use, but some optimizations are awkward to express -No asynchronous communication, collective comm. Programming systems should take hierarchical structure of wide-area systems into account Conclusions http://www.cs.vu.nl/manta
29
29 Performance breakdown Manta ( Fast Ethernet )
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.