Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gluepy: A Framework for Flexible Programming in Complex Grid Environments Ken Hironaka Hideo Saito Kei Takahashi Kenjiro Taura (University of Tokyo) {kenny,

Similar presentations


Presentation on theme: "Gluepy: A Framework for Flexible Programming in Complex Grid Environments Ken Hironaka Hideo Saito Kei Takahashi Kenjiro Taura (University of Tokyo) {kenny,"— Presentation transcript:

1 gluepy: A Framework for Flexible Programming in Complex Grid Environments Ken Hironaka Hideo Saito Kei Takahashi Kenjiro Taura (University of Tokyo) {kenny, h_saito, kay, tau}@logos.ic.i.u-tokyo.ac.jptau}@logos.ic.i.u-tokyo.ac.jp Package available from Home Page: www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy Overview Grid-enabled distributed object oriented programming model Distributed object model with implicit mutual exclusion Programming model that allows join/failure of nodes Incorporate NAT/firewalled clusters by using overlay gluepy : “glue Python” Distributed object oriented library extension for Python Implements our proposed programming model for flexible Grid computing Real Grid Applications on real Grid Environments Over 900 real nodes across 9 clusters Heterogeneous Network Settings (including NAT, firewalls) Related Works Grid-enabled Programming Models Satin [Wrzesinska et al. 2006], Jojo [Nakada et al. 2004], Jojo2 [Aoki et al. 2006] Distributed Objects on the Grid ProActive [Huet et al. 2004], Ibis RMI [van Nieuwpoort, et al. 2005] Wide-area Connection Management SmartSockets [Maassen et al. 2007], MC-MPI [Saito et al. 2007] Programming Model Asynchronous RMIs (Remote Method Invocations) with Futures any invocation may be made asynchronous returns a future, a place holder in which results will be returned Serialization Semantics (Synchronization) At most 1 running thread per object RMIs are handled by a separate thread At any given time, at most 1 thread can execute an object’s method: the owner thread (eliminate race-conditions) If a thread blocks while in the method’s scope, other threads are permitted to execute methods on the object (eliminate deadlocks for common usage) Signals to Object Signals may be sent to objects Any thread blocking in the object’s context will unblock and return None Runtime Node Joins Need to obtaining reference to existing objects A fully decentralized remote object lookup scheme Query for remote reference via random walking among peers Node failure (RMI failure) detection RMI failures are returned as Exceptions Failure of object host process Failure of communication or intermediate processes Automatic Overlay Construction on Grid Construction Scheme: Steps for each peer obtain endpoint information to other peers attempt TCP connections to a selected few peers NAT-Cluster Peers Connectable to global IP peers Firewall-Cluster Peers Automatic SSH-portforwarding Adaptive routing on overlay [Perkins et al. 1997] Failure Detection on Overlay communication path is maintained for each RMI Intermediate peers remember the next peer: Path Pointer Path pointer garbage collected on return On failure of connection, error is returned along path Evaluation Results Experimental Environment hongo(98) chiba(186) okubo(28) suzuk(72) imade(60) kototoi (88) kyoto(70) istbs(316) tsubame(64) Global IPs Firewall Private IPs All packets dropped NAT Firewall Global IP Attempt connection SSH Firewall traversal Overlay Connectivity Simulation Probability of connected random graph 3 Cluster Combinations hongo, chiba, okubo, suzuk, imade, kyoto, kototoi (4 Global clusters (384 peers), 3 Private clusters (218 peers) ) okubo, suzuk, imade, kyoto, kototoi (2 Global clusters (100 peers), 3 Private clusters (218 peers) ) okubo, imade, kyoto, kototoi (1Global clusters (28 peers), 3 Private clusters (218 peers) ) Master-Worker application with node joins/failures A Simple Master Worker Program that distributes tasks to workers New tasks to new workers via async. RMIs Tasks given to failed workers are redistributed By catching and handling RMI failure exceptions Th object owner thread waiting threads Th object new owner thread Give-up Owner ship block Grid Application: Parallel Permutation Flowshop Solver A combination optimization problem Given a sequence of n jobs that use m machines, find a permutation of jobs that give the shortest makespan Path pointer RMI handler failure return error Master Worker doJob() exchange_bound() Finds the optimal solution by parallel branch and bound Master divides the search space into sub-tasks to workers Worker periodically exchange latest bounds with master class Master : def __init__(self): self.nodes = [] self.jobs = [] def nodeJoin(self, node): self.nodes.append(node) self.signal() def run (self): assigned = {} while True: while len(self.nodes)>0 and len(self.jobs)>0: node = self.nodes.pop() job = self.jobs.pop() f = node.doJob.future(job) assigned[f] = (node, job) readys = wait(assigned.keys()) if readys == None: continue for f in readys: node, job = assigned.pop(f) try: print ”done:”, f.get() self.nodes.append(node) except RemoteException, e: self.jobs.append(job) aync. RMI, doJob() to idle workers Block and wait for some results retrieve results Exception raised on failure Future Work Application to much wider range of Grid Applications Development of library package A prototype package is available at Home Page!! Signal thread blocking in master object None returns when unblocked by signal Atomic Section Th object Unblock On signal re-contest for ownership Example Master-Worker Excerpt


Download ppt "Gluepy: A Framework for Flexible Programming in Complex Grid Environments Ken Hironaka Hideo Saito Kei Takahashi Kenjiro Taura (University of Tokyo) {kenny,"

Similar presentations


Ads by Google