Atlas: An Infrastructure for Global Computing

Slides:



Advertisements
Similar presentations
MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
U of Houston – Clear Lake
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki.
CILK: An Efficient Multithreaded Runtime System. People n Project at MIT & now at UT Austin –Bobby Blumofe (now UT Austin, Akamai) –Chris Joerg –Brad.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Job submission architectures in GRID environment Masamichi Ando M1 Student Taura Lab. Department of Information Science and Technology.
 Introduction Originally developed by Open Software Foundation (OSF), which is now called The Open Group ( Provides a set of tools and.
Name Services Jessie Crane CPSC 550. History ARPAnet – experimental computer network (late 1960s) hosts.txt – a file that contained all the information.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Contemporary Languages in Parallel Computing Raymond Hummel.
DDM - A Cache-Only Memory Architecture Erik Hagersten, Anders Landlin and Seif Haridi Presented by Narayanan Sundaram 03/31/2008 1CS258 - Parallel Computer.
Distributed Computing COEN 317 DC2: Naming, part 1.
Understand Active Directory Infrastructure
GrIDS -- A Graph Based Intrusion Detection System For Large Networks Paper by S. Staniford-Chen et. al.
Microsoft Active Directory(AD) A presentation by Robert, Jasmine, Val and Scott IMT546 December 11, 2004.
Operating Systems Lecture 4. Agenda for Today Review of previous lecture Operating system structures Operating system design and implementation UNIX/Linux.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 07. Review Architectural Representation – Using UML – Using ADL.
DCE (distributed computing environment) DCE (distributed computing environment)
Personal Activity Coordinator Shelley Zhuang Computer Science Division U.C. Berkeley Ericsson Workshop August 2000.
Distributed Computing COEN 317 DC2: Naming, part 1.
Grid Computing I CONDOR.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
Virtual Workspaces Kate Keahey Argonne National Laboratory.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Distributed Object Frameworks DCE and CORBA. Distributed Computing Environment (DCE) Architecture proposed by OSF Goal: to standardize an open UNIX envt.
J ICOS’s Abstract Distributed Service Component Peter Cappello Computer Science Department UC Santa Barbara.
Introduction to Active Directory
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
1 Distributed Systems Architectures Distributed object architectures Reference: ©Ian Sommerville 2000 Software Engineering, 6th edition.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Apache Ignite Compute Grid Research Corey Pentasuglia.
Using volunteered resources for data-intensive computing and storage David Anderson Space Sciences Lab UC Berkeley 10 April 2012.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Computer System Structures
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Managing, Storing, and Executing DTS Packages
CILK: An Efficient Multithreaded Runtime System
Operating System Structures
JICOS A Java-Centric Distributed Computing Service
File System Implementation
Chapter 3: Process Concept
Process Management Presented By Aditya Gupta Assistant Professor
Software Design and Architecture
SUBMITTED BY: NAIMISHYA ATRI(7TH SEM) IT BRANCH
Hierarchical Architecture
PA an Coordinated Memory Caching for Parallel Jobs
Ivy Eva Wu.
Linchuan Chen, Xin Huo and Gagan Agrawal
EECS 582 Midterm Review Mosharaf Chowdhury EECS 582 – F16.
AGENT OS.
Chapter 17: Database System Architectures
Distributed P2P File System
Operating Systems Lecture 4.
Processes Chapter 3.
Hadoop Technopoints.
Presented by Neha Agrawal
Prof. Leonardo Mostarda University of Camerino
Processes Chapter 3.
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Processes Chapter 3.
Introduction To Distributed Systems
Database System Architectures
Department of Computer Science University of California, Santa Barbara
Preventing Privilege Escalation
Presentation transcript:

Atlas: An Infrastructure for Global Computing

People Eric Baldeschwieler (UC Berkeley) Bobby Blumofe (UT Austin) Eric Brewer (UC Berkeley)

Outline Introduction Programming model Architecture Examples Discussion Limitations & Conclusion

Introduction Properties of a Internet computing infrastructure Scalability: to 106 nodes Heterogeneity: of machines & OSs Fault tolerance: completion probability comparable to sequential program Adaptive parallelism: dynamic set of resources

Properties ... Safety: Hosts must be secure Anonymity: Secure privacy of client: data & program Hierarchy: Locality of communication (local bandwidth typically is higher) Ease of use: Minimize “costs” of participating. Reasonable performance: Low overhead  Benefit from a small set of machines.

Introduction ... Atlas combines mechanisms from: Java “ensures”: Cilk with new mechanisms. Java “ensures”: heterogeneity safety

Introduction ... Atlas: extends Cilk’s work-stealing scheduler to a hierarchical Internet setting uses Cilk-NOW’s mechanisms for: adaptive parallelism fault tolerance

Programming Model Applications are written in Java When a native library is used, heterogeneity is limited to platforms that support it. Programming model is: a Java-based implementation of Cilk: Non-blocking, explicit continuation passing threads a Unix-like URL-based file system & local caching with coherence.

Native libraries (C or C++) Architecture Basic architecture Compute Server Client Manager Application (Java) Runtime library Java interpreter Native libraries (C or C++) Compute Server Compute Server Compute Server

Architecture ... Client is a Java application connects to compute servers on machines other than its manager’s. Idle servers steal work from busy ones.

Architecture Compute server: relinquishes control when there is non-Atlas work (a screensaver?) Runs as a daemon: working pings manager & siblings for work to steal

Architecture: Porting Atlas A Java runtime system Port: natively written URL-based file system some support routines.

Hierarchical Work Stealing Manager Manager Manager Manager Manager Compute Server Compute Server Compute Server

Hierarchical Work Stealing ... Manager keeps track of when its subtree is idle If manager’s subtree is idle, manager steals work from its siblings If a subtree has “too much” work, it “allows” work stealing from above What is definition & implementation of “too much”?

Hierarchical Work Stealing The authors claim that proven properties of Cilk hold in this hierarchical setting. Goals: Localize communication Sub-trees map to domain hierarchy Administrators can control thread migration: Outflow: Privacy Inflow: Host security

Examples Fib: fine grained threads POV-Ray: coarse grained threads Base 1 Node 3 Nodes 8 Nodes Fib (24) 1.3 80 40 (2.0) 31 (2.6) POV-Ray 20700 21000 - 2700 (7.8) Numbers in ( ) are speedups over 1-node case.

Examples ... POV-Ray is not written in Java Partitioning is done in Java 8 nodes: only 2% overhead. What about larger P?

Discussion Scalable: Yes. Heterogeneity: Incomplete until divorces itself from all native libraries. Safety: Java: OK. Native libraries: ?

Discussion ... Fault tolerance: A timed out thread is recomputed from a checkpoint maintained by subtree (manager?) What is affect on performance of checkpointing? Subtree rooted at a thread is its subcomputation.

Fault Tolerance ... Subcomputations are transactions: Authors claim: side effects can be undone How does this relate to hierarchical work stealing?

Discussion ... Anonymity: A host executing a stolen subtree cannot determine client. Managers are assumed to be trustworthy Hierarchy: Yes, via manager hierarchy. Ease of use: Interface incomplete. clients submit jobs via a special “shell”

Discussion ... Adaptive parallelism: “Owner” (?) of compute server sets a policy that defines when server is idle. How? When compute server becomes unavailable for Atlas work, all its sub-computations are moved to another computer server.

Adaptive Parallelism ... Moving a subcomputation requires updating information linking subcomputation to its: parent children How long does it take to retreat? Is sub-computation restarted? From checkpoint?

Limitations Atlas inherits tree-structured program limitation from Cilk. But this is still a rich set! Generalizing to non-tree-structured programs seems hard. No shared variables among threads. Global file system is read-only.

Conclusion Jicos design goals = those for Atlas. Use JXTA to give Jicos a “file system” Then, Jicos becomes Atlas’s heir.