Grid InterNetworking CAIA guest talk

Slides:



Advertisements
Similar presentations
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Advertisements

EC-GIN Europe-China Grid InterNetworking. Enriched with customised network mechanisms Original Internet technology Overview of Project Traditional Internet.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
Crucial Patterns in Service- Oriented Architecture Jaroslav Král, Michal Žemlička Charles University, Prague.
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
What is Grid Computing? Cevat Şener Dept. of Computer Engineering, METU.
High Performance Computing Course Notes Grid Computing.
Uni Innsbruck Informatik - 1 The Grid: From Parallel to Virtualized Parallel Computing Michael Welzl DPS NSG Team.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
CSCD 433/533 Advanced Computer Networks Lecture 1 Course Overview Fall 2011.
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
Workload Management Massimo Sgaravatto INFN Padova.
What Is TCP/IP? The large collection of networking protocols and services called TCP/IP denotes far more than the combination of the two key protocols.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Grid Computing Net 535.
Introducing EC-GIN: Europe-China Grid InterNetworking Europe-China Grid InterNetworking European Sixth Framework STREP FP IST Michael Welzl,
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
MulTFRC: TFRC with weighted fairness draft-welzl-multfrc-00 Michael Welzl, Dragana Damjanovic 75th IETF Meeting Stockholm, Sweden 29 July 2009.
Tufts Wireless Laboratory School Of Engineering Tufts University “Network QoS Management in Cyber-Physical Systems” Nicole Ng 9/16/20151 by Feng Xia, Longhua.
SWE 316: Software Design and Architecture – Dr. Khalid Aljasser Objectives Lecture 11 : Frameworks SWE 316: Software Design and Architecture  To understand.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
DISTRIBUTED COMPUTING
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
Uni Innsbruck Informatik - 1 Grid InterNetworking Michael Welzl DPS NSG Team
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
Uni Innsbruck Informatik - 1 Grid InterNetworking and QoS for the Grid Michael Welzl DPS NSG Team
Uni Innsbruck Informatik - 1 Network Support for Grid Computing (NSG) Michael Welzl DPS NSG Team
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Introduction to Grid Computing Ed Seidel Max Planck Institute for Gravitational Physics
Uni Innsbruck Informatik - 1 Open Issues and New Challenges for End-to-End Transport E2E RG Meeting - July 28/29, MIT, Cambridge MA Michael Welzl
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
MulTFRC: TFRC with weighted fairness draft-welzl-multfrc-01 Michael Welzl, Dragana Damjanovic 76th IETF Meeting Hiroshima, Japan 10 November2009.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Uni Innsbruck Informatik - 1 Network Support for Grid Computing... a new research direction! Michael Welzl DPS NSG Team
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Uni Innsbruck Informatik - 1 Network Support for Grid Computing Recent Research Work and Plans at the University of Innsbruck Michael Welzl
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
7. Grid Computing Systems and Resource Management
Internet of Things. IoT Novel paradigm – Rapidly gaining ground in the wireless scenario Basic idea – Pervasive presence around us a variety of things.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Reading TCP/IP Protocol. Training target: Read the following reading materials and use the reading skills mentioned in the passages above. You may also.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
1 Architecture and Behavioral Model for Future Cognitive Heterogeneous Networks Advisor: Wei-Yeh Chen Student: Long-Chong Hung G. Chen, Y. Zhang, M. Song,
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Admela Jukan jukan at uiuc.edu March 15, 2005 GGF 13, Seoul Issues of Network Control Plane Interactions with Grid Applications.
Introducing EC-GIN: Europe-China Grid InterNetworking Sven Hessler Institute of Computer Science, NSG team University of Innsbruck, Austria
U Innsbruck Informatik - 1 Specification of a Network Adaptation Layer for the Grid GGF7 presentation Michael Welzl University.
A service Oriented Architecture & Web Service Technology.
Clouds , Grids and Clusters
Muhammad Murtaza Yousaf, Michael Welzl
University of Technology
Large Scale Distributed Computing
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Presentation transcript:

Grid InterNetworking CAIA guest talk Swinburne University Melbourne AUS 12 February, 2008 Michael Welzl http://www.welzl.at DPS NSG Team http://dps.uibk.ac.at/nsg Institute of Computer Science University of Innsbruck, Austria

Outline Problem scope (Grid introduction) Grid InterNetworking Identifying a research gap Some issues and proposed solutions Conclusion Practical problems Future perspectives Final words

Shrinking the problem space Problem scope Shrinking the problem space

Introducing the Grid History: parallel processing at a growing scale Parallel CPU architectures Multiprocessor machines Clusters (“Massively Distributed“) computers on the Internet GRID logical consequence of HPC metaphor: power grid just plug in, don‘t care where (processing) power comes from, don‘t care how it reaches you Common definition: The real and specific problem that underlies the Grid concept is coordinated resource sharing and problem solving in dynamic, multi institutional virtual organizations [Ian Foster, Carl Kesselman and Steven Tuecke, “The Anatomy of the Grid – Enabling Scalable Virtual Organizations”, International Journal on Supercomputer Applications, 2001]

Scope Definition quite broad (“resource sharing“) Reasonable - e.g., computers also have harddisks But also led to some confusion - e.g., new research areas / buzzwords: Wireless Grid, Data Grid, Semantic / Knowledge Grid, Pervasive Grid, [this space reserved for your favorite research area] Grid Example of confusion due to broad Grid interpretation: “One of the first applications of Grid technologies will be in remote training and education. Imagine the productivity gains if we had routine access to virtual lecture rooms! (..) What if we were able to walk up to a local ‘power wall‘ and give a lecture fully electronically in a virtual environment with interactive Web materials to an audience gathered from around the country - and then simply walk back to the office instead of going back to a hotel or an airplane?“ [I. Foster, C. Kesselman (eds): “The Grid: Blueprint for a New Computing Infrastructure“, 2nd edition, Elsevier Inc. / MKP, 2004]  Clear, narrower scope is advisable for thinking/talking about the Grid Traditional goal: processing power Grid people = parallel people; thus, main goal has not changed much

The next Web? Ways of looking at the Internet Communication medium (email) Truly large kiosk (web) The Grid way of looking at the Internet Infrastructure for Virtual Teams Most of the time... the “real and specific goal“ is High Performance Computing Virtual Organizations and Virtual Teams are well defined i.e. not an „open“ system, e.g. security is a big issue Virtual Teams Geographically distributed Organizationally distributed Yet work on a common problem But Web 2.0 is already here :-) It has been called “the next web“

Virtual Organizations and Virtual Teams Distributed resources and people Linked by networks, crossing admin domains Sharing resources, common goals Dynamic R VO-B VO-A Source: Globus presentation by Ian Foster

The Grid and P2P systems Look quite similar Goal in both cases: resource sharing Major difference: clearly defined VOs / VTs No incentive considerations Availability not such a big problem as in P2P case It is an issue, but at larger time scales (e.g. computers in student labs should be available after 22:00, but are sometimes shut down by tutors) Scalability not such a big issue as in P2P case ...so far!  convergence as Grids grow coordinated resource sharing and problem solving in dynamic, multi institutional virtual organizations (Grid, P2P)

Austrian Grid E-science Grid applications Medical Sciences Distributed Heart Simulation Virtual Lung Biopsy Virtual Eye Surgery Medical Multimedia Data Management and Distribution Virtual Arterial Tree Tomography and Morphometry High-Energy Physics CERN experiment analyses Applied Numerical Simulation Distributed Scientific Computing: Advanced Computational Methods in Life Science Computational Engineering High Dimensional Improper Integration Procedures Astrophysical Simulations and Solar Observations Astrophysical Simulations Hydrodynamic Simulations Federation of Distributed Archives of Solar Observation Meteorologal Simulations Environmental GRID Applications

Example: CERN Large Hadron Collider Source: Globus presentation by Ian Foster Largest machine built by humans: particle accelerator and collider with a circumference of 27 kilometers Will generate 10 Petabytes (107 Gigabytes) of information per year … starting 2007 (?) This information must be processed and stored somewhere Beyond the scope of a single institution to manage this problem Projects: LCG (LHC Computing Grid), EGEE (Enabling Grids for E-sciencE)

Massively parallel Heterogeneous distributed systems Complexity GRID Grid poses difficult problems Heterogeneity and dynamicity of resources Secure access to resources with different users in various roles, belonging to VTs which belong to VOs Efficient assignment of data and tasks to machines (“scheduling“) Cluster with dedicated network links Heterogeneous distributed systems Massively parallel SETI@home GRID distributed memory Systems with Shared memory

Grid requirements Computer scientists can tackle these problems Grid application users and programmers are often not computer scientists Important goal: ease of use Programmer should not worry (too much) about the Grid User should worry even less Ultimate goal: write and use an application as if using a single computer (power grid metaphor) How do computer scientists simplify? Abstraction. We build layers. In a Grid, we typically have Middleware.

Toolkits Most famous: Globus Toolkit Other well-known examples Evolution from GT2 via GT3 to GT4 influenced the whole Grid community Reference implementation of Open Grid Forum (OGF) standards Other well-known examples Condor Exists since mid-1980‘s No Grid back then - system gradually evolved towards it Traditional goal: harvest CPU power of normal user workstations  many Grid issues always had to be addressed anyway Special interfaces now enable Condor-Globus communication (“Condor-G“) Unicore (used in D-Grid) gLite (used in EGEE) Issues that these middlewares (should) address Load Balancing, error management Authentification, Authorization and Accounting (AAA) Resource discovery, naming Resource access and monitoring Resource reservation and QoS management

Evolution: moving towards an architecture OGSI / OGSA: Open Grid Service Infrastructure / Architecture Open Grid Forum (OGF) standards OGSA = service-oriented architecture; key concept for virtualization use a resource = call a service OGSI = Web Services + state management failed: too complex, not compliant with Web Service standards GT3 GT4 Source: Globus presentation by Ian Foster

Current SoA Standards are only specified when mechanisms are known to work Globus only includes such working elements Lots of important features missing Practical issues with existing middlewares Submitting a Globus job is very slow (Austrian Grid: approx. 20 seconds)  significant granularity limit for parallelization! Globus is a huge piece of software Currently, some confusion about right location of features On top of middleware? (research on top of Globus) In middleware? (other Middleware projects) In the OS? (XtreemOS)  Upcoming slides concern mechanisms which are mostly on top and partially within middleware

Automatic parallelization in Grids Scheduling; important issue for “power outlet“ goal! Automatic distribution of tasks and inter-task data transmissions = scheduling Grid scheduling encompasses Resource Discovery Authorization Filtering, Application Requirement Definition, Minimal Requirement Filtering System Selection Dynamic Information Gathering Job Execution (optional) Advance Reservation Job Submission Preparation Tasks Monitoring Progress Job Completion Clean-up Tasks So far, most scheduling efforts consider embarassingly parallel applications - typically parameter sweeps (no dependencies)

Grid workflow applications Dependencies between applications (or large parts of applications) typically specified in Directed Acyclic Graph (DAG) Condor: DAG manager (DAGMan) uses .dag file for simple dependencies “Do not run job ‘B’ until job ‘A’ has completed successfully” DAGMan scheduling: for all tasks do... Find task with earliest starting time Allocate it to processor with Earlierst Finish Time Remove task from list GriPhyN (Grid Physics Network) facilitates workflow design with “Pegasus“ (Planning for Execution in Grids) framework Specification of abstract workflow: identify application components, formulate workflow specifying the execution order, using logical names for components and files Automatic generation of concrete workflow (map components to resources) Concrete workflow submitted to Condor-G/DAGMan

Grid Workflow Applications /2 Legacy codes Components Web Services Grid Workflows based on activities MPI HPF OMP Java Legacy Codes Descriptor Generation Component Interaction Optimization, Adaptation Service Description Discovery, Selection Deployment, Invocation Dynamic Instantiation Service Orchestration Quality of Service Source: presentation by Thomas Fahringer Components are built, Web (Grid) Services are defined, Activities are specified Several projects (e.g. K-WF Grid) and systems (e.g. ASKALON) exist Most applications have simple workflows E.g. Montage: dissects space image, distributes processing, merges results

Source: http://www.dps.uibk.ac.at/projects/teuta/

Grid InterNetworking

Research gap: Grid-specific network enhancements Enriched with customised network mechanisms Original Internet technology Bringing the Grid to its full potential ! EC-GIN Today‘s Grid applications EC-GIN enabled Grid applications Applications with special network properties and requirements Driving a racing car on a public road Traditional Internet applications (web browser, ftp, ..) Real-time multimedia applications (VoIP, video conference, ..)

Grid-network peculiarities Special behavior Predictable traffic pattern - this is totally new to the Internet! Web: users create traffic FTP download: starts ... ends Streaming video: either CBR or depends on content! (head movement, ..) Could be exploited by congestion control mechanisms Distinction: Bulk data transfer (e.g. GridFTP) vs. control messages (e.g. SOAP) File transfers are often “pushed“ and not “pulled“ Distributed System which is active for a while overlay based network enhancements possible Multicast P2P paradigm: “do work for others for the sake of enhancing the whole system (in your own interest)“ can be applied - e.g. act as a PEP, ... sophisticated network measurements possible can exploit longevity and distributed infrastructure Special requirements file transfer delay predictions note: useless without knowing about shared bottlenecks QoS, but for file transfers only (“advance reservation“)

What is EC-GIN? European project: Europe-China Grid InterNetworking STREP in IST FP6 Call 6 2.2 MEuro, 11 partners (7 Europe + 4 China) Networkers developing mechanisms for Grids

Research Challenges Research Challenges: How to model Grid traffic? Much is known about web traffic (e.g. self-similarity) - but the Grid is different! How to simulate a Grid-network? Necessary for checking various environment conditions May require traffic model (above) Currently, Grid-Sim / Net-Sim are two separate worlds (different goals, assumptions, tools, people) How to specify network requirements? Explicit or implicit, guaranteed or “elastic“, various possible levels of granularity How to align network and Grid economics? Combined usage based pricing for various resources including the network What P2P methods are suitable for the Grid? What is the right means for storing short-lived performance data?

Open issue: abstract-concrete WF mapping Tasks = {T1, T2, T3, T4} Resources = {R1, R2, R3, R4} Data transfers = {D1, D2, D3, D4} Unnoticed by scheduling algorithms!

Large stacks Middleware WS-RF SOAP HTTP TCP IP Grid apps DoD Source: http://img.dell.com/images/global/topics/power/ps1q02-broadcom1.gif DoD WS-RF SOAP HTTP TCP IP

Open issue: layering inefficiency Can‘t reuse connections! Grid Service Breaking the chain Stateful Web Service WS-RF Stateless SOAP Doesn‘t care, can do both Could reuse connections, but doesn‘t! HTTP 1.0 HTTP 1.1 Stateless Can‘t reuse connections! Connection state TCP Reuses connections Connection state IP Stateless

NWS: The Network Weather Service Most common tool for performance prediction Important for making good scheduling decisions Distributed system consisting of Name Server (boring) Sensor - actual measurement instance, regularly stores values in...... Persistent State Forecaster (calculations based on data in Persistent State) Interesting parts: Sensor Measured resources: availableCpu, bandwidthTcp, connectTimeTcp, currentCpu, freeDisk, freeMemory, latencyTcp Forecaster Apply different models for prediction, compare with actual measurement data, choose best match Duration of a long TCP transfer RTT of a small message

NWS critique Architecture (splitting into sensors, forecaster etc.) seems reasonable; open source  consider integrating new work in NWS Sensor active measurements even though non-intrusiveness was an important design goal - does not passively monitor TCP (i.e. ignores available data) strange methodology: (Large message throughput) “Empirically, we have observed that a message size of 64K bytes (..) yields meaningful results“ ignores packet size ( = measurement granularity ) and path characteristics trivial method - much more sophisticated methods available point-to-point measurements: distributed infrastructure not taken into account Forecaster relies on these weird measurements, where we don‘t know much about the distribution (but we do know some things about net traffic IFF properly measured) uses quite trivial models (but they may in fact suffice...) ssthresh

NWS measurements (Austrian Grid) Muhammad Murtaza Yousaf, Michael Welzl, Malik Muhammad Junaid: "Fog in the Network Weather Service: A case for novel Approaches", MetroGrid workshop, co-located with GridNets 2007, Lyon, France, 19 October 2007. Salzburg-Linz (left): more than 20 MB needed to saturate link Within Innsbruck (right), Gigabit link: around 100 MB needed NWS supposedly designed to be non-intrusive...

The impact of shared bottlenecks Example problem: C allocates tasks to A and B (CPU, memory available); both send results to C B hinders A - task of B should have been kept at C! Path changes are rare - thus, possible to detect potential problem in advance generate test messages from A, B to C - identify signature from B in A‘s traffic Another issue in this scenario: how valid is a prediction that A obtains if a measurement / prediction system does not know about the shared bottleneck?

EC-GIN Large File Transfer Scenario (LFTS) Multipath file transfer (AB + ACB) beneficial Multipath file transfer not beneficial due to shared bottleneck Questions: when does this make sense, how to expose this functionality, how to authenticate and authorize…?

Shared bottleneck detection with SVD Muhammad Murtaza Yousaf, Michael Welzl, Bulent Yener (2007); under submission Input: end-to-end forward delays of multiple flows Analysis Multivariate Analysis Method SVD (Singular Value Decomposition) Matrix operation which yields clustered values for correlating flows Calculate differences between values, consider changes between clusters as outliers, apply simple outlier detection method Output: clusters of flows which share a bottleneck Very precise, easy to calculate, can cluster multiple flows at the same time (other work uses pairwise cross correlation)

Extending the Padhye equation to N flows Dragana Damjanovic, Werner Heiss, Michael Welzl: "An Extension of the TCP Steady-State Throughput Equation for Parallel TCP Flows", poster, ACM SIGCOMM 2007, 27-31 August, Kyoto, Japan. Fair amount of work done, but so far, no (easily usable) approximation exists which also takes loss into account Useful in a Grid for multiple reasons: Prediction of GridFTP throughput (multiple TCP flows) Protocol with tunable aggression (MulTFRC) Because, if a Grid application uses two flows which share a bottleneck, flow 1 may be 3.7 times as important to it as flow 2 (e.g. if flow 2 is from replication) For fairness: if we take a break, we may earn “aggression points“ we get E[W] depending on n, p and b

Other current EC-GIN work INRIA + UIBK working on scheduling of advance reservations for bulk data transfers (using high-speed congestion control mechanisms) WP2 dedicated to modeling (and ns-2 simulation code) Lead: ULANC; currently collecting measurements from everywhere... Traffic model from INRIA UIBK developed a Grid-Net scheduling simulator using ns-2 ISCAS developed a high-speed SOAP engine UniZH working on P2P incentive mechanisms for the Grid + security ...stay tuned for more!

Practical problems Future perspectives Final words Conclusion Practical problems Future perspectives Final words

Problem: How Grid people see the Internet Just like Web Service community Abstraction - simply use what is available still: performance = main goal Conflict! Existing transport system (TCP/IP + Routing + ..) works well QoS makes things better, the Grid needs it! we now have a chance for that, thanks to IPv6 Absolutely not like Web Service community ! Wrong. Quote from a paper review: “In fact, any solution that requires changing the TCP/IP protocol stack is practically unapplicable to real-world scenarios, (..).“ How to change this view Create awareness - e.g. GGF GHPN-RG published documents such as “net issues with grids“, “overview of transport protocols“ Develop solutions and publish them! (EC-GIN, GridNets)

A time-to-market issue Typical Grid project Research begins (Real-life) coding begins Real-life tests begin Thesis writing Result: thesis + running code; tests in collaboration with different research areas Ideal Typical Network project Research begins (Simulation) coding begins Thesis writing Result: thesis + simulation code; perhaps early real-life prototype (if students did well)

Machine-only communication Trend in networks: from support of Human-Human Communication email, chat via Human-Machine Communication web surfing, file downloads (P2P systems), streaming media to Machine-machine Communication Growing number of commercial web service based applications New “hype“ technologies: Sensor nets, Autonomic Computing vision Semantic Web (Services): first big step for supporting machine-only communication at a high level So far, no steps at a lower level This would be like RTP, RTCP, SIP, DCCP, ... for multimedia apps: not absolutely necessary, but advantageous

The long-term value of Grid-net research Key for achieving this: change viewpoint from “what can we do for the Grid“ to “what can the Grid do for us“ (or from “what does the Grid need“ to “what does the Grid mean to us“) A subset of Grid-net developments will be useful for other machine-only communication systems! Grid Future work Sensor nets Web service applications

Conclusion Grid applications show special requirements and properties from a network perspective and it is reasonable to develop tailored Internet technology for them. There is another class of such applications... Multimedia. For multimedia applications, an immense number of network enhancements (even IETF standards) exist. For the Grid, there is nothing. This is a research gap; let‘s fill it together! submit a paper to GridNets 2008 in Beijing! :-) Reminder: if done right, such research is also applicable to other systems with machine-only traffic

More information: http://www.ec-gin.eu Thank you! Questions?