Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid performance analysis Directions, issues and open problems Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.

Similar presentations


Presentation on theme: "Grid performance analysis Directions, issues and open problems Zsolt Németh MTA SZTAKI Computer and Automation Research Institute."— Presentation transcript:

1 Grid performance analysis Directions, issues and open problems Zsolt Németh MTA SZTAKI Computer and Automation Research Institute

2 Outline  What is the grid?  What is grid performance?  Elementary problems of grid performance evaluation Directions Issues Open questions

3 Distributed applications  A set of cooperative processes

4 Distributed applications  Processes require resources CPU Memory Network Printer Storage Database Librabries I/O devices

5 Distributed applications  Resources can be found on computational nodes CPU Memory NetworkPrinter Storage Database Libraries I/O devices CPU Mapping

6 Distributed applications  Application processes are mapped onto computational nodes  Computational nodes Form a loosely coupled computer system Interact via messages

7 Distributed applications Process control? Security? Naming? Communication? Input / output? File access? Application: Cooperative processes Physical layer: Computational nodes

8 Distributed applications Application: Cooperative processes Physical layer: Computational nodes Virtual machine: Process control Security Naming Communication Input / output File access

9  Distributed resources are virtually unified by a software layer A virtual machine is introduced between the application and the physical layer Provides a single system image to the application  Types “Conventional” (PVM, some implementations of MPI) Grid (Globus, Legion) Conventional distributed environments and grids

10  What is the essential difference?

11 Conventional distributed environments and grids  Geographical extent?

12 Conventional distributed environments and grids  Performance?

13 Conventional distributed environments and grids  Tools and services?

14 Conventional distributed environments and grids  How is the virtual machine built up?  What does execution mean?  What is the semantics of execution?

15 Modeling the semantics  Abstract State Machines (ASM) 1.Model for a distributed application assuming a conventional environment 2.The same model (with minimal modifications) assuming a grid  If the latter model works, there are no real differences  If it does not work, what are the fundamental differences  The semantical differences derived from the formal model are presented

16 Conventional environments Physical level Set of nodes (node=collection of resources) Login access Static Virtual machine Constructed on a priori information Processes Have resource requests Mapping Processes are mapped onto nodes Resource assignment is implicit

17 Description of grid  “flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources” (The anatomy of the grid)  “single, seamless, computational environment in which cycles, communication and data are shared” (Legion: the Next Step Toward a Nationwide Virtual Computer)  “widearea environment that transparently consists of workstations, personal computers, graphic rendering engines, supercomputers and nontraditional devices” (Legion - A View from 50,000 Feet)  “collection of geographically separated resources connected by a high speed network”, “a software layer which transforms a collection of independent resources into a single, coherent virtual machine” (Metacomputing - What’s in it for me)

18 Grid Physical layer Virtual machine Resources are assigned to processes Consists of the selected resources Processes Have resource requirements Mapping Assign nodes to resources? Set of resources Shared Dynamic

19 Grid: the resource abstraction Physical layer Processes Have resource needs Resource abstraction Explicit mapping between virtual and physical resources Cannot be solved at user/application level

20 Grid: the user abstraction Physical layer Local, physical users (user accounts) Processes Belong to a user User of the virtual machine is authorised to use the constituting resources Have no login access to the node the resource belongs to User abstraction User of the virtual machine is temporarily mapped onto some local accounts Cannot be solved at user/application level

21 Fundamental grid functionalities  By formal modeling the essential functionalities can be identified Resource abstraction Physical resources can be assigned to virtual resource needs (matched by properties) Grid provides a mapping between virtual and physical resources User abstraction User of the physical machine may be different from the user of the virtual machine Grid provides a temporal mapping between virtual and physical users

22 Conventional distributed environments and grids Smith 4 nodes Smith 4 CPU, memory, storage smith@n1.edu Smith 1 CPU smith@n1.edu smith@n2.edu p12@n2.edu griduser@n1.edu

23 Performance analysis  Instrumentation  Monitoring  Data reduction  Analysis and presentation  Optimisation

24 The scope of this presentation  Instrumentation  Monitoring  Data reduction  Analysis and presentation  Optimisation

25 What is grid performance at all?  Traditionally ‘performance’ is Speed Throughput Bandwidth, etc.  Using grids Quantitative reasons Qualitative reasons – QoS Economic aspects

26 Grid performance analysis 1.Performance is not characterisitic to an application itself rather to the interaction of the application and the infrastructure. 2.The more complex and dynamic nature of a grid introduces more possible performance flaws. 3.Usual metrics and characteristic parameters are not necessarily applicable for grids. 4.The larger event data volume needs careful reduction, feature extraction and intelligent presentation. 5.Due to the permanently changing environment, on-line and semi on- line techniques are advantageous over post mortem methods. 6.Performance tuning is more difficult due to dynamic environment and changing infrastructure. 7.Observation, comparison and analysis is more complex due to the diversity and heterogeneity of resources.

27 Interaction of application and the infrastructure  Performance = application perf.  infrastructure perf.  Signature model (Pablo group) Application signature e.g. instructions/FLOPs Scaling factor (capabilities of the resources) e.g. FLOPs/seconds Execution signature: application signature * scaling factor E.g. instructions/second = instructions/FLOPS * FLOPs/seconds

28 Grid performance analysis 1.Performance is not characterisitic to an application itself rather to the interaction of the application and the infrastructure. 2.The more complex and dynamic nature of a grid introduces more possible performance flaws. 3.Usual metrics and characteristic parameters are not necessarily applicable for grids. 4.The larger event data volume needs careful reduction, feature extraction and intelligent presentation. 5.Due to the permanently changing environment, on-line and semi on- line techniques are advantageous over post mortem methods. 6.Performance tuning is more difficult due to dynamic environment and changing infrastructure. 7.Observation, comparison and analysis is more complex due to the diversity and heterogeneity of resources.

29 Possible performance problems in grids  All that may occur in a distributed application  Plus Effectiveness of resource brokering Synchronous availability of resources Resources may change during execution Various local policies Shared use of resources Higher costs of some activities  The corresponding symptoms must be characterised -

30 Grid performance analysis 1.Performance is not characterisitic to an application itself rather to the interaction of the application and the infrastructure. 2.The more complex and dynamic nature of a grid introduces more possible performance flaws. 3.Usual metrics and characteristic parameters are not necessarily applicable for grids. 4.The larger event data volume needs careful reduction, feature extraction and intelligent presentation. 5.Due to the permanently changing environment, on-line and semi on- line techniques are advantageous over post mortem methods. 6.Performance tuning is more difficult due to dynamic environment and changing infrastructure. 7.Observation, comparison and analysis is more complex due to the diversity and heterogeneity of resources.

31 Grid performance metrics  Abstract representation of measurable quantities  M=R 1 xR 2 x...R n  Usual metrics Speedup, efficiency Queue length  Such strict values are not characteristic in grid Cannot be interpreted Cannot be compared  New metrics Local metrics and grid metrics Symbolic description / metrics

32 Grid performance analysis 1.Performance is not characterisitic to an application itself rather to the interaction of the application and the infrastructure. 2.The more complex and dynamic nature of a grid introduces more possible performance flaws. 3.Usual metrics and characteristic parameters are not necessarily applicable for grids. 4.The larger event data volume needs careful reduction, feature extraction and intelligent presentation. 5.Due to the permanently changing environment, on-line and semi on- line techniques are advantageous over post mortem methods. 6.Performance tuning is more difficult due to dynamic environment and changing infrastructure. 7.Observation, comparison and analysis is more complex due to the diversity and heterogeneity of resources.

33 Processing monitoring information  Trace data reduction Proportional to time t, processes P, metrics dimension n  Statistical clustering (reducing P) Similar temporal behaviours are classified Questionnable if works for grids Representative processes are recorded for each class  Statistical projection pursuit (reducing n) reduces the dimension by identifying significant metrics  Sampling frequency (reducing t)

34 Grid performance analysis 1.Performance is not characterisitic to an application itself rather to the interaction of the application and the infrastructure. 2.The more complex and dynamic nature of a grid introduces more possible performance flaws. 3.Usual metrics and characteristic parameters are not necessarily applicable for grids. 4.The larger event data volume needs careful reduction, feature extraction and intelligent presentation. 5.Due to the permanently changing environment, on-line and semi on- line techniques are advantageous over post mortem methods. 6.Performance tuning is more difficult due to dynamic environment and changing infrastructure. 7.Observation, comparison and analysis is more complex due to the diversity and heterogeneity of resources.

35 Processing monitoring information  On-line, semi on-line techniques are preferred Off-line techniques assume that runs can be reproduced  Event ordering No global clock can be assumed Partial ordering is possible only  Automatic analysis instead of human observation

36 Grid performance analysis 1.Performance is not characterisitic to an application itself rather to the interaction of the application and the infrastructure. 2.The more complex and dynamic nature of a grid introduces more possible performance flaws. 3.Usual metrics and characteristic parameters are not necessarily applicable for grids. 4.The larger event data volume needs careful reduction, feature extraction and intelligent presentation. 5.Due to the permanently changing environment, on-line and semi on- line techniques are advantageous over post mortem methods. 6.Performance tuning is more difficult due to dynamic environment and changing infrastructure. 7.Observation, comparison and analysis is more complex due to the diversity and heterogeneity of resources.

37 Performance tuning, optimisation  The execution cannot be reproduced Post-mortem optimisation is not viable On-line steering is necessary though, hard to realise Sensors and actuators Application and implementation dependent E.g Autopilot, Falcon  Average behaviour of applications can be improved Post-mortem tuning of the infrastructure (if possible) Brokering decisions Supporting services

38 Grid performance analysis 1.Performance is not characterisitic to an application itself rather to the interaction of the application and the infrastructure. 2.The more complex and dynamic nature of a grid introduces more possible performance flaws. 3.Usual metrics and characteristic parameters are not necessarily applicable for grids. 4.The larger event data volume needs careful reduction, feature extraction and intelligent presentation. 5.Due to the permanently changing environment, on-line and semi on- line techniques are advantageous over post mortem methods. 6.Performance tuning is more difficult due to dynamic environment and changing infrastructure. 7.Observation, comparison and analysis is more complex due to the diversity and heterogeneity of resources.

39 Performance visualisation  Static 2D, 3D Representing statistical data ‘off-line’  Dynamic 2D, 3D Better suited to grid environment Co-visualisation of application and infrastructure, rendering symbolic values, etc. ?  Virtual reality, immersive environments Puts the real world user into the virtual world of grid Allows steering Not widespread

40 Grid performance prediction  Past cannot be replayed, present is volatile, future?  Application behaviour (temporal and space patterns) Markov models  Infrastructure behaviour (e.g. NWS) Mean based methods Median based methods Autoregressive methods Assumes a priori knowledge about the resources  Multivariate methods Correlated metrics can be estimated in order to reduce intrusion

41 Some initial thoughts about performance analysis  First steps Grid metrics How to derive metrics from monitored data? Possible grid performance problems How to detect from metrics? What exactly should be monitored?

42

43 Technical differences User abstraction Resource abstraction What? Security Information system How? Refinement Resource management Information provider How? Refinement... Refinement


Download ppt "Grid performance analysis Directions, issues and open problems Zsolt Németh MTA SZTAKI Computer and Automation Research Institute."

Similar presentations


Ads by Google