February 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology February 2006 February 2006 An Agent Based, Dynamic Service System to Monitor,

Slides:



Advertisements
Similar presentations
INTRODUCTION TO COMPUTER NETWORKS Zeeshan Abbas. Introduction to Computer Networks INTRODUCTION TO COMPUTER NETWORKS.
Advertisements

May 2005 Iosif Legrand 1 Iosif Legrand California Institute of Technology May 2005 An Agent Based, Dynamic Service System to Monitor, Control and Optimize.
May 2005 Iosif Legrand 1 Iosif Legrand California Institute of Technology ICFA WORKSHOP Daegu, May 2005 Daegu, May 2005 An Agent Based, Dynamic Service.
Network+ Guide to Networks, Fourth Edition
A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Technical Architectures
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
June 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Monitoring and controlling VRVS Reflectors Catalin Cirstoiu 3/7/2003.
POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC.
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
SensIT PI Meeting, April 17-20, Distributed Services for Self-Organizing Sensor Networks Alvin S. Lim Computer Science and Software Engineering.
Network+ Guide to Networks, Fourth Edition Chapter 1 An Introduction to Networking.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Understanding and Managing WebSphere V5
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
Grid Monitoring By Zoran Obradovic CSE-510 October 2007.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Network+ Guide to Networks, Fourth Edition Chapter 1 An Introduction to Networking.
September 2005 Iosif Legrand 1 End User Agents: extending the "intelligence" to the edge in Distributed Service Systems Iosif Legrand California Institute.
Users’ Authentication in the VRVS System David Collados California Institute of Technology November 20th, 2003TERENA - Authentication & Authorization.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
ACAT 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
1 Ramiro Voicu, Iosif Legrand, Harvey Newman, Artur Barczyk, Costin Grigoras, Ciprian Dobre, Alexandru Costan, Azher Mughal, Sandor Rozsa Monitoring and.
Monitoring, Accounting and Automated Decision Support for the ALICE Experiment Based on the MonALISA Framework.
1 VINCI : Virtual Intelligent Networks for Computing Infrastructures An Integrated Network Services System to Control and Optimize Workflows in Distributed.
1 Iosif Legrand, Harvey Newman, Ramiro Voicu, Costin Grigoras, Catalin Cirstoiu, Ciprian Dobre An Agent Based, Dynamic Service System to Monitor, Control.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
ALICE, ATLAS, CMS & LHCb joint workshop on
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
LISHEP 2004 Iosif Legrand Iosif Legrand California Institute of Technology DISTRIBUTED SERVICES.
Components of a Sysplex. A sysplex is not a single product that you install in your data center. Rather, a sysplex is a collection of products, both hardware.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Overview of ALICE monitoring Catalin Cirstoiu, Pablo Saiz, Latchezar Betev 23/03/2007 System Analysis Working Group.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
End-to-End Efficiency (E 3 ) Integrating Project of the EC 7 th Framework Programme General View of the E3 Prototyping Environment for Cognitive and Self-x.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
1 MonALISA Team Iosif Legrand, Harvey Newman, Ramiro Voicu, Costin Grigoras, Ciprian Dobre, Alexandru Costan MonALISA capabilities for the LHCOPN LHCOPN.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
PPDG February 2002 Iosif Legrand Monitoring systems requirements, Prototype tools and integration with other services Iosif Legrand California Institute.
Intro to Distributed Systems and Networks Hank Levy.
Introduction to Active Directory
October 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
January 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology January 2006 An Agent Based, Dynamic Service System to Monitor, Control and.
Storage discovery in AliEn
ECSAC- August 2009, Veli Losink California Institute of Technology
California Institute of Technology
Open Source distributed document DB for an enterprise
Integration of Network Services Interface version 2 with the JUNOS Space SDK
An Introduction to Computer Networking
Time Gathering Systems Secure Data Collection for IBM System i Server
Network+ Guide to Networks, Fourth Edition
Specialized Cloud Architectures
SLAC monitoring Web Services
STATEL an easy way to transfer data
Presentation transcript:

February 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology February 2006 February 2006 An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems Control and Optimize Distributed Systems

February 2006 Iosif Legrand 2 The MonALISA Framework   MonALISA is a Dynamic, Distributed Service System capable to collect any type of information from different systems, to analyze it in near real time and to provide support for automated control decisions and global optimization of workflows in complex grid systems.   The MonALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks. These agents can analyze and process the information, in a distributed way, and to provide optimization decisions in large scale distributed applications.

February 2006 Iosif Legrand 3 MonALISA is A Dynamic, Distributed Service Architecture   The framework is based on a hierarchical structure of loosely coupled agents acting as distributed services which are independent & autonomous entities able to discover themselves and to cooperate using a dynamic set of proxies or self describing protocols.   An agent-based architecture provides the ability to invest the system with increasing degrees of intelligence; to reduce complexity and make global systems manageable in real time. For an effective use of distributed resources, these services provide adaptability and self-organization.

February 2006 Iosif Legrand 4 Lookup Service MonALISA service & Data Handling Data Cache Service & DB Configuration Control (SSL) Configuration Control (SSL) Lookup Service Data Stores WEB Service WSDL SOAP Client (other service) Java Discovery Registration Client (other service) Web client data Postgres MySQL Applications User defined loadable Modules to write /sent data Predicates & Agents Communications via the ML Proxy MonALSIA Service

February 2006 Iosif Legrand 5 The MonALISA Discovery System & Services Network of JINI-LUSs Secure & Public MonALISA services Proxies Clients, HL services repositories Distributed Dynamic Discovery- based on a lease Mechanism and REN Distributed System for gathering and Analyzing Information. Dynamic load balancing Scalability & Replication Security AAA for Clients Global Services or Clients Fully Distributed System with no Single Point of Failure AGENTS

February 2006 Iosif Legrand 6 Monitoring Internet2 backbone Network u Test for a Land Speed Record u ~ 7 Gb/s in a single TCP stream from Geneva to Caltech

February 2006 Iosif Legrand 7 The UltraLight Network BNL ESnet IN /OUT

February 2006 Iosif Legrand 8 Monitoring Network Topology Latency, Routers NETWORKS AS ROUTERS

February 2006 Iosif Legrand 9 Monitoring The GLORIAD Ring

February 2006 Iosif Legrand 10 Monitoring Grid sites, Running Jobs, Network Traffic, and Connectivity TOPOLOGY JOBS ACCOUNTING

February 2006 Iosif Legrand 11 Monitoring OSG: Resources, Jobs & Accounting 42 SITES ~ Nodes ( CPUs) Thousands of Jobs Thousands of Jobs parameters parameters Running Jobs Accounting

February 2006 Iosif Legrand 12 FTP Data Transfer between GRID sites Total FTP Traffic per VO

February 2006 Iosif Legrand 13 Bandwidth Challenge at SC Gbs ~ 500 TB Total in 4h

February 2006 Iosif Legrand 14 End User / Client Agent LISA- Localhost Information Service Agent   Authorization   Service discovery   Local detection of the hardware and software configuration   Complete end-system monitoring: Per-process load, I/O and network throughputs, etc.   End-to-end performance measurements   Will act as an active listener for all events related with the requests generated by its local applications.

February 2006 Iosif Legrand 15 Host Monitoring at SC2005 u Many “network” problems are actually endhost problems: misconfigured or underpowered end-systems u The LISA application was designed to monitor the endhost and its view of the network. u For SC|05 we developed we used LISA to gather the relevant host details related to network performance u Information on the system information, TCP configuration and network device setup was gathered and accessible from one site. u Future plans are to coordinate this with LISA and deploy this as part of OSG. The Tier-2 centers are a primary target. Network Device Information TCP Settings Host/System Information

February 2006 Iosif Legrand 16 Available Bandwidth Measurements Embedded Pathload module.

February 2006 Iosif Legrand 17 Coordination Service for Available Bandwidth Measurements u Enforces measurement fairness u Avoids multiple probes on shared network segments u Dynamic configuration of measurements timing u Logs events u Provides service redundancy by using a master- slave model

February 2006 Iosif Legrand 18 Monitoring the Execution of Jobs and the Time Evolution SPLIT JOBS LIFELINES for JOBS Job Job1 Job2 Job3 Job 31 Job 32 Summit a Job DAG

February 2006 Iosif Legrand 19 ApMon – Application Monitoring MonALISA Service MonALISA Service ApMon APPLICATION Monitoring Data UDP/XDR Mbps_out: 0.52 Status: reading App. Monitoring MB_inout: ApMon Config parameter1: value parameter2: value App. Monitoring... Time;IP;procID Monitoring Data UDP/XDR Monitoring Data UDP/XDR load1: 0.24 processes: 97 System Monitoring pages_in: 83 MonALISA hosts Config Servlet Library of APIs (C, C++, Java, Perl. Python) that can be used to send any information to MonALISA services   Flexibility, dynamic configuration, high communication performance dynamic reloading ApMon configuration generated automatically by a servlet / CGI script   Automated system monitoring   Accounting information No Lost Packages

February 2006 Iosif Legrand 20 Optical Switch Runs a ML Demon > ml_path IP1 IP4 “copy file IP4” ML proxy services used in Agent Communication ML Demon Control and Monitor the switch Optical Switch MonALISA ML Agent MonALISA ML Agent MonALISA ML Agent Discovery & Secure Connection 4 MonALISA agents to create on demand on an optical path or tree Time to create a path on demand <1s independent of the location and the number of connections Time to create a path on demand <1s independent of the location and the number of connections

February 2006 Iosif Legrand 21 Monitoring and Controlling Optical Planes Port power monitoring Controlling

February 2006 Iosif Legrand 22 Monitoring Optical Switches Agents to Create on Demand an Optical Path

February 2006 Iosif Legrand 23 Major Communities   OSG   CMS   ALICE   D0   STAR   VRVS   LGC RUSSIA   SE Europe GRID   APAC Grid   UNAM Grid   ABILENE   ULTRALIGHT   GLORIAD   LHC Net   RoEduNET Communities using MonALISA ABILENE VRVS - -  ALICE CMS-DC04 Demonstrated at:   SC2003   Telecom World 2003   WSIS 2003   SC 2004   I   TERENA 2005   IGrid 2005   SC 2005 MonALISA Running 24 X 7 at 250 Sites   Collecting 250,000 parameters in near real-time   Update rate of 25,000 parameter updates per second   Monitoring   12,000 computers   > 100 WAN Links   Thousands of Grid jobs running con- currently

February 2006 Iosif Legrand 24 The MonALISA Architecture Provides:  Distributed Registration and Discovery for Services and Applications.  Monitoring all aspects of complex systems :  System information for computer nodes and clusters  Network information : WAN and LAN  Monitoring the performance of Applications, Jobs or services  The End User Systems, its performance  Video streaming  Can interact with any other services to provide in near real-time customized information based on monitoring data  Secure, remote administration for services and applications  Agents to supervise applications, trigger alarms, restart or reconfigure them, and to notify other services when certain conditions are detected.  The MonALISA framework is used to develop higher level decision services, implemented as a distributed network of communicating agents, to perform global optimization tasks.  Graphical User Interfaces to visualize complex information