Winter Retreat - 2003 Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer

Slides:



Advertisements
Similar presentations
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
Advertisements

Declarative sensor networks David Chu Computer Science Division EECS Department UC Berkeley DBLunch UC Berkeley 2 March 2007.
Chapter 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.
Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,
CSE 598B: Self-* Systems Path Based Failure and Evolution Management Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim Lloyd, Dave Patterson, Armando Fox,
Transaction.
CS 582 / CMPE 481 Distributed Systems Communications.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
Networking Theory (Part 1). Introduction Overview of the basic concepts of networking Also discusses essential topics of networking theory.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
Matnet – Matlab Network Simulator for TinyOS Alec WooTerence Tong July 31 st, 2002.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
seminar on Intrusion detection system
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
MOBILE AD-HOC NETWORK(MANET) SECURITY VAMSI KRISHNA KANURI NAGA SWETHA DASARI RESHMA ARAVAPALLI.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
SWE 316: Software Design and Architecture – Dr. Khalid Aljasser Objectives Lecture 11 : Frameworks SWE 316: Software Design and Architecture  To understand.
Software Component Technology and Component Tracing CSC532 Presentation Developed & Presented by Feifei Xu.
Division of IT Convergence Engineering Towards Unified Management A Common Approach for Telecommunication and Enterprise Usage Sung-Su Kim, Jae Yoon Chung,
Enterprise JavaBeans. What is EJB? l An EJB is a specialized, non-visual JavaBean that runs on a server. l EJB technology supports application development.
Cluster Reliability Project ISIS Vanderbilt University.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Class 5 Architecture-Based Self-Healing Systems David Garlan Carnegie Mellon University.
Architecture-based Reliability of web services Presented in SRG Group meeting January 24, 2011 Cobra Rahmani.
Peer Pressure: Distributed Recovery in Gnutella Pedram Keyani Brian Larson Muthukumar Senthil Computer Science Department Stanford University.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!
Prepared By Aakanksha Agrawal & Richa Pandey Mtech CSE 3 rd SEM.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
EEC 688/788 Secure and Dependable Computing Lecture 8 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
S. Shumilov – Zürich Analytical Visualization Framework - a visual data processing and knowledge discovery system Ivan Denisovich, Serge Shumilov Department.
Dynamic and Selective Combination of Extension in Component-based Applications Eddy Truyen, Bart Vanhaute, Wouter Joosen, Pierre Verbaeten, Bo N. Jørgensen.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
ROC Retreat 1/14/2003 Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
Recovery-Oriented Computing Detecting and Diagnosing Application-Level Failures in Internet Services Emre Kıcıman and Armando Fox {emrek,
Progress Report Armando Fox with George Candea, James Cutler, Ben Ling, Andy Huang.
A Security Framework with Trust Management for Sensor Networks Zhiying Yao, Daeyoung Kim, Insun Lee Information and Communication University (ICU) Kiyoung.
Detecting, Managing, and Diagnosing Failures with FUSE John Dunagan, Juhan Lee (MSN), Alec Wolman WIP.
Application Communities Phase 2 (AC2) Project Overview Nov. 20, 2008 Greg Sullivan BAE Systems Advanced Information Technologies (AIT)
The Utilization of Artificial Intelligence in a Hybrid Intrusion Detection System Authors : Martin Botha, Rossouw von Solms, Kent Perry, Edwin Loubser.
Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke.
Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,
TraceBench: An Open Data Set for Trace-Oriented Monitoring Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Michael R. Lyu 1,2 1 PDL, National.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.
Copyright © 2006, Oracle. All rights reserved Oracle Web Services Manager.
Experience Report: System Log Analysis for Anomaly Detection
Cluster-Based Scalable
Welcome to the Winter 2004 ROC Retreat
Self Healing and Dynamic Construction Framework:
Applying Control Theory to Stream Processing Systems
CHAPTER 3 Architectures for Distributed Systems
Plethora: Infrastructure and System Design
CLUSTER COMPUTING.
RM3G: Next Generation Recovery Manager
CS240: Advanced Programming Concepts
Distributed Hash Tables
Dynamic Verification of Sequential Consistency
Presentation transcript:

Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer

Winter Retreat Motivation  Difficult to monitor and debug dynamic, distributed systems – Divide and conquer, layering, and replication disperse execution context throughout the system – E.g., Internet services, P2P, and sensor networks  Lots of existing low-level tools that help with debugging individual components, but not a collection of them  Need tools that help us understand how components/peers/sensors interact and depend on each other

Winter Retreat Macro Analysis  Macro analysis exploits non-local context to improve reliability and performance – Performance examples: Scout, ILP, Magpie – Statistical view is essential for large, complex systems  Beehive Analogy: – micro analysis allows you to understand the details of individual honeybee; macro analysis is needed to understand how the bees interact to keep the beehive functioning Open challenges macro analysis helps with: – Deducing system structure  Detecting and diagnosing application-level failures (improve MTTR) – Verifying macro invariants

Winter Retreat Our Approach  Use runtime paths to connect the dots! – dynamically captures the component interactions – Analyze many paths to get the overall system behavior  Components are only partially known (“gray boxes”) Apache Java Bean Java Bean Database Java Bean path paths

Winter Retreat Runtime Paths  Instrument code to dynamically trace requests through a system at the component/peer/sensor level – record call path + the runtime properties – e.g. components, latency, success/failure, and resources used to service each request  Use statistical analysis to detect and diagnose problems – e.g., data mining, machine learning, etc.  Runtime analysis tells you how the system is actually being used, not how it may be used  Complements existing micro analysis tools

Winter Retreat Reusable Analysis Framework Architecture  Tracer – Tags each request with a unique ID, and carries it with the request throughout the system – Report observations (component name + resource + performance properties) for each component  Aggregator + Repository – Reconstructs paths and stores them  Declarative Query Engine – Supports statistical queries on paths – Data mining and machine learning routines  Visualization A Tracer C B D E F Aggregator Path Repository Query Engine Visualization Developers/ Operators request

Winter Retreat Inferring System Structure  Key idea: paths directly capture application structure 2 requests Request types Database tables  Key idea: paths associate requests with internal state  Track shared state across requests

Winter Retreat Anomalies as Likely Failures  What is an anomaly? – Paths: deviant structures or latencies – Components: performance/behavior variation – Compared to historical norms, or current peers  Generic method of detecting likely errors 1 23 Database Component Behavior Variation

Winter Retreat Detecting Anomalies in Paths 1. Generate path traces from observations 2. Separate paths by request type 3. Cluster similar structures together 4a. Peer comparison: differences between clusters – But, differences are often normal 4b. Historical comparison: compare number, structure and size of clusters to history Type A Type B

Winter Retreat Future Directions: P2P and Sensors  Key idea: violation of macro invariants are signs of buggy implementation or intrusion  Message paths in P2P and sensor networks – a general mechanism to provide visibility into the collective behavior of multiple nodes micro or static approaches by themselves don’t work well in dynamic, distributed settings – e.g. algorithms have upper bounds on the # of hops Although hop count violation can be detected locally, paths help identify nodes that route messages incorrectly – e.g. detecting nodes that are slow or corrupt msgs

Winter Retreat Sensor Networks – Network Topology  Use message paths to infer network topology and membership – directed messaging may reduce resource consumption  Coping with limited bandwidth – each message records a subset of the nodes – statistically reconstruct the paths

Winter Retreat Conclusion  Macro analysis fills the need when monitoring and debugging systems where local context is of insufficient use  Runtime path-based approach dynamically traces request paths and statistically infer macro properties  A shared analysis framework that is reusable across many systems – Simplifies the construction of effective tools for other systems and the integration with recovery techniques like RR  – Paper includes a commercial example from Tellme! (thanks to Anthony Accardi and Mark Verber)