Progress Report Armando Fox with George Candea, James Cutler, Ben Ling, Andy Huang.

Slides:



Advertisements
Similar presentations
Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi.
Advertisements

Chapter 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
UC Berkeley Online System Problem Detection by Mining Console Logs Wei Xu* Ling Huang † Armando Fox* David Patterson* Michael Jordan* *UC Berkeley † Intel.
Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
DStore: Recovery-friendly, self-managing clustered hash table Andy Huang and Armando Fox Stanford University.
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Application Servers What is it? General A set of software frameworks, components, utilities, functionality that enables you to develop and deliver n-tiered.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 30 Slide 1 Security Engineering.
Sung Hsueh, Arvind Ranasaria Microsoft SQL Server Microsoft Corp 6/13/2008 SIGMOD DBTest Cross feature testing in database systems.
The Architecture of Transaction Processing Systems
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
Technion – Israel Institute of Technology Department of Electrical Engineering Software Lab Grades Server on J2EE Technology Edo Yichie Sagee Rosen Supervisor:
Recovery Oriented Computing: Update Armando Fox (in loco Patterson) Summer ROC Retreat, June 2002.
Failure Avoidance through Fault Prediction Based on Synthetic Transactions Mohammed Shatnawi 1, 2 Matei Ripeanu 2 1 – Microsoft Online Ads, Microsoft Corporation.
Client/Server Architecture
Tiered architectures 1 to N tiers. 2 An architectural history of computing 1 tier architecture – monolithic Information Systems – Presentation / frontend,
Emmanuel Cecchet et al.  Performance Scalability of J2EE application servers.  Test effect of: ◦ Application Implementation Methods ◦ Container Design.
Why Recovery Should Be Free, And Often Can Be Armando Fox, Stanford University June 2003 ROC Retreat.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Chapter 5 Roles and features. objectives Performing management tasks using the Server Manager console Understanding the Windows Server 2008 roles Understanding.
1 Autonomic Computing An Introduction Guenter Kickinger.
Software Quality Assurance Lecture #8 By: Faraz Ahmed.
What is Architecture  Architecture is a subjective thing, a shared understanding of a system’s design by the expert developers on a project  In the.
1 RADS Conceptual Architecture Commodity Internet & IP networks Edge Network Distributed Middleware Client SLT Services Distributed Middleware Server Router.
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
4/2/03I-1 © 2001 T. Horton CS 494 Object-Oriented Analysis & Design Software Architecture and Design Readings: Ambler, Chap. 7 (Sections to start.
Microreboot. References 1.George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, Armando Fox, “Microreboot – A Technique for Cheap Recovery”,
 Chapter 13 – Dependability Engineering 1 Chapter 12 Dependability and Security Specification 1.
Ch 2 – Application Assembly and Deployment COSC 617 Jeff Schmitt September 14, 2006.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Chapter 2: A Brief History Object- Oriented Programming Presentation slides for Object-Oriented Programming by Yahya Garout KFUPM Information & Computer.
Java server pages. A JSP file basically contains HTML, but with embedded JSP tags with snippets of Java code inside them. A JSP file basically contains.
EEC 688/788 Secure and Dependable Computing Lecture 8 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Service Primitives for Internet Scale Applications Amr Awadallah, Armando Fox, Ben Ling Computer Systems Lab Stanford University.
Combining Statistical Monitoring and Predictable Recovery for Self-Management Armando Fox, Emre Kıcıman, Stanford University Dave Patterson, Mike Jordan,
System-Directed Resilience for Exascale Platforms LDRD Proposal Ron Oldfield (PI)1423 Ron Brightwell1423 Jim Laros1422 Kevin Pedretti1423 Rolf.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
Recovery-Oriented Computing Detecting and Diagnosing Application-Level Failures in Internet Services Emre Kıcıman and Armando Fox {emrek,
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox
Introduction to EJB. What is an EJB ?  An enterprise java bean is a server-side component that encapsulates the business logic of an application. By.
Free Recovery: A Step Towards Self-Managing State Andy Huang and Armando Fox Stanford University.
Java Programming: Advanced Topics 1 Enterprise JavaBeans Chapter 14.
Outsourcing, subcontracting and COTS Tor Stålhane.
Chapter 2 Database Environment.
Mick Badran Using Microsoft Service Fabric to build your next Solution with zero downtime – Lvl 300 CLD32 5.
A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling, Emre Kiciman, Armando Fox
EJB Enterprise Java Beans JAVA Enterprise Edition
Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
DStore: An Easy-to-Manage Persistent State Store Andy Huang and Armando Fox Stanford University.
Plug-In Architecture Pattern. Problem The functionality of a system needs to be extended after the software is shipped The set of possible post-shipment.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Welcome to the Winter 2004 ROC Retreat
The Case for a Session State Storage Layer
Improving searches through community clustering of information
Maximum Availability Architecture Enterprise Technology Centre.
EADD – Introduction Chapter -1.
RM3G: Next Generation Recovery Manager
Presentation Title Global-scale systems that know when they are behaving badly NSF workshop on grand challenges in distributed systems Jeff Mogul, HP.
Web Application Server 2001/3/27 Kang, Seungwoo. Web Application Server A class of middleware Speeding application development Strategic platform for.
EEC 688/788 Secure and Dependable Computing
Component-based Applications
Decoupled Storage: “Free the Replicas!”
Component Technology Bina Ramamurthy 2/25/2019 B.Ramamurthy.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Progress Report Armando Fox with George Candea, James Cutler, Ben Ling, Andy Huang

© 2002 Armando Fox Philosophical Direction n Use only dynamic, observed behavior to determine recovery technique/policy n Application independent recovery techniques n Specialize designs for fast recovery n Putting it all together: all software should be crash-only

© 2002 Armando Fox Dynamic, Observed Behavior n A priori fault models are suspect. Base recovery strategy only on dynamically observed behavior. l Behavior may change as system or workload evolves => addresses a key difference between Internet-oriented ROC systems and traditional mission-critical systems n Kinds of observations l PinPoint: use statistical analysis to determine which groups of components are correlated with observed external faults l Automatic failure-propagation inference: use fault injection and tracing to determine propagation paths and extent of different kinds of faults

© 2002 Armando Fox Making techniques application-generic n True application-generic recovery is hard [Lowell & Chen] l But that’s because “generic” applications are too unconstrained n Idea: if an application uses a particular “rich runtime”, that runtime may constrain application structure n Example: J2EE, a widely used enterprise app. framework l Modular Java applications, well defined component boundaries l Rich runtime system (“application server”) provides services for deployment/undeployment, naming, load balancing, integration with Web servers & databases, etc. l Instrument the platform with generic methods for fault injection and recovery (e.g., using Recursive Restartability) l Generic mechanisms: timeouts, exception propagation l Parametrizable mechanisms: progress counters, application-level pings

© 2002 Armando Fox Example: Automatic Failure Propagation Inference n When a failure occurs in a particular software component of an application, how far does it propagate? l i.e., what part(s) of the application must be recovered l Traditionally, failure propagation information is derived by hand n Our approach: modify J2EE application server to allow capture of failure-propagation information in any J2EE app n Automatic Failure-Propagation Inference (AFPI) for JBoss: + automatically and dynamically generates f-maps with no performance overhead + no application knowledge required + finds dependencies that other analyses might miss, omits “false” dependencies that don’t result in actual failure propagation

© 2002 Armando Fox Design for Fast Recovery n Recursive Restartability as a technique for recovery assumes... l For correctness: All components are independent and restartable (ie no data loss or other bad effects) l For performance: Restarts are relatively fast n For stateless components, this is “easy”; what about stateful components? l Correctness: eg, filesystems may suffer data loss if OS not cleanly shut down l Performance: eg, commercial RDBMS’s are crash-safe, but take a long time (minutes to hours) to recover

© 2002 Armando Fox Fast-Recovering State Stores n Isolate state exclusively in state store components; make all other “application logic” components stateless n Instead of building a general state store, specialize it for its intended use l Goal: identify combination of specializations that facilitates construction of a very-large-scale state store (O(10 3 ) requests/sec on O(10 6 ) entries) with near-zero recovery time n Possible axes for specialization… l Is state shared across clients or not? (user profile/session state vs. updating a message board) l How powerful must the query API be? (single-key lookup, free-text search, fully relational…) l What is the intended lifetime of state? (short/session, long/forever)

© 2002 Armando Fox Putting it together: crash-only software n Already assumed: software must be able to recover from a crash rapidly and correctly n But if it can do that…then why include separate code paths for “clean shutdown”? n All software should be crash-only; this makes it robust, easy to administer/upgrade, and amenable to RR as a recovery technique (among others) n Current explorations: l RR-ifying the platform (J2EE appserver) vs. individual applications l Improving ability to detect anomalies and failure correlations using path- based statistical analysis l Designing crash-only state stores for both session state and persistent state

© 2002 Armando Fox Outrageous Opinions session tomorrow n tomorrow after dinner: controversial ideas/opinions, open challenges, predicting the future,... l Please sign up on easel (coming this afternoon) l ~5-8 minutes per person to pound the pulpit and stimulate later discussion n Retreat proceedings, slides, etc. (mostly) online l Internet keyword “retreat” :-) or or