1 Philippe. Team 3: Spam’n’Beans 17-654: Analysis of Software Artifacts 18-846: Dependability Analysis of Middleware Gary Ackley Andrew Boyer Charles.

Slides:



Advertisements
Similar presentations
Computer Systems & Architecture Lesson 2 4. Achieving Qualities.
Advertisements

Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Tableau Software Australia
Performance Testing - Kanwalpreet Singh.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 6 Managing and Administering DNS in Windows Server 2008.
Remote Procedure Call (RPC)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Introduction to DBA.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Scalable Content-aware Request Distribution in Cluster-based Network Servers Jianbin Wei 10/4/2001.
NETWORK LOAD BALANCING NLB.  Network Load Balancing (NLB) is a Clustering Technology.  Windows Based. (windows server).  To scale performance, Network.
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Making Services Fault Tolerant
A Dependable Auction System: Architecture and an Implementation Framework
Technical Architectures
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Team 1: Box Office : Analysis of Software Artifacts : Dependability Analysis of Middleware JunSuk Oh, YounBok Lee, KwangChun Lee, SoYoung Kim,
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Team 2: The House Party Blackjack Mohammad Ahmad Jun Han Joohoon Lee Paul Cheong Suk Chan Kang.
Chris Shuster 4/29/2009 1Chris Shuster.  Application Servers ◦ Backend processing platform. ◦ Multiple platforms, operating system and architecture.
Lesson 1: Configuring Network Load Balancing
Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.
Implementing High Availability
By Mohammad Alsawwaf Supervised By Dr. Lee NETWORK LOAD BALANCING NLB.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
1 Oracle 9i AS Availability and Scalability Margaret H. Mei Senior Product Manager, ST.
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
1 The Google File System Reporter: You-Wei Zhang.
Module 13: Network Load Balancing Fundamentals. Server Availability and Scalability Overview Windows Network Load Balancing Configuring Windows Network.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
High-Availability Linux.  Reliability  Availability  Serviceability.
/11/2003 C-JDBC: a High Performance Database Clustering Middleware Nicolas Modrzyk
CH2 System models.
User Manager Pro Suite Taking Control of Your Systems Joe Vachon Sales Engineer November 8, 2007.
CHEN Ge CSIS, HKU March 9, Jigsaw W3C’s Java Web Server.
Autonomic SLA-driven Provisioning for Cloud Applications Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer Presented by Ismail Alan.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Module 11: Implementing ISA Server 2004 Enterprise Edition.
Introduction. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
Team 5: Virtual Online Blackjack : Analysis of Software Artifacts : Dependability Analysis of Middleware Philip Bianco John Robert Vorachat.
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
Computing Infrastructure for Large Ecommerce Systems -- based on material written by Jacob Lindeman.
Usenix Annual Conference, Freenix track – June 2004 – 1 : Flexible Database Clustering Middleware Emmanuel Cecchet – INRIA Julie Marguerite.
Web Cache Redirection using a Layer-4 switch: Architecture, issues, tradeoffs, and trends Shirish Sathaye Vice-President of Engineering.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Implementing Simple Replication Protocols using CORBA Portable Interceptors and Java Serialization T. Bennani, L. Blain, L. Courtes, J.-C. Fabre, M.-O.
SMS Software Distribution. Overview  Explaining How SMS Distributes Software  Managing Distribution Points  Configuring Software Distribution and the.
Chap 7: Consistency and Replication
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
The CoBFIT Toolkit PODC-2007, Portland, Oregon, USA August 14, 2007 HariGovind Ramasamy IBM Zurich Research Laboratory Mouna Seri and William H. Sanders.
The Project Presentation April 28, : Fault-Tolerant Distributed Systems Team 7-Sixers Kyu Hou Minho Jeung Wangbong Lee Heejoon Jung Wen Shu.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
EJB Enterprise Java Beans JAVA Enterprise Edition
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
AMHS (ATS Message Handling System)
Network Load Balancing
Replication Middleware for Cloud Based Storage Service
Systems Issues for Scalable, Fault Tolerant Internet Services
Design pattern for cloud Application
Presentation transcript:

1 Philippe

Team 3: Spam’n’Beans : Analysis of Software Artifacts : Dependability Analysis of Middleware Gary Ackley Andrew Boyer Charles Fry Philippe M. Wilson

3 Team Members Gary Ackley Andrew Boyer Charles Fry Philippe M. Wilson

4 Background Project: Spam’n’Beans What is it? –An content analysis system –Benefits servers by offloading expensive analysis to our system –Exhibits Fault-Tolerant, Real-Time and High- Performance qualities

5 Background What makes it interesting? – analysis is a real-world issue Unsolicited Commercial (UCE) / SPAM accounts for 60-80% of all traffic viruses pose a security risk –Such analysis is often too expensive for high- volume mail servers –Few products yet exist to address this need: Amavisd-new Postini

6

7 Development Environment Language: Java Middleware: EJB (JBoss) Database: PostgreSQL Content Filtering: SpamAssassin Operating System: GNU/Linux Test Data: Public Corpus –Database of real-world –Made available by SpamAssassin

8 High-Level Overview A client’s MTA (Mail Transport Agent) uses the Spam’n’Beans client to send incoming messages to a cluster of replicas for filtering Each replica runs all necessary content filters as daemon processes replica-side middleware accepts incoming from clients and feeds it to the appropriate daemon processes over a local socket connection

9 High-Level Overview

10 Baseline Architecture

11 Gary

12 Fault-Tolerance Goals processing servers are replicated to guarantee availability of service despite faults on any one replica –System will continue to be available despite up to N-1 faults (N is the number of replicas) –Clients will continue to retry when no replicas are active State, stored in a remote database, consists of: –Replica state and statistics –Client authentication information –Message state and statistics –Client requests are idempotent Non-Replicated Components –Replication Manager / Fault Detector –Database backend –EJB Nameserver

13 Fault-Tolerance Elements Replication Manager –A process that starts/stops replicas and manage list of available replicas Fault Detector –A dedicated thread monitors each replica Fault Recovery –Monitor thread will re-start replica automatically as needed Fault Injector –A separate script used during testing –Forcefully kills a random replica every S seconds

14 FT-Baseline Architecture

15 Fault Detection Client application receives exception and reports it to the Replication Manager –From EJB (Remote Exception) –From server application (Fatal Exception, Non-Fatal Exception) Periodic ping by Fault Detector –K failures initiates replica re-start

16 Client-Side Fail-Over Notify Replication Manager of replica failure Request another replica –Retry if none are available Connect to new replica and re-issue original request

17 Fail-Over Measurements ms Message #

18 Charles

19 Real-Time Baseline Bounded fail-over achieved by: –Removing replicas from the pool when Client disables replica use after receiving exception Fault Detector identifies unresponsive replica –Only choosing live replicas on fail-over

20 Bounded Fail-Over Measurements Fail-over now bounded by 600 ms Fail-over time reduced by 1 order of magnitude ms Message #

21 Performance Strategy Clustering –Any middle-tier replica can handle any request –All replicas handle requests in parallel Load Balancing –Minimize response latency –Adjusts to Static system resources Dynamic system utilization

22 Load Balancer Implementation Load Balancer on golden machine –Maintains list of all live replicas and their associated load Replica load is updated by Fault Detector ping Clients request replicas from Load Balancer –Every M messages Load balancing strategies: –Round-Robin –Priority (inversely proportional to relative CPU load)

23 Round Robin Performance ms Message #

24 Priority Based Performance ms Message #

25 Andrew

26 Other Features Multi-threaded administrative console Run-time replica management –Individual replicas can be added/removed as needed Run-time selection of load balancing strategy Optimization for transient failures –Don’t restart a replica until it has been unreachable for K pings –Verify client-reported errors

27 Insights from Measurements System bottleneck is CPU-intensive analysis Message processing time is highly correlated with message size Increases in system load cause temporary increases in jitter and delay

28 Fixed Big Message (~90KB) ms Message #

29 Variable Sized Messages ms Message #

30 Fixed Small Messages (~0.4KB) ms Message #

31 Open Issues Multiple simultaneous replica connections Increase throughput –Experiment with other load-balancing strategies –Add automatic capacity scaling –Enqueue client requests Add virus checking (via ClamAV) Remove single points of failure Enhance administrative consoles –Add graphical/web interface

32 Conclusions What did we learn? –Tradeoffs between fault-tolerance, real-time, and performance can be difficult to manage What did we accomplish? –We built a working system with fault- tolerance, real-time and high-performance attributes to solve a real-world problem What would we do differently now? –Start with better architecture definition –Adhere to “KISS” principle

33 Q & A Any questions?