An open source middleware to build Redundant Array of Inexpensive Databases Emmanuel Cecchet.

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

DynaTrace Platform.
Tableau Software Australia
Distributed Data Processing
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
ObjectWeb Architecture Meeting, 25 sept 03 Our experience of open source.
Database Architectures and the Web
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
City University London
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Overview Distributed vs. decentralized Why distributed databases
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Overview SAP Basis Functions. SAP Technical Overview Learning Objectives What the Basis system is How does SAP handle a transaction request Differentiating.
Sitefinity Performance and Architecture
Dynamics AX Technical Overview Application Architecture Dynamics AX Technical Overview.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
GORDA Kickoff meeting INRIA – Sardes project Emmanuel Cecchet Sara Bouchenak.
Highly available web sites with Tomcat and Clustered JDBC Emmanuel Cecchet.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
JOnAS developer workshop – /02/2004 status Emmanuel Cecchet
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
/11/2003 C-JDBC: a High Performance Database Clustering Middleware Nicolas Modrzyk
© Emic 2005 Emmanuel Cecchet | ApacheCon EU Building Highly Available Database Applications for Apache Derby Emmanuel Cecchet.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Chokchai Junchey Microsoft Product Specialist Certified Technical Training Center.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Copyright 2006 MySQL AB The World’s Most Popular Open Source Database MySQL Cluster: An introduction Geert Vanderkelen MySQL AB.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Module 13 Implementing Business Continuity. Module Overview Protecting and Recovering Content Working with Backup and Restore for Disaster Recovery Implementing.
Applications Web et bases de données en grappe Séminaire InTech 3 Février 2005 – Grenoble.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Usenix Annual Conference, Freenix track – June 2004 – 1 : Flexible Database Clustering Middleware Emmanuel Cecchet – INRIA Julie Marguerite.
ArcGIS Server for Administrators
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Databases Illuminated
Highly available database clusters with JDBC
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Nicolas Modrzyk /02/2004 C-JDBC: a High Performance Database Clustering.
Module 7: SQL Server Special Considerations. Overview SQL Server High Availability Unicode.
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
Chapter 1 Database Access from Client Applications.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Oracle Database Architectural Components
Introduction to Distributed Platforms
Open Source distributed document DB for an enterprise
Consulting Services JobScheduler Architecture Decision Template
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Storage Virtualization
Introduction of Week 6 Assignment Discussion
Database System Architectures
Presentation transcript:

An open source middleware to build Redundant Array of Inexpensive Databases Emmanuel Cecchet

/11/2003 Budget: 120 M€ (tax not incl.) 900 permanent staff 400 researchers 500 engineers, technical and administrative 450 researchers from other organizations 700 Ph.D students 200 external collaborators 750 trainees, post-doctoral students, visiting researchers from abroad (universities or industry)) 6 Research Units A scientific force of 3,000 INRIA key figures Jan A public scientific and technological research institute in computer science and control under the dual authority of the Ministry of Research and the Ministry of Industry INRIA Rhône-Alpes

/11/2003 Itanium-2 processors 104 nodes (Dual 64 bits 900 MHz processors, 3 GB memory, 72 GB local disk) connected through a Myrinet network 208 processors, 312 GB memory, 7.5 TB disk Connected to the GRID Linux OS (RedHat Advanced Server) First Linpack experiments at INRIA (Aug. 03) have reached 560 GFlop/s Applications : Grid computing, classical scientific computing, high performance Internet servers, … iCluster 2

/11/2003 ObjecWeb key figures è Open source middleware development è Based on open standard  J2EE, CORBA, OSGi… è International consortium  Founded by INRIA, Bull and FT R&D in 2001 è Academic partners  European universities and research centers è Industrial partners  RedHat, Suse, …  NEC, Bull, France Telecom, …

/11/2003 Common Software & Architecture for Component Based Development OSGi Hardware & OS Applications J2EECCM... Dedicated Middleware Platform Networked Resources JVM TransactionsPersistence Deployment... Composition... Application Servers Database Connectivity Interoperability Tools FrameworksComponents... JOnAS OpenCCM JEFREE ProActiive JORM/MEDOR JOTMOSCAR Fractal Think JORAM DotNetJ C-JDBC RmiJdbc Octopus Enhydra Zeus Kilim CAROL Jonathan Speedo RUBiS Bonita JAWE XMLC JMOB

/11/2003 Outline è Motivations è RAIDb è C-JDBC è Performance è Lessons learned è Conclusion

/11/2003 Why did we design ? è Scalability evaluation of J2EE servers  performance bounded by database even with a single server  how to compare middleware performance ?  how to evaluate clustering features in J2EE servers ? è Solutions  Large SMP machine: too expensive  Open source solution: do it yourself! è Features we wanted ordered by priority  scalability  on commodity hardware  using open source databases  fault tolerance (high availability + failover)  without modifying the client application

/11/2003 Internet How do we want to use ? è From small dynamic content web sites using a centralized open source database è To an end-to-end open source solution for large scale J2EE clustered application servers Apache MySQL

/11/2003 Sardes project objectives: Autonomic J2EE clusters Self administration and reconfiguration Apache MySQL JSP HTML Tomcat admin module Monitoring JMX JVM EJB JMX JVM JOnAS admin module C-JDBC admin module Fault tolerance policy Load balancing policy JMX Apache admin module Event channels SNMP Network QoS

/11/2003 Outline è Motivations è RAIDb è C-JDBC è Performance è Lessons learned è Conclusion

/11/2003 RAIDb - Definition è Redundant Array of Inexpensive Databases è better performance and fault tolerance than a single database, at a low cost, by combining multiple database instances into an array of databases è RAIDb levels offers various tradeoff of performance and fault tolerance

/11/2003 Key ideas è RAIDb controller  gives the view of a single database to the client  balance the load on the database backends è RAIDb levels  RAIDb-0: full partitioning  RAIDb-1: full mirroring  RAIDb-2: partial replication  composition possible

/11/2003 RAIDb levels è RAIDb-0  partitioning  no duplication and no fault tolerance  at least 2 nodes

/11/2003 RAIDb levels è RAIDb-1  mirroring  performance bounded by write broadcast  at least 2 nodes

/11/2003 RAIDb levels è RAIDb-1ec  mirroring + error checking  support Byzantine failures  error checking read request sent to multiple databases replies compared result returned only if a quorum is reached  at least 3 nodes

/11/2003 RAIDb levels è RAIDb-2  partial replication  at least 2 copies of each table  at least 3 nodes

/11/2003 RAIDb levels composition è RAIDb-0-1  RAIDb-0 at the top level  RAIDb-1 underneath

/11/2003 RAIDb levels composition è RAIDb-1-0  no limit to the composition deepness

/11/2003 Outline è Motivations è RAIDb è C-JDBC  Overview  Internals  Scalability  Checkpointing and Recovery è Performance è Lessons learned è Conclusion

/11/2003 C-JDBC – Key ideas è Middleware implementing RAIDb è Two components  generic JDBC 2.0 driver (C-JDBC driver)  C-JDBC Controller è C-JDBC Controller provides  performance scalability  high availability  failover  caching, logging, monitoring, … è Supports heterogeneous databases

/11/2003 C-JDBC – Overview

/11/2003 C-JDBC RAIDb-1 example è no client code modification è original PostgreSQL driver and RDBMS engine

/11/2003 C-JDBC RAIDb-2 example è supports cluster of heterogeneous RDBMS è unload a single Oracle DB with several MySQL

/11/2003 Outline è Motivations è RAIDb è C-JDBC  Overview  Internals  Scalability  Checkpointing and Recovery è Performance è Lessons learned è Conclusion

/11/2003 Inside the C-JDBC Controller Sockets JMX

/11/2003 Virtual Database è gives the view of a single database è establishes the mapping between the database name used by the application and the backend specific settings è backends can be added and removed dynamically è configured using an XML configuration file

/11/2003 Authentication Manager è Matches real login/password used by the application with backend specific login/ password è Administrator login to manage the virtual database

/11/2003 Scheduler è Manages concurrency control è Specific implementations for Single DB, RAIDb 0, 1 and 2 è Query-level è Optimistic and pessimistic transaction level  uses the database schema that is automatically fetched from backends

/11/2003 Request cache è caches results from SQL requests è improved SQL statement analysis to limit cache invalidations  table based invalidations  column based invalidations  single-row SELECT optimization è request parsing possible in the C-JDBC driver  offload the controller  parsing caching in the driver

/11/2003 Load balancer 1/2 è RAIDb-0  query directed to the backend having the needed tables è RAIDb-1  read executed by current thread  write executed in parallel by a dedicated thread per backend  result returned if one, majority or all commit  if one node fails but others succeed, failing node is disabled è RAIDb-2  same as RAIDb-1 except that writes are sent only to nodes owning the written table

/11/2003 Load balancer 2/2 è Static load balancing policies  Round-Robin (RR)  Weighted Round-Robin (WRR) è Least Pending Requests First (LPRF)  request sent to the node that has the shortest pending request queue  efficient if backends are homogeneous in terms of performance

/11/2003 Connection Manager è Connection pooling for a backend  Simple : no pooling  RandomWait : blocking pool  FailFast : non-blocking pool  VariablePool : dynamic pool è Connection pools defined on a per login basis  resource management per login  dedicated connections for admin

/11/2003 Recovery Log è Checkpoints are associated with database dumps è Record all updates and transaction markers since a checkpoint è Used to resynchronize a database from a checkpoint è JDBCRecoveryLog  store information in a database  can be re-injected in a C-JDBC cluster for fault tolerance

/11/2003 Outline è Motivations è RAIDb è C-JDBC  Overview  Internals  Scalability  Checkpointing and Recovery è Performance è Lessons learned è Conclusion

/11/2003 C-JDBC scalability è Horizontal scalability  prevents the controller to be a Single Point Of Failure (SPOF)  distributes the load among several controllers  uses group communications for synchronization è C-JDBC Driver  multiple controllers automatic failover jdbc:c-jdbc://node1:25322,node2:12345/myDB  connection caching  URL parsing/controller lookup caching

/11/2003 C-JDBC – Horizontal scalability è Writes broadcasted by JGroups è Each backend is accessed in write by only one controller but possibly shared by all for reads è global transaction id computed locally è Group commit only for write transactions

/11/2003 C-JDBC scalability è Vertical scalability  allows nested RAIDb levels  allows tree architecture for scalable write broadcast  necessary with large number of backends  C-JDBC driver re-injected in C-JDBC controller

/11/2003 C-JDBC vertical scalability è RAIDb-1-1 with C-JDBC è no limit to composition deepness

/11/2003 C-JDBC vertical scalability è RAIDb-0-1 with C-JDBC

/11/2003 Outline è Motivations è RAIDb è C-JDBC  Overview  Internals  Scalability  Checkpointing and Recovery è Performance è Lessons learned è Conclusion

/11/2003 Checkpointing è Octopus is an ETL tool è Use Octopus to store a dump of the initial database state è Currently done by the user using the database specific dump tool

/11/2003 Checkpointing è Backend is enabled è All database updates are logged (SQL statement, user, transaction, …)

/11/2003 Checkpointing è Add new backends while system online è Restore dump corresponding to initial checkpoint with Octopus

/11/2003 Checkpointing è Replay updates from the log

/11/2003 Checkpointing è Enable backends when done

/11/2003 Making new checkpoints è Disable one backend to have a coherent snapshot è Mark the new checkpoint entry in the log è Use Octopus to store the dump

/11/2003 Making new checkpoints è Replay missing updates from log

/11/2003 Making new checkpoints è Re-enable backend when done

/11/2003 Recovery è A node fails! è Automatically disabled but should be fixed or changed by administrator

/11/2003 Recovery è Restore latest dump with Octopus

/11/2003 Recovery è Replay missing updates from log

/11/2003 Recovery è Re-enable backend when done

/11/2003 Outline è Motivations è RAIDb è C-JDBC è Performance è Lessons learned è Conclusion

/11/2003 TPC-W è Browsing mix performance

/11/2003 TPC-W è Shopping mix performance

/11/2003 TPC-W è Ordering mix performance

/11/2003 Fine-grain caching è Cache hit rate with TPC-W  browsing mix  only one database backend Throughput Response time Hit rate No cache9.1 req/s3.30s Table12.9 req/s1.96s12.6% Column16 req/s1.36s48.8% Column+ single-row 16 req/s1.35s49.2%

/11/2003 Fine-grain caching è Cache hit rate with TPC-W  shopping mix  only one database backend Throughput Response time Hit rate No cache12.8 req/s3.11s Table13.5 req/s2.58s3.5% Column19.0 req/s0.93s30.0% Column+ single-row 20.2 req/s0.84s30.4%

/11/2003 Outline è Motivations è RAIDb è C-JDBC è Performance è Lessons learned è Conclusion

/11/2003 Why did we design ? … Reloaded è Features we wanted ordered by priority  scalability  on commodity hardware  using open source databases  fault tolerance (high availability + failover)  without modifying the client application

/11/2003 Why users are using ? è JDBC standard è Open source solution è Features they want ordered by priority  fault tolerance (high availability + failover) [was 4/5]  using open source existing databases [was 3/5]  on commodity hardware [was 2/5]  administration tools [was not]  security [was not]  scalability [was 1/5]  without modifying the client application [was 5/5]

/11/2003 How users are using ? è Hard to really know è Just default settings! è Most common usage  existing applications (Tomcat/JBoss/JOnAS) with one MySQL/Postgres backend  add a second backend for fault tolerance and scalability è For things it was not designed for  write mostly workloads  distributed databases  hosting centers (administration tools missing)

/11/2003 Lessons learned è Users do not use it for what it was first designed for  advanced features are never used  concerned about ease of use and TCO è Default settings are important è Good technology is necessary but not sufficient  [administration] tools are needed  minor bugs are ok for open source users

/11/2003 Open problems è Partition of clusters è Users want control on failure policy è Reconciliation must also be user controlled

/11/2003 Open problems è Opening the architecture to the users  user defined strategies when a fault or exception occurs  which interfaces/callbacks to provide ? è Monitoring  needed for more accurate load balancing algorithms è Benchmarking  need automatic evaluation of clustered servers  platform available: new INRIA 208 itanium-2 cluster è Sun Test Suite  should help strengthening C-JDBC code  interoperability with J2EE servers

/11/2003 Outline è Motivations è RAIDb è C-JDBC è Performance è Lessons learned è Conclusion

/11/2003 Current status è C-JDBC 1.0b15 release  Generic JDBC 2.0 driver  Schedulers and load balancers for RAIDb 0, 1 and 2  Fine grain query caching  JDBC recovery log  Logger/request player  Java installer  User documentation è Currently missing  Octopus integration  Recovery for horizontal scalability  RAIDb-1ec and RAIDb-2ec  Dynamic reconfiguration

/11/2003 Stats as of Nov 6, 03 è Downloads  total : > 8300 downloads since may 2003  last 30 days : > 2800 downloads  > hits since first release  2nd most downloaded ObjectWeb project è Mailing lists  101 subscribers  18 subscribers è Team  9 committers  1 full-time INRIA engineer

/11/2003 Conclusion è RAIDb  classification of replication techniques  difficult to publish è C-JDBC  open platform for database replication at the middleware level  RDBMS independent  no application modification required è Lot of features missing … join us !

Questions ? Visit

Bonus slides

/11/2003 Fine-grain caching è increased throughput è better response time è even with a single database backend

/11/2003 How do we build a community? è Necessary features (but not sufficient)  open source  standard API  responsiveness on the mailing list è Visibility  Web: slashdot, TheServerSide, freshmeat, …  Conferences: JAX, Middleware, LinuxWorld, ICAR, … è Our weak points  no detailed design documentation  beta phase

/11/2003 How do we interact with the user community? è only one mailing list è being very responsive on the mailing list  reply even if we don’t have a response yet  no direct communication with team but share everything on the mailing list è benefit from engineers who work 1 week full- time to evaluate C-JDBC for their corporation è plan every feature request in the task list

/11/2003 How do we interact with the developer community? è single user/developer mailing list è post all design questions/choices on the mailing list  most users use default settings  hard to get feedback about usage è very permissive to accept new commiters  8 committers (3 outside ObjectWeb – 2 full time)  2 contributors who didn’t want to become committers  no problem so far è involve people in testing

/11/2003 Lessons learned è Visibility  perpetual involvement  time consuming but necessary è Responsiveness to user queries  always on the mailing list  makes first impression for many users è Involve users in all decisions è Be open  source, CVS, contributions, patches, …

/11/2003 Freshmeat.net è Links to projects è Users can subscribe to be notified of new releases è Need to register project in all possible categories to have good visibility è One release per week is a good timing è 3 new subscribers with every release Slashdot Weekly release Friday release out of town