Presentation is loading. Please wait.

Presentation is loading. Please wait.

An open source middleware to build Redundant Array of Inexpensive Databases Emmanuel Cecchet.

Similar presentations


Presentation on theme: "An open source middleware to build Redundant Array of Inexpensive Databases Emmanuel Cecchet."— Presentation transcript:

1 An open source middleware to build Redundant Array of Inexpensive Databases Emmanuel Cecchet

2 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 2 - 07/11/2003 Budget: 120 M€ (tax not incl.) 900 permanent staff 400 researchers 500 engineers, technical and administrative 450 researchers from other organizations 700 Ph.D students 200 external collaborators 750 trainees, post-doctoral students, visiting researchers from abroad (universities or industry)) 6 Research Units A scientific force of 3,000 INRIA key figures Jan. 2003 A public scientific and technological research institute in computer science and control under the dual authority of the Ministry of Research and the Ministry of Industry INRIA Rhône-Alpes

3 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 3 - 07/11/2003 Itanium-2 processors 104 nodes (Dual 64 bits 900 MHz processors, 3 GB memory, 72 GB local disk) connected through a Myrinet network 208 processors, 312 GB memory, 7.5 TB disk Connected to the GRID Linux OS (RedHat Advanced Server) First Linpack experiments at INRIA (Aug. 03) have reached 560 GFlop/s Applications : Grid computing, classical scientific computing, high performance Internet servers, … iCluster 2

4 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 4 - 07/11/2003 ObjecWeb key figures è Open source middleware development è Based on open standard  J2EE, CORBA, OSGi… è International consortium  Founded by INRIA, Bull and FT R&D in 2001 è Academic partners  European universities and research centers è Industrial partners  RedHat, Suse, …  NEC, Bull, France Telecom, …

5 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 5 - 07/11/2003 Common Software & Architecture for Component Based Development OSGi Hardware & OS Applications J2EECCM... Dedicated Middleware Platform Networked Resources JVM TransactionsPersistence Deployment... Composition... Application Servers Database Connectivity Interoperability Tools FrameworksComponents... JOnAS OpenCCM JEFREE ProActiive JORM/MEDOR JOTMOSCAR Fractal Think JORAM DotNetJ C-JDBC RmiJdbc Octopus Enhydra Zeus Kilim CAROL Jonathan Speedo RUBiS Bonita JAWE XMLC JMOB

6 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 6 - 07/11/2003 Outline è Motivations è RAIDb è C-JDBC è Performance è Lessons learned è Conclusion

7 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 7 - 07/11/2003 Why did we design ? è Scalability evaluation of J2EE servers  performance bounded by database even with a single server  how to compare middleware performance ?  how to evaluate clustering features in J2EE servers ? è Solutions  Large SMP machine: too expensive  Open source solution: do it yourself! è Features we wanted ordered by priority  scalability  on commodity hardware  using open source databases  fault tolerance (high availability + failover)  without modifying the client application

8 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 8 - 07/11/2003 Internet How do we want to use ? è From small dynamic content web sites using a centralized open source database è To an end-to-end open source solution for large scale J2EE clustered application servers Apache MySQL

9 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 9 - 07/11/2003 Sardes project objectives: Autonomic J2EE clusters Self administration and reconfiguration Apache MySQL JSP HTML Tomcat admin module Monitoring JMX JVM EJB JMX JVM JOnAS admin module C-JDBC admin module Fault tolerance policy Load balancing policy JMX Apache admin module Event channels SNMP Network QoS

10 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 10 - 07/11/2003 Outline è Motivations è RAIDb è C-JDBC è Performance è Lessons learned è Conclusion

11 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 11 - 07/11/2003 RAIDb - Definition è Redundant Array of Inexpensive Databases è better performance and fault tolerance than a single database, at a low cost, by combining multiple database instances into an array of databases è RAIDb levels offers various tradeoff of performance and fault tolerance

12 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 12 - 07/11/2003 Key ideas è RAIDb controller  gives the view of a single database to the client  balance the load on the database backends è RAIDb levels  RAIDb-0: full partitioning  RAIDb-1: full mirroring  RAIDb-2: partial replication  composition possible

13 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 13 - 07/11/2003 RAIDb levels è RAIDb-0  partitioning  no duplication and no fault tolerance  at least 2 nodes

14 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 14 - 07/11/2003 RAIDb levels è RAIDb-1  mirroring  performance bounded by write broadcast  at least 2 nodes

15 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 15 - 07/11/2003 RAIDb levels è RAIDb-1ec  mirroring + error checking  support Byzantine failures  error checking read request sent to multiple databases replies compared result returned only if a quorum is reached  at least 3 nodes

16 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 16 - 07/11/2003 RAIDb levels è RAIDb-2  partial replication  at least 2 copies of each table  at least 3 nodes

17 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 17 - 07/11/2003 RAIDb levels composition è RAIDb-0-1  RAIDb-0 at the top level  RAIDb-1 underneath

18 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 18 - 07/11/2003 RAIDb levels composition è RAIDb-1-0  no limit to the composition deepness

19 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 19 - 07/11/2003 Outline è Motivations è RAIDb è C-JDBC  Overview  Internals  Scalability  Checkpointing and Recovery è Performance è Lessons learned è Conclusion

20 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 20 - 07/11/2003 C-JDBC – Key ideas è Middleware implementing RAIDb è Two components  generic JDBC 2.0 driver (C-JDBC driver)  C-JDBC Controller è C-JDBC Controller provides  performance scalability  high availability  failover  caching, logging, monitoring, … è Supports heterogeneous databases

21 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 21 - 07/11/2003 C-JDBC – Overview

22 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 22 - 07/11/2003 C-JDBC RAIDb-1 example è no client code modification è original PostgreSQL driver and RDBMS engine

23 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 23 - 07/11/2003 C-JDBC RAIDb-2 example è supports cluster of heterogeneous RDBMS è unload a single Oracle DB with several MySQL

24 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 24 - 07/11/2003 Outline è Motivations è RAIDb è C-JDBC  Overview  Internals  Scalability  Checkpointing and Recovery è Performance è Lessons learned è Conclusion

25 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 25 - 07/11/2003 Inside the C-JDBC Controller Sockets JMX

26 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 26 - 07/11/2003 Virtual Database è gives the view of a single database è establishes the mapping between the database name used by the application and the backend specific settings è backends can be added and removed dynamically è configured using an XML configuration file

27 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 27 - 07/11/2003 Authentication Manager è Matches real login/password used by the application with backend specific login/ password è Administrator login to manage the virtual database

28 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 28 - 07/11/2003 Scheduler è Manages concurrency control è Specific implementations for Single DB, RAIDb 0, 1 and 2 è Query-level è Optimistic and pessimistic transaction level  uses the database schema that is automatically fetched from backends

29 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 29 - 07/11/2003 Request cache è caches results from SQL requests è improved SQL statement analysis to limit cache invalidations  table based invalidations  column based invalidations  single-row SELECT optimization è request parsing possible in the C-JDBC driver  offload the controller  parsing caching in the driver

30 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 30 - 07/11/2003 Load balancer 1/2 è RAIDb-0  query directed to the backend having the needed tables è RAIDb-1  read executed by current thread  write executed in parallel by a dedicated thread per backend  result returned if one, majority or all commit  if one node fails but others succeed, failing node is disabled è RAIDb-2  same as RAIDb-1 except that writes are sent only to nodes owning the written table

31 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 31 - 07/11/2003 Load balancer 2/2 è Static load balancing policies  Round-Robin (RR)  Weighted Round-Robin (WRR) è Least Pending Requests First (LPRF)  request sent to the node that has the shortest pending request queue  efficient if backends are homogeneous in terms of performance

32 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 32 - 07/11/2003 Connection Manager è Connection pooling for a backend  Simple : no pooling  RandomWait : blocking pool  FailFast : non-blocking pool  VariablePool : dynamic pool è Connection pools defined on a per login basis  resource management per login  dedicated connections for admin

33 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 33 - 07/11/2003 Recovery Log è Checkpoints are associated with database dumps è Record all updates and transaction markers since a checkpoint è Used to resynchronize a database from a checkpoint è JDBCRecoveryLog  store information in a database  can be re-injected in a C-JDBC cluster for fault tolerance

34 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 34 - 07/11/2003 Outline è Motivations è RAIDb è C-JDBC  Overview  Internals  Scalability  Checkpointing and Recovery è Performance è Lessons learned è Conclusion

35 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 35 - 07/11/2003 C-JDBC scalability è Horizontal scalability  prevents the controller to be a Single Point Of Failure (SPOF)  distributes the load among several controllers  uses group communications for synchronization è C-JDBC Driver  multiple controllers automatic failover jdbc:c-jdbc://node1:25322,node2:12345/myDB  connection caching  URL parsing/controller lookup caching

36 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 36 - 07/11/2003 C-JDBC – Horizontal scalability è Writes broadcasted by JGroups è Each backend is accessed in write by only one controller but possibly shared by all for reads è global transaction id computed locally è Group commit only for write transactions

37 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 37 - 07/11/2003 C-JDBC scalability è Vertical scalability  allows nested RAIDb levels  allows tree architecture for scalable write broadcast  necessary with large number of backends  C-JDBC driver re-injected in C-JDBC controller

38 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 38 - 07/11/2003 C-JDBC vertical scalability è RAIDb-1-1 with C-JDBC è no limit to composition deepness

39 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 39 - 07/11/2003 C-JDBC vertical scalability è RAIDb-0-1 with C-JDBC

40 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 40 - 07/11/2003 Outline è Motivations è RAIDb è C-JDBC  Overview  Internals  Scalability  Checkpointing and Recovery è Performance è Lessons learned è Conclusion

41 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 41 - 07/11/2003 Checkpointing è Octopus is an ETL tool è Use Octopus to store a dump of the initial database state è Currently done by the user using the database specific dump tool

42 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 42 - 07/11/2003 Checkpointing è Backend is enabled è All database updates are logged (SQL statement, user, transaction, …)

43 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 43 - 07/11/2003 Checkpointing è Add new backends while system online è Restore dump corresponding to initial checkpoint with Octopus

44 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 44 - 07/11/2003 Checkpointing è Replay updates from the log

45 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 45 - 07/11/2003 Checkpointing è Enable backends when done

46 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 46 - 07/11/2003 Making new checkpoints è Disable one backend to have a coherent snapshot è Mark the new checkpoint entry in the log è Use Octopus to store the dump

47 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 47 - 07/11/2003 Making new checkpoints è Replay missing updates from log

48 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 48 - 07/11/2003 Making new checkpoints è Re-enable backend when done

49 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 49 - 07/11/2003 Recovery è A node fails! è Automatically disabled but should be fixed or changed by administrator

50 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 50 - 07/11/2003 Recovery è Restore latest dump with Octopus

51 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 51 - 07/11/2003 Recovery è Replay missing updates from log

52 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 52 - 07/11/2003 Recovery è Re-enable backend when done

53 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 53 - 07/11/2003 Outline è Motivations è RAIDb è C-JDBC è Performance è Lessons learned è Conclusion

54 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 54 - 07/11/2003 TPC-W è Browsing mix performance

55 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 55 - 07/11/2003 TPC-W è Shopping mix performance

56 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 56 - 07/11/2003 TPC-W è Ordering mix performance

57 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 57 - 07/11/2003 Fine-grain caching è Cache hit rate with TPC-W  browsing mix  only one database backend Throughput Response time Hit rate No cache9.1 req/s3.30s Table12.9 req/s1.96s12.6% Column16 req/s1.36s48.8% Column+ single-row 16 req/s1.35s49.2%

58 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 58 - 07/11/2003 Fine-grain caching è Cache hit rate with TPC-W  shopping mix  only one database backend Throughput Response time Hit rate No cache12.8 req/s3.11s Table13.5 req/s2.58s3.5% Column19.0 req/s0.93s30.0% Column+ single-row 20.2 req/s0.84s30.4%

59 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 59 - 07/11/2003 Outline è Motivations è RAIDb è C-JDBC è Performance è Lessons learned è Conclusion

60 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 60 - 07/11/2003 Why did we design ? … Reloaded è Features we wanted ordered by priority  scalability  on commodity hardware  using open source databases  fault tolerance (high availability + failover)  without modifying the client application

61 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 61 - 07/11/2003 Why users are using ? è JDBC standard è Open source solution è Features they want ordered by priority  fault tolerance (high availability + failover) [was 4/5]  using open source existing databases [was 3/5]  on commodity hardware [was 2/5]  administration tools [was not]  security [was not]  scalability [was 1/5]  without modifying the client application [was 5/5]

62 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 62 - 07/11/2003 How users are using ? è Hard to really know è Just default settings! è Most common usage  existing applications (Tomcat/JBoss/JOnAS) with one MySQL/Postgres backend  add a second backend for fault tolerance and scalability è For things it was not designed for  write mostly workloads  distributed databases  hosting centers (administration tools missing)

63 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 63 - 07/11/2003 Lessons learned è Users do not use it for what it was first designed for  advanced features are never used  concerned about ease of use and TCO è Default settings are important è Good technology is necessary but not sufficient  [administration] tools are needed  minor bugs are ok for open source users

64 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 64 - 07/11/2003 Open problems è Partition of clusters è Users want control on failure policy è Reconciliation must also be user controlled

65 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 65 - 07/11/2003 Open problems è Opening the architecture to the users  user defined strategies when a fault or exception occurs  which interfaces/callbacks to provide ? è Monitoring  needed for more accurate load balancing algorithms è Benchmarking  need automatic evaluation of clustered servers  platform available: new INRIA 208 itanium-2 cluster è Sun Test Suite  should help strengthening C-JDBC code  interoperability with J2EE servers

66 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 66 - 07/11/2003 Outline è Motivations è RAIDb è C-JDBC è Performance è Lessons learned è Conclusion

67 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 67 - 07/11/2003 Current status è C-JDBC 1.0b15 release  Generic JDBC 2.0 driver  Schedulers and load balancers for RAIDb 0, 1 and 2  Fine grain query caching  JDBC recovery log  Logger/request player  Java installer  User documentation è Currently missing  Octopus integration  Recovery for horizontal scalability  RAIDb-1ec and RAIDb-2ec  Dynamic reconfiguration

68 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 68 - 07/11/2003 Stats as of Nov 6, 03 è Downloads  total : > 8300 downloads since may 2003  last 30 days : > 2800 downloads  > 430.000 hits since first release  2nd most downloaded ObjectWeb project è Mailing lists  c-jdbc@objectweb.org: 101 subscribers c-jdbc@objectweb.org  c-jdbc-commits@objectweb.org: 18 subscribers c-jdbc-commits@objectweb.org è Team  9 committers  1 full-time INRIA engineer

69 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 69 - 07/11/2003 Conclusion è RAIDb  classification of replication techniques  difficult to publish è C-JDBC  open platform for database replication at the middleware level  RDBMS independent  no application modification required è Lot of features missing … join us !

70 Questions ? Visit http://c-jdbc.objectweb.org c-jdbc@objectweb.org

71 Bonus slides

72 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 72 - 07/11/2003 Fine-grain caching è increased throughput è better response time è even with a single database backend

73 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 73 - 07/11/2003 How do we build a community? è Necessary features (but not sufficient)  open source  standard API  responsiveness on the mailing list è Visibility  Web: slashdot, TheServerSide, freshmeat, …  Conferences: JAX, Middleware, LinuxWorld, ICAR, … è Our weak points  no detailed design documentation  beta phase

74 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 74 - 07/11/2003 How do we interact with the user community? è only one mailing list è being very responsive on the mailing list  reply even if we don’t have a response yet  no direct communication with team but share everything on the mailing list è benefit from engineers who work 1 week full- time to evaluate C-JDBC for their corporation è plan every feature request in the task list

75 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 75 - 07/11/2003 How do we interact with the developer community? è single user/developer mailing list è post all design questions/choices on the mailing list  most users use default settings  hard to get feedback about usage è very permissive to accept new commiters  8 committers (3 outside ObjectWeb – 2 full time)  2 contributors who didn’t want to become committers  no problem so far è involve people in testing

76 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 76 - 07/11/2003 Lessons learned è Visibility  perpetual involvement  time consuming but necessary è Responsiveness to user queries  always on the mailing list  makes first impression for many users è Involve users in all decisions è Be open  source, CVS, contributions, patches, …

77 http://c-jdbc.objectweb.org/ - c-jdbc@objectweb.org 77 - 07/11/2003 Freshmeat.net è Links to projects è Users can subscribe to be notified of new releases è Need to register project in all possible categories to have good visibility è One release per week is a good timing è 3 new subscribers with every release Slashdot Weekly release Friday release out of town


Download ppt "An open source middleware to build Redundant Array of Inexpensive Databases Emmanuel Cecchet."

Similar presentations


Ads by Google