Publish / Subscribe Database Log Shipping over Bittorent P2P CS 848 Fall 2006 Univeristy of Waterloo Project Presentation by N. T c h e r v e n s k i
Intro Implemented a tool to facilitate publish / subscribe of databases. Technologies used: Log shipping Bittorrent RSS
Motivation Looking for an easy and quick way to create read-only replicated databases using minimum new infrastructure and minimum overhead Instead of keeping a standby replica, can use it for queries Log shipping can be performed on many of the popular DB systems – DB2, Oracle, MS SQL Server, Postgres, Teradata, etc. Transferring large amounts of data can be done using P2P like Bittorrent
Architecture DB Server Archived Logs & backup images Archived logs directory Publishing Tool DB Replica Archived logs directory Db restore and rollforward RSS Feeds of tracker dataBitTorrent seeder Subscription Tool BitTorrent client RSS client Internet DB Log management tool commands
Features Minimum impact on the server No need to capture data Can be part of regular backup / replication process Can send data to as many or as few peers as needed Log shipping is popular – existing scripts and infrastructure can be reused Sharing through Bittorrent is flexible – can limit upload speed, number of connections, disable IPs, etc.
Current Limitations Database backups are not cross-platform / cross-database-version portable Moving the whole database, rather than just the data need similarly configured machines (access control, paths, etc. ) Delay when bringing up the database up after rollforward ( index rebuilding, etc. ). To include new logs, need to rollback and then rollforward again – this cannot be done too often. Not suitable for databases with lots of updates When LOAD is done (DB2), tablespace backup needs to be provided or data location be available to the remote DB Security Authorization to download Bittorrent transfers can be slowed down by malicious peers sending garbage data
Related Work DPROPR - IBM DataPropagator Relational Clients subscribe to particular rows / columns of tables Can receive full refresh or just updates For updates only mode, capture control tables are used
DPROPR Use DPROPR for[1]: Operational to Decision Support System data propagation Improved network load balancing Data consolidation Data distribution Improved application availability Multivender replication Data archiving Data audit trailing Mobile computing Consider DPROPR as a potential vehicle for: Building databases for logical recovery Two-way propagation between databases DPROPR is not recommended for: Synchronous propagation Hot site recovery
Testing Testing and implementation is done using DB2 V9 Linux – Ubuntu Bittorrent client – Enchanced CTorrent
Conclusion Based on gluing together existing technologies A way to use standby replica Legitimate use of BitTorrent Legitimate use of BitTorrent Hope this will stir more related research Ideal for public databases
References [1] DPROPR Planning and Design Guide, 1.html 1.html 1.html DB2 Replication Guide and Reference, ftp://ftp.software.ibm.com/ps/products/db2/info/vr 82/pdf/en_US/db2e0e82.pdf Warm Standby Servers for High Availability, standby.html