Oracle Solutions for Data Archiving Luca Canali, CERN/IT-DM Distributed Database Operations Workshop November 11th, 2008 The title has to be worked on.
Data life cycle management - WHAT Use cases ultimately come from the experiments Typically require removal of large chunks of data on a time-based selection Related use case: put read only part of data
Data life cycle management - WHY Manageability A database should not become a ‘data dump’ Very large DBs are hard to manage Performance Some application-critical SQL statements may not scale well (full scans, hash joins, etc) More and more attention to SQL tuning is needed with very large tables Cost When the DB grows extra HW resources are needed to maintain the service levels ex: extra spindles to provide IOPS to read stale data
There is no default Oracle mechanism to archive parts of applications Why it’s hard There is no default Oracle mechanism to archive parts of applications It must be custom implemented Rules must be set in place by the ‘data owners’ on how to move and restore What can be of help Oracle features for bulk data movement Typically require DBA privileges Additional HW to hold ‘offline data’
Oracle and large data movements Normal DML is too slow delete operations in particular Alternatives DDL such as partition exchange/movement Transportable tablespace CTAS (create table a select) Datapump
Oracle and large data sets Data (tables) need to have the possibility to allow extracting ‘chunks’ in a time-based manner Oracle partitioning by time is a possibility Beware of many limitations of partitioning (it’s not a magic wand) Do-it-yourself partitioning by generating multiple tables/schema every year (for example)
Lookup tables/metadata is needed too Additional challenge Lookup tables/metadata is needed too Besides the main table(s) to archive Can be a large number of tables Not easy to identify a ‘consistent set’ to archive
Technology Main requirements Possible solutions Must be able to restore the data if and when need (over a time scale of 10 years, i.e. 3-4 Oracle versions) Must keep track of what is where Should use one of the few Oracle technologies for moving large chunks of data Possible solutions Move data to a specialized Oracle database to collect ‘application chunks’ Export to files (flat files or datapump) is not considered reliable given current experience
Possible Scenarios Scenario 1 Application owners identify consistent sets of data to be moved Identified data sets are moved by the DBAs Data is then dropped from production. requires considerable effort from the developers gives the cleaner result
Possible Scenarios Scenario 2 A clone (complete copy) of the selected applications is taken (for example at the end of each year) and archived according to the chosen archive technology Application owners then take care of deleting from production trusting that data is archived and kept. easy to implement more data than strictly needed is copied
Example of data movements
Example of data purging after archive
LHC databases are growing fast Conclusions LHC databases are growing fast Offline data for significant performance and cost gains Mechanisms for offlining are application dependent DB services and ideas to provide tools to move and restore data A large Oracle ‘Archive DB’ could be part of the solution
Q&A Thank you