Getting the Most from Oracle Data Pump NEDC Getting the Most from Oracle Data Pump Roy Swonger Director, Oracle Database Utilities January, 2007
Objectives of Talk Provide an overview of Oracle Data Pump Give a quick-start primer for users of original Export/Import Highlight useful features that differ from those offered in original Export/Import While we have been developing and working with Data Pump for 3-4 years now, we realize that fewer than half our customers have had much exposure to it. As our user community migrates upward to 10g and beyond, Data Pump will be more a part of their lives. For those who haven’t used Data Pump, this talk will give a primer on how to get started. There are also lots of features that haven’t been discovered by customers because they are buried in the docs.
Table of Contents Overview of Oracle Data Pump Data Pump Quick Start for Exp/Imp users New Features of Oracle Data Pump Advanced Features of Oracle Data Pump Frequently Asked Questions
Data Pump Overview Background Usage Scenarios Performance Features
Data Pump Overview: Background Replacement for old exp and imp utilities Faster and more feature-rich than older utilities Available starting in Oracle 10g, Data Pump is the new export/import mechanism As of Oracle 11g, original Export is no longer supported for general use Reasons for creating data pump: Take advantage of modern machine architectures Use available database features Scale to ever growing database workloads and sizes
Data Pump Overview: Usage Scenarios Typical uses for Data Pump Export/Import Logical backup of schema/table Refresh test system from production Upgrade (either cross-platform, or with storage reorg) Move data from production to offline usage (e.g. data warehouse, ad-hoc query) Data Pump complements other Oracle features RMAN physical backups Oracle Warehouse Builder for Extract/Transform/Load operations “Quick and Dirty” backup
Data Pump Overview: Performance Typical results for data load/unload expdp is ~2x faster than original exp impdp is ~15-40x faster than original imp Using PARALLEL can further improve performance Your mileage may vary! Metadata performance essentially unchanged, sometimes slower Data performance vastly improved Small amount of fixed overhead will affect performance of small jobs Storage and file system characteristics are a major factor in performance More on Data Pump performance later in this talk!
Data Pump Performance Test Results by Prof. Carl Dudley, University of Wolverhampton, UK Timings taken for sample employee tables containing 1, 0.5m, 1m, 2m, 4m, 8m and 16m rows Original Export Data Pump Export using direct path and external table Original Import Sizes of dump file sets compared for EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO SEQID ----- ----- -------- ---- --------- ---- ---- ------ ----- 7369 SMITH CLERK 7902 17-DEC-80 800 20 1 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 2 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 3 7566 JONES MANAGER 7839 02-APR-81 2975 20 4 Source: Prof. Carl Dudley, University of Wolverhampton, UK
Data Pump Export Timings and Sizes Timings (seconds) Rows 1 0.5M 1M 2M 4M 8M Data Pump Direct path 28 38 58 69 90 114 External Tables 51 70 105 120 146 Original Import 2 21 32 52 100 192 Size of Export File (MB) Rows 1 0.5M 1M 2M 4M 8M Data Pump Direct path 0.008 20 41 82 169 331 External Tables Original Import 23 47 89 190 389 Source: Prof. Carl Dudley, University of Wolverhampton, UK
Data Pump Export Performance 3 min Response time Export - conventional path Data Pump - direct path Data Pump - external table 2 min 1 min 0.5m 1m 2m 4m Rows (millions) 8m Source: Prof. Carl Dudley, University of Wolverhampton, UK
Data Pump Import Timings Timings (seconds) Rows 1 0.5M 1M 2M 4M 8M Data Pump Direct path 7 12 16 25 46 86 External Tables 33 44 63 120 Original Import 2 15 73 157 306 Source: Prof. Carl Dudley, University of Wolverhampton, UK
Data Pump Import Performance 5 min Import Data Pump - direct path Data Pump - external table Response time 4 min 3 min 2 min 1 min 0.5m 1m 2m 4m Rows (millions) 8m Source: Prof. Carl Dudley, University of Wolverhampton, UK
Data Pump Overview: Features Improved Speed Direct path load/unload Parallel workers Flexibility INCLUDE or EXCLUDE many more object types REMAP schema, tablespace, data file Use multiple dumpfiles for parallelism and ease of file management Database Feature Support Encrypted columns Network move over dblinks Newer datatypes
Data Pump Quick Start New Concepts Syntax Changes
Data Pump Quick Start: New Concepts Directory Object Used for reading and writing dumpfiles Allows DBA to control where files will be written on the server system Default object DATA_PUMP_DIR created as of Oracle Database 10g Release 2 Interactive Command-line Allows the user to monitor and control Data Pump jobs Many job parameters can be adjusted “on the fly” Tuning Parameters Data Pump handles most tuning internally Directory Object: Command Line: Stop/start jobs Monitor progress from another location Add or subtract parallelism Free up disk space Tuning Parameters - No longer need BUFFER, COMMIT, COMPRESS, CONSISTENT, DIRECT, and RECORDLENGTH parameters
Data Pump Quick Start: Directory Object Example of Directory Object Usage: Create the directory as a privileged user: $ sqlplus sys/<pwd> as SYSDBA SQL> CREATE DIRECTORY scott_dir AS ‘/usr/apps/datafiles’; SQL> GRANT READ,WRITE ON DIRECTORY scott_dir TO scott; SQL> exit User scott can then export/import using Data Pump: expdp scott/<pwd> DIRECTORY=scott_dir dumpfile=scott.dmp
Data Pump Quick Start: Syntax Changes New command line clients expdp/impdp instead of exp/imp Parameter changes: a few examples Note: A full mapping of parameters from original Export/Import to Data Pump Export/Import can be found in the Oracle Database Utilities manual. Data Pump Parameter Original Exp/Imp Parameter SCHEMAS OWNER REMAP_SCHEMA TOUSER CONTENT=METADATA_ONLY ROWS=N EXCLUDE=TRIGGER TRIGGERS=N
New Features of Oracle Data Pump Network Mode Restartability Parallelization Include/Exclude SQLFILE
New Features: Network Mode Export expdp scott/tiger network_link=db1 tables=emp dumpfile=scott.dmp directory=mydir Produces a local dump file set using the contents of a remote database Only way to export from a write locked database (e.g., a standby database or a read-only database) Requires a local, writeable database to act as an agent May be parallelized Will generally be significantly slower than exporting to a file on a local device
New Features: Network Mode Import impdp system/manager network_link=db1 directory=mydir Moves a portion of a database to a new database without creating a dump file Ideal when the footprint of the dump file set needs to be minimized May be parallelized Primarily a convenience: will generally be slower than exporting to a file, copying the file over the network, and importing to the target
New Features: Restartability Data Pump jobs may be restarted without loss of data and with only minimal loss of time Restarts may follow: System failure (e.g., loss of power) Database shutdown Database failure User stop of Data Pump job Internal failure of Data Pump job Exceeding dumpfile space on export Great when trying to fit a database migration within a window.
New Features: Restartability - Export expdp system/manager attach=myjob Export> start_job Export writes out objects based upon object type On restart, any uncompleted object types are removed from the dump file and the queries to regenerate them are repeated For data, incompletely written data segments (i.e., partitions or unpartitioned tables) are removed and the data segments are totally rewritten when the job continues
New Features: Restartability – Import (1) impdp system/manager attach=myjob Import> start_job Restart is based upon the state of the individual objects recorded in the master table If object was completed, it is ignored on restart If object was not completed, it is reprocessed on restart If object was ‘in progress’ and its creation time is consistent with the previous run, it is reprocessed, but ‘duplicate object’ errors are ignored
New Features Restartability – Import (2) impdp system/manager attach=myjob Import> start_job=skip_current If skip_current is specified, objects that were ‘in progress’ are marked as having failed at the time of restart skip_current is useful for getting beyond corrupted objects in the dump file that repeatedly cause impdp to fatally exit
New Features: Parallelization Multiple threads of execution may be used within a Data Pump job Jobs complete faster, but use more database and system resources Only available with Enterprise Edition Speedup will not be realized if there are bottlenecks in I/O bandwidth, memory, or CPU Speedup will not be realized if bulk of job involves work that is not parallelizable Two levels of parallelization available: Multiple workers within a job as well as the parallel query/execution facilities.
New Features: Parallel Export expdp system/manager directory=mydir dumpfile=a%u.dmp parallel=2 There should be at least one file available per degree of parallelism. Wildcarding filenames (%u) is helpful All metadata is exported in a single thread of execution Export of user data will use up all other threads of execution: Typically each partition or unpartitioned table will be processed by a single worker thread In certain cases, a very large partition will be processed across multiple threads of execution using parallel query
New Features: Parallel Import impdp system/manager directory=mydir dumpfile=a%u.dmp parallel=6 Degree of parallelization in import does not have to match degree of parallelization used for export Processing of user data is split among the workers as is done for export Creation of package bodies is parallelized by splitting the definitions of packages across multiple parallel workers Index building is parallelized by temporarily specifying a degree clause when an index is created
New Features: Include/Exclude impdp hr/hr directory=mydir dumpfile=mydump exclude=index Fine grained object selection is allowed for both expdp and impdp Objects may be either excluded or included List of object types and a short description of them may be found in the following views: DATABASE_EXPORT_OBJECTS SCHEMA_EXPORT_OBJECTS TABLE_EXPORT_OBJECTS Old Export/Import had limited object selection through the indexes, constraints, statistics, and triggers parameter A Data Pump job works upon object classes.
New Features: Exclude expdp hr/hr directory=mydir dumpfile=mydump exclude=index,trigger Objects described by the Exclude parameter are omitted from the job Objects that are dependent upon an excluded object are also excluded. (e.g., grants and statistics upon an index are excluded if an index is excluded) Multiple object types may be excluded in a single job
New Features: Include impdp hr/hr directory=mydir dumpfile=mydump include=procedure,function Objects described by the Include parameter are the only objects included in the job Objects that are dependent upon an included object are also included. (e.g., grants upon a function are included if the function is included) Multiple object types may be included in a single job Note: Include and Exclude parameters may not be mixed on the same command
New Features: SQLFILE Specifies a file into which the DDL that would have been executed in the import job will be written Actual import is not performed; only the DDL file is created Can be combined with EXCLUDE/INCLUDE to tailor the contents of the SQLFILE Example: to get a SQL script that will create just the tables and indexes contained in a dump file: $ impdp user/pwd DIRECTORY=DPUMP_DIR1 DUMPFILE=export.dmp INCLUDE=TABLE,INDEX SQLFILE=create_tables.sql Output of SQLFILE is executable, but will not include passwords In original Import, the INDEXFILE parameter generated a text file which contained the SQL commands necessary to recreate tables and indexes that you could then edit to get a workable DDL script.
Advanced Features of Oracle Data Pump Network Mode Transportable Move Exclude/Include With Object Paths With Expressions
Advanced Features: Network Mode Transportable Import impdp system/manager network_link=db1 transport_tablespaces=tbs_1 transport_datafiles=/disk1/tbs1.f directory=mydir Tablespaces should be set to read only and datafiles should be copied/converted to target location before import is started Best way to perform transportable tablespace moves For transportable: Set source tablespaces to read-only Copy/convert data files to target location Perform import Set source tablespaces to read/write
Advanced Features: Include/Exclude with Object Paths Object types may also be selected by their object paths which appear in log files: include=“/TABLE_EXPORT/TABLE/TRIGGER” The full list of object paths is available in the DATAPUMP_PATHS view Abbreviated object paths may also be specified: exclude=“PACKAGE” will exclude from the job all object types whose paths match “LIKE ‘%/PACKAGE%”: all package specifications all package bodies all package grants Data Pump actually doesn’t consider object types for inclusion and exclusion. Instead, it acts up object paths.
Advanced Features: Include/Exclude with Expressions Named objects may be included or excluded based upon the object’s name: expdp scott/tiger directory=mydir dumpfile=scott.dmp include=table:\”LIKE \‘E%\’\” Most commonly used operators in name expressions are “=“, “LIKE”, “BETWEEN”, and “IN” expdp scott/tiger directory=mydir dumpfile=scott.dmp include=table:\”in \(select table_name from my_exp_tables\)\” Allowed on both export and import Because of arcane shell quoting rules, it is best to specify name expressions within parameter files
Frequently Asked Questions (1) Can original Export dump files be used with Data Pump? No. The dump file formats for original exp/imp and Data Pump Export/Import are not compatible. Can Data Pump work with 9i databases? No. Data Pump works with Oracle Database 10g and later. Can Data Pump work over a network link to an earlier version database? No, you cannot use network links to use Data Pump Export on a database earlier than version 10g
Frequently Asked Questions (2) Can I use Enterprise Manager with Data Pump? Yes, there is an EM interface for Data Pump How do I pipe a Data Pump job through gzip? This compression technique cannot be used with Data Pump, because Data Pump cannot support named pipes In Oracle Database 10g Release 2, Data Pump compresses metadata by default Stay tuned for a data compression solution in a future release What will happen to original Export/Import? Original Export will no longer be supported for general use after Oracle Database 10g Release 2 Original Import will be supported indefinitely, to handle existing legacy Export files
References Oracle® Database Utilities 10g Release 2 (10.2) Part Number B14215-01 Data Pump on Oracle Technology Network: http://www.oracle.com/technology/products/database/utilities/index.html Lots of information available on OTN include Oracle World presentations, Oracle by Example, and white papers
The preceding is intended to outline our general product direction The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.