The ORACLE_DATAPUMP Access Driver: Abilities and Applications in External Tables
The ORACLE_DATAPUMP access driver is a specialized component within Oracle's External Tables framework, enabling the use of Oracle Data Pump dump files as the data source or target for external tables. This driver is unique in its dual capability for both unloading (exporting) and loading (importing) data, offering a high-performance, parallelizable, and feature-rich mechanism for data movement within the Oracle Database ecosystem.
1. Core Applications: Unloading and Loading Data
The primary function of the ORACLE_DATAPUMP access driver is to facilitate the movement of data between database tables or between databases and Data Pump dump files via external tables.
1.1 Unloading Data (Export)
Data is unloaded from a database table into a Data Pump dump file when an external table is created using the ORACLE_DATAPUMP driver in conjunction with a CREATE TABLE AS SELECT statement. The external table is then created, pointing to the newly generated dump file.
As part of creating an external table with a SQL CREATE TABLE AS SELECT statement, 🔷️the ORACLE_DATAPUMP access driver can write data to a dump file.
The data in the file is written in a binary format that can only be read by the ORACLE_DATAPUMP access driver. Once the dump file is created, it cannot be modified (that is, no data manipulation language (DML) operations can be performed on it). However, the file can be read any number of times and used as the dump file for another external table in the same database or in a different database.
Example 1: Unloading Data and Creating an External Table
This example demonstrates creating an external table and unloading data from the inventories table into a dump file named inv_xt.dmp.
CREATE TABLE inventories_xt
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY def_dir1
LOCATION ('inv_xt.dmp')
)
AS SELECT * FROM inventories;
-- Table created.
The resulting external table can be queried like any standard table.
Example 2: Querying and Verifying Unloaded Data
SQL> DESCRIBE inventories
Name
--------------------------------------------------------------
PRODUCT_ID
WAREHOUSE_ID
QUANTITY_ON_HAND
SQL> DESCRIBE inventories_xt
Name
-------------------------------------------------------------
PRODUCT_ID
WAREHOUSE_ID
QUANTITY_ON_HAND
SQL> SELECT COUNT(*) FROM inventories_xt;
COUNT(*)
----------
1112
SQL> SELECT * FROM inventories MINUS SELECT * FROM inventories_xt;
no rows selected
1.2 Loading Data (Import)
To load data, an external table is created to point to an existing Data Pump dump file, without the AS SELECT clause. This external table then acts as the source for loading data into a new database table using a SQL INSERT INTO...SELECT statement.
Example 3: Creating an External Table with an Existing Dump File
The dump file created for the external table can now be moved and used as the dump file for another external table in the same database or different database. Note that when you create an external table that uses an existing file, there is no AS SELECT clause for the CREATE TABLE statement.
CREATE TABLE inventories_xt2
(
product_id NUMBER(6),
warehouse_id NUMBER(3),
quantity_on_hand NUMBER(8)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY def_dir1
LOCATION ('inv_xt.dmp')
);
-- Table created.
Example 4: Loading Data into a New Database Table
CREATE TABLE inventories_new
(
product_id NUMBER(6),
warehouse_id NUMBER(3),
quantity_on_hand NUMBER(8)
);
-- Table created.
INSERT INTO inventories_new SELECT * FROM inventories_xt4; --or use database link for remote db.
-- 1112 rows created.
SQL> SELECT COUNT(*) FROM inventories_new;
COUNT(*)
----------
1112
1.3 Parallelism and Combining Dump Files
The driver supports parallel operations, where the degree of parallelism for unloading is specified by the PARALLEL clause, and the number of dump files created equals the degree of parallelism. For loading, the degree of parallelism is determined by the number of dump files listed in the LOCATION clause.
Example 5: Parallel Unloading with Multiple Dump Files
CREATE TABLE inventories_xt3
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY def_dir1
LOCATION ('inv_xt_1.dmp', 'inv_xt_2.dmp', 'inv_xt_3.dmp')
)
PARALLEL 3
AS SELECT * FROM inventories;
-- Table created.
Example 6: Combining Multiple Dump Files for Loading
CREATE TABLE inventories_xt7
(
product_id NUMBER(6),
warehouse_id NUMBER(3),
quantity_on_hand NUMBER(8)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY def_dir1
LOCATION ('inv_xt_1.dmp', 'inv_xt_2.dmp', 'inv_xt_3.dmp')
);
2. Access Parameters (access_parameters Clause)
When you create the ORACLE_DATAPUMP access driver external table, you can specify certain parameters in an access_parameters clause.
This clause is optional, as are its individual parameters. For example, you can specify LOGFILE, but not VERSION, or vice versa. The syntax for the access_parameters clause is as follows.
Note
These access parameters are collectively referred to as the opaque_format_spec in the SQL CREATE TABLE...ORGANIZATION EXTERNAL statement.
The access_parameters clause allows for fine-grained control over the dump file creation and access process. These parameters are collectively referred to as the opaque_format_spec in the CREATE TABLE...ORGANIZATION EXTERNAL statement.
• Comments
The ORACLE_DATAPUMP access driver comments access parameter enables you to place comments with external tables
• COMPRESSION
The ORACLE_DATAPUMP access driver compression access parameter specifies whether and how data is compressed before the external table data is written to the dump file set.
• ENCRYPTION
The ORACLE_DATAPUMP access driver encryption access parameter specifies whether to encrypt data before it is written to the dump file set.
• LOGFILE | NOLOGFILE
The ORACLE_DATAPUMP access driver LOGFILE|NOLOGFILE access parameter specifies the name of the log file that contains any messages generated while the dump file was being accessed.t.
• VERSION Clause
The ORACLE_DATAPUMP access driver version clause access parameter enables you to specify generating a dump file that can be read with an earlier Oracle Database release.
• HADOOP_TRAILERS Clause
The ORACLE_DATAPUMP access driver provides a HADOOP_TRAILERS clause that specifies whether to write Hadoop trailers to the dump file.
• Effects of Using the SQL ENCRYPT Clause
Review the requirements and guidelines for external tables when you encrypt columns using the ORACLE_DATAPUMP access driver ENCRYPT clause.
2.1 Effects of SQL ENCRYPT Clause
The ORACLE_DATAPUMP access driver encryption access parameter specifies whether to encrypt data before it is written to the dump file set.
Default: DISABLED
Purpose:
Specifies whether to encrypt data before it is written to the dump file set.
If ENABLED is specified, then all data is written to the dump file set in encrypted format.
If DISABLED is specified, then no data is written to the dump file set in encrypted format.
Restrictions:
This parameter is used only for export operations.
Example:
In the following example, the ENCRYPTION parameter is set to ENABLED. Therefore, all data written to the dept.dmp file will be in encrypted format.
CREATE TABLE deptXTec3
ORGANIZATION EXTERNAL (TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY def_dir1
ACCESS PARAMETERS (ENCRYPTION ENABLED) LOCATION ('dept.dmp'));
3. Data Type Support and Workarounds
3.1. Supported Data Types
The ORACLE_DATAPUMP access driver resolves many data types automatically during loads and unloads.
When you use external tables to move data between databases, you may encounter the following situations:
• The database character set and the database national character set may be different between the two platforms.
• The endianness of the platforms for the two databases may be different.
The ORACLE_DATAPUMP access driver automatically resolves some of these situations.
The following data types are automatically converted during loads and unloads:
• Character (CHAR, NCHAR, VARCHAR2, NVARCHAR2)
• RAW
• NUMBER
• Date/Time
• BLOB
• CLOB and NCLOB
• ROWID and UROWID
If you attempt to use a data type that is not supported for external tables, then you receive an error. This is demonstrated in the following example, in which the unsupported data type, LONG, is used:
SQL> CREATE TABLE bad_datatype_xt ( product_id NUMBER(6), language_id VARCHAR2(3), translated_name NVARCHAR2(50), translated_description LONG )
ORGANIZATION EXTERNAL
( TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY def_dir1
LOCATION ('proddesc.dmp') 13 );
Error:
translated_description LONG * ERROR at line 6: ORA-30656: column type not supported on external organized table
3.2. Unsupported Data Types
You can use the ORACLE_DATAPUMP access driver to unload and reload data for some of the unsupported data types.
An external table supports a subset of all possible data types for columns. In particular, it supports character data types (except LONG), the RAW data type, all numeric data types, and all date, timestamp, and interval data types.
The unsupported data types for which you can use the ORACLE_DATAPUMP access driver to unload and reload data include the following:
• BFILE
• LONG and LONG RAW
• Final object types
• Tables of final object types
• Unloading and Loading BFILE Data Types
The BFILE data type has two pieces of information stored in it: the directory object for the file and the name of the file within that directory object.
• Unloading LONG and LONG RAW Data Types
The ORACLE_DATAPUMP access driver can be used to unload LONG and LONG RAW columns, but that data can only be loaded back into LOB fields.
• Unloading and Loading Columns Containing Final Object Types
Final column objects are populated into an external table by moving each attribute in the object type into a column in the external table.
• Tables of Final Object Types
Object tables have an object identifier that uniquely identifies every row in the table.
4. Performance Hints and Restrictions
4.1 Performance Hints
To maximize performance when using the ORACLE_DATAPUMP access driver:
• Parallelism: Use the PARALLEL clause and ensure the degree of parallelism matches the number of dump files in the LOCATION clause.
• Compression: Utilize the COMPRESSION clause to reduce dump file size, which is particularly beneficial over slower networks.
• Logging: Use LOGFILE for monitoring or NOLOGFILE to eliminate logging overhead if not needed.
When you monitor performance, the most important measurement is the elapsed time for a load. Other important measurements are CPU usage, memory usage, and I/O rates.
You can alter performance by increasing or decreasing the degree of parallelism. The degree of parallelism indicates the number of access drivers that can be started to process the data files. The degree of parallelism enables you to choose on a scale between slower load with little resource usage and faster load with all resources utilized. The access driver cannot automatically tune itself, because it cannot determine how many resources you want to dedicate to the access driver.
An additional consideration is that the access drivers use large I/O buffers for better performance. On databases with shared servers, all memory used by the access drivers comes out of the system global area (SGA). For this reason, you should be careful when using external tables on shared servers.
Example 7: Attempting DML on an External Table (Error)
SQL> DELETE FROM inventories_xt WHERE warehouse_id = 1;
DELETE FROM inventories_xt WHERE warehouse_id = 1
*
ERROR at line 1:
ORA-30657: operation not supported on external organized table
• Dump File Format: The dump file must be in the Oracle Data Pump format.
• Directory Object: The directory object specified in DEFAULT DIRECTORY must exist, and the user must have READ and WRITE privileges.
• No DDL on Dump File: Data Definition Language (DDL) statements (ALTER, DROP) cannot be used on the dump file itself.
• No AS SELECT for Existing Files: When creating an external table that points to an existing dump file, the AS SELECT clause is prohibited.
🟥Practical Scenario in data warehouse env using Combining Dump Files
🔷️Dump files populated by different external tables can all be specified in the LOCATION clause of another external table.
For example, data from different production databases can be unloaded into separate files, and then those files can all be included in an external table defined in a data warehouse. This provides an easy way of aggregating data from multiple sources.
💠The only restriction is that the metadata for all of the external tables be exactly the same. This means that the character set, time zone, schema name, table name, and column names must all match. Also, the columns must be defined in the same order, and their data types must be exactly alike.
This means that after you create the first external table you must drop it so that you can use the same table name for the second external table.
This ensures that the metadata listed in the two dump files is the same and they can be used together to create the same external table.
SQL> CREATE TABLE inv_part_1_xt
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY def_dir1
LOCATION ('inv_p1_xt.dmp')
)
AS SELECT * FROM oe.inventories WHERE warehouse_id < 5;
SQL> DROP TABLE inv_part_1_xt;
-- we need only dumpfile.
SQL> CREATE TABLE inv_part_1_xt
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT directory def_dir1
LOCATION ('inv_p2_xt.dmp')
)
AS SELECT * FROM oe.inventories WHERE warehouse_id >= 5;
--use both export pump in previous steps to filling inv_part_all_xt table:
SQL> CREATE TABLE inv_part_all_xt
2 (
3 PRODUCT_ID NUMBER(6),
4 WAREHOUSE_ID NUMBER(3),
5 QUANTITY_ON_HAND NUMBER(8)
6 )
7 ORGANIZATION EXTERNAL
8 (
9 TYPE ORACLE_DATAPUMP
10 DEFAULT DIRECTORY def_dir1
11 LOCATION ('inv_p1_xt.dmp','inv_p2_xt.dmp')
12 );
SQL> SELECT * FROM inv_part_all_xt MINUS SELECT * FROM oe.inventories;
no rows selected
So we can use two separate dumpfile and merge them into a full table in another database easily.
-------------------------------
Alireza Kamrani
Oracle senior Solution Architect
No comments:
Post a Comment