Improving SQL*Loader’s Performance

# by Sanjay Mishra

Drop indexes before loading data

If you are loading a substantial amount of data, I recommend dropping the indexes on the target table before doing the load. Index maintenance takes time, and it generates extra disk I/O.

Each time a new row is inserted into a table, all the indexes for the table must be updated. It’s possible even that the I/O requirements of updating indexes will exceed those for inserting the row in the first place. By dropping indexes on the target table, you allow the load to proceed more quickly. After the load is complete, you can recreate the indexes. You may need to experiment a bit to find the point at which it becomes more efficient to rebuild indexes after the load than to maintain them during the load.

There is one exception to my suggestion of dropping indexes before a load. Indexes associated with primary key and unique key constraints should never be dropped. To do so sacrifices data integrity. You need such indexes in place at all times so that the proper business rules are enforced.
#

Use TRUNCATE instead of REPLACE

If you want to delete existing data from a table before loading it, I recommend using the TRUNCATE keyword instead of the REPLACE keyword in your INTO TABLE clause.

The REPLACE keyword causes SQL*Loader to issue a DELETE statement against the target table before beginning the load. A SQL DELETE operation carries with it a lot of baggage. Transactional integrity must be maintained, so deleted rows are written to rollback segments so they are available in the event the transaction is rolled back. In addition, a database redo log entry will be written for each database block affected by the DELETE. Given a large table, the amount of extra I/O can be substantial.

You also risk running out of rollback segment space. The TRUNCATE keyword, on the other hand, causes SQL*Loader to issue a SQL TRUNCATE TABLE statement. Such a statement does not generate any rollback segment entries. In addition, it only results in one redo log entry–for the statement itself.

Bottom line: If you can use TRUNCATE instead of REPLACE and your target table is not of a trivial size, your performance gain will be significant.
#

Use a larger bind array

The bind array is the area in memory where SQL*Loader stores a batch of data to be loaded. When the bind array fills, it is transmitted to the database. Generally, the larger the bind array, the more rows can be loaded in one operation, and the better your performance will be.

The bind array size is controlled by the READSIZE and BINDSIZE parameters, and the default size is 64KB (256KB in Oracle9i). Finding the optimal setting for a given load takes a bit of trial and error.

First, increase the READSIZE and BINDSIZE settings (usually, both should be the same). Then, determine the number of rows that will fit within the bind array, and set the ROWS parameter to that value. Run the load. If you see a performance gain, try increasing the bind array size once again. Continue this process until you are comfortable with the performance.
#

Avoid use of CONTINUEIF and CONCATENATE

If at all possible, prepare your input data in such a way that you have a one-to-one correspondence between physical and logical records.

SQL*Loader does allow you to join multiple physical records into one logical record using the CONTINUEIF and CONCATENATE keywords, but this joining involves some amount of overhead. You’re better off if you can avoid that overhead, so whenever you have control over the format of your input data, strive to avoid the need to concatenate multiple physical records together.

5.

Load fixed-width data instead of delimited data

If you have the option of loading from either fixed-width or delimited data files, choose the fixed-width data. You’ll save the rather significant overhead of scanning each record for delimiters.
6.

Avoid character set conversion

SQL*Loader supports character set conversion between the character set used in the input data file and the character set used by the database server. However, character set conversion takes time and requires CPU resources. If you can keep all character sets the same, you avoid that overhead and your load will run faster than if these conversions needed to be made.
7.

Avoid unnecessary NULLIF and DEFAULTIF clauses

The NULLIF and DEFAULTIF clauses allow you to set a column to NULL or to its default value given a specified condition in the input record. Any NULLIF and DEFAULTIF clauses that you specify in your control file need to be evaluated for each row in your input file. This poses a considerable amount of processing overhead.

Avoid NULLIF and DEFAULTIF clauses where you can and your load will run that much faster.
8.

Perform direct path loads

A direct path load is one in which SQL*Loader writes directly to your database’s datafiles, as opposed to loading data by executing SQL INSERT statements. Because the direct path bypasses standard SQL statement processing, it results in a significant performance improvement. Direct path loads can’t be used in all circumstances, but when they can be used, the performance benefits are significant.

While it will not always be possible to apply all of these techniques in every environment, the more you can employ the better. If you implement the tips that are most appropriate to your situation, you should notice a significant improvement in SQL*Loader’s performance.

2 thoughts on “Improving SQL*Loader’s Performance

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s