Change Capture Stage - How to avoid Scratch space usage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Change Capture Stage - How to avoid Scratch space usage

Post by rleishman »

I have a job that uses the Change Capture Stage to identify differences between two similar data sets. Both data sets are pre-sorted and partitioned on the change-key. Each data set has 1M rows. I want it to stream the output straight through, but it is writing the entire thing to scratch disk before outputting a single row.

Watching DataStage as it runs, the CC stage accepts inputs from the two pre-sorted sources simultaneuosly and at roughly the same speed, but it does not output ANYTHING until the two sources are completely consumed. After the inputs complete, it then pauses for 30 seconds or so and starts outputting the combined dataset.

Looking at the Scratch disk whilst this is happening, I can see it creating files. This doesn't seem necessary to me because it does not need to sort the data.

I suspect that it is unneccessarily sorting my data, but do not know how to make it stop. In the Partitioning tab of the CC Input tab, I am NOT checking the box that asks it to force a sort.

Question: Is this normal? If I ask it to force a sort, it does take a little bit longer, but does not use more temp space.

I want it to stream the output without writing it to scratch.
Ross Leishman
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Take a look at the score to see whether tsort operators and/or buffer operators are being inserted.

Add explicit Sort stages on the input links, with sort mode set to "don't sort, already sorted" (to prevent insertion of tsort operators) and with memory boosted as high as you can afford.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sohasaid
Premium Member
Premium Member
Posts: 115
Joined: Tue May 20, 2008 3:02 am
Location: Cairo, Egypt

Post by sohasaid »

Also you can uncheck the APT_SORT_INSERTION environment variable.
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

Both suggestions work perfectly. Thanks guys.
Ross Leishman
Post Reply