I have a job that uses the Change Capture Stage to identify differences between two similar data sets. Both data sets are pre-sorted and partitioned on the change-key. Each data set has 1M rows. I want it to stream the output straight through, but it is writing the entire thing to scratch disk before outputting a single row.
Watching DataStage as it runs, the CC stage accepts inputs from the two pre-sorted sources simultaneuosly and at roughly the same speed, but it does not output ANYTHING until the two sources are completely consumed. After the inputs complete, it then pauses for 30 seconds or so and starts outputting the combined dataset.
Looking at the Scratch disk whilst this is happening, I can see it creating files. This doesn't seem necessary to me because it does not need to sort the data.
I suspect that it is unneccessarily sorting my data, but do not know how to make it stop. In the Partitioning tab of the CC Input tab, I am NOT checking the box that asks it to force a sort.
Question: Is this normal? If I ask it to force a sort, it does take a little bit longer, but does not use more temp space.
I want it to stream the output without writing it to scratch.
Change Capture Stage - How to avoid Scratch space usage
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 252
- Joined: Mon Sep 19, 2005 10:28 pm
- Location: Melbourne, Australia
- Contact:
Change Capture Stage - How to avoid Scratch space usage
Ross Leishman
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Take a look at the score to see whether tsort operators and/or buffer operators are being inserted.
Add explicit Sort stages on the input links, with sort mode set to "don't sort, already sorted" (to prevent insertion of tsort operators) and with memory boosted as high as you can afford.
Add explicit Sort stages on the input links, with sort mode set to "don't sort, already sorted" (to prevent insertion of tsort operators) and with memory boosted as high as you can afford.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.