Insert/Update Oracle table in CDC job?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply

Perform insert & update action on Oracle table in CDC job or in separate jobs?

Poll ended at Mon Apr 10, 2017 8:36 pm

In separate jobs
0
No votes
In CDC jobs
0
No votes
 
Total votes: 0

reddy.vinod
Participant
Posts: 36
Joined: Mon Jul 16, 2007 3:37 am
Location: USA

Insert/Update Oracle table in CDC job?

Post by reddy.vinod »

Hi All,

We are trying to build a new data warehouse, i am currently doing POC, we will have multiple tables in warehouse, we will build CDC job for every table to capture the data changes. In every CDC job i am planning to capture inserts into a dataset & edits into another data set. Next i planned to build couple of jobs, one for loading inserts into Oracle table & another for updating Oracle table.

Now my question, instead of inserting/updating data in separate jobs, can i do these tasks in CDC job itself. If i perform insert/update in CDC job, i hope it improve jobs performance because i don't need to run(build) insert/update jobs for every table. Please let me know if this method is recommended & also let me know what kind of issues i may face in long run.

Some additional information, in CDC jobs we are doing CDC, joining few reference tables & find key max value. Each table may have rows between 1 million to 2 millions.

Greatly appreciate any help, thanks in advance.
VINOD
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

For a "POC" I would think you would be trying both ways in order to make that determination for yourself.
-craig

"You can never have too many knives" -- Logan Nine Fingers
reddy.vinod
Participant
Posts: 36
Joined: Mon Jul 16, 2007 3:37 am
Location: USA

Post by reddy.vinod »

Thanks for your reply chulett.

Sure, i will test in both the ways.If performance is same in both the ways, then i would like to build insert/update actions in CDC job. I am seeing following advantages in this way.

1. If suppose i have 100 tables, then if i need to build separate jobs for inserts/updates then i need to build 200 jobs for inserts & updates. If i keep insert/update actions in CDC jobs, then we can save development time.

2. These jobs runs daily, running 200 insert/update jobs daily may take longer time, like every job will need time for startup.

My main concern in developing insert/update actions in CDC job is, what happens if job fails, does Oracle tables get locked? do i start face any Oracle related issues in long run?
VINOD
reddy.vinod
Participant
Posts: 36
Joined: Mon Jul 16, 2007 3:37 am
Location: USA

Post by reddy.vinod »

I am also thinking of another approach.

1. Keep update action in CDC job.
2. Load inserts into a dataset.
3. Build a generic load job with RCP & parameterise dataset file name & table name.
4. Use load RCP job for all inserts.
VINOD
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

reddy.vinod wrote:My main concern... is, what happens if job fails
To me, that's one of the primary drivers of any design, CDC or otherwise - how does it recover from failure? We always keep "jobs" as atomic as possible, discrete units of work so that failure during one step doesn't mean repeating multiple steps to get back to that point. When dealing with CDC in the past, we ensured that all changes for that "run" were captured before starting any processing of those changes so that we had a static set to work with. Which meant in the event of failure, we knew we could either start over from the point of failure (or the beginning) without fear of a changing source complicating that. Meaning, IMHO, it would be best to not combine the process of capturing the CDC information with processing it. Depending on exactly what "CDC" means here. Just something to keep in mind.

Now, I also don't see a need to handle inserts separately from updates unless you are in a situation where the inserts FAR outnumber the updates so you'd want to bulk load those before performing the updates. But again, that's just me. You may also need to take into account updates that happen to the inserts, meaning you would not be able to do the updates first. Depends on your knowledge of the data and what kind of changes you'll see to any given record between loads. You can't necessarily assume you get one or the other for a record, or that you will only get one. Make sure you are know for certain what you will be processing.

ps. Not a big fan of polls here. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply