Tips how to optimized DataStage Jobs

This forum is in support of all issues about Data Quality regarding DataStage and other strategies.

Moderators: chulett, rschirm

Post Reply
marionpt
Participant
Posts: 4
Joined: Mon Sep 29, 2008 12:41 am
Location: Manila, Philippines
Contact:

Tips how to optimized DataStage Jobs

Post by marionpt »

Hello experts,

My DS jobs are processing about 12 million customers.
Objective of DS Jobs:
- To know how many duplicate customers
- To get the best breed record among the duplicates.
- To integrate customers from different sources.
- To generate enterprise customer data warehouse.

But the jobs took several hours to complete. And sometimes, I got fatal errors :

"APT_CombinedOperatorController(0),0: Write to dataset on [fd 15] failed (Error 0) on node node1, hostname CRM01"

Do anybody have idea about optimizing jobs... or know how to avoid the above error?

Can I remove DS generated (unnecessary) columns? Does it helps?

Thanks!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Presumably you have some QualityStage stages in the job. But we can not help you, because you have allowed operator combination to occur, which has hidden the true source of the error. Disable operator combination, try again, and post the exact error message.

Moderator: please move this to the QualityStage forum
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
phonseau
Participant
Posts: 3
Joined: Fri Sep 10, 2010 12:50 pm
Location: dallas

Re: Tips how to optimized DataStage Jobs

Post by phonseau »

Are you able to estimate system resources before you start your job?

Here are some tips that might help you get better matches.
- use only columns that are in use in your match designer during the match frequency stage. It reduces processing time.
-Use disagreement weights to penalize columns that don't agree.
- Use more standardized columns during matching especially unhandled data for name. Reward this column generously if it agrees.
-Profile your data after standardization to know your data better. This will help you to design better matching logic.

marionpt wrote:Hello experts,

My DS jobs are processing about 12 million customers.
Objective of DS Jobs:
- To know how many duplicate customers
- To get the best breed record among the duplicates.
- To integrate customers from different sources.
- To generate enterprise customer data warehouse.

But the jobs took several hours to complete. And sometimes, I got fatal errors :

"APT_CombinedOperatorController(0),0: Write to dataset on [fd 15] failed (Error 0) on node node1, hostname CRM01"

Do anybody have idea about optimizing jobs... or know how to avoid the above error?

Can I remove DS generated (unnecessary) columns? Does it helps?

Thanks!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Assuming you have version 8, you can use the Resource Estimation tool (on the toolbar of Designer client).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply