Page 1 of 1

Tips how to optimized DataStage Jobs

Posted: Thu Nov 13, 2008 8:42 pm
by marionpt
Hello experts,

My DS jobs are processing about 12 million customers.
Objective of DS Jobs:
- To know how many duplicate customers
- To get the best breed record among the duplicates.
- To integrate customers from different sources.
- To generate enterprise customer data warehouse.

But the jobs took several hours to complete. And sometimes, I got fatal errors :

"APT_CombinedOperatorController(0),0: Write to dataset on [fd 15] failed (Error 0) on node node1, hostname CRM01"

Do anybody have idea about optimizing jobs... or know how to avoid the above error?

Can I remove DS generated (unnecessary) columns? Does it helps?

Thanks!

Posted: Thu Nov 13, 2008 9:56 pm
by ray.wurlod
Presumably you have some QualityStage stages in the job. But we can not help you, because you have allowed operator combination to occur, which has hidden the true source of the error. Disable operator combination, try again, and post the exact error message.

Moderator: please move this to the QualityStage forum

Re: Tips how to optimized DataStage Jobs

Posted: Fri Sep 24, 2010 10:41 am
by phonseau
Are you able to estimate system resources before you start your job?

Here are some tips that might help you get better matches.
- use only columns that are in use in your match designer during the match frequency stage. It reduces processing time.
-Use disagreement weights to penalize columns that don't agree.
- Use more standardized columns during matching especially unhandled data for name. Reward this column generously if it agrees.
-Profile your data after standardization to know your data better. This will help you to design better matching logic.

marionpt wrote:Hello experts,

My DS jobs are processing about 12 million customers.
Objective of DS Jobs:
- To know how many duplicate customers
- To get the best breed record among the duplicates.
- To integrate customers from different sources.
- To generate enterprise customer data warehouse.

But the jobs took several hours to complete. And sometimes, I got fatal errors :

"APT_CombinedOperatorController(0),0: Write to dataset on [fd 15] failed (Error 0) on node node1, hostname CRM01"

Do anybody have idea about optimizing jobs... or know how to avoid the above error?

Can I remove DS generated (unnecessary) columns? Does it helps?

Thanks!

Posted: Fri Sep 24, 2010 2:38 pm
by ray.wurlod
Assuming you have version 8, you can use the Resource Estimation tool (on the toolbar of Designer client).