Implementation of QualityStage

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
MukundShastri
Premium Member
Premium Member
Posts: 103
Joined: Tue Oct 14, 2003 4:07 am

Implementation of QualityStage

Post by MukundShastri »

Hi,

We have not used quality stage . Hence following basic question about the same.
We want to use do data cleansing using difference business rules of around 460 sequential files. For each file in input we should have one file in output after cleasing.
Does qualitystage has the template job facility which is available in enterprise edition parallel jobs. All the files have same metadata. Can one template job of Quality stage handle different input files and different cleaning rules dynamically.

How difficult it will be to create such jobs. How much efforts do you estimate to do that roughly?


Thanks
Mukund
PilotBaha
Premium Member
Premium Member
Posts: 202
Joined: Mon Jan 12, 2004 8:05 pm

Post by PilotBaha »

Besides hiring a consultant that knows what to do and how to do it, I'd recommend making things as homogeneous as possible <b> before </b> you feed the data to QS. Gathering data from different layouts and bringing them in to QS is an easy task for DataStage. Use that to the most of your abilities.
(Try to insert a field in a QS data structure .. that's what the interns are fore)
Besides all this, I can recommed getting a good consultant who can understand the products and your business needs..
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Since all your files have the same columns you just write a parallel job that processes files using a file mask, or pass in the different file names as a job parameter, and process them with the one datastage job and the one plugin qualitystage job. Dynamic rules depends entirely on what rules you need. It is possible to set rules at run time via DataStage job parameters that prevent certain changes from occuring, or carry out certain string substitutions within DataStage prior to cleansing.

If your sequential files are small and can be processed quickly I would consider multiple instance parallel and QS jobs and run multiple copies of them. Run each instance on a single node and allocate jobs to different nodes to spread the load around. If they are large sequential files then stick to parallel nodes per job.
Post Reply