QS: limitation using datafile

This forum is in support of all issues about Data Quality regarding DataStage and other strategies.

Moderators: chulett, rschirm

Post Reply
ponzio
Participant
Posts: 165
Joined: Mon Dec 05, 2005 9:13 am
Location: Italy

QS: limitation using datafile

Post by ponzio »

Hi.
I've been working with QualityStage for some years...but
I understood just some days ago a big limitation using datafile inside the same QS project.

PROBLEM
The same datafile can't be used, inside the same project, either as FileA and FileB in two different jobs;
for example using the file as FileA in an undup stage and as FileB in a geomatch stage

MY SOLUTIONS

1] Create 2 distinct projects, one for the undup job, and another one for the geomatch job. Then I need to copy (or move) the common file in the Data directory of the second project

2] create a new datafile, identical to the datafile in common, and use this in the geomatch, job for example. Then I need to create, on the filesystem, a symbolic link to the real file and whose name is the same of the new datafile created (the one used in geomatch job)

Please, can someone suggest me another way ?
Thanks,
Andrea
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D

Are you running your QualityStage jobs independently or from DataStage. If the latter, why not have it split the source data into two streams for the separate QualityStage jobs? OK it's two copies of the data, but it's in memory rather than two physical files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ponzio
Participant
Posts: 165
Joined: Mon Dec 05, 2005 9:13 am
Location: Italy

Post by ponzio »

ray.wurlod wrote:Welcome aboard! :D

Are you running your QualityStage jobs independently or from DataStage. If the latter, why not have it split the source data into two streams for the separate QualityStage jobs? OK it's two copies of the data, but it's in memory rather than two physical files.
Hi Ray :D

I'm running QS independently....
I'm not familiar with running QS from DS, I've just tried a couple of time ;-)
but...
I think that also with your method the problem will persist !
The problem is with the deploy informations, and these informations will not change dependentely on how you run the QS job, isn't it?
The problem is not the actual data used, but how to use it "File A" rather than "File B"....

Many thanks :D
ponzio
Participant
Posts: 165
Joined: Mon Dec 05, 2005 9:13 am
Location: Italy

Post by ponzio »

ponzio wrote: The problem is with the deploy informations
....
The problem is not the actual data used, but how to use it "File A" rather than "File B"....
To be precise, if the name of the data file is INPUT (for example),
The deploy will create the file INPUT.DIC in the DIC directory under the project directory...

Consider 2 jobs that use that file, one of these is a undup job and the other a geomatch job.
INPUT is the reference file (File B) of the geomatch job, and it is the only input file for the undup job (File A).

We have 2 jobs but only 1 file INPUT.DIC!!
So if we deploy the undup job first, the deploy of the geomatch job will override the INPUT.DIC created for the undup job...
if the geomatch job will be deployed before, the deploy of undup job will override the INPUT.DIC created for the geomatch job :!:

The difference in the 2 versions of the 2 files is the line

FILE ${DATAA}
in the file created for the undup job

FILE ${DATAB}
in the file created for the geomatch job

This line indicates how to use the file when used in a job :!:

Different problems will occur depending on the deploy order and the running order of the 2 job :!:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

(I don't have an immediate answer. I shall give it some thought.) Meanwhile, if you post on Developer Net you may get an earlier response.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ponzio
Participant
Posts: 165
Joined: Mon Dec 05, 2005 9:13 am
Location: Italy

Post by ponzio »

After this discovery I've read in the QS documentation file MatchConcepts.pdf this sentence


Record Linkage Projects
You assign each record linkage application a project name. This
project name is used in all of the steps of the linkage except for data
dictionary creation


That sentence strengthens what we saw in the files ;-)
Post Reply