100 columns only need to scrub five

This forum is in support of all issues about Data Quality regarding DataStage and other strategies.

Moderators: chulett, rschirm

Post Reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

100 columns only need to scrub five

Post by ray.wurlod »

Posted on behalf of Nag, who sent a private message. :oops:

hello Ray,
I have a rawdata and datafiles i need to do address scrubbing on five columns out of 100 columns in the raw data. What is the procedure should i follow and should i keep the raw data in the data/projects directory

as i am new to this forum i don't know how do send to all users.
thanks
nag
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Create the following initial procedures.
ADDKEY add a unique identifier to each row (can be a sequence)
STORE95 put the 95 unneeded columns into a file, with the key
STORE5 put the 5 needed columns into a file, with the key

The STORE5 output file becomes the input file for scrubbing. Preserce the key values added by the ADDKEY procedure.

Create a final UNIjoin procedure to re-associate the other 95 columns with your scrubbed data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply