Dynamic arrays/or pivot

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dj
Participant
Posts: 78
Joined: Thu Aug 24, 2006 5:03 am
Location: india

Dynamic arrays/or pivot

Post by dj »

Hi All,
We have got requirement to compare 2 datasets and generate reports for the mismatches/ detail records and counts.

And also we are trying to keep it generic for any file metadata.(to enable rcp later)

1. After passing thr diff stage, the below layout is achieved

Code: Select all

 
Custid, File1name, File2Name, Matchflg, File1address, File2Address, matchflg_addr
1, a, aa, n, x, x, y
2, b, b, y, x, x, y

2.to get the non match records and counts, we did horizontal pivot.
Custid, file1value, file2value, match
1, a, aa, n
1, x, x, y
2, b, b, y
2, x x, y

How to add the columnnames to above layout, so we can filter n records and group by column name to get the counts.
ie..
Custid, columnname, file1value, file2value, match
1, name, a, aa, n
1, address, x, x, y
2, name, b, b, y
2, address, x, x, y

The file has got 50+ columns.

Instead of pivot, are there any dynamic arrays that can be build and read the values form arrays ? Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Dynamic array structures are natively supported in server jobs. I'd recommend looking at using an approach based on server jobs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Where do you get the column names (name, address) from in your design?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You might also consider reducing the 50+ non-key columns to one using a Column Output stage, which generates a delimited string. You should then be easily able to compare those delimited strings to fulfil your requirement.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dj
Participant
Posts: 78
Joined: Thu Aug 24, 2006 5:03 am
Location: india

Post by dj »

Thanks for the details.

The column names "name" and "address" are the file field names.File1_name and File2_name.

For the comparision part, even after using column export how to get the fieldnames?

how to get the field names over there without hardcoding to make it generic

custid, columnname , file1value,file2value, match
1, name ,a, aa, n
1, address,x, x, y
2, name,b, b, y
2, address,x x, y

so that later the report can be generated as

columname mismatch count
name 1
address 1

Thanks
Post Reply