Group by clause implementation in Datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Group by clause implementation in Datastage

Post by rumu »

Dear Members,

I have a requirement as below:

Select first row
From TEXT
Where TEXT.OID = 2498
And TEXT.TEXT in ('Discharge', 'Expired')
Group by TEXT.OID
Order by TEXT.STARTDTIME Asc

The source TEXT is Flat file.There are mutiple STARTDTIME for same OID and we have to pick up the least one .
We designed as below:
SEQFile Stage---->RDC-----Xfrm-----Target dataset

In RDC,we used OID as key and Partitioning tab
did HashPartitioning on OID and Sort on STARTDTIME.Not used Sort on OID.

We received proper output.Wanted to check with you if this approach is fine or we need to sort OID also along with Partitioning.

Thanks,
Rumu
Rumu
IT Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... RDC is the "Remove Duplicates" stage, yes? :?

Yes, one would need to sort on all of the "group by" fields, in this case OID. I would wager if you did more extensive testing on larger data sets you would no longer receive 'proper output'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Thanks Chulett,noted this point.
Rumu
IT Consultant
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

Only one group - since TEXT.OID is filtered on single value ? Whats the point - curious
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What's the point of what?
-craig

"You can never have too many knives" -- Logan Nine Fingers
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

From TEXT
Where TEXT.OID = 2498
And TEXT.TEXT in ('Discharge', 'Expired')
Group by TEXT.OID

If they were using SQL , a group by would not be needed
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

When people post examples they rarely fully match reality (if at all) so I'll typically answer with general / generic advice. Did much the same here so they can apply it to whatever their specific situation is, as needed.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply