how to get the distinct count

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
lakshmipriya
Participant
Posts: 31
Joined: Tue Jul 13, 2004 5:26 am
Location: chennai
Contact:

how to get the distinct count

Post by lakshmipriya »

how to count the distinct of a column value

if the column values are

2
3
4
2

then the count should be 3

how to acheive this in datastage other than DB2
Lakshmi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use an Aggregator stage. Group your column and count your column.
Refer to Chapter 17 of Parallel Job Developer's Guide (parjdev.pdf) for information on how to use this stage type. It works best if the input data are sorted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mandyli
Premium Member
Premium Member
Posts: 898
Joined: Wed May 26, 2004 10:45 pm
Location: Chicago

Post by mandyli »

Hi priya,

Yes you can achieve this using Aggregator stage only.
lakshmipriya
Participant
Posts: 31
Joined: Tue Jul 13, 2004 5:26 am
Location: chennai
Contact:

Post by lakshmipriya »

I tired this with too different aggregators thats working fine, whether i can acheive this in a single aggregator.

I tired it with converting the input column to double and having the same coulmn in both group by and count column too. It worked but output i got is not what is expected for distinct count

wht can i do? :oops:
Lakshmi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Give an example of what you are processing and what you expect to see. A single Aggregator will satisfy the request you originally posted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
lakshmipriya
Participant
Posts: 31
Joined: Tue Jul 13, 2004 5:26 am
Location: chennai
Contact:

Post by lakshmipriya »

I want to take a distinct of combination of two columns transaction_date(date) and transaction_sequence_id(integer)
Lakshmi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In DataStage you need two Aggregator stages.

In most databases you need two separate queries.

Red Brick is the only database of which I am aware that allows multiple DISTINCT clauses in the one SELECT query. But, then, it's designed for data warehouse (business intelligence) queries.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply