SQL Logic in Datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kpsita
Participant
Posts: 99
Joined: Tue Jul 21, 2009 11:43 pm

SQL Logic in Datastage

Post by kpsita »

Hi,

How can I implement this following logic in datastage.

select
B.AMT,
rank ()
OVER (PARTITION BY B.FIELD_ID
ORDER BY B.AMT DESC)
ROW_RANK
FROM TABLE B

I wanted to implement this in datastage and here my table represents a dataset.

Appriciate your help.

Thanks
KPSITA
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Do you understand what the sql does? Can you spell out in words what kind of information you need from the dataset in question?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

RANK is a display function, not a selection function. As a general rule it's not something I'd ask an ETL tool to do - it's something I'd ask a reporting (or business intelligence tool) to do.

While it can be done in ETL, you need to decide in advance how you're going to handle tied values.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ajay.vaidyanathan
Participant
Posts: 53
Joined: Fri Apr 18, 2008 8:13 am
Location: United States

SQL Logic in Datastage

Post by ajay.vaidyanathan »

How can you write an SQL Query on a DataSet in the first place? You either need to load the DataSet into a tabel and then try out whatever query you were planning to implement.
Regards
Ajay
kpsita
Participant
Posts: 99
Joined: Tue Jul 21, 2009 11:43 pm

Post by kpsita »

My input dataset has two fileds, FIELD1 and a AMOUNT field. I have to select maximum amout from the dataset corresponding to FIELD1.

Example:
Field1 amt
1 0
1 15
1 10
2 6
2 20

and my output should be
1 15
2 20

Thanks,
KPSITA
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

Use remove duplicates to retain the maximum. Use hash partition and sort on the input to remove duplicates
You are the creator of your destiny - Swami Vivekananda
kpsita
Participant
Posts: 99
Joined: Tue Jul 21, 2009 11:43 pm

Post by kpsita »

Hi Anbu,

Thanks for your reply. I didn't understand your post. Should I first hash partition, and then remove the duplicates retaining the first value.

Thanks
KPSITA
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

Yes do the hash partition first and then use remove duplicates
You are the creator of your destiny - Swami Vivekananda
hemanthakumar
Participant
Posts: 34
Joined: Mon May 05, 2008 1:31 am

Post by hemanthakumar »

Hi,
you can use Aggregator stage. In Aggregator group by on field1 and find max(Amount).
way246
Participant
Posts: 7
Joined: Wed Sep 01, 2010 9:06 am
Location: Europe

Post by way246 »

anbu wrote:Use remove duplicates to retain the maximum. Use hash partition and sort on the input to remove duplicates
If I am not wrong it shud be retain Last in remove duplicates if the sort is
asc
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What ever happened to the requirement for RANK?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSShishya
Premium Member
Premium Member
Posts: 37
Joined: Tue Oct 27, 2009 9:43 pm

Post by DSShishya »

Got lost in the crowd I guess :lol:
way246
Participant
Posts: 7
Joined: Wed Sep 01, 2010 9:06 am
Location: Europe

Post by way246 »

Ray you are correct, RANK() is used to display records when one or more rows have same value in reporting.e.g when filter applied to display top 10 ranks(rows) and when rank 10 shared among 3 in this case 13 rows will be displayed even though filter is applied to display top 10 rows. Hope aggregate and filter can be applied to achive this in datastage.
prakashdasika
Premium Member
Premium Member
Posts: 72
Joined: Mon Jul 06, 2009 9:34 pm
Location: Sydney

Post by prakashdasika »

While using Remove duplicates sort by Field1 (ASC) and Amount (DESC) on the input link and use Field1 as the Key. This should work.
Prakash Dasika
ETL Consultant
Sydney
Australia
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try it. It doesn't handle tied ranks.

To handle rank properly in ETL - and I continue to maintain that it's not an ETL task but a display task - you can use stage variables that detect change (perhaps from a Sort stage key change column) and decide whether to preserve or increment the rank. At the same time another stage variable, always incrementing, is needed to determine what the next value needs to be after tied values are encountered.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply