SQL Logic in Datastage

kpsita · Post by **kpsita** » Wed Sep 01, 2010 8:16 pm

Hi,

How can I implement this following logic in datastage.

select
B.AMT,
rank ()
OVER (PARTITION BY B.FIELD_ID
ORDER BY B.AMT DESC)
ROW_RANK
FROM TABLE B

I wanted to implement this in datastage and here my table represents a dataset.

Appriciate your help.

Thanks

chulett · Post by **chulett** » Wed Sep 01, 2010 9:40 pm

Do you understand what the sql does? Can you spell out in words what kind of information you need from the dataset in question?

ray.wurlod · Post by **ray.wurlod** » Wed Sep 01, 2010 10:49 pm

RANK is a display function, not a selection function. As a general rule it's not something I'd ask an ETL tool to do - it's something I'd ask a reporting (or business intelligence tool) to do.

While it can be done in ETL, you need to decide in advance how you're going to handle tied values.

ajay.vaidyanathan · Post by **ajay.vaidyanathan** » Thu Sep 02, 2010 5:23 am

How can you write an SQL Query on a DataSet in the first place? You either need to load the DataSet into a tabel and then try out whatever query you were planning to implement.

kpsita · Post by **kpsita** » Thu Sep 02, 2010 2:28 pm

My input dataset has two fileds, FIELD1 and a AMOUNT field. I have to select maximum amout from the dataset corresponding to FIELD1.

Example:
Field1 amt
1 0
1 15
1 10
2 6
2 20

and my output should be
1 15
2 20

Thanks,

anbu · Post by **anbu** » Thu Sep 02, 2010 2:35 pm

Use remove duplicates to retain the maximum. Use hash partition and sort on the input to remove duplicates

kpsita · Post by **kpsita** » Thu Sep 02, 2010 2:59 pm

Hi Anbu,

Thanks for your reply. I didn't understand your post. Should I first hash partition, and then remove the duplicates retaining the first value.

Thanks

anbu · Post by **anbu** » Thu Sep 02, 2010 3:07 pm

Yes do the hash partition first and then use remove duplicates

hemanthakumar · Post by **hemanthakumar** » Fri Sep 03, 2010 2:28 am

Hi,
you can use Aggregator stage. In Aggregator group by on field1 and find max(Amount).

way246 · Post by **way246** » Fri Sep 03, 2010 3:03 am

anbu wrote:Use remove duplicates to retain the maximum. Use hash partition and sort on the input to remove duplicates

If I am not wrong it shud be retain Last in remove duplicates if the sort is
asc

ray.wurlod · Post by **ray.wurlod** » Fri Sep 03, 2010 6:02 am

What ever happened to the requirement for RANK?

DSShishya · Post by **DSShishya** » Fri Sep 03, 2010 2:33 pm

Got lost in the crowd I guess

way246 · Post by **way246** » Mon Sep 06, 2010 8:36 am

Ray you are correct, RANK() is used to display records when one or more rows have same value in reporting.e.g when filter applied to display top 10 ranks(rows) and when rank 10 shared among 3 in this case 13 rows will be displayed even though filter is applied to display top 10 rows. Hope aggregate and filter can be applied to achive this in datastage.

prakashdasika · Post by **prakashdasika** » Mon Sep 06, 2010 7:06 pm

While using Remove duplicates sort by Field1 (ASC) and Amount (DESC) on the input link and use Field1 as the Key. This should work.

ray.wurlod · Post by **ray.wurlod** » Mon Sep 06, 2010 11:04 pm

Try it. It doesn't handle tied ranks.

To handle rank properly in ETL - and I continue to maintain that it's not an ETL task but a display task - you can use stage variables that detect change (perhaps from a Sort stage key change column) and decide whether to preserve or increment the rank. At the same time another stage variable, always incrementing, is needed to determine what the next value needs to be after tied values are encountered.