case statement in datastage px

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

case statement in datastage px

Post by satheesh_color »

Hi All,

I would like to obtain the result for the below transformation using sql statement, as we have millions of rows in the tables..while executing the whole tempspace got filled up.

CASE WHEN obj.Id=8 THEN ( COUNT(DISTINCT
CASE
WHEN act.code = 'DEL' THEN acc.item
ELSE NULL
END
) )
ELSE 0
END as CNT_A,


CASE WHEN obj.Id=8 THEN
SUM(CASE
WHEN act.code = 'DEL'
AND acc.item IS NULL THEN 1
ELSE 0
END
)
ELSE 0
END as CNT_B

and so on for CNT_C.

Is there any way around we can simplify the above transformation using datastage.

Regards,
Satheesh
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sure, but that's not your issue as a CASE statement (even a big one) won't kill temp space. That's going to be because of an ORDER BY primarily, augmented perhaps by the type of joins you are doing. In your shoes I'd work with your DBA on this or anyone who can help optimize the SQL. That or convince your DBA to increase the TEMP tablespace - sometimes, that's the only way to solve the problem.

What database is this?
-craig

"You can never have too many knives" -- Logan Nine Fingers
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

Post by satheesh_color »

Hi Chullet,

Oracle is the DB.

As i am doing a left outer with 5 of the tables and doing the case statement(almost 5) would eat up the whole temp space.

For me, t's tricky with datastage to attain the same.

Regards,
Satheesh
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So, no ORDER BY? Only the LEFT OUTER joins and CASE statement shenanigans to work on?

A case statement in-and-of itself is not the issue, if anything it's the nature of the work you are doing inside the case statements - COUNT, SUM, etc. As a test, I'd suggest running the query outside of DataStage in something like Toad, after removing all of the extra work from inside the CASE statements. Replace them with a simple field select or constant so there's no aggregate functions being used in them. See then if that runs to completion.

Then I would work with someone to better optimize this. Perhaps move those internal aggregates out to inline views and conditionally join to them. Or move them into the job as lookups and do conditional 'joins' that way. Hard to say without seeing the whole thing and knowing your data structures. There are usually multiple ways to approach any SQL issue, some more 'efficient' than others. Bottom line, this is all about Oracle and your SQL, not really a DataStage problem. Well, unless of course you can't get this to fly in your database and you need to move part of the functionality into the job.

An explain plan would also help, if you have someone there that can walk through it with you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

Post by satheesh_color »

Hi Chulett,

When i try to run the normal query(simple select) in sql developer, it resulted rows within a minute.

But the same query has been run forever in DS jobs using oracle connector stage.



Regards,
Satheesh
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

They're both just clients and issue the same SQL via OCI. :?

Are you running the job on more than one node? Perhaps try a single node to see if the issue changes, that or put the Connector in sequential (serial?) mode... that's about all of the difference between those two environments that I can think of off the top of my head.
-craig

"You can never have too many knives" -- Logan Nine Fingers
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

Post by satheesh_color »

Thanks Chulett,

I am just running the below query in sql developer

select col1,col2,col3 from DIM_CUSTOMER; Returns - 110000 rows

when i tried to run the same query using Oracle connector(for extract using one node(sequential/ or parallel(using 2 nodes)) and put it into the Dataset..the job is running forever..it shows one just row :(

Logs:(set $APT_STARTUP_STATUS=True)
waiting for players to start
settingup data connections among players
starting step execution
waiting for step completion


Regards,
Satheesh
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Okay, not sure why we've gone off track here.

When I asked you to run the SQL outside of DataStage, I meant the same SQL you were using inside it - just minus all of the aggregation in the CASE statements. This other thing you are doing, while certainly some kind of an issue, is a distraction and not germane to the issue at hand. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

Post by satheesh_color »

Hi Chulett,

I apologize for not mentioning that in the previous post.

Yes.. i ran the query outside of datastage(Toad) without having the case statement on it, it ran fine and fetched the results. While using the case statement and run it again using Toad..it came out with the below error

ORA-01652: unable to extend temp segment by 32 in tablespace TEMP.



Regards,
Satheesh.R
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No worries.

We've now come full circle. To recap, it's not the case statement that is the problem, it is what you are doing inside the case statement that is causing your TEMP issue. I'm not going to type it all up again as it's all in my first two replies here. Optimize or restructure.

I'll mention this again - have you asked your DBA to increase the TEMP space? It's a perfectly valid solution to issues like this when you are doing aggregates with large volumes of data and they really shouldn't push back (much) unless you are on a really tight space budget. It's an instant solution to the problem. :wink:

If that's not an option, you've got a lot of work on your hands. Involve someone there to get more eyes (and brains) on the solution. Good luck.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply