Performance Issue because of huge data

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Abdulwahabza
Premium Member
Premium Member
Posts: 25
Joined: Sat Feb 18, 2006 12:27 pm
Location: Erie

Performance Issue because of huge data

Post by Abdulwahabza »

Hi,

I am running below query using user defined SQL in oracle OCI--Trasformer--To OCI stage

Code: Select all

SELECT 
CMSITEM.CUST_ACCT_ID ,
CA.PRI_CUST_ACCT_ID, 
CMSITEM.ITEM_NUM, 
I2.EM_ITEM_NUM ALT_EQ_ITEM_NUM, 
PRC.CNTRC_BID_PRC ALT_EQ_ITEM_PRC 
FROM 
   ( 
      SELECT 
	    CMS.ITEM_NUM 
      , CMS.CUST_ACCT_ID 
      , I.GNRC_ID GNRC_ID 
      FROM L_CMS_RLZD_SAV CMS, 
        L_ITEM I 
      WHERE CMS.ITEM_NUM = I.EM_ITEM_NUM 
   ) CMSITEM
INNER JOIN L_ITEM I2 
      ON CMSITEM.GNRC_ID = I2.GNRC_ID  
INNER JOIN L_CUST_ACCT CA 
      ON CMSITEM.CUST_ACCT_ID = CA.CUST_ACCT_ID 
INNER JOIN S_DRUG_ITEM_PRC PRC 
      ON (I2.EM_ITEM_NUM = PRC.ITEM_NUM and CA.PRI_CUST_ACCT_ID = trim(PRC.cust_acct_id))
CMS_RLZD_SAV is having 7000 rows
L_ITEM is having 142000 Rows
L_CUST_ACCT is having 188000 Records
S_DRUG_ITEM_PRC is having 400 millions records

The Job is not returning any rows in datastage in testing envirnment, and In dev where in S_DRUG_ITEM_PRC PRC contains 72 millions the datastage job is taking 2 hours and giving out result, But in production the S_DRUG_ITEM_PRC PRC will be having 400 millions rows, I tried running the sql in TOAD in testing env, it giving results in 2 mins but in datastage job the sql is not giving answer and the job keeps running,

Please help me with an approach of solving this problem

I tried dividing the SQL taking out only S_DRUG_ITEM_PRC PRC . Without S_DRUG_ITEM_PRC PRC the query gives 1.2 millions, I stored that result in Hashed file and tried to inner join to S_DRUG_ITEM_PRC PRC table

When i create the hashed file I have to give all the 4 colomn as keys or else rows are getting rejected. After that I tried to do look up with the 2 keys of S_DRUG_ITEM_PRC PRC, when i tried to view data, after deselecting 2 colomn not required for joining , I am not able to view data and after job running for 5 hours , it loaded 0 rows in target,

Experts Please help how to run this job successfully in datastage server ,

Thanks a lot for reading this with patience

How to Achieve sucessful running and loading of the target
Why is the query running in Toad with 400 million and refuses to gives result in DataStae
How to join Hashed file with 4 keys with 2 joining keys( I deselected 2 keys of hashed file) I know the something is terrible wrong
I had talked with DBA and he is proposing function index on S_DRUG_ITEM_PRC PRC, S_DRUG_ITEM_PRC PRC is a View off four tables with 148, 72,72 , 108 million records as T_DRUG_ITEM_PRC_A1..A2..A3...A4 respectively

Thanks in advance
Wahab
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Millions of data will obviously prolong the output.
The result you get in TOAD is actually the throughput. Still you should be able to see the Process of of the rows been flowing in statistics.
Check if you with sequential file as Target directly connected to the Input OCI stage. Try increasing the row buffer.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

In the transformer, specify the constraint as @FALSE to see the throughput in the sql select. See how much time is that taking. I am guessing that shouldnt be too long. I think the bottleneck is your target OCI.
Are these all inserts? If yes then you can utlize the bulk loader.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Abdulwahabza
Premium Member
Premium Member
Posts: 25
Joined: Sat Feb 18, 2006 12:27 pm
Location: Erie

Post by Abdulwahabza »

Hi,

I checked the Explain Plan for that SQL its shows the cost as 15490 which is not bad, But its not giving any results while executed using datastage.

Regards
Wahab
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kumar_s wrote:The result you get in TOAD is actually the throughput.
Actually, the result you get in TOAD is fairly meaningless, pretty much only telling you how much time will elapse before the query starts returning rows. However, if it 'runs' in TOAD it will run in DataStage assuming the two queries are actually the same.

There's nothing magical about DataStage that causes a query to not run. Your issue is primarily an Oracle tuning issue, you need to get your query running in a performant manner with Production values. The 'cost' of the query is also fairly meaningless, it's all about the specifics of the plan in the explain plan. You need to work with your DBAs, perhaps stats are not optimal in prod, perhaps you need to hint the query, something to get a 'better' plan regardless of the cost reported.

Once you've got the query tuned, then yes settings like the Array Size can get the data delivered from the source into the job in an optimal fashion, but it can't take it faster than Oracle can push it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That should address the extract speed. Your next issue will then be how fast can you shove it into the other table. You can follow DSGuru2B's advice at that point. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Abdulwahabza
Premium Member
Premium Member
Posts: 25
Joined: Sat Feb 18, 2006 12:27 pm
Location: Erie

Post by Abdulwahabza »

I tried using Sequencial File as Target, Still the query does not yield results in DataStage, so using OCI as target is not a issue, Please help

Regards,
Wahab
Post Reply