Problem Extracting Data From Oracle Database

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
moalik
Participant
Posts: 39
Joined: Thu Sep 15, 2011 8:15 am
Location: Melbourne

Problem Extracting Data From Oracle Database

Post by moalik »

Hi All,

My question is regarding a weird issue we are facing in datastage with respect to extracting data from an oracle table.

We are extracting around two hundred and fifty million records from the table and loading it into a dataset.

some days the job finishes in 19 minutes with 200k rows/sec and and other days it keeps on running with 600 rows/sec and never finishes.

When we manually kill the long running job and re-trigger it back, it finishes within 19 to 20 minutes.

We are not able to find where the issue is. We investigated from the database side, the query finishes in 15-20 minutes but the writing takes long time. Database DBA's pointed that everything is fine from there end it is datastage that is behaving incorrectly.

We are only going on with wild guesses such as network connectivity.

Are there any parameters that i need to set to monitor the process and speed up the read from DataStgae.

Any help in this space is much appreciated.

Thanks
Mohsin Khan
Datastage Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

From the outside looking inside, I'm going to agree with the assessment of your DBA team - I really don't think you have an issue 'extracting data'. I would put my money on the issue being the writing of the dataset. And whenever I see someone with an intermittent issue where a simple kill and restart solves the problem, that says resource issue to me... meaning it has an issue with resources at the time it is running but by the time you kill and restart it those resources have been freed up and the restated job is running (essentially) in isolation.

What other things are going on when this job runs successfully, when it has a problem and is there any correlation between those two? Other DataStage jobs, server backups, overly aggressive auditing, anything putting pressure on the filesystem? Be warned, these kind of problems can be quite difficult to track down.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How many directories do you have assigned as resource disk in your configuration file, and how much free space is available on those disks?

The resource disk directories are where the segment files of data sets are stored. The segment files contain the actual data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
moalik
Participant
Posts: 39
Joined: Thu Sep 15, 2011 8:15 am
Location: Melbourne

Post by moalik »

Thanks Craig,

When ever we face the issue of long running job, the CPU utilization clocks 99% . Now we are considering the timings when the job is getting triggered is causing the problem.

Ray,

We are using a default.apt configuration file which is indeed pointing to Node01.apt

we have 2 Resource Disks for DataSets and 3 resource ScratchDisk in the default configuration file.

As my company license has expired i cannot view the premium content.

Do i need to monitor the Scratch space while the job is running?
Mohsin Khan
Datastage Consultant
moalik
Participant
Posts: 39
Joined: Thu Sep 15, 2011 8:15 am
Location: Melbourne

Post by moalik »

Hi Ray,

Sorry for missing the requested information.

Here are contents of Configuration file and disk usage:

Code: Select all

{
        node "node1"
        {
                fastname "XXXXXXX.YYYYYYY.ZZZZZZZ"
                pools ""
                resource disk "/opt/IBM/Datasets1" {pools ""}
                resource disk "/opt/IBM/Datasets2" {pools ""}
                resource scratchdisk "/opt/IBM/scratch1" {pools ""}
                resource scratchdisk "/opt/IBM/scratch2" {pools ""}
                resource scratchdisk "/opt/IBM/scratch3" {pools ""}
        }
}


Size	Used   Avail Use%	Mounted On
*****************************************
50G 	200M   47G   1%	/opt/IBM/scratch1
50G 	143M   47G   1%	/opt/IBM/scratch2
50G 	146M   47G   1%	/opt/IBM/scratch3
79G	 36G    39G  49%	/opt/IBM/Datasets1
79G 	36G    39G  49%	/opt/IBM/Datasets2
Mohsin Khan
Datastage Consultant
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I would reach out to your Oracle DBA team to see if Stats were gathered on those tables. From what I remember, there is something on the Oracle side that you can do to optimize an execution. I know that when it was done... fast result sets, when it wasn't done in a long time... poor performance. I simply forget what the buzzword is (Runstat / Reorg?)

Poke your Oracle DBA with the question.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Someone would need to check the explain plan while the job was running to see how the optimizer decided to execute the sql joins, filters, nested loops vs. hash joins, etc. Where I'm working, that would have to come from the DBA as mere mortals typically don't have access to the tables required to get it. :( This to ensure it is well tuned, which can mean a whole bunch of different things to different people and can change dynamically as the stats change.

And typically, unless you are running a pretty old version or have turned off all of those helpful automatic features, Oracle will gather / updates stats automagically whenever 'it needs to'. Sometimes, though, it needs to be done manually and is again typically a DBA decision and task.

Back on the subject of resource constraints, if your tables or DB are set up to automatically 'parallel-ize' your SQL (parallel query execution threads) or you've embedded a PARALLEL hint, you may see that get throttled back if other already running processes have used up that pool. Meaning, on a normal day it run 16 or 32 parallel threads and finishes up nicely, but if others have grabbed those resources before you (timing again, first come first served) Oracle will happily and silently drop it back to a single thread. And even if those resources become available while you query is running, you're stuck in single-threaded mode until it completes. Or until you kill and restart it when it goes 'Hey, I can run 32 parallel threads again!' and you are magically back to your speedy time. Lessons hard learned on the database of life. :wink:

Still don't think Oracle is your issue.
Last edited by chulett on Wed Aug 22, 2018 4:47 am, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

All of your resource disk is on a single file system, which introduces a bottleneck. Also, you're contending with resource scratchdisk, whose directories are on the same file system (I'm making an assumption there).

Can you look for ways to increase the disk I/O bandwidth, either creating resource disk directories on other file systems and/or adding channels into the SAN on which they're stored?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
moalik
Participant
Posts: 39
Joined: Thu Sep 15, 2011 8:15 am
Location: Melbourne

Post by moalik »

Hi All,

Thanks for the information.

We were able to engage network team to see what exactly is happening on the Server when the job is running for a long time. We could see some of the packets were getting dropped.

The Query completed successfully on database side could see the message "SQL*Net message from client".

We tried to replicating the same issue in Test environment but no luck. Then we had a look at the SQLNET.ORA file in both the Test and Production environment and found out the parameters in the were completely different.

We are planning to change the SQLNET.ORA file to have the same parameters as TEST.

Before updating the parameters in the Sqlnet.ora files, i just wanted to confirm that whether the new sqlnet.ora file would be called by Oracle client when the data stage job runs or it uses the old cacjhed sqlnet.ora file?
Mohsin Khan
Datastage Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you're really concerned, ask your DBA as DataStage won't play a role there. Never seen any issue editing it "on the fly", by the way, there's no need to restart anything from what I've seen. Don't really see how the things that sqlnet.ora controls would help here but... standing by. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply