dsenv and performance tuning

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

rjhcc
Premium Member
Premium Member
Posts: 34
Joined: Thu Jan 27, 2005 4:20 pm

dsenv and performance tuning

Post by rjhcc »

I'm trying to tune our environment and am not finding much in the docs as to how set things up. There are some descriptions with minimums for the settings, a lot of warnings about not changing them, ans so on. Can anyone help with which ones are safe and what the values for those really mean?

Also, are there other files I can examine for tuning?

Thanks!
rjhcc
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The misconception is that the DS Engine needs special configuration to accelerate your jobs. Most performance problems are related to design choices and lack of understanding of what processing demands are being made.

There's nothing in the dsenv file that will accelerate poorly written SQL or even efficient SQL that has a complicated explain plan. There's no magic configuration parm that will cause 100M rows to flow faster across a network. Likewise, there's no parameter that will made a 100M row hashed file more efficient (hashed file tuning/optimization/etc is a whole other topic).

The reasons to touch the dsenv file are usually connectivity related. This file is used to define database connections and set environment variables. Once you move into the PX world, yes, there are environmental settings that "tune" performance but that's in different files than dsenv.

You may be confusing dsenv with uvconfig. The uvconfig file has a handful of parameters that do need to be upsized for specific reasons, such as the number of dynamic files that can be open at any moment or the default hashed file bit size (32 or 64). Those can affect performance or operability of the job processes.

But, for the most part, you need to focus pretty much all of your attention on job design. You need to understand performance and resource monitoring in your environment. If you're on Solaris, learn what prstat shows you. For AIX, checkout topas, and for HP see glance. Anything else get top downloaded onto your server.

You need to see what your job is doing so that you can pinpoint if you're cpu constrained. You may think a job is slow but when checking its cpu usage you see that it's at 100%. This implies it can't go any faster. If your job is not using cpu, you need to wonder why. Read the history here, OCI lookups suck on high volume batch processing. Inserts and updates need to be separated on high volume processing.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ds

Post by ds »

Certainly, DataStage doesn't behaves the same everywhere with the same settings. You need to optimize it's performance as per your environments' requirement and acceptability.

Do you have a routine housekeeping or a regular maintenance cycle setup for the DataStage server ? I've witnessed environments which gave huge performance gains once the housekeeping cycle was put in place.

- James
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Maintenance tasks, such as periodic cleansing of projects &PH& directory, sensible job log purging intervals, etc, all impact job performance. A job, in addition to processing its data, also has responsibilities to file log entries and such, therefore, a huge job log file can impact runtime. I didn't mention this as configuration tuning because it's not really "engine" configuration. Just like running 100 jobs simultaneously is not configuration tuning, it's more like sensible choices. :lol:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The ability to capture execution statistics for active stages should be your first port of call, noting what Ken (kbland) had to say. By this means you can identify the "hot spots" where, say, a Transformer stage is spending most of its time, and address those areas first.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shin0066
Premium Member
Premium Member
Posts: 69
Joined: Tue Jun 12, 2007 8:42 am

Post by shin0066 »

Thanks for sharing very good information Guru's.

I wonder - is there a way to get a jobs statistics in terms of CPU consumption?
By having those we can work on those jobs to reduce the CPU time to complete the job.

Thanks,
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

see above, I described it
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
rjhcc
Premium Member
Premium Member
Posts: 34
Joined: Thu Jan 27, 2005 4:20 pm

Post by rjhcc »

Came back today after the weekend and was thankfull to see al of your input.

I have to say that all of these prove to be true. I wondered about the claim of dsenv not being a place to tune and was glad to see the reply stating that it is. It contains disk cache, memory, and file handling parameters among others.

I would appreciate any other posts or suggestions!
rjhcc
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

rjhcc wrote:I wondered about the claim of dsenv not being a place to tune and was glad to see the reply stating that it is. It contains disk cache, memory, and file handling parameters among others.
No, it doesn't. As noted earlier, you are confusing dsenv with uvconfig.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Job Monitor would also give you the %CPU utilisation of that job.
shin0066 wrote:Thanks for sharing very good information Guru's.

I wonder - is there a way to get a jobs statistics in terms of CPU consumption?
By having those we can work on those jobs to reduce the CPU time to complete the job.

Thanks,
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If you're tuning, you need to be aware of everything running on your DS server. Looking at cpu utilization from DS Monitor is deceiving - it only shows you what the job used. It doesn't tell you why it didn't use 100%. If a job is all file based processing, there's no network or database delays. A job will run until it runs into disk delays swapping/paging/reading/writing or cpu wait time.

Monitor is like a speedometer in your car. It tells you how fast you are going, it doesn't tell you if you're behind a slow car, stuck in traffic, going around tight corners, climbing a hill, or pulling a trailer. It doesn't tell you the circumstances dictating the speed. Tuning is identifying the circumstances and making adjustments.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

%CPU as reported by the monitor is usually wrong, for the same reasons that rows/sec is usually wrong. The clock is running even when rows are not being processed, and the time is rounded to whole seconds before the division has been done - %CPU is calculated as (CPU seconds) / (clock seconds) * 100. So it's approximate at best. The CPU figures obtained with stage tracing are in microseconds if the platform supports them; in milliseconds otherwise, and are reported raw (not as a percentage).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rjhcc
Premium Member
Premium Member
Posts: 34
Joined: Thu Jan 27, 2005 4:20 pm

an exercise in mis communication...

Post by rjhcc »

Again I return to be thankful for the suggestions!

However, I must appologize, and agree with the postings in regard to dsenv and uvconfig. Thanks for your patience with that one. I repeatedly and incorrectly called it dsenv.

Respectfully,
rjhcc
rjhcc
Premium Member
Premium Member
Posts: 34
Joined: Thu Jan 27, 2005 4:20 pm

Post by rjhcc »

Can anyone point me in the direction for a good source for explanation of the settings in the UV CONFIG. The descriptions are a little vague...
rjhcc
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The manual Administering UniVerse contains the best coverage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply