Slow DataStage performance

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Slow DataStage performance

Post by jackson.eyton »

Hi everyone,
So I have been wondering how to present this question to IBM and figured I would ask here first. A quick summary is that DataStage Designer and all other applications that relate to DataStage (administrator, director, etc) will periodically tweak out and become incredibly slow. There are two scenarios in which this happens.

1. If my fellow ETL developer and I are both in DataStage Designer, working our separate jobs, and one of us decides to run a job, DataStage will then crunch for both of us until that job has completed. Meaning if he runs his job, I then cannot open any stages in my job without DataStage going "Not Responding" until his job has completed. This seems very odd to me.

2. Various scenarios can occur where DataStage Designer simply starts hanging up for one of us for no obvious reason. Notably yesterday I highlighted about 8 jobs that were backup copies and attempted to delete them. My DataStage stopped responding and as soon as that happened my coworker's DataStage started working incredibly slow. I waited an hour and ended up killing the process and disconnecting all of my sessions via the admin console. The slowness issue however did not resolve and persists even today. We have had this happen before with no obvious rhyme or reason and we've had to reboot the server to correct this.

We have a development environment on one server, we'll call it Dev1 and our web services for infosphere on another server, we'll call Svc1. These servers both run fully patched windows sever 2012. We have monitored the system performance and resources of both servers during both the above scenarios and there are no persistent spikes of any resources (CPU, Memory, Disk, or even Network). We are using Fat clients, not terminal services so the bog down of DataStage Designer when another individual is even running a job confuses me.

Does anyone have any suggestions on where to start with this?

*EDIT*
We have multiple projects on this development server and I have confirmed that the performance issue is persistent among the projects, pointing to the server as the common denominator.

*UPDATE*
We were able to get some improvement by closing the log pane in Designer. Thus we cleared some of the logs, which retention is set to the last 3 runs of a job, and this also helped some. Opening a stage in a job and looking at its properties for example now seems normal. However, saving a new job and compiling still take far longer than a freshly booted server. A job as simple as SRC---TRXFM---DEST would normally take 30 seconds or so to compile, now takes nearly 10 minutes (6 minutes as of last clocking).
Last edited by jackson.eyton on Wed Nov 29, 2017 12:11 pm, edited 2 times in total.
-Me
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi
I have had scenario 1 when working from home and the connection not as good as in the office. I found that turning off "show performance statistics" in designer helped a lot.

Hope that helps,
Bob
Bob Oxtoby
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

I may give that a shot and see if that makes any improvement, obviously we wont want to leave it off permanently. What strikes me as really odd is that it happens at all. I can understand MY Designer going slow when I run a job MYSELF. However, when my coworker runs a job, that should not slow down my Designer for things as simple as opening a transform stage and getting to the stage properties. Its almost as if Designer runs from as an instance from the server itself and isn't REALLY a fat client.
-Me
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Slow DataStage performance

Post by chulett »

jackson.eyton wrote:A job as simple as SRC---TRXFM---DEST would normally take 30 seconds or so to compile, now takes nearly 10 minutes (6 minutes as of last clocking).
Is the same thing true if a transformer is not involved? Don't forget they compile down to C++ code and we've seen sites with a single concurrent user compiler license...
-craig

"You can never have too many knives" -- Logan Nine Fingers
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

THAT is an interesting question.... At this stage of our warehouse development, most of our jobs contain at least one transform stage. Could you point me in the right direction where to check our compiler license status?
-Me
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

We don't generally run into that problem but we don't run our servers on Windows either.

I would be mostly suspicious of any kind of security software, especially anti-virus software running scans on your clients, or even worse, on your server.

My next suspicion would be gremlins on your network. Years ago we did have some extreme network problems that would crop up and bring everything to a grinding halt.
Choose a job you love, and you will never have to work a day in your life. - Confucius
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

Yea, we have done the standard AV disabling, as well as network sniffing and monitoring and so far that all looks fine. I was able to verify the CPU is pegged when I run a transfornation job.

This does not occur when compiling a job. I did a little digging into the compiler license but I am unsure exactly how to confirm how many concurrent compiler licenses we have available. Additionally there are only two of us so we rarely compile anything at the same time. Which from the following quote is the real benefit of being multi-licensed?
"For some compilers, each developer must have a license at the time that the developer compiles the job with the Designer client. The maximum number of simultaneous processes that compile jobs determines the number of licenses."
-SRC https://www.ibm.com/support/knowledgece ... s_cpp.html

As of now the server is still acting up as we are refusing to reboot it just to temporarily resolve the pain. As such, things like just running a job take 2+ minutes between the job log says starting job, and data actually starts processing. :-(
-Me
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Only two of you? Not so much an issue then. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

Correct, as far as the compiler licensing goes. However the issues we see when we run jobs in addition to the fact that there are only two of us, really raises a red flag for me. I imagine there are companies with dozens of developers all building and testing jobs. Its rather odd that me running a job causes my partners Designer to slow to a crawl and vice versa...

Does anyone have the actual InfoEng server recommended specifications? What I am finding from the IBM knowledge center doesn't make a whole lot of sense.
-Me
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

it's not odd if someone went and changed your apt file and went nuts with the quantity of nodes.
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

I have done some playing around with those myself in the past, however our APT settings are configured as a two node system. Our Dev server does only have two cores however. I am of the mind that the server cores should be the result of the following equation: (N * E)+1
Where N is the number of Nodes in APT and E is the number of employees who could be running jobs simultaneously.

Additionally this is not the whole of our issue as the performance hit can occur and maintain itself even when jobs are not running. Such that simply opening jobs, or stages in a job, opening wizards, and even creating build packages are all extremely sluggish.
-Me
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Perhaps you are on a virtual server and other virtual machines on the same hardware are hammering away. It could be thin-provisioned and overloaded.
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

On the subject of cores versus nodes, don't forget that "cores" are a physical concept while "nodes" are a logical concept... basically worker threads in the O/S where you typically have no control over which ones run where. There's WAY more to it than that, of course and I'm sure Ray has given a masters class or twelve on the subject but wanted to put out there that there is no simple equation for the number of nodes any given system could support.

<tip-toes silently away from the keyboard>
-craig

"You can never have too many knives" -- Logan Nine Fingers
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

qt_ky,
Yes our server is a vm on a cluster, and our IT department is resistant to adding more cores to our vm as that could potentially decrease performance due to the server having to potentially wait for a higher number of cores to become available.

chulett,
..... O_o ....well hmm....
-Me
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Here's my two cents:

1) I dislike VMs for any ETL work. To much politics with the back room server folks. You've paid a lot of money to license this software, it's used to push the data that your company feeds off of... get some hardware under it, not a virtual layer.

2) Ditch Windows and go to Unix for this ETL tool.

3) My phone has more cores than your company ETL tool host.

4) Do you have your xmeta running on the same host as your engine? You could farm that off onto a separate host. Thus freeing up CPU/Mem. I would keep the domain tier (WAS) with the engine.


Once you said you had two cores running on windows... there is little surprise that you are getting laggy performance once someone actually runs a job and person number 2 tries to do anything else.


Given that you have 2 cores... I am guessing that money is an issue for your dept. I would recommend getting a second host (since hosts are cheaper than software licensing for DataStage) and farm off all non essential "Data Manipulation" activities to that. Like a zip, an SFTP, etc... have the data on a shared mount. But that zip and ftp, placed upon a different host, will free up CPU and Memory on your primary ETL host. Thus you can run more concurrent jobs. You should write a standard set of scripts to consistently farm off that work to the other box.
Post Reply