Datastage server crashed on failover to secondary server

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Madhumitha_Raghunathan
Premium Member
Premium Member
Posts: 59
Joined: Fri Apr 22, 2011 8:02 am

Datastage server crashed on failover to secondary server

Post by Madhumitha_Raghunathan »

Hi,
We have IIS 11.5 is installed on active / passive cluster (Power HA Version 7). When the application fails over to secondary host, and engine starts up more than 45k DB connections are opened to XMETA (sample below) and server crashes due to memory spike to 100% and swap to 99%. The application comes up fine on the primary host and we dont have that many internal connections and it is stable. All the tiers are installed on one single host.

tcp4 0 0 iis-uat.db2c_db2 iis-uat.33010 ESTABLISHED
tcp 0 0 iis-uat.33010 iis-uat.db2c_db2 ESTABLISHED
tcp4 0 0 iis-uat.db2c_db2 iis-uat.33013 ESTABLISHED
tcp 0 0 iis-uat.33013 iis-uat.db2c_db2 ESTABLISHED
tcp4 0 0 iis-uat.db2c_db2 iis-uat.33016 ESTABLISHED
tcp 0 0 iis-uat.33016 iis-uat.db2c_db2 ESTABLISHED

Would be great if someone could help. Appreciate it.
Thanks,
Madhumitha
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Is this your first failover?
Madhumitha_Raghunathan
Premium Member
Premium Member
Posts: 59
Joined: Fri Apr 22, 2011 8:02 am

Post by Madhumitha_Raghunathan »

Yes. another thing that I forgot to mention is that the repository was installed first IBM DB2 and the databases and objects were created with the scripts provided by IBM. The Engine and services were installed from the IIS installation pkg after HACMP was setup.
Thanks,
Madhumitha
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

When the DB fails over, you will still have to restart WAS and DataStage. Did you do that?

The nature of the failover... Is the database still accessible by the name defined in Version.xml and the DB2 Catalog?
Madhumitha_Raghunathan
Premium Member
Premium Member
Posts: 59
Joined: Fri Apr 22, 2011 8:02 am

Post by Madhumitha_Raghunathan »

Yes. Our start order is always DB2, WAS, NodeAgent and Engine. Also once all of these tiers startup, I am able to login to datastage also. But the number of connections increase drastically after 5 minutes and the failover server crashes due to memory and paging space issue. We have 48 GB of memory on each box. The whole process takes up only 5 GB of memory on the primary server but even 48 GB is not enough on failover.
Thanks,
Madhumitha
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Ahh so the failover was as a result of a broken process and not a planned failover test?
Madhumitha_Raghunathan
Premium Member
Premium Member
Posts: 59
Joined: Fri Apr 22, 2011 8:02 am

Post by Madhumitha_Raghunathan »

When IIS is up and running on the primary host it is all good and doesn't consume much memory (about 5 GB out of available 48GB). We are trying to switch to the secondary server to complete a maintenance activity on the primary host and not because of a broken process. When we failover, we are moving the resource groups to the failover box and the application starts up fine and I am able to login as well. After about 5 minutes the failover server crashes as it exceed the 48GB due to the looping DB connections.
Thanks,
Madhumitha
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Get IBM engaged.

We won't be able to do much from here since we are not poking on your system.

I would still try to look at Version.xml to see if you can communicated properly to your xmeta and the entry that is cataloged in db2.

You should run your check xmeta script to see if that communication path is good.

$DSHOME/../../ASBServer/bin/AppServerAdmin.sh -w
Post Reply