Deadlock in DB2 z/OS

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Deadlock in DB2 z/OS

Post by Palermo »

Hi Everyone,

I have a problem with deadlock in DB2 database (z/OS). ETL job consists of 4 DB2 connectors and change capture stage. The table contains about 1,5 million records. Updated, inserted and deleted records are about 150 thousand. The job performs commit after 100 records. The table is not used by other applications. Deadlock occurs randomly, for example, in DB2 connector for updating:
SQLExecute reported: SQLSTATE = 57033: Native Error Code = -913: Msg = [IBM][CLI Driver][DB2] SQL0913N Unsuccessful execution caused by deadlock or timeout. Reason code "00C90088". SQLSTATE=57033 (CC_DB2DBStatement::processRowStatusArray, file CC_DB2DBStatement.cpp, line 2,636)
Could you please help or give advice? Thanks.

The job and the table space for the table (1 table=1 table space):
Image
Image
Image
Image
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

This is typically caused by (improper) partitioning, which can be tested by running the job on a single node.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Post by Palermo »

Do you mean improper partitioning in the Change capture stage? I used "Hash partitioning" for before and after links. If I rerun the job can finish successfully or fail with deadlock message. That happens once a week and it is difficult to catch. I have no time and I should promote the jobs to production. I tried to reproduce the same deadlock unsuccessfully.

Data in dataset is partitioned and sorted in other job:

Image
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So this job does run on multiple nodes? And you don't have time to test it over time running on a single node before it needs to go to production? Just want to make sure we're all on the same page.

My "partitioning" comment was extremely generic. And would typically be more about the target stages.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Post by Palermo »

The job runs on multiple nodes. I think that the problem is on database side but I have no idea how to catch and reproduce the same situation and run on a single node because if I rerun the job finishes successfully. All the DB2 connectors work in sequential mode.
Post Reply