DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
d-woo
Participant



Joined: 18 Feb 2015
Posts: 5
Location: United States
Points: 81

Post Posted: Fri Jan 25, 2019 7:36 am Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
We're having intermittent job hanging running 11.5.0.2, updating tables against Oracle 11g targets on a client database. We run hundreds of jobs per day that perform various table updates. Each day, we can have between zero and 4 jobs hang at various times. The only common theme appears to be that we are attempting to connect in parallel to the target table, and in the log in Director, we only see one connection success. Then the job hangs until we kill it.

It is rarely the same job and is not specific to a time or node we're on. In every case, we're able to kill and then successfully rerun the job. We have attempted multiple straces and suggestions from Fix Central, but haven't really made any progress. The latest evidence notes that this may be an issue when attempting the parallel connect as we do not see anything further than the process executing OCIServerAttach.

We continue to debug, just hoping someone has seen this behavior before. We have other jobs running in the same environment that do not establish an Oracle connection to a target, all of those jobs run without issue. Thanks.
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42926
Location: Denver, CO
Points: 221381

Post Posted: Fri Jan 25, 2019 7:51 am Reply with quote    Back to top    

Can you confirm for us what target stage you are using, please?

_________________
-craig

The Old Ones were, the Old Ones are, and the Old Ones shall be. Not in the spaces we know, but between them. They walk serene and primal, undimensioned and to us unseen.
Rate this response:  
Not yet rated
d-woo
Participant



Joined: 18 Feb 2015
Posts: 5
Location: United States
Points: 81

Post Posted: Fri Jan 25, 2019 8:13 am Reply with quote    Back to top    

My fault. We are using an Oracle connector stage.

Here's an example of what we see in the logs:

Orc_insert_update_table: The connector connected to Oracle server TARGET1A.
Orc_insert_update_table: Connector code revision: 263892_0223017
Orc_insert_update_table: The connector will run in parallel on 2 processing nodes.

But then later, we only see this one entry, whereas for the other tables there are both a 0 and 1 instance:

Orc_insert_update_table,0: The connector connected to Oracle server TARGET1A.

When we rerun, we'll see both entries and the job is successful:

Orc_insert_update_table,0: The connector connected to Oracle server TARGET1A.
Orc_insert_update_table,1: The connector connected to Oracle server TARGET1A.


This is not specific to an insert/update, we see this with deletes as well. It appears we are unable to establish a connection although the original "non-parallel" log states we have connected, but the second node does not appear to connect and just hangs forever.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54501
Location: Sydney, Australia
Points: 295558

Post Posted: Sat Jan 26, 2019 8:40 pm Reply with quote    Back to top    

Check with your Oracle DBA how many listeners are available. Work out how many the totality of DataStage jobs are trying to connect to Oracle at the same time.

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42926
Location: Denver, CO
Points: 221381

Post Posted: Sun Jan 27, 2019 4:00 am Reply with quote    Back to top    

Yeah, wondering as well if your DBA has been involved yet. Several limits could in play per user, per session, etc... which I imagine would show up in the alert log. Or could be monitored.

_________________
-craig

The Old Ones were, the Old Ones are, and the Old Ones shall be. Not in the spaces we know, but between them. They walk serene and primal, undimensioned and to us unseen.
Rate this response:  
Not yet rated
PaulVL



Group memberships:
Premium Members

Joined: 17 Dec 2010
Posts: 1267

Points: 8325

Post Posted: Mon Jan 28, 2019 5:10 am Reply with quote    Back to top    

Is your environment set in a grid where the nodes get distributed to different hosts? Or do you have a consistent hard coded APT file?
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 735
Location: Malvern, PA
Points: 6974

Post Posted: Mon Jan 28, 2019 6:40 am Reply with quote    Back to top    

I'll describe a similar issue we had recently with the same symptoms. The DB is MSSQL Server, and I don't know if Oracle should handle this differently.

We transmit up to 1,000 rows per day for an "upsert", except that the command line has separate queries for update and insert. Running on two nodes meant that an unintended pair of key column duplicates were on separate nodes, and when the rows arrived they created a contention that caused the DB session to hang. We resolved it by reducing to one node in default.apt, with the rows coming in sequentially and being processed separately.

Our key column duplication was caused by truncation. The lesson for us was to check for duplication in the data, and bring out the nerf bats to use on the project team that failed to fix the code to avoid the truncation.

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42926
Location: Denver, CO
Points: 221381

Post Posted: Mon Jan 28, 2019 10:55 am Reply with quote    Back to top    

From what I've seen, Oracle would handle it much the same way. Should show up as a "deadlock" in the alert logs, I believe.

_________________
-craig

The Old Ones were, the Old Ones are, and the Old Ones shall be. Not in the spaces we know, but between them. They walk serene and primal, undimensioned and to us unseen.
Rate this response:  
Not yet rated
rkashyap



Group memberships:
Premium Members

Joined: 02 Dec 2011
Posts: 521
Location: Richmond VA
Points: 4709

Post Posted: Tue Jan 29, 2019 4:22 am Reply with quote    Back to top    

We had faced similar situation and our support provider suspected it to be a symptom of oracle bug 10178982, which was related to partition maintenance.
Similar to Franklin's suggestion, workaround was to run Oracle connector in sequential mode.
Rate this response:  
Not yet rated
d-woo
Participant



Joined: 18 Feb 2015
Posts: 5
Location: United States
Points: 81

Post Posted: Fri Feb 01, 2019 2:35 am Reply with quote    Back to top    

Thank you all. We're looking in to the user/sessions/connections limitations with our Oracle connectivity now. One would think we'd have a job failure, rather than a hang though. We're also looking into the deadlock possibility, but again I'd think that we'd receive an error there and a failure. None of these jobs fail, despite leaving them running for hours.

We have turned on debug logging for several jobs on the Oracle connectors in the hopes we'll see a hang on one of them. We've had two clean days now, without making any changes.

Does anyone know if there's a way to set the Oracle connectors default to run sequentially at a project level? We have hundreds of jobs set up and do not want to modify each if we can avoid it.

Also - we are running grid with work being distributed to different hosts, and the issue isn't specific to one node. It's completely random - different times, different jobs, different days. There's no consistency in the hangs. Extremely frustrating at this point.

Thank you all for your suggestions thus far!
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 735
Location: Malvern, PA
Points: 6974

Post Posted: Fri Feb 01, 2019 6:30 am Reply with quote    Back to top    

d-woo,

I understand your frustration very well. We were lucky in that the app is very small and has a maximum data volume of about 5,000 records. We don't need more than one node.

There should be an environment variable which you can use in each job that needs to run on only one node. I don't know how to find it, hopefully someone else reading your thread will know.

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42926
Location: Denver, CO
Points: 221381

Post Posted: Sat Feb 02, 2019 3:42 am Reply with quote    Back to top    

You are right in that Oracle deadlocks will generate specific errors / failures when detected. However, since Oracle has been known to have the occasional bug or forty at any given time for any given version, there could be issues with communicating that back to the connector via the client or even perhaps with the connector detecting that. But it would definitely be in the alert logs which hopefully your DBA is monitoring for you when this happens.

Involve support if you haven't already.

_________________
-craig

The Old Ones were, the Old Ones are, and the Old Ones shall be. Not in the spaces we know, but between them. They walk serene and primal, undimensioned and to us unseen.
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours