Job extracts data from Orcale tables compares the data and loads it to a flat file. One of the queries that is executed against the database is going to the hanging state.
The query fired against the database is like
SELECT DISTINCT COL1,COL2,COL3
FROM tab1
WHERE EXISTS
(
SELECT 'X' FROM TAB2
WHERE TAB2.COL1 = tab1.COL1
AND TAB2.COL2 = tab1.COL2
AND TAB2.COL3 = tab1.COL3
)
Or
Exists (
select 'X' from tab3
where upper(trim(tab3.Last_nm)) = upper(trim(tab1.LAST_NM))
and tab3.COL1 = tab1.COL1
and tab3.COL2 = tab1.COL2
Tab1 - rowcount around 10000
tab2 - rowcount around 830000
tab3 - rowcount around 400000
Query output count - around 3000
Certain days job runs for around 1 hr and finshes succesfully. Eventhough the job takes 1 hr ,performance statistcs shows 518 rows processed per second for this link . . 1 hr for processing 9000 rows , i guess is very poor performance.
job design is like
Oracle stage ---> Transformer ---> FlatFile
(Mutiple queries executed)
Is it something which i need to followup with the DBA???
Perfromance Issues
Moderators: chulett, rschirm, roy
Your issue is database related 99%. Your query is ugly, and not a DS issue. The only point DS comes into play is spooling and writing the data to the output file.
I would look to load on the server, degree of parallelism requested, minimum, and actual, and other factors that will vary the performance of the query. If you are competing with other queries, you may be getting a different number of parallel slaves assigned one day to the next. Other options are that your data set varies from day to day, and therefore your runtime fluctuates. Another issue could be invalid table statistics playing with the cost based optimizer.
I would look to load on the server, degree of parallelism requested, minimum, and actual, and other factors that will vary the performance of the query. If you are competing with other queries, you may be getting a different number of parallel slaves assigned one day to the next. Other options are that your data set varies from day to day, and therefore your runtime fluctuates. Another issue could be invalid table statistics playing with the cost based optimizer.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Thanks for the inputs.
How can we explain the link refresh rate of 500 when this link causes the job to run for more than 1 hr.
DataStage Server and Oracle Server resides in different m/c . Is the communication drop between the m/c has to do something with the job. If the communication drop happens whether we get a warning/error msg in datastage logs.
How can we explain the link refresh rate of 500 when this link causes the job to run for more than 1 hr.
DataStage Server and Oracle Server resides in different m/c . Is the communication drop between the m/c has to do something with the job. If the communication drop happens whether we get a warning/error msg in datastage logs.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Mention to your DBA that you're wanting to execute a query that includes a correlated subquery to a 830000 row table and another correlated subquery to a 400000 row table, then run for your life!
Seriously, this is a tough query for a database server. It could probably be made much more efficient if you used two reference lookups from DataStage instead (add an additional constraint "only return one row", for example ROWNUM = 1 in Oracle).
Index the constrained columns.
Seriously, this is a tough query for a database server. It could probably be made much more efficient if you used two reference lookups from DataStage instead (add an additional constraint "only return one row", for example ROWNUM = 1 in Oracle).
Index the constrained columns.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
The link monitor shows rows/second based on rows traveled on that link divided by the time that the link connected to that transformer has been open.
So, if the link opens at 1:00pm it will start with the status of WAITING as the query is being prepared by the source database. Then, when the database signals it's ready to start sending data, the status changes to RUNNING. Now the clock starts. As the rows trickle from the database, the row counts will increase. However, if the database is busy scanning, finding, building, etc, there will be lapses in rows coming over. This is not an indication of network issues, but you can't tell that yet.
You will need to time the effort for the database to produce the result set. Once the result set is built, it's just an effort to spool. But, some queries don't produce a complete result set and then spool, it's more of a spool as you find qualifying rows. For example, a query with GROUP BY or ORDER BY requires that the result set be built first. But, SELECT * does not.
So, if the link opens at 1:00pm it will start with the status of WAITING as the query is being prepared by the source database. Then, when the database signals it's ready to start sending data, the status changes to RUNNING. Now the clock starts. As the rows trickle from the database, the row counts will increase. However, if the database is busy scanning, finding, building, etc, there will be lapses in rows coming over. This is not an indication of network issues, but you can't tell that yet.
You will need to time the effort for the database to produce the result set. Once the result set is built, it's just an effort to spool. But, some queries don't produce a complete result set and then spool, it's more of a spool as you find qualifying rows. For example, a query with GROUP BY or ORDER BY requires that the result set be built first. But, SELECT * does not.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
To add to what the experts have said here, here is a how you could do it in datastage manageable chunks:Sainath.Srinivasan wrote:Better split the query into DataStage manageable chunks and design the job in DataStage.
Take the distinct values of Tab2. col1,col2,col3 and stage it in a hashfile HF2
Take the distinct values of Tab3.upper(trim(tab3.Last_nm)), col1,col2 and stage it in another hashfile HF3
Write this query in the source Oracle OCI :
SELECT DISTINCT COL1,COL2,COL3
FROM tab1
Then lookup on HF2 & HF3 and output to your flat file only if values exist in both hashfiles.
Alternately, you could write an outer join query to get the same results. You may need to change some Oracle session parameters (sort_area_size etc .. ask the DBAs) and may also need indexes on the columns you are using in the where clause. Since you are using a trim in one of the columns (of Tab3), the Tab3 index maynot be used by Oracle.
Bibhu