Perfromance Issues

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Aravind
Participant
Posts: 16
Joined: Mon Dec 27, 2004 4:17 pm

Perfromance Issues

Post by Aravind »

Job extracts data from Orcale tables compares the data and loads it to a flat file. One of the queries that is executed against the database is going to the hanging state.
The query fired against the database is like

SELECT DISTINCT COL1,COL2,COL3
FROM tab1
WHERE EXISTS
(
SELECT 'X' FROM TAB2
WHERE TAB2.COL1 = tab1.COL1
AND TAB2.COL2 = tab1.COL2
AND TAB2.COL3 = tab1.COL3
)

Or

Exists (
select 'X' from tab3
where upper(trim(tab3.Last_nm)) = upper(trim(tab1.LAST_NM))
and tab3.COL1 = tab1.COL1
and tab3.COL2 = tab1.COL2

Tab1 - rowcount around 10000
tab2 - rowcount around 830000
tab3 - rowcount around 400000
Query output count - around 3000


Certain days job runs for around 1 hr and finshes succesfully. Eventhough the job takes 1 hr ,performance statistcs shows 518 rows processed per second for this link . . 1 hr for processing 9000 rows , i guess is very poor performance.
job design is like

Oracle stage ---> Transformer ---> FlatFile
(Mutiple queries executed)
Is it something which i need to followup with the DBA???
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Your issue is database related 99%. Your query is ugly, and not a DS issue. The only point DS comes into play is spooling and writing the data to the output file.

I would look to load on the server, degree of parallelism requested, minimum, and actual, and other factors that will vary the performance of the query. If you are competing with other queries, you may be getting a different number of parallel slaves assigned one day to the next. Other options are that your data set varies from day to day, and therefore your runtime fluctuates. Another issue could be invalid table statistics playing with the cost based optimizer.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Better split the query into DataStage manageable chunks and design the job in DataStage.
Aravind
Participant
Posts: 16
Joined: Mon Dec 27, 2004 4:17 pm

Post by Aravind »

Thanks for the inputs.

How can we explain the link refresh rate of 500 when this link causes the job to run for more than 1 hr.
DataStage Server and Oracle Server resides in different m/c . Is the communication drop between the m/c has to do something with the job. If the communication drop happens whether we get a warning/error msg in datastage logs.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Mention to your DBA that you're wanting to execute a query that includes a correlated subquery to a 830000 row table and another correlated subquery to a 400000 row table, then run for your life! :lol:

Seriously, this is a tough query for a database server. It could probably be made much more efficient if you used two reference lookups from DataStage instead (add an additional constraint "only return one row", for example ROWNUM = 1 in Oracle).

Index the constrained columns.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You can discuss with your DBA to analyse the query, tune it with appropriate hints and indexes.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The link monitor shows rows/second based on rows traveled on that link divided by the time that the link connected to that transformer has been open.

So, if the link opens at 1:00pm it will start with the status of WAITING as the query is being prepared by the source database. Then, when the database signals it's ready to start sending data, the status changes to RUNNING. Now the clock starts. As the rows trickle from the database, the row counts will increase. However, if the database is busy scanning, finding, building, etc, there will be lapses in rows coming over. This is not an indication of network issues, but you can't tell that yet.

You will need to time the effort for the database to produce the result set. Once the result set is built, it's just an effort to spool. But, some queries don't produce a complete result set and then spool, it's more of a spool as you find qualifying rows. For example, a query with GROUP BY or ORDER BY requires that the result set be built first. But, SELECT * does not.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
bibhudc
Charter Member
Charter Member
Posts: 20
Joined: Thu Jun 19, 2003 12:26 pm

Post by bibhudc »

Sainath.Srinivasan wrote:Better split the query into DataStage manageable chunks and design the job in DataStage.
To add to what the experts have said here, here is a how you could do it in datastage manageable chunks:

Take the distinct values of Tab2. col1,col2,col3 and stage it in a hashfile HF2

Take the distinct values of Tab3.upper(trim(tab3.Last_nm)), col1,col2 and stage it in another hashfile HF3

Write this query in the source Oracle OCI :
SELECT DISTINCT COL1,COL2,COL3
FROM tab1

Then lookup on HF2 & HF3 and output to your flat file only if values exist in both hashfiles.

Alternately, you could write an outer join query to get the same results. You may need to change some Oracle session parameters (sort_area_size etc .. ask the DBAs) and may also need indexes on the columns you are using in the where clause. Since you are using a trim in one of the columns (of Tab3), the Tab3 index maynot be used by Oracle.
Bibhu
Post Reply