Oracle GroupBy wrong output

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ponzio
Participant
Posts: 165
Joined: Mon Dec 05, 2005 9:13 am
Location: Italy

Oracle GroupBy wrong output

Post by ponzio »

Hi.
I'm working with Datastage for years and I've never faced a problem like that.

I'm loading a dwh aggregatate table. The job is very simple

OCI (User Defied Sql) ---> seq file --->OCI (Truncate / Insert)


select field1, field2, field3, sum(field4), sum(field5)
from table
group by field1, field2, field3


The SQL launched by Datastage produces a wrong output for some group, extracting couple record for the same group by fields:

field1 field2 field3
a b c 10000 15000
a b c 20000 40000

The same query launched with SQL Developer has the right output:
field1 field2 field3
a b c 30000, 45000

Oracle db version is 10g (10.2.0.5.0).
I've read about problem related to Oracle Algorithm HASH GROUP BY.

Anyway the question is why SQL Developer gets the right result and OCI not ?

Many thanks for your help.
Andrea
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What Oracle client version are you using? While your target DB might be 10gR2 there can be two other clients involved - the one on your ETL server (assuming the target instance is located elsewhere) and the one on your PC that you used with SQL Developer. What versions are they? They could all be different.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ponzio
Participant
Posts: 165
Joined: Mon Dec 05, 2005 9:13 am
Location: Italy

Post by ponzio »

Hi Craig

I've mentioned SQL Developer but it's not pertinent because according to more recent tests the problem seems related to the output dimension.
Moreover the source and target db are the same and the Oracle Client on Datastage Server is the same for years...so I don't think it's a client version problem....

In SQL Developer I've always launched filtered query reducing the output dimension in a way.
At the same way when I launch the query with Datastage OCI using filters on the values causing the problem, this doesn't occur.
But as soon as I launch the OCI query on the full output I get the error.

Maybe the problem could be related to Oracle temporary areas ...
I've recently added a second file to TEMP tablespace and maybe not in the rigth way.
This is the script:
-- 21/7/2014
alter TABLESPACE "TEMP"
add tempfile 'D:\ORACLE\BISE1\ORADATA\BISE1DB\TEMP02.DBF'
SIZE 4096M REUSE AUTOEXTEND
ON NEXT 1024M MAXSIZE 32767M
;
When the OCI query is running Oracle fills first the second file TEMP02.DBF (5 GB added few days ago) and then the the orginal TEMP file TEM01.DBF (32 GB).
I've also read a problem with HASH GROUP BY Oracle algorithm ...


Thanks,
Andrea
ponzio
Participant
Posts: 165
Joined: Mon Dec 05, 2005 9:13 am
Location: Italy

Post by ponzio »

I've removed TEMP02.DBF ... same problem

A
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I mentioned the client issue because I have seen it cause problems so I thought it was worth checking. You've said they haven't changed for years but that doesn't really answer the question as to the version number of each, up to you if you want to double-check on that. And can't really help with the DBA questions on the temp space but don't really think that would be an issue either.
ponzio wrote:according to more recent tests the problem seems related to the output dimension.
Not really sure what this means. However, seems like you'll need to involve support of some kind for this. Probably DataStage at first and then perhaps Oracle as well seeing as how DS is just a client and really just uses OCI and SQLNet like any other tool... other than the binding of the output columns into the data flow, of course.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ponzio
Participant
Posts: 165
Joined: Mon Dec 05, 2005 9:13 am
Location: Italy

Post by ponzio »

chulett wrote:However, seems like you'll need to involve support of some kind for this. Probably DataStage at first and then perhaps Oracle as well seeing as how DS is just a client and really just uses OCI and SQLNet like any other tool... other than the binding of the output columns into the data flow, of course.
You're right. I need to contact support.
But first of all I'll try to launch the full query with Sql Plus and with another etl tool to confirm it's not a client problem.


Thanks.
Andrea
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Just a note to say that I've seen "client issues" with Oracle where everything works great for years and then one specific thing doesn't work properly - something similar to what you might be seeing. Oddly enough, there can be bugs in Oracle releases :wink: and you may find that there are known issues with the exact version you are on and may need to update to a higher maintenance release.

For whatever that is worth.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try forcing the Oracle SELECT stage to execute in sequential mode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ponzio
Participant
Posts: 165
Joined: Mon Dec 05, 2005 9:13 am
Location: Italy

Post by ponzio »

Hi Ray
ray.wurlod wrote:Try forcing the Oracle SELECT stage to execute in sequential mode. ...
sorry for the question but how can do it ?

Anyway, the problem is not in Datastage because it occurs also in SQL Developer , SQL*Plus and using other ETL tool.

Oracle gives a bigger numer of record than the correct one, in fact Aggregator Stage (after OCI or Connector) can reduce the number of rows from 4022708 to 3966088 rows (aggregating on the same query group by columns).

This could be a work around but it's not acseptable in my opinion.
I've requested Oracle support.

Many thanks
Andrea
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Mind the forum and job type, Ray. :wink:

Andrea, he's thinking you're running a PX job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ponzio
Participant
Posts: 165
Joined: Mon Dec 05, 2005 9:13 am
Location: Italy

Post by ponzio »

Ok I was supposed too ;-)
I let you know what Oracle says, maybe it could be useful for someone else.

Thanks,
Andrea
Post Reply