Duplicated entries although using RemoveDuplicates stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Could you add a peek of all rows between remove duplicates and lookup and manually check row 2257 +- 1?
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Post by manuel.gomez »

ArndW wrote:Could you add a peek of all rows between remove duplicates and lookup and manually check row 2257 +- 1?
I know that Peek stage gets data from an input (in this case, the rem dup stage) ... but....can output this same data to the lookup?
How shall I check those records manually? in the peek? how?

Thanks a lot
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Post by manuel.gomez »

manuel.gomez wrote:
ArndW wrote:Could you add a peek of all rows between remove duplicates and lookup and manually check row 2257 +- 1?
I know that Peek stage gets data from an input (in this case, the rem dup stage) ... but....can output this same data to the lookup?
How shall I check those records manually? in the peek? how?

Thanks a lot
OK, I did it myself. I put a Peek stage between RemDup and Lookup stage, but I did not get too much help.

I configured the peek like this:
All records = False
Number of records = 3
Period = 2256
It gave me this, but I dont know if I am getting row 2257, it does not seem to be correct
Peek_50,0: REF_OFERTA:10 CIF:A08829699 COD_GESTOR:mruizji COD_EST_OF:AC IN_ANEXOS:NO EST_PRO:NO
Peek_50,0: REF_OFERTA:12880 CIF:Q2800540C COD_GESTOR:acalvo1 COD_EST_OF:AC IN_ANEXOS:NO EST_PRO:NO
Peek_50,0: REF_OFERTA:15076 CIF:B38480752 COD_GESTOR:rperezd6 COD_EST_OF:AC IN_ANEXOS:NO EST_PRO:NO
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

manuel.gomez wrote:
ArndW wrote:Could you add a peek of all rows between remove duplicates and lookup and manually check row 2257 +- 1?
I know that Peek stage gets data from an input (in this case, the rem dup stage) ... but....can output this same data to the lookup?
How shall I check those records manually? in the peek? how?

Thanks a lot
use a copy stage between remove duplicate and lookup and from copy stage pass it to peek(to anlyze that in log)/sequential file/dataset where you want to analyze tha data.

try this

split the job in to 2 in job1 read it from ODBC and after the transformation put it in a dataset.

in job2 perform lookup

run that and reply with the result

Regards,
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

1 more thing check the keys defined in remove duplicate and lookup i hope both are same

and dont add period
it will give you 1 record after each XXXX record defined as period
use skip instead.

get record number 2257 and add the keys in transformer above in constraint and check how many records you get?

Can you tell us what properties actually you have defined in remove duplicate explicitly (Not the default properties)
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Post by manuel.gomez »

priyadarshikunal wrote:1 more thing check the keys defined in remove duplicate and lookup i hope both are same
uuuupssss........it seems you found the problem......

Anyway, this fixes the warning in the lookup stage, but still I cant get desired results

This just became a sql query issue, you may help me, because I must be missing something really stupid

How is it possible this query:
SELECT G.REF_OFERTA , R.COD_GR_EMP
FROM
REL_GR_EMP_EMP R, GRUPOS_EMPRESARIALES GR
COFER_LISTA_CIFS C, COFER_DATOS_GRAL G,

WHERE
R.COD_GR_EMP = GR.COD_GR_EMP AND
R.CIF = C.CIF AND
C.REF_OFERTA = G.REF_OFERTA AND
G.COD_EST_OF='AC'
returns 9851 rows ,

and this one only 9785:
SELECT A.COD_GR_EMP
FROM
(SELECT
REL_GR_EMP_EMP.COD_GR_EMP,
REL_GR_EMP_EMP.CIF
FROM
dbo.GRUPOS_EMPRESARIALES AS GRUPOS_EMPRESARIALES
INNER JOIN
dbo.REL_GR_EMP_EMP AS REL_GR_EMP_EMP
ON GRUPOS_EMPRESARIALES.COD_GR_EMP = REL_GR_EMP_EMP.COD_GR_EMP
GROUP BY
REL_GR_EMP_EMP.COD_GR_EMP,
REL_GR_EMP_EMP.CIF ) A ,
(SELECT COFER_LISTA_CIFS.CIF
FROM
dbo.COFER_DATOS_GRAL AS COFER_DATOS_GRAL
INNER JOIN
dbo.COFER_LISTA_CIFS AS COFER_LISTA_CIFS
ON COFER_DATOS_GRAL.REF_OFERTA = COFER_LISTA_CIFS.REF_OFERTA
WHERE COFER_DATOS_GRAL.COD_EST_OF = 'AC'
GROUP BY COFER_LISTA_CIFS.CIF ) B

WHERE A.CIF = B.CIF
For me, they are doing the same (but they obviously dont, as I dont get same results)

Thanks for your help!!!!
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

manuel.gomez wrote:
priyadarshikunal wrote:1 more thing check the keys defined in remove duplicate and lookup i hope both are same
uuuupssss........it seems you found the problem......

Anyway, this fixes the warning in the lookup stage, but still I cant get desired results

This just became a sql query issue, you may help me, because I must be missing something really stupid

How is it possible this query:
SELECT G.REF_OFERTA , R.COD_GR_EMP
FROM
REL_GR_EMP_EMP R, GRUPOS_EMPRESARIALES GR
COFER_LISTA_CIFS C, COFER_DATOS_GRAL G,

WHERE
R.COD_GR_EMP = GR.COD_GR_EMP AND
R.CIF = C.CIF AND
C.REF_OFERTA = G.REF_OFERTA AND
G.COD_EST_OF='AC'
returns 9851 rows ,

and this one only 9785:
SELECT A.COD_GR_EMP
FROM
(SELECT
REL_GR_EMP_EMP.COD_GR_EMP,
REL_GR_EMP_EMP.CIF
FROM
dbo.GRUPOS_EMPRESARIALES AS GRUPOS_EMPRESARIALES
INNER JOIN
dbo.REL_GR_EMP_EMP AS REL_GR_EMP_EMP
ON GRUPOS_EMPRESARIALES.COD_GR_EMP = REL_GR_EMP_EMP.COD_GR_EMP
GROUP BY
REL_GR_EMP_EMP.COD_GR_EMP,
REL_GR_EMP_EMP.CIF ) A ,
(SELECT COFER_LISTA_CIFS.CIF
FROM
dbo.COFER_DATOS_GRAL AS COFER_DATOS_GRAL
INNER JOIN
dbo.COFER_LISTA_CIFS AS COFER_LISTA_CIFS
ON COFER_DATOS_GRAL.REF_OFERTA = COFER_LISTA_CIFS.REF_OFERTA
WHERE COFER_DATOS_GRAL.COD_EST_OF = 'AC'
GROUP BY COFER_LISTA_CIFS.CIF ) B

WHERE A.CIF = B.CIF
For me, they are doing the same (but they obviously dont, as I dont get same results)

Thanks for your help!!!!
as we don't know the keys and the relationship between all the tables used in the query, i am unable to answer.

however both queries are different, once grouped on few columns makes the result a set of unique records. Get help from someone in your team, hope others can find one's fault easily. :wink:

and also don't post two queries in one thread.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

manuel.gomez - just an additional tidbit of information regarding the PEEK stage. You can put it inline into a data stream, but you need to ensure that you make it use ALL data rows and just drag the input columns to output.
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Post by manuel.gomez »

Thanks ArndW

Can anybody help me with the queries? I know this is not datastage question, but I am so frustrated about this........For me....they are SO the same!!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Relating to "duplicates coming out of Remove Duplicates" stage, no-one seems to have picked up on the possibility that the data are not partitioned on the keys used to identify duplicates. This would cause the symptom described.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dinakaran_s
Participant
Posts: 22
Joined: Wed Jul 02, 2008 7:01 am
Location: London

Sort the row before removing duplicates

Post by dinakaran_s »

Hi,

When ever you use "Remove Duplicate" stage, the incoming data should be sorted. As per your design, i don't think your imcoming are not properly sorted. Please try to sort the row before the "Remove Duplicate" stage.
Post Reply