Refresh Environments - Issue related to hashed files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kinu008
Premium Member
Premium Member
Posts: 36
Joined: Fri Mar 16, 2007 1:35 pm

Refresh Environments - Issue related to hashed files

Post by kinu008 »

Hi,
Our current environment - DataStage 8.1 Server Edition as of Fix Pack 1, AIX UNIX, PeopleSoft EPM 9.1

We recently refreshed our test environment from production. We migrated all our hashed files, ETL code, database objects from production to test. On prod, some of our hashed files are in custom directory and some hashed files are in account (project) directory so we moved hashed files accordingly into similar directory structure in test.

Few things:
1. We used SCP command to move hashed files from production UNIX to test UNIX and we over wrote all existing test hashed files with production hashed files.
2. Our test and production datastage projects reside on different UNIX servers.


Issue: we have an ETL where we have a source DRS stage, transformer stage (for lookup) and target stage. Purpose of hashed lookup is to check if a row already exists in target and to stop it if it already exists in target. In Prod we have no problem, but in test hashed look up is not doing what it is supposed to do. The look up is failing for multiple records and is letting the rows to the target. I randomly checked some key values in hashed files and those records which I checked already exist in source. The same code works fine in prod but the issue is occurring in test after migration. Can any one think of where the issue is? Please let me know if you need more information.

Thank you very much in advance.

Kiran
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You may already know this but hashed files can be copied easily provided one of two conditions are met:

1. They are 'pathed' hashed files, meaning were not created in an account
2. Your account-based hashed files already exist in the target location

New account-based hashed files would not be found until a VOC record had been created for them. Not the problem you are having it would seem, just wanted to state that for the record. Note that you need to ensure all components of the hashed file are copied over, including the hidden file that controls the type, just in case that had changed or the hashed file is new. I'm assuming you are fine here and probably using dynamic (Type30) hashed files and you've confirmed all of the pieces are in place.

Now... your issue is what, exactly? You've confirmed the hashed file is populated correctly and yet when the lookup is performed and it should find a match (key values are identical) it is behaving as if the lookup failed and allowing rows to be sent to the target again?

First suggestion would be to ensure that you've verified you do not have any kind of trim issue in the test data. Meaning you understand that a string with "FRED " in it would not match to a value of "FRED". Other than that, detail for us what kind of check / constraint you are using post lookup - the NOTFOUND link variable?

I'm also curious, how did you "randomly check some key values" in the hashed file?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you missing running the job that populates the hashed file from the target table?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kinu008
Premium Member
Premium Member
Posts: 36
Joined: Fri Mar 16, 2007 1:35 pm

Post by kinu008 »

Craig & Ray,

Thank you very much for the reply.

Hashed file is pre-loaded from the database table so that it can be referenced in following job. As I mentioned earlier, the pre-loaded hashed file should stop records from source that already exist in target table. In prod we have no issues, but in test the referenced hashed file won't stop the existing records and is letting every record to go to the target. Yes Craig, we are using NOTFOUND link variable to check if the row already exists or not.

I wanted to know if copying of hashed files is allowed like the way we did? So far based on your replies, it seems like that's not a problem. Anyways, I am doing this exercise one more time to make sure we do this carefully. I will do some more testing on this newly initiated refresh process, and I will let you know if any issues.

One more question.

What is the best and fastest way to refresh TEST ETL code with PROD ETL? We have 1000's of ETLs and regular export/import is taking several days so I would like to know if there is a fast way.

Thanks again.
Kiran
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Make sure to copy hashed files recursively (use the -r option with the cp command), since you must copy DATA.30, OVER.30 and .Type30 files).

Probably the fastest way to refresh TEST from PROD is to discard the TEST environment and start by creating a new one. Next best is to be selective about what you refresh. The istool export/import may be marginally faster than the DataStage export/import, but there won't be a lot in it, as both have to accomplish the same work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kinu008
Premium Member
Premium Member
Posts: 36
Joined: Fri Mar 16, 2007 1:35 pm

Post by kinu008 »

Hi all,

I found something interesting.

Earlier I indicated that the migrated code won't work in test, but the same code works fine in prod. We migrated job designs multiple times, compiled the code in test and still it wouldn't work.

So this time I imported JUST executable from PROD to TEST and it worked perfectly fine. Job was doing exactly what it is supposed to do.

Can it be possible that the compiler in test be corrupted or set up differently? Any thoughts?

Thank you,
Kiran
kinu008
Premium Member
Premium Member
Posts: 36
Joined: Fri Mar 16, 2007 1:35 pm

Post by kinu008 »

Any thoughts on this?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

This would not be a compiler issue - unless your two environments aren't using the same version of DataStage. Are you certain that you have the same release and patch level on the machines?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What is the actual error message from the compiler?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kinu008
Premium Member
Premium Member
Posts: 36
Joined: Fri Mar 16, 2007 1:35 pm

Post by kinu008 »

ArndW,
Both test and prod are DataStage 8.1 and they are as of fix pack 1.

Ray,
There are no compiler errors. It's just that the same exact job design compiled on two different servers works differently. If I copy the executable from PROD to TEST, then ETL works fine. The executable generated on TEST isn't working correctly even though the job design was copied from PROD.

Thank you
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Check that everything about the hashed file stages is the same in both environments, particularly the "lock for update" and cache settings.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply