Update field in key when there is a match else then Insert

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
elinzeyjr
Participant
Posts: 9
Joined: Thu May 11, 2017 9:18 am

Update field in key when there is a match else then Insert

Post by elinzeyjr »

I have a requirement to read in a mainframe flat file, use two fields from the source record to see if a record with that key exists on the target greenplum table where the column END_EFF_DT = '9999-12-31'. If it exists, I need to update the target record and set the END_EFF_DT field to the value in INMOL-END-EFF-DT from the source file and then insert a new record into the target which includes all fields from the source setting END_EFF_DT = '9999-12-31' on the new record. If there is not a match, then I need to insert a new record into the target which includes all fields from the source setting END_EFF_DT = '9999-12-31' on the new record.

Example of Update statement:

UPDATE TARGET_TABLE
SET END_EFF_DT = :INMOL-END-EFF-DT (SOURCE FIELD)
WHERE MOL_PRFX_POL_NO = :INMOL-MOL-PRFX-POL-NO (SOURCE FIELD)
AND MOL_POL_NO = :INMOL-MOL-POL-NO (SOURCE FIELD)
AND END_EFF_DT = '9999-12-31'

To do this, I am doing the following:

1) created a job to read the source file and load it into a staging table
that will be truncated prior to each load.
2) Once loaded, I have an after SQL script that joins the staging table with
the target replicating the update statement listed above.
3) once the job from step 2 is completed, created a job to read the staging
table and do an insert for each record into the target table.

*****The target table is in a POSTGRESQL database(Greenplum) and we are using the native greenplum connector and utilizing the after SQL option.

*****The source file will have only approx. 100 records on a heavy day and the target table has approx. 2 million records.

My concern with taking this approach is that the number of records updated by the after SQL script will not be captured and only the inserts will be.

Any suggestions on an alternate/better approach?

I'm fairly new at datastage and would appreciate any and all feedback.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sounds a bit like you are reinventing the wheel. :wink:

What you've got is a Type 2 "Slowly Changing Dimension" and there's an actual stage dedicated to them. Regardless if you use it or not, there's no need for two jobs or anything "after SQL". If you decide to not use the stage or have a tool like we do that doesn't have anything like it, you can go all Old School on it.

High level, you bring in your source data and stage it if desired but that's technically not required. The job that targets the SCD table does a lookup, source to target, to determine if you have a new row or a changed row. Optionally, you can check to see if the existing row is actually a change by comparing what's coming in to what's already in the target and discarding the record if you've already seen the change. We have some processes that do that while others "know" that something must have changed for it to come to us so we just skip the checking part and process it as a change.

Lookup failure? Insert it into the target as a new record. Lookup succeeds? Use the looked up key to end date the existing record and again, insert the record from the source as a new record.

Hope all that helps.

ps. IMHO it is well worth your time to learn to use the SCD stage mentioned above, it can be a bit daunting at first but is good to have in yer Bag o' Tricks. 8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply