Incremental load with no key field at source

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
rafik2k
Participant
Posts: 182
Joined: Wed Nov 23, 2005 1:36 am
Location: Sydney

Incremental load with no key field at source

Post by rafik2k »

This is one of the interveiw question.

Source tables/files does not contain any key field.

How to implement incremental load in the target tables?

I was guesing using unix shell script commands like compare and other commmands, it be can be possible.

Any other tips/solution would be a great help.

Thanks in advance

Regards,
Rafiq
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Re: Incremental load with no key field at source

Post by sud »

rafik2k wrote:This is one of the interveiw question.

Source tables/files does not contain any key field.

How to implement incremental load in the target tables?

I was guesing using unix shell script commands like compare and other commmands, it be can be possible.

Any other tips/solution would be a great help.

Thanks in advance

Regards,
Rafiq
The "key" to understanding whether a incoming record is already present or not are the "key columns". When you don't have any, what it necessarily means is that 1>no primary keys are specified in the table or 2>there are actually no keys. Option 1 gives you the alternate or natural keys based on which you can do a change capture in datastage. Option 2 tells you that all the columns are a key, so you do a change capture on all columns as keys.

In a practical situation, I cannot think of a Logical Datamodel where you have a table which does not have any keys and for which you want to write incremental load ETL. Probably when you find one, you should bust your data modeller.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Without a key what identifies the records to compare from one run to the next?

If there is no natural key in the source field then all you can test for is whether their is already a record in the target that is exactly the same, in which case all fields are key fields???

Or is this a trick question? was there a modified time field in the source?
Regards,

Nick.
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

If you trust your load portion of ETL, you can compare yesterday's full load file with today's full load file to identify inserts, updates, and deletes.

Look at the unix "comm" command to compare your load files.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

It is a trick question. An incremental load is based on only selecting records in the source which have changed. You can do that without keys. All you need is last updated field in the source.

What to update in the target may require keys.
Mamu Kim
katz
Charter Member
Charter Member
Posts: 52
Joined: Thu Jan 20, 2005 8:13 am

Post by katz »

There might be a source system log/journal file from which you can extract the incremental set of source data. Updating the target without key definitions is a bit harder to do.
rafik2k
Participant
Posts: 182
Joined: Wed Nov 23, 2005 1:36 am
Location: Sydney

Post by rafik2k »

chucksmith wrote:If you trust your load portion of ETL, you can compare yesterday's full load file with today's full load file to identify inserts, updates, and deletes.

Look at the unix "comm" command to compare your load files.
Thanks for your tips
Though I can indentify changed records using unix utilities, but how can I update or deletes these recorods at target database without any key :?:
rafik2k
Participant
Posts: 182
Joined: Wed Nov 23, 2005 1:36 am
Location: Sydney

Post by rafik2k »

kduke wrote:It is a trick question. An incremental load is based on only selecting records in the source which have changed. You can do that without keys. All you need is last updated field in the source.

What to update in the target may require keys.
I also believe that it is trick question.
That means if we have last updated field in the source then we can insert fresh records only.

We can't assume all fields are key fields as some field even may contain null values.

So updating existing records in target not possible without key.
Post Reply