handling DELETES in DW.

Moderators: chulett, rschirm

Post Reply
Madhav_M
Participant
Posts: 43
Joined: Sat Jul 10, 2004 5:47 am

Post by Madhav_M »

Hi Satish
This defeats the purpose of having datawarehouse in place!! Ok.. if you want to keep your target intact with source.. only way is reload the target on a defined frequency!! BTW what is your load frequecy?
satyasur
Participant
Posts: 15
Joined: Fri Apr 07, 2006 12:11 am

@ madhav

Post by satyasur »

hello sir,

I have mentioned that we do not intend to actually delete the records.
We plan to mark records with a flag, say 'D' for deleted.

we have huge tables with millions of rows in them.
so we cannot afford to load them every time.
so we are looking at CDC kind of an approach.
we have a constraint of not putting the operational system( source application ) on pressure by writing triggers.
also we want the solution to be generic( in terms of database as we have different DBs (oracle,DB2,MS SQL server).
That was the reason I wanted to know if any of you have used any tool to perform such a task.


Best regards,
Satish.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You're on the right track. Once you detect that a source record has been deleted (and there's lots of ways to do that), simply update the "active" indicator in the Data Warehouse.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
satyasur
Participant
Posts: 15
Joined: Fri Apr 07, 2006 12:11 am

Post by satyasur »

hello Ray sir,

Thanks for the motivation :)

I was interested in knowing the first part : detecting the deleted rows.
I had come across CDC as a viable method.
So wanted to know more about it.
any place you can point out where I can find some related reading about its implementation?

Best regards,
Satish.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There are many different kinds of CDC (change data capture). Ideally the database itself will maintain a log of deleted records, perhaps via a trigger or through the application itself, and you can leverage that. Some CDC tools inspect the transaction logs. Comparing periodic snapshots of the tables is another technique that exists out there. A good first step for you would be to find out whether the source database can easily identify deleted records, or at least the keys for them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
satyasur
Participant
Posts: 15
Joined: Fri Apr 07, 2006 12:11 am

Post by satyasur »

ray.wurlod wrote:There are many different kinds of CDC (change data capture). Ideally the database itself will maintain a log of deleted records, perhaps via a trigger or through the application itself, and you can l ...
well yes I have come across an article which says that one of the efficient methods to do CDC is by reading the transaction logs. any tools you know of which do it really well?

If I start writing my own code I will have to write it for all the DB's. That is why i wanted to know about the existing tools.
any inputs on IBM data mirror?

P.S. thanks for all your inputs Ray sir :)

cheers,
Satish.
Post Reply