DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
satyasur
Participant



Joined: 07 Apr 2006
Posts: 15

Points: 135

Post Posted: Tue Mar 25, 2008 3:15 am Reply with quote    Back to top    

DataStage® Release: 7x
Job Type: Server
OS: Windows
Hello all,

I am trying to figure out a strategy with which we can handle DELETES in our data warehouse. The situation is a follows:

In our data warehouse we load the data incrementally in the staging. This means we have rows updated or inserted depending on whether they are modified or new. The source system application allows users to delete rows. Take for example: If a product is retired the customers can delete it from the source system’s master table. So if a report is generated on the application system the n that product does not show up. Now they want the data to be consistent in the DW too.
If a record is deleted from the source system we do not have a method as of now to determine it. So we miss those rows and they are still persistent in our DW.
We are not actually thinking of deleting those master rows or transaction rows from our DW.

Any inputs on how to handle this situation are most welcome.
Are there any generic tools (database independent) which can do this (CDC)? Please suggest.

Thanking you in advance.
Satish.
Madhav_M
Participant



Joined: 10 Jul 2004
Posts: 43

Points: 355

Post Posted: Tue Mar 25, 2008 3:52 am Reply with quote    Back to top    

Hi Satish
This defeats the purpose of having datawarehouse in place!! Ok.. if you want to keep your target intact with source.. only way is reload the target on a defined frequency!! BTW what is your load frequecy?
Rate this response:  
Not yet rated
satyasur
Participant



Joined: 07 Apr 2006
Posts: 15

Points: 135

Post Posted: Tue Mar 25, 2008 4:14 am Reply with quote    Back to top    

hello sir,

I have mentioned that we do not intend to actually delete the records.
We plan to mark records with a flag, say 'D' for deleted.

we have huge tables with millions of rows in them.
so we cannot afford to load them every time.
so we are looking at CDC kind of an approach.
we have a constraint of not putting the operational system( source application ) on pressure by writing triggers.
also we want the solution to be generic( in terms of database as we have different DBs (oracle,DB2,MS SQL server).
That was the reason I wanted to know if any of you have used any tool to perform such a task.


Best regards,
Satish.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54431
Location: Sydney, Australia
Points: 295208

Post Posted: Tue Mar 25, 2008 5:36 am Reply with quote    Back to top    

You're on the right track. Once you detect that a source record has been deleted (and there's lots of ways to do that), simply update the "active" indicator in the Data Warehouse.

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Canberra, Sydney and Melbourne (especially seeking good business analysts)
Rate this response:  
Not yet rated
satyasur
Participant



Joined: 07 Apr 2006
Posts: 15

Points: 135

Post Posted: Tue Mar 25, 2008 5:40 am Reply with quote    Back to top    

hello Ray sir,

Thanks for the motivation :)

I was interested in knowing the first part : detecting the deleted rows.
I had come across CDC as a viable method.
So wanted to know more about it.
any place you can point out where I can find some related reading about its implementation?

Best regards,
Satish.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54431
Location: Sydney, Australia
Points: 295208

Post Posted: Tue Mar 25, 2008 5:49 am Reply with quote    Back to top    

There are many different kinds of CDC (change data capture). Ideally the database itself will maintain a log of deleted records, perhaps via a trigger or through the application itself, and you can l ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Canberra, Sydney and Melbourne (especially seeking good business analysts)
Rate this response:  
Not yet rated
satyasur
Participant



Joined: 07 Apr 2006
Posts: 15

Points: 135

Post Posted: Tue Mar 25, 2008 6:29 am Reply with quote    Back to top    

ray.wurlod wrote:
There are many different kinds of CDC (change data capture). Ideally the database itself will maintain a log of deleted records, perhaps via a trigger or through the application itself, and you can l ...


well yes I have come across an article which says that one of the efficient methods to do CDC is by reading the transaction logs. any tools you know of which do it really well?

If I start writing my own code I will have to write it for all the DB's. That is why i wanted to know about the existing tools.
any inputs on IBM data mirror?

P.S. thanks for all your inputs Ray sir :)

cheers,
Satish.
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours