How to handle huge historical record loads in Azure DataLake

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

How to handle huge historical record loads in Azure DataLake

Post by satheesh_color »

Hi All,

We have a scenario that we have to load 500 million historical records from sql server to Azure data lake followed by 200k daily/incremental records data to capture the changed records and then we will need to load into data lake.

The catch here is we don't have timestamp columns. We are looking for your thoughts and assistance as well w.r.t DataStage jobs.


Thanks & Regards,
S.R
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

My first thought is to request they add a timestamp column to the source.

Is your problem that you are trying to identify and update changed records?

I have the same problem with another client with a feed way larger than yours. They say they can't modify the source since its a mainframe file. They threw hardware at the problem to reprocess the entire feed (clear and reload) when it comes in.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

Post by satheesh_color »

Hi asorrell,

Thanks for your response. I am also on the same road. No change / source will not be modified.



Regards,
Satheesh.R
Post Reply