How to handle huge historical record loads in Azure DataLake

satheesh_color · Post by **satheesh_color** » Sat Jan 11, 2020 7:11 am

Hi All,

We have a scenario that we have to load 500 million historical records from sql server to Azure data lake followed by 200k daily/incremental records data to capture the changed records and then we will need to load into data lake.

The catch here is we don't have timestamp columns. We are looking for your thoughts and assistance as well w.r.t DataStage jobs.

Thanks & Regards,
S.R

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Wed Jan 15, 2020 10:28 am

My first thought is to request they add a timestamp column to the source.

Is your problem that you are trying to identify and update changed records?

I have the same problem with another client with a feed way larger than yours. They say they can't modify the source since its a mainframe file. They threw hardware at the problem to reprocess the entire feed (clear and reload) when it comes in.

satheesh_color · Post by **satheesh_color** » Wed Jan 15, 2020 9:09 pm

Hi asorrell,

Thanks for your response. I am also on the same road. No change / source will not be modified.

Regards,
Satheesh.R