IIS on Hadoop Recommendations

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
betterthanever
Participant
Posts: 152
Joined: Tue Jan 13, 2009 8:59 am

IIS on Hadoop Recommendations

Post by betterthanever »

Hello All,

I am at a client place with 9.1.2 version. Client is considering moving to a managed Grid env with 11.5 version and at the enterprise level, modernization initiative is pushing all the teams to look for alternatives that work with/within BigData eco system.

Since the migration(effort/cost) is going to significant, wondering if it make sense to move to BigIntegrate/BigQuality which runs natively on Hadoop. So we can consider that option as well in the migration strategy/options. I appreciate any inputs.

Thanks.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

If you are going to move to Big Integrate on Yarn, the Hadoop environment needs to be configured to support BI from the start. The Hadoop team will probably not like some of the changes that need to be made to support BI, and they'll like it even less if they have to retrofit those changes and disrupt a working environment.

Scratch space and storage for the Big Integrate binaries all have to be allocated on all the active yarn nodes. Container memory settings can also be an issue (BI needs a lower default than most standard Hadoop applications). Pre-emption also needs to be disabled, as BI can't recover from being pre-empted.

The setup and configuration can be moderate to very complex depending on some of the design choices (like Kerberos security). Once it is working, the performance characteristics also take a bit to get used to, since there can be significant delays on startup as containers are allocated and binaries or HDFS files may need to be synchronized on active nodes.

And last but not least - don't expect HDFS / Hive files to replace database functionality. Little things like the inability to efficiently update an existing file can be very disconcerting if you aren't prepared for it. Customers that expect to startup a Yarn cluster and use it to replace their more expensive DB2 / Teradata / etc. environments are finding out that it isn't that easy.

Also - you might want to investigate using 11.7 which has a lot of fixes and features for Big Integrate. Right now 11.5 doesn't support upgrade in place to 11.7 (maybe in a few months).
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply