DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message

Joined: 13 Jan 2009
Posts: 152

Points: 927

Post Posted: Sat Feb 10, 2018 12:28 pm Reply with quote    Back to top    

DataStage® Release: 9x
Job Type: Parallel
OS: Unix
Additional info: 9.1 to 11.5 that runs natively on Hadoop
Hello All,

I am at a client place with 9.1.2 version. Client is considering moving to a managed Grid env with 11.5 version and at the enterprise level, modernization initiative is pushing all the teams to look for alternatives that work with/within BigData eco system.

Since the migration(effort/cost) is going to significant, wondering if it make sense to move to BigIntegrate/BigQuality which runs natively on Hadoop. So we can consider that option as well in the migration strategy/options. I appreciate any inputs.

Site Admin

Group memberships:
Premium Members, DSXchange Team, Inner Circle, Server to Parallel Transition Group

Joined: 04 Apr 2003
Posts: 1680
Location: Colleyville, Texas
Points: 22817

Post Posted: Wed Feb 14, 2018 12:26 pm Reply with quote    Back to top    

If you are going to move to Big Integrate on Yarn, the Hadoop environment needs to be configured to support BI from the start. The Hadoop team will probably not like some of the changes that need to be made to support BI, and they'll like it even less if they have to retrofit those changes and disrupt a working environment.

Scratch space and storage for the Big Integrate binaries all have to be allocated on all the active yarn nodes. Container memory settings can also be an issue (BI needs a lower default than most standard Hadoop applications). Pre-emption also needs to be disabled, as BI can't recover from being pre-empted.

The setup and configuration can be moderate to very complex depending on some of the design choices (like Kerberos security). Once it is working, the performance characteristics also take a bit to get used to, since there can be significant delays on startup as containers are allocated and binaries or HDFS files may need to be synchronized on active nodes.

And last but not least - don't expect HDFS / Hive files to replace database functionality. Little things like the inability to efficiently update an existing file can be very disconcerting if you aren't prepared for it. Customers that expect to startup a Yarn cluster and use it to replace their more expensive DB2 / Teradata / etc. environments are finding out that it isn't that easy.

Also - you might want to investigate using 11.7 which has a lot of fixes and features for Big Integrate. Right now 11.5 doesn't support upgrade in place to 11.7 (maybe in a few months).

Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2017
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours