DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message

Group memberships:
Premium Members, Heartland Usergroup

Joined: 13 Feb 2009
Posts: 151

Points: 1201

Post Posted: Tue Sep 12, 2017 10:47 pm Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
Additional info: I have few questions about how BI 11.5 works on hadoop

We have BI 11.5 on Linux, Hortonworks Data platform. We are building a huge data lake in hadoop. I have few questions about how BI 11.5 works. I couldnt find answers to these in google. So could you please share some light.

1. Hive Queries - If i want to run any HIVE query on interactive mode using PUTTY, by default the execution engine is on Tez, but we can change this to use map reduce engine. But how does BigIntegrate works? For Example when i use a HIVE stage does it runs on Tez or MR engine? What about if i use File Connector stage to build a HIVE table?
2. Dynamic configuration File - Can someone provide the IBM link for some details on what is Dynamic config file and how it differs from traditional DS config file?
3. is there a way to start a tez session and keep it open, then run all my HIVE queries and then once i complete all HIVE queries, i would like to close the tez session. I would like to do this in a BI JOB. for example... I want to open the tez session using an execute command stage (shell script), then run all my jobs that uses HIVE load and extractions, once complete then finally close tez session. I dont know if BI uses tez in the background, but when i run H-SQL's it runs on tez... so i'm assuming that BI would also use tez in the background. but i may be completely wrong..

So any advice on these would be very helpful.


Joined: 30 Sep 2014
Posts: 24

Points: 152

Post Posted: Sun Sep 24, 2017 7:16 am Reply with quote    Back to top    

1) AFAIK the hive stage is merely a wrapper for the hive jdbc connector - my recollection is that the execution is deferred to the default for your environment (doesnt switching to MR require restarting hive anyways?). If you're using the file connector to create a hive table i dont imagine it would need to invoke a MR/tez job anyways - it should be restricted to the name node/metastore.
2) The DS Knowledge centre pages are ok - but i found this random IBM doco floating on linkedin to be massively useful:
https://app.box.com/s/b0wonh8vv5bn8g8eaaj76cy7deui27cx (specifically page 30 onwards for your question)
(credit: https://www.linkedin.com/pulse/information-server-yarnhadoop-new-release-vik-malhotra/)
3) Sorry no idea on this one. you may not have that sort of level of granular control within DataStage/BI.
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours