DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
rsmohankumar
Participant



Joined: 04 Mar 2013
Posts: 1

Points: 21

Post Posted: Fri Sep 16, 2016 12:54 am Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
Hi all,

We are loading the data from CSV file (accessing using Bigdata file stage) in HDFS filesystem into the HIVE table using the JDBC stage in Datastage 11.5. Performance of loading is worst. It takes 22 seconds to insert one record into the HIVE table. Can you please let us know what can be do to improve the performance of loading through JDBC stage.

We guess the data is inserting as one row at a time into HIVE table even-though we gave 2000 rows per transactions.

Thanks in advance.

_________________
Thanks,
Mohan
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42113
Location: Denver, CO
Points: 216181

Post Posted: Fri Sep 16, 2016 6:32 am Reply with quote    Back to top    

Welcome!

Rows per transaction just tells it when to commit. If you have an 'Array Size' property there, that would be what controls how many to send to the target at a time.

_________________
-craig

Can't keep my eyes from the circling skies
Tongue tied and twisted just an earth bound misfit, I
Rate this response:  
Timato
Participant



Joined: 30 Sep 2014
Posts: 23

Points: 143

Post Posted: Fri Sep 23, 2016 1:59 am Reply with quote    Back to top    

Which distribution of Hadoop are you using? From what I can gather, the BigData File Stage is primarily aimed at IBM's BigInsights and i'd imagine there may be issues when interacting with other distributions.

Have you tried using the File Connector stage instead? WebHDFS/HTTPFS is standard with most HDFS versions I think?
Rate this response:  
TNZL_BI



Group memberships:
Premium Members

Joined: 20 Aug 2012
Posts: 24
Location: NZ
Points: 243

Post Posted: Mon Apr 10, 2017 6:22 pm Reply with quote    Back to top    

Hi , I have been facing a similar issue. I am using the Hive Connector Stage to load / extract data .

However , the speed is dismal . Is there something that we can do to improve the performance of loading into Hive. Having said that , I dont expect Hive loading to be as fast as any other database , as Hive is just an easy interface which is database like but not a database in the typical sense since beneath the Hive surface , there are complex Java map reduce programs running.

Never the less , do we know of some ways to get this tuned. I see the array size in ODBC stage but not in the native Hive Connector stage .

Any info here in regards to fine tuning performance will be really helpful
Rate this response:  
TNZL_BI



Group memberships:
Premium Members

Joined: 20 Aug 2012
Posts: 24
Location: NZ
Points: 243

Post Posted: Sun Apr 30, 2017 6:01 pm Reply with quote    Back to top    

I have been suggested by IBM to run some patches. Will install this and then update .
Rate this response:  
eostic

Premium Poster



Group memberships:
Premium Members

Joined: 17 Oct 2005
Posts: 3701

Points: 29548

Post Posted: Mon May 01, 2017 9:48 am Reply with quote    Back to top    

I have talked with other customers who use the File Connector exclusively for loading --- writing directly to the hdfs file that Hive is abstracting --- precisely for performance reasons.

Ernie

_________________
Ernie Ostic

blogit!
Open IGC is Here!
Rate this response:  
TNZL_BI



Group memberships:
Premium Members

Joined: 20 Aug 2012
Posts: 24
Location: NZ
Points: 243

Post Posted: Sun May 14, 2017 10:04 pm Reply with quote    Back to top    

Exactly. I have been using the file connector stage now and its a better / faster way to put in data onto Hadoop rather than use a Hive or ODBC connector stage.
The other advantage is that the file connector stage also provides an option to create a Hive table as well which is like 2 steps in one.
Rate this response:  
Not yet rated
dsuser_cai



Group memberships:
Premium Members, Heartland Usergroup

Joined: 13 Feb 2009
Posts: 151

Points: 1201

Post Posted: Tue Sep 12, 2017 10:34 pm Reply with quote    Back to top    

We use BigData Stage in a job to load data to HDFS and then use a script to create the HIVE table with correct partitions. We store data in /folder/structure/for_Hive/tableName/yyyy/mm/dd folder format and the HIVE tables are partitioned on Year, month and date. Both the loading HDFS and creating HIVE table is executed from a Job sequence.

_________________
Thanks
Karthick
Rate this response:  
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours