Hi all,
We are loading the data from CSV file (accessing using Bigdata file stage) in HDFS filesystem into the HIVE table using the JDBC stage in Datastage 11.5. Performance of loading is worst. It takes 22 seconds to insert one record into the HIVE table. Can you please let us know what can be do to improve the performance of loading through JDBC stage.
We guess the data is inserting as one row at a time into HIVE table even-though we gave 2000 rows per transactions.
Thanks in advance.
Loading data from HDFS file into HIVE table using Datastage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 1
- Joined: Mon Mar 04, 2013 8:59 am
Which distribution of Hadoop are you using? From what I can gather, the BigData File Stage is primarily aimed at IBM's BigInsights and i'd imagine there may be issues when interacting with other distributions.
Have you tried using the File Connector stage instead? WebHDFS/HTTPFS is standard with most HDFS versions I think?
Have you tried using the File Connector stage instead? WebHDFS/HTTPFS is standard with most HDFS versions I think?
Hi , I have been facing a similar issue. I am using the Hive Connector Stage to load / extract data .
However , the speed is dismal . Is there something that we can do to improve the performance of loading into Hive. Having said that , I dont expect Hive loading to be as fast as any other database , as Hive is just an easy interface which is database like but not a database in the typical sense since beneath the Hive surface , there are complex Java map reduce programs running.
Never the less , do we know of some ways to get this tuned. I see the array size in ODBC stage but not in the native Hive Connector stage .
Any info here in regards to fine tuning performance will be really helpful
However , the speed is dismal . Is there something that we can do to improve the performance of loading into Hive. Having said that , I dont expect Hive loading to be as fast as any other database , as Hive is just an easy interface which is database like but not a database in the typical sense since beneath the Hive surface , there are complex Java map reduce programs running.
Never the less , do we know of some ways to get this tuned. I see the array size in ODBC stage but not in the native Hive Connector stage .
Any info here in regards to fine tuning performance will be really helpful
I have talked with other customers who use the File Connector exclusively for loading --- writing directly to the hdfs file that Hive is abstracting --- precisely for performance reasons.
Ernie
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
-
- Premium Member
- Posts: 151
- Joined: Fri Feb 13, 2009 4:19 pm
We use BigData Stage in a job to load data to HDFS and then use a script to create the HIVE table with correct partitions. We store data in /folder/structure/for_Hive/tableName/yyyy/mm/dd folder format and the HIVE tables are partitioned on Year, month and date. Both the loading HDFS and creating HIVE table is executed from a Job sequence.
Thanks
Karthick
Karthick