Writing to Hadoop using Datastage 11.5.2 Parallel Hive

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!:

You posted this in three places. One works just fine. I deleted the other two.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sachinshankrath
Participant
Posts: 7
Joined: Mon Mar 29, 2010 1:36 pm
Location: WASHINGTON, DC

Post by sachinshankrath »

Sorry, could not figure out how to post my own topic at first. Thank you for deleting the other two.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Two questions:

1) Does your site use Kerberos for security? If so, I might be able to help (I have only worked at Kerberos sites).

2) Have you or your admin setup the config file for Hive?

https://www.ibm.com/support/knowledgece ... river.html

The file should look like this (but with correct installation location for your site) assuming you are using the Hive library that comes with BigIntegrate:

$ pwd
/InformationServer/Server/DSEngine
$ cat isjdbc.config
CLASSPATH=/InformationServer/ASBNode/lib/java/IShive.jar
CLASS_NAMES=com.ibm.isf.jdbc.hive.HiveDriver

Answer those two questions first... Then I might be able to assist with getting the stage working (assuming Kerberos is used).
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
skathaitrooney
Participant
Posts: 103
Joined: Tue Jan 06, 2015 4:30 am

Post by skathaitrooney »

Andy, can i ask you a question, are you able to connect to Hive Server2 using ISHive.jar that is shipped with IIS itself?

I also have a kerberos setup. I get this error.
java.sql.SQLException: [IBM][Hive JDBC Driver]THRIFT protocol error.



This is the jdbc URL is use:

Code: Select all

jdbc:ibm:hive://hivehostname:2181;DataBaseName=test;AuthenticationMethod=kerberos;ServicePrincipalName=datastage@XXX.NET;loginConfigName=JDBC_DRIVER_dsadm_keytab
Here is the JDBCDriverLogin.conf that is i created in the same folder as ISHive.jar

Code: Select all

JDBC_DRIVER_dsadm_keytab {
com.ibm.security.auth.module.Krb5LoginModule required
credsType=both
principal="datastage@XXX.NET"
useKeytab="FILE:/etc/security/keytabs/datastage.hdfs.headless.keytab";
};
JDBC_DRIVER_cache{
com.ibm.security.auth.module.Krb5LoginModule required
credsType=initiator
principal="datastage@XXX.NET"
useCcache="FILE:/tmp/krb5cc_22367";
};


The above does not work with the Hive connector for me. Although beeline client does work. I am running this with dsadm

Code: Select all

kinit -kt /etc/security/keytabs/datastage.hdfs.headless.keytab datastage
beeline --verbose=true -u "jdbc:hive2://hivehostname:2181/hive;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"
And i am absolutely clueless what i do wrong here.....
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Yes we connect to Hive Server2, but our URL looks slightly different...

jdbc:ibm:hive://hiveserver.company.com:10000;AuthenticationMethod=kerberos;ServicePrincipalName=hive/hiveserver.company.com@KERBEROS_DEFAULT_REALM;loginConfigName=JDBC_DRIVER_USERID

Look in /etc/krb5.conf under libdefaults for the Kerberos default realm. Also - the USERID seems to be case sensitive and should be upper case.

I also recommend debugging in "non-YARN mode" because it makes it simpler. To do that use an APT file that only runs nodes on the Edge Node (no Dynamic Hosts). Also set your APT_YARN_CONFIG to the empty string and APT_YARN_MODE to 0 (zero) in your job. That should make it run edge-node only.

Hopefully that will get your connection up and running on the edge node. If that works, then there are probably other steps that have to be done to get the keytab dispersed to the data nodes for "YARN mode".
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
sachinshankrath
Participant
Posts: 7
Joined: Mon Mar 29, 2010 1:36 pm
Location: WASHINGTON, DC

Post by sachinshankrath »

Hi -

Sorry for the 6 months delay but we abandoned this project and restarted working on it now. So, at this stage the update is we were able to successfully establish a connection using Hive Connector in the sense that the "TEST" connection returns a successful message. Then we built a simple test job to read a few records from a flat file and write to a target table in Impala using Hive Connector. The job does not return any error messages but we find that it does not load any rows into the target table either. What could be going on? Again, we are on v11.5 Parallel datastage running on a linux box.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just to add to this thread I, too, have been bitten by the case sensitive user ID both in this case and in certain connections to/from Enterprise Search.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pkhadapk
Participant
Posts: 1
Joined: Fri Aug 28, 2009 3:09 am
Location: Mumbai

Re: Writing to Hadoop using Datastage 11.5.2 Parallel Hive

Post by pkhadapk »

I am using format :

jdbc:ibm:hive://hiveserver.company.com:10000;AuthenticationMethod=kerberos;ServicePrincipalName=hive/hiveserver.company.com@KERBEROS_DEFAULT_REALM;loginConfigName=JDBC_DRIVER_USERID

in JDBC connector and able to get through for limited records and using default queue. Is there any way to set queue in above connection string?
Post Reply