Writing to Hadoop using Datastage 11.5.2 Parallel Hive

chulett · Post by **chulett** » Fri Jun 15, 2018 2:41 pm

You posted this in three places. One works just fine. I deleted the other two.

sachinshankrath · Post by **sachinshankrath** » Sat Jun 16, 2018 6:41 pm

Sorry, could not figure out how to post my own topic at first. Thank you for deleting the other two.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Thu Jun 21, 2018 11:48 am

Two questions:

1) Does your site use Kerberos for security? If so, I might be able to help (I have only worked at Kerberos sites).

2) Have you or your admin setup the config file for Hive?

https://www.ibm.com/support/knowledgece ... river.html

The file should look like this (but with correct installation location for your site) assuming you are using the Hive library that comes with BigIntegrate:

$ pwd
/InformationServer/Server/DSEngine
$ cat isjdbc.config
CLASSPATH=/InformationServer/ASBNode/lib/java/IShive.jar
CLASS_NAMES=com.ibm.isf.jdbc.hive.HiveDriver

Answer those two questions first... Then I might be able to assist with getting the stage working (assuming Kerberos is used).

skathaitrooney · Post by **skathaitrooney** » Fri Jun 22, 2018 6:31 am

Andy, can i ask you a question, are you able to connect to Hive Server2 using ISHive.jar that is shipped with IIS itself?

I also have a kerberos setup. I get this error.
java.sql.SQLException: [IBM][Hive JDBC Driver]THRIFT protocol error.

This is the jdbc URL is use:

Code: Select all

jdbc:ibm:hive://hivehostname:2181;DataBaseName=test;AuthenticationMethod=kerberos;ServicePrincipalName=datastage@XXX.NET;loginConfigName=JDBC_DRIVER_dsadm_keytab

Here is the JDBCDriverLogin.conf that is i created in the same folder as ISHive.jar

Code: Select all

JDBC_DRIVER_dsadm_keytab {
com.ibm.security.auth.module.Krb5LoginModule required
credsType=both
principal="datastage@XXX.NET"
useKeytab="FILE:/etc/security/keytabs/datastage.hdfs.headless.keytab";
};
JDBC_DRIVER_cache{
com.ibm.security.auth.module.Krb5LoginModule required
credsType=initiator
principal="datastage@XXX.NET"
useCcache="FILE:/tmp/krb5cc_22367";
};

The above does not work with the Hive connector for me. Although beeline client does work. I am running this with dsadm

Code: Select all

kinit -kt /etc/security/keytabs/datastage.hdfs.headless.keytab datastage
beeline --verbose=true -u "jdbc:hive2://hivehostname:2181/hive;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"

And i am absolutely clueless what i do wrong here.....

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Fri Jun 22, 2018 12:00 pm

Yes we connect to Hive Server2, but our URL looks slightly different...

jdbc:ibm:hive://hiveserver.company.com:10000;AuthenticationMethod=kerberos;ServicePrincipalName=hive/hiveserver.company.com@KERBEROS_DEFAULT_REALM;loginConfigName=JDBC_DRIVER_USERID

Look in /etc/krb5.conf under libdefaults for the Kerberos default realm. Also - the USERID seems to be case sensitive and should be upper case.

I also recommend debugging in "non-YARN mode" because it makes it simpler. To do that use an APT file that only runs nodes on the Edge Node (no Dynamic Hosts). Also set your APT_YARN_CONFIG to the empty string and APT_YARN_MODE to 0 (zero) in your job. That should make it run edge-node only.

Hopefully that will get your connection up and running on the edge node. If that works, then there are probably other steps that have to be done to get the keytab dispersed to the data nodes for "YARN mode".

sachinshankrath · Post by **sachinshankrath** » Wed Jan 23, 2019 11:47 am

Hi -

Sorry for the 6 months delay but we abandoned this project and restarted working on it now. So, at this stage the update is we were able to successfully establish a connection using Hive Connector in the sense that the "TEST" connection returns a successful message. Then we built a simple test job to read a few records from a flat file and write to a target table in Impala using Hive Connector. The job does not return any error messages but we find that it does not load any rows into the target table either. What could be going on? Again, we are on v11.5 Parallel datastage running on a linux box.

ray.wurlod · Post by **ray.wurlod** » Wed Jan 23, 2019 1:55 pm

Just to add to this thread I, too, have been bitten by the case sensitive user ID both in this case and in certain connections to/from Enterprise Search.

pkhadapk · Post by **pkhadapk** » Wed Feb 27, 2019 6:54 am

I am using format :

jdbc:ibm:hive://hiveserver.company.com:10000;AuthenticationMethod=kerberos;ServicePrincipalName=hive/hiveserver.company.com@KERBEROS_DEFAULT_REALM;loginConfigName=JDBC_DRIVER_USERID

in JDBC connector and able to get through for limited records and using default queue. Is there any way to set queue in above connection string?

DSXchange

Writing to Hadoop using Datastage 11.5.2 Parallel Hive

Re: Writing to Hadoop using Datastage 11.5.2 Parallel Hive