File Connector with Avro / WebHDFS issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

File Connector with Avro / WebHDFS issue

Post by pavankvk »

Hi

We are trying to read an Avro file using file connector/webhdfs. I am just specifying the name node (host name) and the path to the .avro file, user name and password for the hdfs access

When i run the job, i get the message, file not found or you don't have permissions. I see the file and permissions are read/write for all. Has anyone faced this issue?
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

First things first - has your site applied patch JR54938 to enable Avro support? Check the Version.xml file on the base install directory of the server if you are not sure.

http://www-01.ibm.com/support/docview.w ... wg24041535
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

Yes, its 11.5.0.1 and this patch is included in that. I also tried with a delimited file. I get the same error though the file is present with all permissions.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Have you added the required Hadoop libraries to the CLASSPATH? The list is documented in the updated Guide to Accessing Files documentation.

The only way to get the new guide (i46defcn.pdf) that I know of is to download the patch from Fix Central. Its in the tar file with additional documentation on the patch.

There's also a list of Avro restrictions in a separate txt file with the patch.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

Hi,

for the 3rd party jar files, should they be copied from the server(hadoop distribution) to the engine server? i see that the document specifies 3 jar files. i am assuming they will be on the server where hadoop distribution is installed and we should copy them manually to the engine server and place in some folder and have that folder in the CLASSPATH? is that the right way to do it? thank so much for your help
Post Reply