How to read .aspx file using datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mo13
Premium Member
Premium Member
Posts: 3
Joined: Thu Oct 30, 2014 10:09 am
Location: USA

How to read .aspx file using datastage

Post by mo13 »

Hi DSXchange friends,

Since this is my first post, so kindly pardon me if I asked a non-feasible query :)

Wanted to seek some guidance if anyone has read .aspx file or page (basically a sharepoint list) in Datastage or can be somehow converted to a format which I can read in my jobs ?

Requirement here is : To read data from a sharepoint list (mostly a .aspx file if I connect or download the sharepoint content using a script).

Platform : Linux
DS Version : 9.1

Any encounter with this kind of scenario would be helpful.

Thanks in advance.

Best,
Mo
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

I've never tried reading a .aspx file using DataStage.

To do so, you would first need to understand something about its structure.

There may be something you can do using the Unstructured Data stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mouthou
Participant
Posts: 208
Joined: Sun Jul 04, 2004 11:57 pm

Re: How to read .aspx file using datastage

Post by mouthou »

It seems possible to do using XML stage in 9.x version along with features of REST webservices and HTTPPOST methods. Exploring on these lines may help with the solution. Copied an old post link too below for other references if found relevant.

viewtopic.php?p=444412
mo13
Premium Member
Premium Member
Posts: 3
Joined: Thu Oct 30, 2014 10:09 am
Location: USA

Post by mo13 »

Thanks for your kind inputs Ray (learned a lot from your posts) and mouthou.

As far as my knowledge goes, Unstructured data stage can only be used for excel sheets in 9.1 (please correct if that's wrong), so that can only help if we get an input in spreadsheet format. So, that doesn't resolve the problem.

The .aspx file is basically the downloaded form of a url and that looks something different than a regular xml file (tried using xml stages to read). That's where I got stuck since i have never read that format/structure of a file before in DS.

For RESTful webservices and HTTPpost menthd, since my knowledge is limited there I tried reading out the post mentioned but it didn't really gave me an insight other than the lotus domino. I might need a little more insight on that perspective.

Though, reading it in a webservice stage will be needing a WSDL for the url from sharepoint. Is there a way that we can get WSDL file ourselves or any alternative ?

I am able to code the download and read part of any file shared on sharepoint site. But only if that sharepoint has a list, thats where i am kind of stuck. If not 'downloading' it as a .aspx file I was wondering if we can read it directly from the website like @mouthou mentioned.

Thanks.
mouthou
Participant
Posts: 208
Joined: Sun Jul 04, 2004 11:57 pm

Post by mouthou »

You are right on the part that Unstructured stage is for excel sheet. May be Hierarchial stage was meant which anyway present in 11.x only and that could fit your requirement much better. But since XML stage was the earlier version of Hierarchial stage, I referred XML stage to try it out. I was to going to refer webservices option too but missed thinking it is present in 11.3. Anyway to generate the WSDL, is it an option to use import metadata feature on the menu. Or login to IS console using ADMIN and work with services-->binding sections which will lead you to WSDL generation. Also there seems some online forums talks about how to create wsdl from asp code.


I havent come across any feature in 9.x for your exact requirement to read directly from the website. Another option may be to introduce some Java transformations (if you are versed with Java classses etc) in your existing design which could get complicated!
mo13
Premium Member
Premium Member
Posts: 3
Joined: Thu Oct 30, 2014 10:09 am
Location: USA

Post by mo13 »

So, just wanted to share what I tried as a work around.
Since Sharepoint list had a MS-Access database underneath, I created a MS-Access macro and running that as a windows batch job so populate the data I need in the correct format and then reading it in datastage.

More of a collective work-around but still wanted to post if someone had similar scenario.

If anyone still gets a solution, please paste it here. In the meantime, marking this as a work around. Thanks for your time.
cdp
Premium Member
Premium Member
Posts: 113
Joined: Tue Dec 15, 2009 9:28 pm
Location: New Zealand

Post by cdp »

So you are trying to retrieve 'List data' from "Sharepoint Online", is that correct ?

I went through this exercise recently, it was a pain to get it working, mainly because of Microsoft's Authentication procedure which is not well documented.

You need to read this:
https://allthatjs.com/2012/03/28/remote ... nt-online/

and to get an understanding of how it works, try this step by step procedure with a REST client (eg. SOAP UI, Postman, Fiddler etc.)
http://paulryan.com.au/2014/spo-remote- ... tion-rest/

- But to summarize, every-time you read from Sharepoint Online, you need to provide, with each WebService or GET method requests an authentication "Cookie" header and a "User-Agent" header.

- If you write to SharePoint Online (eg. POST method), you'll need to provide an extra header called 'X-RequestDigest' (haven't tried it..)


But then the Question is, how do I get this 'Cookie' thing from DataStage ?

I just used this Java code:
https://gist.github.com/mirontoli/3702971

Edit the code with your Microsoft account logins, get a java compiler (it worked like a charm with Eclipse), run it, and in the output you'll find something like this:

Code: Select all

Set-Cookie: rtFa=0U1zw+TnLmLfDtzmppbuJgD
Set-Cookie: FedAuth=77u/PD94bWfdfW path=/secure; HttpOnly
You can then create a unique Cookie header and User-Agent header that you'll need to pass with each request:

Code: Select all

Cookie: rtFa=0U1zw+TnLmLfDtzmppbuJgD; FedAuth=77u/PD94bWfdfW path=/secure; HttpOnly

User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)
I have tried those 2 approaches in DataStage:

1)WebService Client Stage

The WSDL you were looking for is here:
https://<yoursite>.sharepoint.com/_vti_bin/Lists.asmx?WSDL

But as usual, it is not well formatted, and DataStage will complain that you have duplicate 'namespaces':
<s:import namespace="http://www.w3.org/2001/XMLSchema" />
<s:import namespace="http://microsoft.com/wsdl/types/" />
I removed the first one and then was able to have all the SOAP 1.1/1.2 Methods recognized by DataStage.

You then need to create, in the WebService Client Stage an input column that contains the SOAP header with 'Cookie' and 'User-Agent' (this has to be structured in XML format :?)
Read this to see how it is done:
https://www.ibm.com/developerworks/webs ... index.html

But that is just FYI... I never got this approach to work :lol: ... Authentification fails with '403 Forbidden' error message) ... I guess my XML syntax was incorrect.. But you might be more lucky, and if you are, what's great with this approach is that you'll have all the necessary output columns ready to be used.


2) REST approach with Hierarchical Stage - It worked like a charm !
(available from DataStage 11.3 only! formerly known as XML Stage in 9.1, but you don't have the REST feature in 9.1! ... So not an option for you, until you upgrade to newer DS version... Support for 9.1 expires next year btw :P)

- Used the GET method in the Hierarchical Stage
- Easy to add the 'Cookie' and 'User-Agent' header
BUT, with this approach you'll get the response in HTML or XML format... You'll need to find out how to retrieve the data you're after from all the mess.. (Haven't done it yet, but I'm sure it's do-able and maybe the reponse structure can be found somewhere on the Microsoft website)


But if your workaround works fine, maybe there's no much value in going through all this trouble again...
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Awesome details, cdp. Thanks!

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply