Reading xls using Datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Reading xls using Datastage

Post by SURA »

Hi

We have to start a new project to read the xls file and load the data into a sql server table .

The challenges I have is

1. ETL need to read the data from 3,4,5...N sheets in a file .
2. in each sheet need to pull some specific range cell values, not the whole file !
3. Other known issue we had was, if the data in the first 15 rows is integer and then date: in such case irrespective of the data type the odbc driver scan and consider it as an int and convert the date as int.

One person is recommending to use SSIS!

I can understand the 11X can do better job than 8.7. But with 8.7 what is the best way to handle it!!

Please throw some light !
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

I know 11x can read excel directly, but yours may not have that. The stage that can do it is the unstructured data stage.

Two things you should know:
1) excel is capable of talking to a database; you can load data from it directly.
2) if your version does not have the stage, you can export the excel as CSV which is plain text, comma delimited and use that in datastage very easily.

You may need to change the column data formats in the excel before you export it, but the tasks you list all sound possible.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Oh, they are all possible but you'll have all of the "normal" PITA problems we all have with using Excel as a source... as noted, one awesome issue is how it loves to turn things into numbers. And the people building the spreadsheets need to be very disciplined and consistent, something I have yet to see. :( The problems are worse when you don't have something like the Unstructured Data stage to help. From what I recall, the earliest version of that stage showed up in the 9.1 release.

As noted, on a Windows server Excel can talk directly to SQL Server so there may not be a need to involve DataStage until Excel has "staged" it and then you can do the specific ranges / not the whole file bits from there via ETL. Or use the ODBC driver or worst case a csv file per sheet.

Did I mention I hate having to use Excel as a source? :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

chulett wrote:Oh, they are all possible but
Thank you so much for your reply.

Is that means in 8.7 we can do the range selection ?
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Thank you UCDI
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

SURA wrote:Is that means in 8.7 we can do the range selection ?
You'd have to answer that... seems to me the answer is "yes" only if that functionality is provided by the ODBC drivers. Failing that you stage the whole file and then from the staging area pull just the data elements / ranges / rows you need.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Using the ODBC driver for Microsoft Excel, each worksheet must be treated as a separate "table" (and there are some restrictions on the format). When importing the table definitions, you must specify to include system tables, because the driver treats each worksheet as a system table for some reason.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

I written a PowerShell script to store the data as csv and used Datastage to load.

Thanks to all for your help and suggestions.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Post Reply