New to DataStage and need advice on a massive file layout

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
precisionhealth
Participant
Posts: 1
Joined: Thu Mar 01, 2018 2:56 pm
Location: Canada
Contact:

New to DataStage and need advice on a massive file layout

Post by precisionhealth »

I am very new to DataStage and have a request that I'm not even sure is possible or not in DataStage. I'm hoping someone can give me some general advice to point me in the right direction.

I have been tasked with creating a file extract from an Oracle based system. This extract has a file layout that is 180 printed pages long. The extract is a completely custom format that has a number of groups and sub-groups. For example, some of the record layout is as follows:

PATIENT_VISIT (will be thousands of records per run)
-> PATIENT_INFORMATION (e.g. name, address, SSN, etc.)
-> PROCEDURE_HEADER
-> PROCEDURE (can be many of these)
-> PROCEDURE_STAFF (will be multiple per procedure)
-> BILLING_HEADER
-> CHARGES (can be many)
etc...

At any rate, this request is a beast and I've already written it on another platform but need to convert it to DataStage. There are roughly 60-80 different Oracle tables being used in about 20-30 separate SQL statements that all need to be put together into this massive format.

Some of my questions are:

1. Is this even possible in DataStage?
2. If it is possible, would a Parallel or Server job be better suited?
3. I've been told that developing it all in BASIC in a Server job may work but I don't know how to connect to Oracle from BASIC. Is this even possible?

Being new to DataStage I may not even know if I'm asking the right questions here or not so if I'm not, I'd appreciate being pointed in the right direction.

Thanks,

John
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Welcome to DataStage.



Yes this can be done in DataStage.

I would always go parallel with any new DataStage jobs going forward.

Do not write it in Basic.

Eat the Elephant one bite at a time.

The first thing you have to do is to properly understand how the tables are combined together to formulate your result set.

If the tables are all on the same database, you might just have one magic big SQL on your hands.

Sounds like you will have a lot of JOINS based upon key columns in your tables.

File layouts being 180 pages long... depends on your font. ;)


Given so many tables, you might want to break the job into steps for restart-ability and connection timeout issues.

Get to know the Oracle DBA because you will want to ask him how much transformation would he like on the Oracle box before you extract the data.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm wondering how much of this can be done in separate (possibly parallel) chunks that could then be stitched together to create the final product or does it all need to be created in one go? And by 'chunks' I don't necessarily mean file fragments that you concatenate together but I've done similar things in the past where you build logical sets of transformed data and then the last step joins those sets into a cohesive whole that gets dumped to a file.

Curious what your approach was when you wrote it on another platform... perhaps a goodly portion of that methodology could be reused in this tool.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

datastage support calling basic routines, c++ executables, unix shell commands, etc directly. You "can" do everything for all your jobs this way, never actually USING the datastage tool. Its a case of just because you can do something, does not mean its the right way.

Here, let me just say it again, its not the right way :)
DS is a powerful tool and what would take you months in vb is already done in many cases with a drag and drop + a few setup clicks.

By all means, use the basic/C/etc tools when you find something weird that datastage can't do or can't do cleanly. As an example, I recently needed to fix a standardizer block written in datastage's pattern action language by someone no longer around. I needed it done NOW. I am not great at pattern action language, the need was simple .. in 15 min I had a vb routine that did everything the old standardizer did + the new requirements. A couple of iterations with "that was what we said but not what we meant" pulled that to 30 min and done. It would have taken me 2-3 days to do it some other way... its doable in a big messy transformer, its doable in pattern action, or it was 20 lines of VB. So I used VB.

If you run into something you can't do, ask here first. Might learn a new trick. If it still is a no-can-do, or the datastage solution feels clunky/overly complex, then consider an alternate language. But it really should be close to a last resort, as that means maintaining hand written code which is one of the things datastage is good for (elimination of reams of buggy handrolled code).
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If all you need is a file extract, why can't you use Oracle facilities?

That said, DataStage can deal with the complexities or otherwise of the SQL. Build your job a bit at a time, using Oracle Connector stages to retrieve the data from Oracle and appropriate stages (such as Join stages) to assemble them.

Then use DataStage's optimization tool (Balanced Optimization) to see how much of that can be pushed back into the database. Inspect both of those designs, then come up with the best compromise for your particular requirements.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply