Parallel Load Jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nag0143
Premium Member
Premium Member
Posts: 159
Joined: Fri Nov 14, 2003 1:05 am

Parallel Load Jobs

Post by nag0143 »

Hi All,

Since Datastage Server Bulk Loader doesn't support DB2EEE and our loads are huge and they are daily jobs ...,we don't want to use load utility we are left with only option of using parallel extender for load jobs or wait until ds7.5 ....

According to your experience, is it okay to use server jobs to extract and transform and for loading use parallel jobs... what are the trade offs... and what do you recommend...???

Thanks
Nag
crouse
Charter Member
Charter Member
Posts: 204
Joined: Sun Oct 05, 2003 12:59 pm
Contact:

Post by crouse »

Why don't you want to use the load utilities, called from a shell script in an after job in Server?

Shouldn't be any issues with doing extract and transform in Server and load in Parallel.

Only possible issues are learning curves and possible confusion around how Server works and Parallel works.

Some folks experience frustration once they get used to hash files in Server, then try to do the same thing in Parallel.

And as you've probably experienced, the extract and transform are "easier" and more flexible in Server... or else you'd be doing it all in Parallel. :D

My $.02
Craig Rouse
Griffin Resouces, Inc
www.griffinresources.com
nag0143
Premium Member
Premium Member
Posts: 159
Joined: Fri Nov 14, 2003 1:05 am

Post by nag0143 »

crouse wrote:Why don't you want to use the load utilities, called from a shell script in an after job in Server?
my unix admin says its a big maintaince issue if i want to call or use shell scripts from a job in server.....

but i believe if we use server and parallel for the same flow.... may be we might be misusing the resources......

is my point justified.... r u have any other ideas....

Thanks
Nag
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You are fine to mix and match a series of server and parallel jobs. In fact, using PX jobs for extracting and loading to/from DB2/EEE has the advantage of directly accessing the method by which data is partitioned and bypasses the coordinator node for faster data access.

If you have a separate job for Extract, Transform, and Load, then the E and L can be PX and the T be Server jobs. This works really well, and from job control there is no difference.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

nag0143 wrote:my unix admin says its a big maintaince issue if i want to call or use shell scripts from a job in server.....
Huh? :? I'm unsure what the 'big maintenance issue' would be. This is a very common thing - server jobs leveraging shell scripts. We routinely use scripts to manipulate files, do bulk loads, handle ftp and sql-plus, you name it. Your shell scripts and the jobs that use them can be parameterized to minimize any perceived maintenance issues. Version Control can be used to both version and promote them.

I'm not commenting on the whole mix of Server and PX, I'll leave that to others. On the issue of using shell scripts in Server jobs, however... :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The problem with a bulk load versus PX is the coordinator node. In a partitioned database like DB2, you have a series of clustered servers (nodes) that each contain a portion of data for partitioned tables. ALL data has to go thru the coordinator node to decide where the data will reside.

Using a bulk loader on DB2 has serious issues, such as locking the tablespace and choking around the coordinator node. So, PX has intelligence in it that allows it to "act" as the coordinator without going thru the coordinator and directly access the nodes where data will reside and not have the limitations imposed by the bulk loader.

Sooooo, this "mixed-bag" approach is the best of both Server and PX worlds. It allows you the flexibility of Server jobs, and the power of PX jobs for E and L. If you're running PX on the same nodes as the database, you get this wonderful contention for cpu time. However, if you're running Server on an isolated node, you can get a lot of performance and still use PX as the database communicator. It just sucks how nasty PX E and L jobs are to setup when Server E and L jobs are so easy and intuitive.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply