Page 1 of 1

Parallel Load Jobs

Posted: Wed Jul 07, 2004 3:02 pm
by nag0143
Hi All,

Since Datastage Server Bulk Loader doesn't support DB2EEE and our loads are huge and they are daily jobs ...,we don't want to use load utility we are left with only option of using parallel extender for load jobs or wait until ds7.5 ....

According to your experience, is it okay to use server jobs to extract and transform and for loading use parallel jobs... what are the trade offs... and what do you recommend...???

Thanks
Nag

Posted: Wed Jul 07, 2004 3:21 pm
by crouse
Why don't you want to use the load utilities, called from a shell script in an after job in Server?

Shouldn't be any issues with doing extract and transform in Server and load in Parallel.

Only possible issues are learning curves and possible confusion around how Server works and Parallel works.

Some folks experience frustration once they get used to hash files in Server, then try to do the same thing in Parallel.

And as you've probably experienced, the extract and transform are "easier" and more flexible in Server... or else you'd be doing it all in Parallel. :D

My $.02

Posted: Wed Jul 07, 2004 5:18 pm
by nag0143
crouse wrote:Why don't you want to use the load utilities, called from a shell script in an after job in Server?
my unix admin says its a big maintaince issue if i want to call or use shell scripts from a job in server.....

but i believe if we use server and parallel for the same flow.... may be we might be misusing the resources......

is my point justified.... r u have any other ideas....

Thanks
Nag

Posted: Wed Jul 07, 2004 9:08 pm
by kcbland
You are fine to mix and match a series of server and parallel jobs. In fact, using PX jobs for extracting and loading to/from DB2/EEE has the advantage of directly accessing the method by which data is partitioned and bypasses the coordinator node for faster data access.

If you have a separate job for Extract, Transform, and Load, then the E and L can be PX and the T be Server jobs. This works really well, and from job control there is no difference.

Posted: Wed Jul 07, 2004 9:18 pm
by chulett
nag0143 wrote:my unix admin says its a big maintaince issue if i want to call or use shell scripts from a job in server.....
Huh? :? I'm unsure what the 'big maintenance issue' would be. This is a very common thing - server jobs leveraging shell scripts. We routinely use scripts to manipulate files, do bulk loads, handle ftp and sql-plus, you name it. Your shell scripts and the jobs that use them can be parameterized to minimize any perceived maintenance issues. Version Control can be used to both version and promote them.

I'm not commenting on the whole mix of Server and PX, I'll leave that to others. On the issue of using shell scripts in Server jobs, however... :wink:

Posted: Wed Jul 07, 2004 9:43 pm
by kcbland
The problem with a bulk load versus PX is the coordinator node. In a partitioned database like DB2, you have a series of clustered servers (nodes) that each contain a portion of data for partitioned tables. ALL data has to go thru the coordinator node to decide where the data will reside.

Using a bulk loader on DB2 has serious issues, such as locking the tablespace and choking around the coordinator node. So, PX has intelligence in it that allows it to "act" as the coordinator without going thru the coordinator and directly access the nodes where data will reside and not have the limitations imposed by the bulk loader.

Sooooo, this "mixed-bag" approach is the best of both Server and PX worlds. It allows you the flexibility of Server jobs, and the power of PX jobs for E and L. If you're running PX on the same nodes as the database, you get this wonderful contention for cpu time. However, if you're running Server on an isolated node, you can get a lot of performance and still use PX as the database communicator. It just sucks how nasty PX E and L jobs are to setup when Server E and L jobs are so easy and intuitive.