Query Regarding how PX 7.1 process records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
akash_nitj
Participant
Posts: 27
Joined: Fri Aug 13, 2004 3:36 am
Location: INDIA

Query Regarding how PX 7.1 process records

Post by akash_nitj »

Hi Datascions,
I have query regarding how px internally process the records.

My query is based upon the following design :

1. You read a input from a table having three coloumns ( table
definition : command_type, timestamp, data )

2. Input is presorted based upon timestamp value

3. Then you have transformer stage which have three ouptuts based
upon command_type value (C/U/D).

4. The data read is passed to three o/ps of transformer based upon
command_type and finally loaded to table XYZ.

5. All the links load data to same table XYZ.



QUERY : Will the data be loaded in the XYZ table in the same sequence ( based on timestamp value as it is read in ascending order of timestamp)as it is read assuming no record is rejected by any of intermediate stage.


My current understanding is datastage process each record one by one i.e is one record is read from input and written to o/p. Though Parrallelism is possible but in that case also records are written to o/p in same sequence as they are in input.

Please validate the same and also update my understanding if it is wrong


Regards
Akash
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

The answer is:

No.

The real question is:

Is there any particular reason why you MUST have a database table with data in any specific order? This fly against the grain of just about any practical DBMS philosophy I am aware of.

Running in multiple nodes would cause partitioning. This will break up the data, which will most likely be out of order when merged back into the output stage.

If you MUST have the data sorted on loading to a database table, then explicitly sort it within the stage (or attach a sort stage before the output stage.) This is the only way to be sure.

I would advise that you reconsider the sort requirement, as in only focus on that when you are presenting data for human viewing (or other required purposes, such as aggregation, et cetera).
Post Reply