How to use Copy stage Force property

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Enzopre
Participant
Posts: 57
Joined: Thu Feb 07, 2013 2:04 pm
Location: Italy

How to use Copy stage Force property

Post by Enzopre »

hi at all

I have a question for you about copy stage.

From the documentation it look like that if the Force property is set to TRUE (FORCE=TRUE) then datatage do no optimize out the
job removing the copy operator at compile time, instead if the Forse property is set to FALSE (FORCE=FALSE) then datastage can
decide if optimize or no the job removing the copy operator.

Now, what is not clears is when do we set FORCE=TRUE OR FORCE=FALSE? and Why? More precisely, what happeans and why if:

1) we have One input link, one output link AND FORCE=TRUE

2) we have one input link, one output link AND FORCE=FALSE

3) we have one input link, more output link AND FORCE=TRUE

4) we have one input link, more output link AND FORCE=FALSE

regards.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Simple enough to test that yourself and see.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Enzopre
Participant
Posts: 57
Joined: Thu Feb 07, 2013 2:04 pm
Location: Italy

Post by Enzopre »

where and what should I to see precisely?

I have already tested all four cases and I seen, for each case, in the OSH script generated and job score but I do not perceive changes. The copy operator is present in all four cases.
ssreeni3
Participant
Posts: 29
Joined: Fri May 18, 2012 1:35 am

Post by ssreeni3 »

How many OutPuts?
One or many?

-----------------------
Thanks,
Ssreeni3
Enzopre
Participant
Posts: 57
Joined: Thu Feb 07, 2013 2:04 pm
Location: Italy

Post by Enzopre »

What do you means?
As already I said I tested all four cases above mentioned and I do not perceive changes ......the copy operator is present in all four cases.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

You have to look at job score to see the difference. If you have multiple output from the copy stage, You may not find any difference.

Also the optimizer will not change your job design, but the way it executes hence the job score is where you should look for.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Also, you might need to set APT_DUMP_SCORE to see the job score. Which will make the score visible in the logs.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just looking at the row counts ("performance statistics") in Designer will give you a clue. If there are row counts both sides of the Copy stage, then it's in the design and in the runtime.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Enzopre
Participant
Posts: 57
Joined: Thu Feb 07, 2013 2:04 pm
Location: Italy

Post by Enzopre »

So, I have tried to re-compile and re-run in the two following cases:

1) One input link, one output link AND FORCE=FALSE

in this case the OSH script generated at compile time is the following:

Code: Select all

# OSH / orchestrate script for Job Copy_Stage compiled at 09.54.30 23 apr 2013

#################################################################
#### STAGE: Copy_0
## Operator
copy
## General options
[ident('Copy_0'); jobmon_ident('Copy_0')]
## Inputs
0< [] 'Sequential_File_1:DSLink3.v'
## Outputs
0> [modify (
keep
  CUSTOMER_NUMBER,DATE_1,DATE_2,DATE_3;
)] 'Copy_0:DSLink4.v'
;


#################################################################
#### STAGE: Sequential_File_1
## Operator
import
## Operator options
-schema record
  {final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
  CUSTOMER_NUMBER:string[max=10];
  DATE_1:date;
  DATE_2:date;
  DATE_3:date;
)
-file  'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\DataStage87TutorialFiles\\GlobalCo_BillTo_CONVDATE.txt'
-rejects continue
-reportProgress yes
## General options
[ident('Sequential_File_1'); jobmon_ident('Sequential_File_1')]
## Outputs
0> [] 'Sequential_File_1:DSLink3.v'
;


#################################################################
#### STAGE: Sequential_File_2
## Operator
export
## Operator options
-schema record
  {final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
  CUSTOMER_NUMBER:string[max=10];
  DATE_1:date;
  DATE_2:date;
  DATE_3:date;
)
-file 'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\copy_stage.txt'
-overwrite
-rejects continue
## General options
[ident('Sequential_File_2'); jobmon_ident('Sequential_File_2')]
## Inputs
0< [] 'Copy_0:DSLink4.v'
;
# End of OSH code
and the Job SCORE DUMP generated at run time is the following:

Code: Select all


main_program: This step has 2 datasets:

ds0: {op0[1p] (sequential Sequential_File_1)
      eAny<>eCollectAny
      op1[2p] (parallel Copy_0)}

ds1: {op1[2p] (parallel Copy_0)
      >>eCollectAny
      op2[1p] (sequential APT_RealFileExportOperator in Sequential_File_2)}

It has 3 operators:

op0[1p] {(sequential Sequential_File_1)
    on nodes (
      node1[op0,p0]
    )}


op1[2p] {(parallel Copy_0)
    on nodes (
      node1[op1,p0]
      node2[op1,p1]
    )}


op2[1p] {(sequential APT_RealFileExportOperator in Sequential_File_2)
    on nodes (
      node2[op2,p0]
    )}

It runs 4 processes on 2 nodes.
2) one link unput, one link output AND FORCE=TRUE

in this case the OSH script generated at compile time is the following:

Code: Select all

# OSH / orchestrate script for Job Copy_Stage compiled at 10.00.35 23 apr 2013
#################################################################
#### STAGE: Copy_0
## Operator
copy
## Operator options
-force
## General options
[ident('Copy_0'); jobmon_ident('Copy_0')]
## Inputs
0< [] 'Sequential_File_1:DSLink3.v'
## Outputs
0> [modify (
keep
  CUSTOMER_NUMBER,DATE_1,DATE_2,DATE_3;
)] 'Copy_0:DSLink4.v'
;


#################################################################
#### STAGE: Sequential_File_1
## Operator
import
## Operator options
-schema record
  {final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
  CUSTOMER_NUMBER:string[max=10];
  DATE_1:date;
  DATE_2:date;
  DATE_3:date;
)
-file  'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\DataStage87TutorialFiles\\GlobalCo_BillTo_CONVDATE.txt'
-rejects continue
-reportProgress yes
## General options
[ident('Sequential_File_1'); jobmon_ident('Sequential_File_1')]
## Outputs
0> [] 'Sequential_File_1:DSLink3.v'
;


#################################################################
#### STAGE: Sequential_File_2
## Operator
export
## Operator options
-schema record
  {final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
  CUSTOMER_NUMBER:string[max=10];
  DATE_1:date;
  DATE_2:date;
  DATE_3:date;
)
-file 'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\copy_stage.txt'
-overwrite
-rejects continue
## General options
[ident('Sequential_File_2'); jobmon_ident('Sequential_File_2')]
## Inputs
0< [] 'Copy_0:DSLink4.v'
;
# End of OSH code
and the Job SCORE DUMP generated at is the following:

Code: Select all

main_program: This step has 2 datasets:

ds0: {op0[1p] (sequential Sequential_File_1)
      eAny<>eCollectAny
      op1[2p] (parallel Copy_0)}

ds1: {op1[2p] (parallel Copy_0)
      >>eCollectAny
      op2[1p] (sequential APT_RealFileExportOperator in Sequential_File_2)}

It has 3 operators:

op0[1p] {(sequential Sequential_File_1)
    on nodes (
      node1[op0,p0]
    )}

op1[2p] {(parallel Copy_0)
    on nodes (
      node1[op1,p0]
      node2[op1,p1]
    )}

op2[1p] {(sequential APT_RealFileExportOperator in Sequential_File_2)
    on nodes (
      node2[op2,p0]
    )}

It runs 4 processes on 2 nodes.
Between OSH scripts in the cases 1) and 2) and between SCORE DUMPs in the cases 1) and 2) I do not perceive differences... they are the same.

Also in the "performace statistics" I do not perceive difference at all!

So, what are the difference if we set the FORCE property to TRUE (FORCE=TRUE) or FORCE=FALSE?

From the documentation it look like that if the Force property is set to TRUE (FORCE=TRUE) then datatage do no optimize out the
job removing the copy operator at compile time, instead if the Forse property is set to FALSE (FORCE=FALSE) then datastage can
decide if optimize or no the job removing the copy operator.


I have tried to re-compile and re-run really many times but the result is always the same, the copy operator is always present !
Last edited by Enzopre on Tue Apr 23, 2013 2:07 pm, edited 1 time in total.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So, as you are finding, having that property set to FALSE just means that the Copy Stage may be optimized out - not that it will be. Unfortunately, I can't tell you under what exact circumstances it will decide it's not really needed and remove it... perhaps others can. Or you could open a support case and see if you can find out what the official rules are for Copy Stage removal, or suggested best practices on when you would set FORCE=TRUE.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Is the Copy stage doing anything, such as dropping or renaming columns?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Enzopre
Participant
Posts: 57
Joined: Thu Feb 07, 2013 2:04 pm
Location: Italy

Post by Enzopre »

chulett wrote:So, as you are finding, having that property set to FALSE just means that the Copy Stage may be optimized out - not that it will be. .......
Now it's all more clear. This is what I have thought. Indeed, from the documentation:

[....] if the Forse property is set to FALSE (FORCE=FALSE) then datastage can decide if optimize or no the job removing the copy operator.
ray.wurlod wrote:Is the Copy stage doing anything, such as dropping or renaming columns?
Not at all! Simply the copy stage does the copy of data in the Sequential_File_1 to the Sequential_File_2. You can notice this from the OSH scripts!!
Last edited by Enzopre on Tue Apr 23, 2013 3:54 pm, edited 1 time in total.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Enzopre wrote:Indeed, from the documentation:
We know, you've now posted that paraphrased section of the documentation three times in the thread. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
eph
Premium Member
Premium Member
Posts: 110
Joined: Mon Oct 18, 2010 10:25 am

Post by eph »

Hi,

Try this example, which will show you how DS optimize or not the copy operation.

I suppose that you copy operator is not optimized since it is partitionning the data (only parallel stage in your job) and is the only active stage?

Check also this answer from Ray

Eric
Enzopre
Participant
Posts: 57
Joined: Thu Feb 07, 2013 2:04 pm
Location: Italy

Post by Enzopre »

eph wrote:[....]

I suppose that you copy operator is not optimized since it is partitionning the data (only parallel stage in your job) and is the only active stage?

[....]
Yes exactly! However thanks for the examples that you suggested me.

I'll let you know....
Post Reply