DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
This topic has been marked "Resolved."
Author Message
sandhya.budhi



Group memberships:
Premium Members

Joined: 15 Nov 2017
Posts: 18

Points: 348

Post Posted: Wed Apr 25, 2018 9:05 pm Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
I have a requirement where I have to replace a string with another string in a text file. The string appears in different lines in the text file. I need to replace it in every occurance in the file.

The text file is a mainframe JCL template.

//XXXXXX &PGMCHAR JOB (A1,xx,A123456),'TEST',REGION=0M,
// MSGCLASS=4
//*******************************************************
//DELET EXEC UNCAT
//SYSIN DD *
DELETE DSN=TEST.FILE. &PLDSN
DELETE DSN=TEST.FILE. &PLDSN .DONE.TXT
/*
//PGM1 EXEC E1234,PROGRAM=PGM1
//GO.FILEIN DD *
&PLIDO&PLDATE
/*
//BACKUP DD DUMMY
//RATEIN DD DUMMY
//GO.FILEOUT DD DUMMY
//*

The strings PGMCHAR, PLID, PLDATE and PLDSN is the strings needs to be replaced. The values for these strings will be a different data file.

PGMCHAR PLID PLDATE PLDSN
4 123 20180425 1234
4 456 20180425 4567

The final output file should look like this

//XXXXXX 4 JOB (A1,xx,A123456),'TEST',REGION=0M,
// MSGCLASS=4
//*******************************************************
//DELET EXEC UNCAT
//SYSIN DD *
DELETE DSN=TEST.FILE. 1234
DELETE DSN=TEST.FILE. 1234 .DONE.TXT
/*
//PGM1 EXEC E1234,PROGRAM=PGM1
//GO.FILEIN DD *
123O20180425
/*
//BACKUP DD DUMMY
//RATEIN DD DUMMY
//GO.FILEOUT DD DUMMY
//*

//XXXXXX 4 JOB (A1,xx,A123456),'TEST',REGION=0M,
// MSGCLASS=4
//*******************************************************
//DELET EXEC UNCAT
//SYSIN DD *
DELETE DSN=TEST.FILE. 4567
DELETE DSN=TEST.FILE. 4567 .DONE.TXT
/*
//PGM1 EXEC E1234,PROGRAM=PGM1
//GO.FILEIN DD *
456O20180425
/*
//BACKUP DD DUMMY
//RATEIN DD DUMMY
//GO.FILEOUT DD DUMMY
//*

For every record in the data file the JCL template should be repeated.

Can we implement this using datastage or I should go completely with UNIX scripting?

My architecture suggested to do it using datastage and I am looking for suggestion how we can replace different strings in a text file.

_________________
Thanks,
Sandhya
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54395
Location: Sydney, Australia
Points: 295036

Post Posted: Wed Apr 25, 2018 11:39 pm Reply with quote    Back to top    

I'd use a UNIX tool such as sed or awk. It could be done with DataStage (using Replace() or Change() function) but that seems to me to be overkill. This is precisely the kind of task tha ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Canberra, Sydney and Melbourne (especially seeking good business analysts)
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42762
Location: Denver, CO
Points: 220350

Post Posted: Thu Apr 26, 2018 6:47 am Reply with quote    Back to top    

IMHO the only "pro" that DataStage would bring to the table would be the ability to handle the table-driven substitution values. And I'd use a Server job.

_________________
-craig

Research shows that 6 out of 7 dwarves aren't happy
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 692
Location: Malvern, PA
Points: 6561

Post Posted: Thu Apr 26, 2018 7:36 am Reply with quote    Back to top    

I strongly caution that since this is JCL -- job code that will be executed by the operating system, not data to be processed (at this level) -- that DataStage is a bad choice for accomplishing your task.

In JCL -- with a strong parallel in the job sequence -- it is best practice to break up distinct tasks in job steps called procs. The proc is a separate member of a library, a text file that contains JCL for the task. One may think of it as a standard routine, with linkage variables which are capable of being set or overridden in the JCL step that invokes the proc.

With that, it is very easy to just set the values you want for the substitution variables. You won't be doing it dynamically -- an advantage of using DS or an edit command line -- but you are also executing high-level processes which, if set dynamically, can cause huge headaches in support and maintenance... well, especially at runtime.

In your example, too, I must point out that those variables used in dataset names contain spaces. That is your first clue that doing this dynamically is dangerous.

An excellent reference for the basics of JCL, for versions JES2 & JES3, is "MVS JCL" by Doug Lowe, Mike Murach & Assocs publisher. I keep a copy handy.

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 692
Location: Malvern, PA
Points: 6561

Post Posted: Thu Apr 26, 2018 7:41 am Reply with quote    Back to top    

As a second-level thinking, if you really need to set these values at runtime, a routine that builds a proc for you seems best. You would call that proc each cycle, and how you build it is up to you. I would still avoid using DataStage, but if your architects believe it to be the best choice, have at it. I would be very curious to see your solution.

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
sandhya.budhi



Group memberships:
Premium Members

Joined: 15 Nov 2017
Posts: 18

Points: 348

Post Posted: Mon Apr 30, 2018 6:56 pm Reply with quote    Back to top    

I discussed with my architect and said the process will be complicated when using datastage to replace multiple values in a file. But he said to give it a try using Routines in datastage.

I was trying to use the seq and awk commands in the datastage server routine and I am finding difficult to pass the input values to the routine.

The values for the PLDSN, PLID, PLDATE, PGMCHAR needs to be retrieved from a dataset. And everytime the values of these variables will be changed.

I have 10 different values for PLDSN, PLID, PDATE and PGMCHAR in a dataset. I have to create 10 JCL job code and append to the same output file.

I am planning to loop the process of calling the routine to replace the values in the JCL output file. But I am not able to pass the input values to the routine.

The input value will be changed in each loop process.


Will User variable activity stage will be a good choise for passing the input values?


Please advice.

Thanks,
Sandhya

_________________
Thanks,
Sandhya
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 692
Location: Malvern, PA
Points: 6561

Post Posted: Tue May 01, 2018 11:09 am Reply with quote    Back to top    

I have a job that generically sets a loop to run individual FTP sessions from a text file with a list of file names to get. It goes like this:

Exec Command -- script to determine the number of files (sets end value of loop).

User Variable -- set variable values from script.

Loop:

Exec Command -- get next file name from text file.
User Variable -- set variable with file name.
Activity Stage -- calls parallel job for FTP, passing file name in variable.
Next/End Loop.

You can design the loop to always run 10 times, or set it up for a varying number of outputs. How you edit and create the output file is up to you and your architecture team.

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 336

Points: 3396

Post Posted: Tue May 01, 2018 11:22 am Reply with quote    Back to top    

a VB routine could do this without a lot of pain.
you may also find it handy to use the unix shell commands in a parallel job. You can do this with a 2 line C program (literally wrap the command system(input); that you convert to a parallel routine, then you can feed it one line at a time or whatever. System is considered to be a risk, but then again, datastage is a back door too, so whether this will be acceptable or not is another question. There are a couple of safer os command calls in the language but at the end of the day, if some jerk replaced awk with a trojan version, its game over (this is a flaw in doing it in datastage also).

you may also be able to use a transformer on a line of text to find and replace the offending values. Datastage is poor at string processing, but it has a couple of solid ways to do this task.

Just throwing ideas out there.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54395
Location: Sydney, Australia
Points: 295036

Post Posted: Wed May 02, 2018 5:20 pm Reply with quote    Back to top    

UCDI wrote:
a VB routine could do this without a lot of pain.

Good luck getting VB to work on Unix.

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Canberra, Sydney and Melbourne (especially seeking good business analysts)
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54395
Location: Sydney, Australia
Points: 295036

Post Posted: Wed May 02, 2018 5:25 pm Reply with quote    Back to top    

UCDI wrote:
Datastage is poor at string processing, but it has a couple of solid ways to do this task.

DataStage server jobs are particularly good at string processing.

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Canberra, Sydney and Melbourne (especially seeking good business analysts)
Rate this response:  
Not yet rated
sandhya.budhi



Group memberships:
Premium Members

Joined: 15 Nov 2017
Posts: 18

Points: 348

Post Posted: Wed May 02, 2018 10:18 pm Reply with quote    Back to top    

Hi Frank,

Thanks for your comments. I designed my job in the below steps

1. Extract the values thats needs to be substituted in the JCL text file
PLID, PLDSN, PLDATE
2. Start loop
In the loop process
3.Execute Command for getting the individual values from the file. Used AWK command to read each value from the file.
4. Replace multiple strings in the JCL text file using sed command and create a JCL file withj replaced strings for PLID, PLDSN, PLDATE
5. Next record
6. End loop

I am having an issue when using sed command. In the sed command I am using the execute commad output value when replacing the string

sed -e 's/&PLID/#ec_Get_PLID_Value.$CommandOutput#/g'

Data from Command Output is GRIF-EPG-F W

When running the job in datastage i am getting the below error
Executed: sed -e 's/&PLID/GRIF-EPG-F W
Reply=)/g'
Output from command ====>
-1

I think # in sed is interpreting as Comment. Is there any better way of using the execute command output value in sed command.

Thanks,
Sandhya

_________________
Thanks,
Sandhya
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 692
Location: Malvern, PA
Points: 6561

Post Posted: Thu May 03, 2018 8:12 am Reply with quote    Back to top    

Sandhya,

I'm just an adequate Unix guy. You have a sed issue that others here can better address. My only advice is to use a script with input parameters instead of a command line, because you can better control the processing, particularly how you pass the results back to the stage.

Two best practices in a script that help me a lot: create a log and write to it at every step; code for errors and use the return code to abend your job when appropriate. Verbose messaging is my friend. Smile

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 336

Points: 3396

Post Posted: Thu May 03, 2018 11:39 am Reply with quote    Back to top    

ray.wurlod wrote:
UCDI wrote:
Datastage is poor at string processing, but it has a couple of solid ways to do this task.

DataStage server jobs are particularly good at string processing.


For some reason that I am not privy to my org has a no server job policy (exceptions allowed but the red tape is deep), so I have not worked with them. Good to know. It feels like the transformer stage needs about 5 or 10 more functions to really round it out for string transforms.

as for the #variable# issue, that is strange. I have several run command stages with parameters like this and they worked fine for me. But if you can't get it to take, maybe poke the #blah# into a user variable stage, let that cook up the string, and pass it forward that way?
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 692
Location: Malvern, PA
Points: 6561

Post Posted: Thu May 03, 2018 12:31 pm Reply with quote    Back to top    

Quote:
sed -e 's/&PLID/#ec_Get_PLID_Value.$CommandOutput#/g'


Sorry, I missed this one, and UCDI has the right idea. Command output is only available in the subsequent stages in "raw" format, meaning it could contain spaces or control characters that can't be read by DataStage.

Put the command output in a User variable first, then use it in your command line. Sample:
Code:
uvFileName Field(GET_FILE_NAME.$CommandOutput,":","2")


I remove the field mark (crlf) from it before passing it to the stage that uses it.
Code:
pFileName Trim(Convert(@FM,"",UVAR_FILE_NAME.uvFileName))

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
sandhya.budhi



Group memberships:
Premium Members

Joined: 15 Nov 2017
Posts: 18

Points: 348

Post Posted: Wed May 09, 2018 1:00 pm Reply with quote    Back to top    

Hi Frank,

I removed the control characters and everything worked well. Using the Field Mark in the user variable stage and data is passed correctly. Thanks for all your valuable comments and help.

_________________
Thanks,
Sandhya
Rate this response:  
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours