Replace Strings in a text file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sandhya.budhi
Premium Member
Premium Member
Posts: 18
Joined: Wed Nov 15, 2017 10:50 am

Replace Strings in a text file

Post by sandhya.budhi »

I have a requirement where I have to replace a string with another string in a text file. The string appears in different lines in the text file. I need to replace it in every occurance in the file.

The text file is a mainframe JCL template.

//XXXXXX&PGMCHAR JOB (A1,xx,A123456),'TEST',REGION=0M,
// MSGCLASS=4
//*******************************************************
//DELET EXEC UNCAT
//SYSIN DD *
DELETE DSN=TEST.FILE.&PLDSN
DELETE DSN=TEST.FILE.&PLDSN.DONE.TXT
/*
//PGM1 EXEC E1234,PROGRAM=PGM1
//GO.FILEIN DD *
&PLIDO&PLDATE
/*
//BACKUP DD DUMMY
//RATEIN DD DUMMY
//GO.FILEOUT DD DUMMY
//*

The strings PGMCHAR, PLID, PLDATE and PLDSN is the strings needs to be replaced. The values for these strings will be a different data file.

PGMCHAR PLID PLDATE PLDSN
4 123 20180425 1234
4 456 20180425 4567

The final output file should look like this

//XXXXXX4 JOB (A1,xx,A123456),'TEST',REGION=0M,
// MSGCLASS=4
//*******************************************************
//DELET EXEC UNCAT
//SYSIN DD *
DELETE DSN=TEST.FILE.1234
DELETE DSN=TEST.FILE.1234.DONE.TXT
/*
//PGM1 EXEC E1234,PROGRAM=PGM1
//GO.FILEIN DD *
123O20180425
/*
//BACKUP DD DUMMY
//RATEIN DD DUMMY
//GO.FILEOUT DD DUMMY
//*

//XXXXXX4 JOB (A1,xx,A123456),'TEST',REGION=0M,
// MSGCLASS=4
//*******************************************************
//DELET EXEC UNCAT
//SYSIN DD *
DELETE DSN=TEST.FILE.4567
DELETE DSN=TEST.FILE.4567.DONE.TXT
/*
//PGM1 EXEC E1234,PROGRAM=PGM1
//GO.FILEIN DD *
456O20180425
/*
//BACKUP DD DUMMY
//RATEIN DD DUMMY
//GO.FILEOUT DD DUMMY
//*

For every record in the data file the JCL template should be repeated.

Can we implement this using datastage or I should go completely with UNIX scripting?

My architecture suggested to do it using datastage and I am looking for suggestion how we can replace different strings in a text file.
Thanks,
Sandhya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'd use a UNIX tool such as sed or awk.

It could be done with DataStage (using Replace() or Change() function) but that seems to me to be overkill. This is precisely the kind of task that sed and awk are designed to do.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

IMHO the only "pro" that DataStage would bring to the table would be the ability to handle the table-driven substitution values. And I'd use a Server job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I strongly caution that since this is JCL -- job code that will be executed by the operating system, not data to be processed (at this level) -- that DataStage is a bad choice for accomplishing your task.

In JCL -- with a strong parallel in the job sequence -- it is best practice to break up distinct tasks in job steps called procs. The proc is a separate member of a library, a text file that contains JCL for the task. One may think of it as a standard routine, with linkage variables which are capable of being set or overridden in the JCL step that invokes the proc.

With that, it is very easy to just set the values you want for the substitution variables. You won't be doing it dynamically -- an advantage of using DS or an edit command line -- but you are also executing high-level processes which, if set dynamically, can cause huge headaches in support and maintenance... well, especially at runtime.

In your example, too, I must point out that those variables used in dataset names contain spaces. That is your first clue that doing this dynamically is dangerous.

An excellent reference for the basics of JCL, for versions JES2 & JES3, is "MVS JCL" by Doug Lowe, Mike Murach & Assocs publisher. I keep a copy handy.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

As a second-level thinking, if you really need to set these values at runtime, a routine that builds a proc for you seems best. You would call that proc each cycle, and how you build it is up to you. I would still avoid using DataStage, but if your architects believe it to be the best choice, have at it. I would be very curious to see your solution.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
sandhya.budhi
Premium Member
Premium Member
Posts: 18
Joined: Wed Nov 15, 2017 10:50 am

Post by sandhya.budhi »

I discussed with my architect and said the process will be complicated when using datastage to replace multiple values in a file. But he said to give it a try using Routines in datastage.

I was trying to use the seq and awk commands in the datastage server routine and I am finding difficult to pass the input values to the routine.

The values for the PLDSN, PLID, PLDATE, PGMCHAR needs to be retrieved from a dataset. And everytime the values of these variables will be changed.

I have 10 different values for PLDSN, PLID, PDATE and PGMCHAR in a dataset. I have to create 10 JCL job code and append to the same output file.

I am planning to loop the process of calling the routine to replace the values in the JCL output file. But I am not able to pass the input values to the routine.

The input value will be changed in each loop process.


Will User variable activity stage will be a good choise for passing the input values?


Please advice.

Thanks,
Sandhya
Thanks,
Sandhya
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I have a job that generically sets a loop to run individual FTP sessions from a text file with a list of file names to get. It goes like this:

Exec Command -- script to determine the number of files (sets end value of loop).

User Variable -- set variable values from script.

Loop:

Exec Command -- get next file name from text file.
User Variable -- set variable with file name.
Activity Stage -- calls parallel job for FTP, passing file name in variable.
Next/End Loop.

You can design the loop to always run 10 times, or set it up for a varying number of outputs. How you edit and create the output file is up to you and your architecture team.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

a VB routine could do this without a lot of pain.
you may also find it handy to use the unix shell commands in a parallel job. You can do this with a 2 line C program (literally wrap the command system(input); that you convert to a parallel routine, then you can feed it one line at a time or whatever. System is considered to be a risk, but then again, datastage is a back door too, so whether this will be acceptable or not is another question. There are a couple of safer os command calls in the language but at the end of the day, if some jerk replaced awk with a trojan version, its game over (this is a flaw in doing it in datastage also).

you may also be able to use a transformer on a line of text to find and replace the offending values. Datastage is poor at string processing, but it has a couple of solid ways to do this task.

Just throwing ideas out there.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

UCDI wrote:a VB routine could do this without a lot of pain.
Good luck getting VB to work on Unix.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

UCDI wrote:Datastage is poor at string processing, but it has a couple of solid ways to do this task.
DataStage server jobs are particularly good at string processing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sandhya.budhi
Premium Member
Premium Member
Posts: 18
Joined: Wed Nov 15, 2017 10:50 am

Post by sandhya.budhi »

Hi Frank,

Thanks for your comments. I designed my job in the below steps

1. Extract the values thats needs to be substituted in the JCL text file
PLID, PLDSN, PLDATE
2. Start loop
In the loop process
3.Execute Command for getting the individual values from the file. Used AWK command to read each value from the file.
4. Replace multiple strings in the JCL text file using sed command and create a JCL file withj replaced strings for PLID, PLDSN, PLDATE
5. Next record
6. End loop

I am having an issue when using sed command. In the sed command I am using the execute commad output value when replacing the string

sed -e 's/&PLID/#ec_Get_PLID_Value.$CommandOutput#/g'

Data from Command Output is GRIF-EPG-F W

When running the job in datastage i am getting the below error
Executed: sed -e 's/&PLID/GRIF-EPG-F W
Reply=)/g'
Output from command ====>
-1

I think # in sed is interpreting as Comment. Is there any better way of using the execute command output value in sed command.

Thanks,
Sandhya
Thanks,
Sandhya
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Sandhya,

I'm just an adequate Unix guy. You have a sed issue that others here can better address. My only advice is to use a script with input parameters instead of a command line, because you can better control the processing, particularly how you pass the results back to the stage.

Two best practices in a script that help me a lot: create a log and write to it at every step; code for errors and use the return code to abend your job when appropriate. Verbose messaging is my friend. :)
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

ray.wurlod wrote:
UCDI wrote:Datastage is poor at string processing, but it has a couple of solid ways to do this task.
DataStage server jobs are particularly good at string processing.
For some reason that I am not privy to my org has a no server job policy (exceptions allowed but the red tape is deep), so I have not worked with them. Good to know. It feels like the transformer stage needs about 5 or 10 more functions to really round it out for string transforms.

as for the #variable# issue, that is strange. I have several run command stages with parameters like this and they worked fine for me. But if you can't get it to take, maybe poke the #blah# into a user variable stage, let that cook up the string, and pass it forward that way?
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

sed -e 's/&PLID/#ec_Get_PLID_Value.$CommandOutput#/g'
Sorry, I missed this one, and UCDI has the right idea. Command output is only available in the subsequent stages in "raw" format, meaning it could contain spaces or control characters that can't be read by DataStage.

Put the command output in a User variable first, then use it in your command line. Sample:

Code: Select all

uvFileName Field(GET_FILE_NAME.$CommandOutput,":","2")
I remove the field mark (crlf) from it before passing it to the stage that uses it.

Code: Select all

pFileName Trim(Convert(@FM,"",UVAR_FILE_NAME.uvFileName))
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
sandhya.budhi
Premium Member
Premium Member
Posts: 18
Joined: Wed Nov 15, 2017 10:50 am

Post by sandhya.budhi »

Hi Frank,

I removed the control characters and everything worked well. Using the Field Mark in the user variable stage and data is passed correctly. Thanks for all your valuable comments and help.
Thanks,
Sandhya
Post Reply