Manipulate sequential file using regular expression

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
celso.pupio
Participant
Posts: 4
Joined: Mon Aug 20, 2018 10:41 am
Location: Brasília/Brazil

Manipulate sequential file using regular expression

Post by celso.pupio »

I have a sequential file where the delimiter is pipe character ("|"), but it is possible that a specific column has "|" in your content.

Example of a row of the file:
1234567|12345|Specific column|with pipe in the content|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt

I need to manipulate this column to replace the pipe character, before mapping the columns.

A friend is using regular expression to "isolate" the column on Pentaho Spoon software, like this:
1) searches the regex (^[0-9]{1,8}\|[0-9]{1,8}\|) to put double quotes before the column:
1234567|12345|"Specific column|with pipe in the content|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt

2) searches the regex (\|[0-9]{5,7}\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\||\|\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|) to put double quotes after the column:
1234567|12345|"Specific column|with pipe in the content"|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt

3) and then he manipulates the substring between double quotes, to replace the pipe character.


Is it possible to implement this solution on DataStage?

Thanks for any help
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

Yes, the answer is quite simple. In the Sequential File stage that you are using to read the file, enable the Filter Command option. This will contain a command (or command pipeline) to be executed before the stage starts consuming rows, and the rows will then be consumed from stdout of that command.

So you would apply your regular expressions to the file in the Filter Command, and the stage would read the result.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
celso.pupio
Participant
Posts: 4
Joined: Mon Aug 20, 2018 10:41 am
Location: Brasília/Brazil

Post by celso.pupio »

Thanks a lot, Ray.

Now I'm trying to implement the command to insert the double quotes after the first regex, but I'm having issues (it's new to me).

After looking for a solution on internet, I tried to use sed 's/(^[0-9]{1,8}\|[0-9]{1,8}\|)/\1"/' but I got an error that I could not solve yet.

Would anyone know where the error is?
celso.pupio
Participant
Posts: 4
Joined: Mon Aug 20, 2018 10:41 am
Location: Brasília/Brazil

Post by celso.pupio »

Guys, I got it using "sed -r".

Thanks for all!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

8)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply