Page 1 of 1

Manipulate sequential file using regular expression

Posted: Mon Aug 20, 2018 11:58 am
by celso.pupio
I have a sequential file where the delimiter is pipe character ("|"), but it is possible that a specific column has "|" in your content.

Example of a row of the file:
1234567|12345|Specific column|with pipe in the content|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt

I need to manipulate this column to replace the pipe character, before mapping the columns.

A friend is using regular expression to "isolate" the column on Pentaho Spoon software, like this:
1) searches the regex (^[0-9]{1,8}\|[0-9]{1,8}\|) to put double quotes before the column:
1234567|12345|"Specific column|with pipe in the content|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt

2) searches the regex (\|[0-9]{5,7}\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\||\|\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|) to put double quotes after the column:
1234567|12345|"Specific column|with pipe in the content"|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt

3) and then he manipulates the substring between double quotes, to replace the pipe character.


Is it possible to implement this solution on DataStage?

Thanks for any help

Posted: Mon Aug 20, 2018 8:52 pm
by ray.wurlod
Welcome aboard.

Yes, the answer is quite simple. In the Sequential File stage that you are using to read the file, enable the Filter Command option. This will contain a command (or command pipeline) to be executed before the stage starts consuming rows, and the rows will then be consumed from stdout of that command.

So you would apply your regular expressions to the file in the Filter Command, and the stage would read the result.

Posted: Tue Aug 21, 2018 3:11 pm
by celso.pupio
Thanks a lot, Ray.

Now I'm trying to implement the command to insert the double quotes after the first regex, but I'm having issues (it's new to me).

After looking for a solution on internet, I tried to use sed 's/(^[0-9]{1,8}\|[0-9]{1,8}\|)/\1"/' but I got an error that I could not solve yet.

Would anyone know where the error is?

Posted: Tue Aug 21, 2018 3:35 pm
by celso.pupio
Guys, I got it using "sed -r".

Thanks for all!

Posted: Tue Aug 21, 2018 9:31 pm
by ray.wurlod
8)