Manipulate sequential file using regular expression
Posted: Mon Aug 20, 2018 11:58 am
I have a sequential file where the delimiter is pipe character ("|"), but it is possible that a specific column has "|" in your content.
Example of a row of the file:
1234567|12345|Specific column|with pipe in the content|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt
I need to manipulate this column to replace the pipe character, before mapping the columns.
A friend is using regular expression to "isolate" the column on Pentaho Spoon software, like this:
1) searches the regex (^[0-9]{1,8}\|[0-9]{1,8}\|) to put double quotes before the column:
1234567|12345|"Specific column|with pipe in the content|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt
2) searches the regex (\|[0-9]{5,7}\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\||\|\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|) to put double quotes after the column:
1234567|12345|"Specific column|with pipe in the content"|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt
3) and then he manipulates the substring between double quotes, to replace the pipe character.
Is it possible to implement this solution on DataStage?
Thanks for any help
Example of a row of the file:
1234567|12345|Specific column|with pipe in the content|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt
I need to manipulate this column to replace the pipe character, before mapping the columns.
A friend is using regular expression to "isolate" the column on Pentaho Spoon software, like this:
1) searches the regex (^[0-9]{1,8}\|[0-9]{1,8}\|) to put double quotes before the column:
1234567|12345|"Specific column|with pipe in the content|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt
2) searches the regex (\|[0-9]{5,7}\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\||\|\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|) to put double quotes after the column:
1234567|12345|"Specific column|with pipe in the content"|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt
3) and then he manipulates the substring between double quotes, to replace the pipe character.
Is it possible to implement this solution on DataStage?
Thanks for any help