I have a sequential file where the delimiter is pipe character ("|"), but it is possible that a specific column has "|" in your content.
Example of a row of the file:
1234567|12345|Specific column|with pipe in the content|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt
I need to manipulate this column to replace the pipe character, before mapping the columns.
A friend is using regular expression to "isolate" the column on Pentaho Spoon software, like this:
1) searches the regex (^[0-9]{1,8}\|[0-9]{1,8}\|) to put double quotes before the column:
1234567|12345|"Specific column|with pipe in the content|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt
2) searches the regex (\|[0-9]{5,7}\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\||\|\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|[0-9]{4}\-[0-9]{2}\-[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]\|) to put double quotes after the column:
1234567|12345|"Specific column|with pipe in the content"|123456|2016-12-31 10:20:53.0|2017-01-31 19:10:26.0|||0|tx_tx_txtxtx|Txtxtxtxt|tx_tx_txtxt
3) and then he manipulates the substring between double quotes, to replace the pipe character.
Is it possible to implement this solution on DataStage?
Thanks for any help
Manipulate sequential file using regular expression
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 4
- Joined: Mon Aug 20, 2018 10:41 am
- Location: BrasÃlia/Brazil
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard.
Yes, the answer is quite simple. In the Sequential File stage that you are using to read the file, enable the Filter Command option. This will contain a command (or command pipeline) to be executed before the stage starts consuming rows, and the rows will then be consumed from stdout of that command.
So you would apply your regular expressions to the file in the Filter Command, and the stage would read the result.
Yes, the answer is quite simple. In the Sequential File stage that you are using to read the file, enable the Filter Command option. This will contain a command (or command pipeline) to be executed before the stage starts consuming rows, and the rows will then be consumed from stdout of that command.
So you would apply your regular expressions to the file in the Filter Command, and the stage would read the result.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 4
- Joined: Mon Aug 20, 2018 10:41 am
- Location: BrasÃlia/Brazil
Thanks a lot, Ray.
Now I'm trying to implement the command to insert the double quotes after the first regex, but I'm having issues (it's new to me).
After looking for a solution on internet, I tried to use sed 's/(^[0-9]{1,8}\|[0-9]{1,8}\|)/\1"/' but I got an error that I could not solve yet.
Would anyone know where the error is?
Now I'm trying to implement the command to insert the double quotes after the first regex, but I'm having issues (it's new to me).
After looking for a solution on internet, I tried to use sed 's/(^[0-9]{1,8}\|[0-9]{1,8}\|)/\1"/' but I got an error that I could not solve yet.
Would anyone know where the error is?
-
- Participant
- Posts: 4
- Joined: Mon Aug 20, 2018 10:41 am
- Location: BrasÃlia/Brazil
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: