lookup with LIKE and not with EQUAL
Moderators: chulett, rschirm, roy
lookup with LIKE and not with EQUAL
Hello,
We have a small table , that contains "words".
Then I have a table with one field of address
What I have to do is to check if the address contains the words".
In SQL is something like
select * from table_adress , table_words
where table_adress.address like '%words%'
As we are not working with a database but with a file, then How can I do this in datastage ?
Thank you -
We have a small table , that contains "words".
Then I have a table with one field of address
What I have to do is to check if the address contains the words".
In SQL is something like
select * from table_adress , table_words
where table_adress.address like '%words%'
As we are not working with a database but with a file, then How can I do this in datastage ?
Thank you -
Sorry , it is not 2 input tables, but 2 input files
Exactly what we have to do is we have one file with the addresses and another that is a catalog ( with swear words ) and we find one of those swear word in the address , then we have to blank it.
The catalog file with the swear word is not fixed , it can change
thank you
Exactly what we have to do is we have one file with the addresses and another that is a catalog ( with swear words ) and we find one of those swear word in the address , then we have to blank it.
The catalog file with the swear word is not fixed , it can change
thank you
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In the controlling sequence read the WORDS file and construct the SQL statement. Pass this to the job as a job parameter.
You could use an awk script or a DataStage server routine to construct the SQL - whichever you are comfortable with.
You could use an awk script or a DataStage server routine to construct the SQL - whichever you are comfortable with.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
You have mentioned OS as Windows. Is that the DataStage Server OS? If yes, then some of the options for solving this issue are:
1. Load the contents of the file/files to a database table(s). Subsequently you can follow Ray's suggestion (above) or use sparse lookup with "LIKE/NOT LIKE" clause.
2. (Server job) Another option for partial key lookup used to be hash file/UV stage. You will be able to search for some examples on this forum.
Following options use Unix commands. You may be able to use these commands on a Windows OS leveraging MKS Toolkit.
Assumption: Word file contains just the undesirable words.
3. To drop the lines with the "word", use following command in External source stage
4. To drop the "word" from address file, use following command in External source stage
Note: There is a finite limit on length of argument passed to xargs. These limits vary by flavor of Unix.
1. Load the contents of the file/files to a database table(s). Subsequently you can follow Ray's suggestion (above) or use sparse lookup with "LIKE/NOT LIKE" clause.
2. (Server job) Another option for partial key lookup used to be hash file/UV stage. You will be able to search for some examples on this forum.
Following options use Unix commands. You may be able to use these commands on a Windows OS leveraging MKS Toolkit.
Assumption: Word file contains just the undesirable words.
3. To drop the lines with the "word", use following command in External source stage
Code: Select all
egrep -v -f <WordFile> <AddressFile>
Code: Select all
sed -e 's/^/s\//' -e 's/$/\/\/g;/' <WordFile>|tr '\n' ' '|xargs -i sed {}<AddressFile>
Note: There is a finite limit on length of argument passed to xargs. These limits vary by flavor of Unix.
You can also use the DataStage functions, ex :
Code: Select all
If Count(FullString,Word) > 0
Then --> Word is in FullString
Else --> Word is not in FullString
Code: Select all
If index(FullString,Word,1) > 0
Then --> Word is in FullString
Else --> Word is not in FullString
BI Consultant
DSXConsult
DSXConsult
Yes, it can be done with a bit of careful work in three steps:
1) Transformer stage loop to parse each address record into individual words or tokens, based on whitespace (stage var using Dcount() function, loop to iterate and use Field() function, also per record output both the Dcount value and the @ITERATION value as you will need them in step 3 below)
2) Lookup stage with caseless comparison with the reference input as the swear words file (set the lookup failure condition to Continue)
3) Transformer stage with stage variables used to reassemble the address records with swear words blanked out, and constraint to pass out only the final, full address record (where the iteration value = the DCount value)
I've had to do half a dozen operations like this, for lack of a better way, and found it can actually perform quite well with correct results.
1) Transformer stage loop to parse each address record into individual words or tokens, based on whitespace (stage var using Dcount() function, loop to iterate and use Field() function, also per record output both the Dcount value and the @ITERATION value as you will need them in step 3 below)
2) Lookup stage with caseless comparison with the reference input as the swear words file (set the lookup failure condition to Continue)
3) Transformer stage with stage variables used to reassemble the address records with swear words blanked out, and constraint to pass out only the final, full address record (where the iteration value = the DCount value)
I've had to do half a dozen operations like this, for lack of a better way, and found it can actually perform quite well with correct results.
Last edited by qt_ky on Tue Feb 16, 2016 7:01 am, edited 1 time in total.
Choose a job you love, and you will never have to work a day in your life. - Confucius
You are using version 11 so this is actually a lot easier than some earlier versions.
Join your Full String to every word in your list (ie add a column which contains the offending word).
You can then just sort and group your data into a transformer, set a stage variable to hold the updated output value.
set the stage variable (eg svOutput) to be something like
Output the last row in the group.
This will replace the offending words in your input string.
(You could also accomplish the same by concatenating all your words into a single string and using a loop function)
Join your Full String to every word in your list (ie add a column which contains the offending word).
You can then just sort and group your data into a transformer, set a stage variable to hold the updated output value.
set the stage variable (eg svOutput) to be something like
Code: Select all
If NewRecord then Ereplace(FullString,Word,'') else Ereplace(svOutput,Word,'')
This will replace the offending words in your input string.
(You could also accomplish the same by concatenating all your words into a single string and using a loop function)