Page 1 of 1

lookup with LIKE and not with EQUAL

Posted: Mon Feb 15, 2016 8:34 am
by Gius
Hello,

We have a small table , that contains "words".
Then I have a table with one field of address

What I have to do is to check if the address contains the words".
In SQL is something like

select * from table_adress , table_words
where table_adress.address like '%words%'

As we are not working with a database but with a file, then How can I do this in datastage ?

Thank you -

Posted: Mon Feb 15, 2016 8:59 am
by chulett
Where does the file come into play? All you've mentioned are two tables and then you say that you are 'not working with a database'. :?

Posted: Mon Feb 15, 2016 9:59 am
by Gius
Sorry , it is not 2 input tables, but 2 input files
Exactly what we have to do is we have one file with the addresses and another that is a catalog ( with swear words ) and we find one of those swear word in the address , then we have to blank it.

The catalog file with the swear word is not fixed , it can change

thank you

Posted: Mon Feb 15, 2016 4:26 pm
by ray.wurlod
In the controlling sequence read the WORDS file and construct the SQL statement. Pass this to the job as a job parameter.

You could use an awk script or a DataStage server routine to construct the SQL - whichever you are comfortable with.

Posted: Mon Feb 15, 2016 5:00 pm
by chulett
Ray, the SQL was just there as an example of how they would do it if a database was involved. One isn't. These are two files they need to work with.

Posted: Mon Feb 15, 2016 10:21 pm
by rkashyap
You have mentioned OS as Windows. Is that the DataStage Server OS? If yes, then some of the options for solving this issue are:

1. Load the contents of the file/files to a database table(s). Subsequently you can follow Ray's suggestion (above) or use sparse lookup with "LIKE/NOT LIKE" clause.

2. (Server job) Another option for partial key lookup used to be hash file/UV stage. You will be able to search for some examples on this forum.


Following options use Unix commands. You may be able to use these commands on a Windows OS leveraging MKS Toolkit.

Assumption: Word file contains just the undesirable words.

3. To drop the lines with the "word", use following command in External source stage

Code: Select all

egrep  -v -f <WordFile> <AddressFile>
4. To drop the "word" from address file, use following command in External source stage

Code: Select all

sed -e 's/^/s\//' -e 's/$/\/\/g;/' <WordFile>|tr '\n' ' '|xargs -i sed {}<AddressFile>

Note: There is a finite limit on length of argument passed to xargs. These limits vary by flavor of Unix.

Posted: Tue Feb 16, 2016 5:04 am
by Thomas.B
You can also use the DataStage functions, ex :

Code: Select all

If Count(FullString,Word) > 0
    Then --> Word is in FullString
Else     --> Word is not in FullString

Code: Select all

If index(FullString,Word,1) > 0
	Then --> Word is in FullString
Else    --> Word is not in FullString

Posted: Tue Feb 16, 2016 5:21 am
by Gius
Hello,
the problem is the "Word" , is not just ONE word , is a list of words
thank you -

Posted: Tue Feb 16, 2016 6:53 am
by qt_ky
Yes, it can be done with a bit of careful work in three steps:

1) Transformer stage loop to parse each address record into individual words or tokens, based on whitespace (stage var using Dcount() function, loop to iterate and use Field() function, also per record output both the Dcount value and the @ITERATION value as you will need them in step 3 below)

2) Lookup stage with caseless comparison with the reference input as the swear words file (set the lookup failure condition to Continue)

3) Transformer stage with stage variables used to reassemble the address records with swear words blanked out, and constraint to pass out only the final, full address record (where the iteration value = the DCount value)

I've had to do half a dozen operations like this, for lack of a better way, and found it can actually perform quite well with correct results.

Posted: Tue Feb 16, 2016 6:59 am
by ShaneMuir
You are using version 11 so this is actually a lot easier than some earlier versions.

Join your Full String to every word in your list (ie add a column which contains the offending word).
You can then just sort and group your data into a transformer, set a stage variable to hold the updated output value.
set the stage variable (eg svOutput) to be something like

Code: Select all

If NewRecord then Ereplace(FullString,Word,'') else Ereplace(svOutput,Word,'')
Output the last row in the group.

This will replace the offending words in your input string.

(You could also accomplish the same by concatenating all your words into a single string and using a loop function)