lookup with LIKE and not with EQUAL

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Gius
Participant
Posts: 29
Joined: Mon Mar 09, 2015 2:10 am

lookup with LIKE and not with EQUAL

Post by Gius »

Hello,

We have a small table , that contains "words".
Then I have a table with one field of address

What I have to do is to check if the address contains the words".
In SQL is something like

select * from table_adress , table_words
where table_adress.address like '%words%'

As we are not working with a database but with a file, then How can I do this in datastage ?

Thank you -
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Where does the file come into play? All you've mentioned are two tables and then you say that you are 'not working with a database'. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Gius
Participant
Posts: 29
Joined: Mon Mar 09, 2015 2:10 am

Post by Gius »

Sorry , it is not 2 input tables, but 2 input files
Exactly what we have to do is we have one file with the addresses and another that is a catalog ( with swear words ) and we find one of those swear word in the address , then we have to blank it.

The catalog file with the swear word is not fixed , it can change

thank you
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In the controlling sequence read the WORDS file and construct the SQL statement. Pass this to the job as a job parameter.

You could use an awk script or a DataStage server routine to construct the SQL - whichever you are comfortable with.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ray, the SQL was just there as an example of how they would do it if a database was involved. One isn't. These are two files they need to work with.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

You have mentioned OS as Windows. Is that the DataStage Server OS? If yes, then some of the options for solving this issue are:

1. Load the contents of the file/files to a database table(s). Subsequently you can follow Ray's suggestion (above) or use sparse lookup with "LIKE/NOT LIKE" clause.

2. (Server job) Another option for partial key lookup used to be hash file/UV stage. You will be able to search for some examples on this forum.


Following options use Unix commands. You may be able to use these commands on a Windows OS leveraging MKS Toolkit.

Assumption: Word file contains just the undesirable words.

3. To drop the lines with the "word", use following command in External source stage

Code: Select all

egrep  -v -f <WordFile> <AddressFile>
4. To drop the "word" from address file, use following command in External source stage

Code: Select all

sed -e 's/^/s\//' -e 's/$/\/\/g;/' <WordFile>|tr '\n' ' '|xargs -i sed {}<AddressFile>

Note: There is a finite limit on length of argument passed to xargs. These limits vary by flavor of Unix.
Thomas.B
Participant
Posts: 63
Joined: Thu Apr 09, 2015 6:40 am
Location: France - Nantes

Post by Thomas.B »

You can also use the DataStage functions, ex :

Code: Select all

If Count(FullString,Word) > 0
    Then --> Word is in FullString
Else     --> Word is not in FullString

Code: Select all

If index(FullString,Word,1) > 0
	Then --> Word is in FullString
Else    --> Word is not in FullString
BI Consultant
DSXConsult
Gius
Participant
Posts: 29
Joined: Mon Mar 09, 2015 2:10 am

Post by Gius »

Hello,
the problem is the "Word" , is not just ONE word , is a list of words
thank you -
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Yes, it can be done with a bit of careful work in three steps:

1) Transformer stage loop to parse each address record into individual words or tokens, based on whitespace (stage var using Dcount() function, loop to iterate and use Field() function, also per record output both the Dcount value and the @ITERATION value as you will need them in step 3 below)

2) Lookup stage with caseless comparison with the reference input as the swear words file (set the lookup failure condition to Continue)

3) Transformer stage with stage variables used to reassemble the address records with swear words blanked out, and constraint to pass out only the final, full address record (where the iteration value = the DCount value)

I've had to do half a dozen operations like this, for lack of a better way, and found it can actually perform quite well with correct results.
Last edited by qt_ky on Tue Feb 16, 2016 7:01 am, edited 1 time in total.
Choose a job you love, and you will never have to work a day in your life. - Confucius
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

You are using version 11 so this is actually a lot easier than some earlier versions.

Join your Full String to every word in your list (ie add a column which contains the offending word).
You can then just sort and group your data into a transformer, set a stage variable to hold the updated output value.
set the stage variable (eg svOutput) to be something like

Code: Select all

If NewRecord then Ereplace(FullString,Word,'') else Ereplace(svOutput,Word,'')
Output the last row in the group.

This will replace the offending words in your input string.

(You could also accomplish the same by concatenating all your words into a single string and using a loop function)
Post Reply