Page 1 of 1

Pattern Matching

Posted: Sat Jul 02, 2011 12:27 am
by datastagedw
Hi All,

We have a requirement where in after the look up, not only the records with exact match but also any source records matching the pattern in the look up dataset should also pass.

We used the below option however there can be performance issues as the source data cann be huge:

Have a left outer join based on some dummy key column (generated on either side of the join). This will resut in cartesian product. Then using a transformer, using count function doing the pattern match and placing a constraint accordingly. the output will be 1 for pattern matched records and 0 for unmatched records.

Is there any other approach which can give better performance? Please help

Posted: Sat Jul 02, 2011 1:29 am
by pandeesh
you can try with index function

Posted: Sun Jul 03, 2011 7:52 am
by datastagedw
Thanks for the quick reply. The issue is that the substring to match is dynamic rather than static. The substring values to be looked for come from the reference dataset.

Posted: Sun Jul 03, 2011 9:13 am
by pandeesh
Can you please post your sample source and reference data as well as your expected result?

Posted: Sun Jul 03, 2011 10:14 pm
by datastagedw
Sample Source data:

earabcrt
qewtuy
ghruo
ehraabc
abcrtch
anmnoas
mnopwwas
asdasmno
asdad
asdas
ryrED

Reference dataset
abc
mno

Output results:
earabcrt
ehraabc
abcrtch
anmnoas
mnopwwas
asdasmno

'abc' and 'mno' sub strings should be available in the source column

Posted: Mon Jul 04, 2011 1:27 am
by samyamkrishna
try this unix script


while read record
do
cat srcsam.txt | grep '$record' >> tgtsam.txt
done < refsam.txt

Thsi should work

Posted: Mon Jul 04, 2011 3:16 pm
by ray.wurlod
I don't believe DataStage can do this. You might try using a sparse lookup (assuming you have a database that supports the same) with LIKE in the user-defined SQL.

Posted: Tue Jul 05, 2011 12:54 am
by datastagedw
Thanks for the reply.. We are using SQL 2005, need to check if it supports sparse look up.

In the meanwhile, we decided to go with the approach I mentioned in the post and do a performance testing and observe the results and time taken

Posted: Tue Jul 05, 2011 2:33 am
by ray.wurlod
Only DB2 and Oracle support sparse lookup. Sorry.

Re: Pattern Matching

Posted: Tue Jul 05, 2011 5:19 am
by pandeesh
datastagedw wrote:Hi All,
Have a left outer join based on some dummy key column (generated on either side of the join). This will resut in cartesian product. Then using a transformer, using count function doing the pattern match and placing a constraint accordingly. can give better performance? Please help
Hi,

Can you please explain this approach little bit?

My source file contains below records:
source
--------
abc
bca
axdbc
cbhss
bcda


Reference
----------
bc

My output should be

abc
bca
axdbc

Thanks

Posted: Tue Jul 05, 2011 8:02 am
by pandeesh
ray.wurlod wrote: You might try using a sparse lookup (assuming you have a database that supports the same) with LIKE in the user-defined SQL.
Hi Ray,

I have tried the sparse lookup like below using oracle tables:

Source table contains a field VAL

and the records are

abc
bca
abcde
be

Reference table contains a field VAL and it contains only one record bc.

So the result should be

abc
bca
abcde

Job design:

Code: Select all



src table------------lkp stage(sparse)----------->seq file
                              |
                              |
                            ref table

In the ref table , i have specified the lookup as sparse and the user-defined sql is

Code: Select all

SELECT VAL FROM WHERE VAL LIKE '%ORCHESTRATE.VAL%'
in the src table, the query is like

Code: Select all

SELECT VAL FROM src
But nothing gets matched.
Please help me.

Thanks

Posted: Tue Jul 05, 2011 9:48 am
by samyamkrishna
Hi Pandeesh,

I think the source and reference should be interchanged when you are trying that.