Phonetic Code Generation

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
nilanjan
Participant
Posts: 16
Joined: Fri Jan 18, 2013 4:12 am

Phonetic Code Generation

Post by nilanjan »

Hi,

My requirement is to generate reverse soundex phonetic code for a complete string say, Byblos Restaurent. Is there any function ? How can i do it?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's possible, but only if you're creating your own PAL script. Is this the case?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

That's an odd requirement to be given.
Let's step back from the "requirement" for a minute. What is the problem? Is there another way to fulfill it?

As you have seen, Soundex or RSoundex have limitations. Even more than the length issue, they are a little bit loose for some tastes.
Have you tried RNYSIIS? It's both longer and a bit more comprehensive phonetically than RSoundex. What region is your data from? A lot of the "standard" phonetic algorithms work best (if at all) with names where English is the primary language. If you have other needs, you'll have to do as Ray said, but using an algorithm that makes more sense for the data.

Are you standardising the name first?
I suggest you do for a couple of reasons:
- "Fix" some spelling errors and standardise full word vs abbreviations etc
- The output of the standard name rulesets split the name into the important words and individually does NYSIIS and RSOUNDEX on them. Might remove the "requirement".
- Nicknames: Bob and Robert won't match but most likely should.
- Gives you the option to check the phonetic keys in varying orders that can allow you to match names where the word order is a little swapped around.
nilanjan
Participant
Posts: 16
Joined: Fri Jan 18, 2013 4:12 am

Post by nilanjan »

Hi Ray,stuart,

My requirement is : we have a base table which have say,12 million data.Now we r getting delta records which we need to compare with the base table data on DBA NAME and STREET NAME by phonetic code match.The matching part will be done by other tool(AB INITIO).My concern is to generate the phonetic code only.What they want to generate phonetic codes of DBA NAME and STREET NAME.In case of STREET NAME,if i use soundex function it will not generate proper code.say,'2825 N SCOTTSDALE RD' and '25 N SCOTTSDALE RD' is generating same code.Which is not approprate.I am new to datastage and really don't have any idea on that.

What is PAL script?
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

If you have a complete address and just want the street name, you'll need to parse it first. Check out lesson 2 of the QualityStage tute: it will do what you want it to.

If you do want some version of the whole address for matching, then maybe a single phonetic value is neither doable nor sufficient.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not any kind of helpful, I know, but I find it amusing that you are using one ETL tool to feed another. Seems to me either one should be able to do the whole task. [shrug]
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can do all of this with QualityStage - no need to use Ab Initio. Maybe Ab Initio can do it all too - that I don't know.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nilanjan
Participant
Posts: 16
Joined: Fri Jan 18, 2013 4:12 am

Post by nilanjan »

chulett wrote:Not any kind of helpful, I know, but I find it amusing that you are using one ETL tool to feed another. Seems to me either one should be able to do the whole task. [shrug]
Actually i m completely new to datastage and it is new also in my project.We have also huge data(about 12 million).So i m afraid about the performance of dsqs.Is there any posibility of performance issue?
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Is there any posibility of performance issue?
Actually - given your DS-Server has got sufficient resources - No.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would say there is always a possibility. That possibility goes up with any tool if you are new to it and there's no-one onsite to mentor you. However, I wouldn't let that stop you... why solve only half the problem? :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply