Recognization on junk

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
skp
Premium Member
Premium Member
Posts: 135
Joined: Wed Dec 26, 2007 1:56 am
Location: India

Recognization on junk

Post by skp »

Issue :: I have a column As CustomerName varchar(50) where the Data is represented as below

MHI三原 ホストダウンサイジング 棚卸フェーズ


I want to know whether these data is a Japanece/chianece charactors or Junck Charactors.

Can any one guide me to find the correct stage/rules to define in quality stage which will divide the data according to the chainese/japanese/ Junk.

Please can any one advice on this.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

These are Japanese characters.

There are no rules in QualityStage for dividing characters based on the character sets to which each belongs. That's not really what QualityStage is for, although you might be able to create a heavily customised rule set.

The preferred tool would be DataStage, and you'd still need some custom code to identify whether a particular character belongs to a particular character set (aka code page). But why?

Know also that Chinese, Japanese and Korean share a few hundred characters (known as the CJK characters under the Unicode standards).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

That is standard japanese text, discussing host downsizing.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In my experience it is totally undesirable to dismiss any character as "junk" without consultation with the owners of the data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
skp
Premium Member
Premium Member
Posts: 135
Joined: Wed Dec 26, 2007 1:56 am
Location: India

Recognization on junk

Post by skp »

Actually i am desired to change the given character to English.

As this was arriving on a daily basis ,I want to work out a process in datastage which will convert Japanese Characters to English Characters.

Can this process be implemented in Datastage itself?

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, but not meaningfully. For example you can transliterate (specify the sound of a Japanese character using English characters, such as "東" and "京" becoming "Tō" and "kyō") but there is no one-to-one correspondence between CJK characters and English characters. Anyone who specifies such a requirement is ignorant of the differences. Resist stupid requirements!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

Ray is spot on. QualityStage and DataStage are not translation engines. You may want to look at translations engines like Google and the like.
Regards,
Robert
sendmkpk
Premium Member
Premium Member
Posts: 97
Joined: Mon Apr 02, 2007 2:47 am

Post by sendmkpk »

Hi

I just got an idea here, cant we use the webservice and connect to google to get the translated respone back.

Would it work?

Reg
Praveen
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Quite possibly, translation engines are getting better. But it would not be DataStage/QualityStage doing the work, which was the gist of the original post.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply