Hi,
I've probably posting something that is a fairly common problem, unfortunately the posts I've searched on have only really solved it for strings in a consistent format. We have a fullname field that needs to be split into title, forename/initials and surname fields. The problem is that the fullname field contains a variety of name formats
e.g.
Mr Edward Smith
D Jones
Miss F G H Underhill
Peter Morrison
T.D. Watson
etc.
Obviously using Field and defining space as a delimiter is not going to work effectively for this and Datastage itself is possibly unsuitable. Is there some code that someone has devised for this scenario...or should I be looking at QualityStage?
Thanks in advance,
Alex
Splitting name into title, forename, surname
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
This is precisely the task that QualityStage performs, almost out of the box. You can invoke a QualityStage standardization task through a QualityStage stage in a DataStage job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
The USNAME/GBNAME rulesets achieve this with little or no customization effort required. You can play with your examples using the Rules Analyzer to see if it meets your needs.ray.wurlod wrote:This is precisely the task that QualityStage performs, almost out of the box. You can invoke a QualityStage standardization task through a QualityStage stage in a DataStage job.
J.
Alex,
there is no builtin function to accurately do this in DataStage. There are 3rd party product for name cleansing out there (Trillium software comes to mind).
In the past I've programmed my own logic in short routines, using spaces, commas, and periods as delimiters and using a list of known prefixes and titles to strip the non-name portions out; then using the last word as the family name and any string left for the first and second names.
there is no builtin function to accurately do this in DataStage. There are 3rd party product for name cleansing out there (Trillium software comes to mind).
In the past I've programmed my own logic in short routines, using spaces, commas, and periods as delimiters and using a list of known prefixes and titles to strip the non-name portions out; then using the last word as the family name and any string left for the first and second names.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>