Data Validation of sequential files

This forum is in support of all issues about Data Quality regarding DataStage and other strategies.

Moderators: chulett, rschirm

Post Reply
Champa
Participant
Posts: 88
Joined: Wed Dec 14, 2005 1:44 pm

Data Validation of sequential files

Post by Champa »

Hi,

I have to validate sequential file columns for format matching. Please let me know what tool you recommend.

Thanks
Champa
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A more detailed specification for a start.

Does "format matching" simply refer to the fields in the file (the right number of fields, the correct width for fixed-width, etc.) or does it refer to something more specific on a per-field basis?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Champa
Participant
Posts: 88
Joined: Wed Dec 14, 2005 1:44 pm

Post by Champa »

Hi,

Thanks Ray. Yes it means the right number of fields, the correct width for fixed-width, format mask and validate the value with a check condition.

Eg for format checking:

Phone number format in USA:

215-937-8323

Social Security Number format in USA:

249-91-0000

Thanks
Champa
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

OK, DataStage can do all that. Assuming you're using a parallel job, the Sequential File stage will reject any record that does not match the number/width of fields and can capture these on a reject link. Format you would have to do field-by-field (obviously) in a Transformer stage.
In a server job read the entire line as a single VarChar (I am assuming here that the file has line terminators), parse and check everything in a Transformer stage and, again, set up a rejects link. Format matching is easier in server jobs or BASIC Transformer stage in parallel jobs because the DataStage BASIC Matches operator can match data class (alphabetic, numeric or any), for example InLink.US_Phone Matches "3N'-'3N'-'4N" or InLink.SSN Matches "3N'-'2N'-'4N" or, to be more flexible about the delimiter, InLink.US_Phone Matches "3N1X3N1X4N"
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Champa
Participant
Posts: 88
Joined: Wed Dec 14, 2005 1:44 pm

Post by Champa »

Thank you once again Ray.
Champa
Post Reply