Sequential(CSV) file structure validation?

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Sequential(CSV) file structure validation?

Post by jackson.eyton »

Hi,
I have a potential issue I am hoping to circumvent before it happens (matter of time). We have a few external sources that we have to bring in from sequential files. The issue is that a few of these have to be manually generated by that department. It is only a matter of time before someone incorrectly generates one of these files in that the structure is different, therefore the file definition that we have in the datastage job doesn't match. In our setup, unfortunately, if that were to happen, our staging process would stop and we would need to manually review and fix the issue to continue nightly processing. Also unfortunately, datastage's Validate, doesn't check if the incoming source file itself will work as its defined. I was hoping someone had some suggestions on this.

Thanks,
Jackson
-Me
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So, you're looking for something to validate the file structure so you can skip loading any that could jam up your staging process? I had to build something several years ago where the sequential files we had issues with were csv files from a spreadsheet source that people would sometimes... "adjust". That or mess up the formatting or add a ton of blank lines at the end from improper deletions. From what I recall, we read each record in as a single string and then did some simple validations:

Validate the delimiter count
Validate the header column names are there in order
Validate data types / conversion to type
Validate any columns with "valid value" lists

Caught most of the issues from what I recall but not all. I could pull the code out of archives and see what (if anything else) we did if that would help.

I wonder if there is any kind of third party validation utility that could be leveraged? Did a quick search for "csv file format checker" and seems like that may be an option for you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

Looks like CSV Lint might work. I was considering writing my own, may still do that but I will research some other tools that already exist first.

https://csvlint.io/
-Me
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

I have done the Craig's suggestion in my past. Especially the top 2.
Compare the header of the incoming file against the predefined list. And.. thats it.
Assumption is that, header would promptly represent the all Data column and its values.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sending threatening messages to the providers works wonders.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply