Page 1 of 1

Schemas for flat files and datasets

Posted: Thu May 31, 2018 1:45 pm
by hobocamp
My searching for this hasn't turned up any results, but figured I'd put the question to the experts.

Is there a method for performing a mass extract of the file and dataset schemas within a DS project? I know that for a particular instance, the columns could be saved as a table definition and then exported.

But in light of GDPR requirements, there is a request to be able to look at these definitions en masse, in order to locate any sensitive data (SSN, Bank info, etc.) being captured and stored.

Thanks in advance for any advice.

Tom Smith

Posted: Thu May 31, 2018 11:18 pm
by ray.wurlod
None of which I'm aware.

Are the files guaranteed to have column headers? Does it violate any rules to inspect the data to guess the data types?
In a perfect world it would be possible to use the Connector Import Wizard in DataStage Designer, specifying the File Connector but, alas, it's not.

For data sets, you have the orchadmin command; you could build a script to loop through all the descriptor file names, which I hope you keep in a consistent location.
Similarly to above, importing an Orchestrate schema definition only allows for one descriptor file at a time to be processed.

Posted: Fri Jun 01, 2018 12:17 pm
by PaulVL
You can filter out the schema file from a dataset like this:

Put this into a sequencer execute stage.


echo "record" > #seqfilepath##ds_filename#.schema; echo "{record_delim='\n', final_delim=end,delim=',', quote=double}" >> #seqfilepath##ds_filename#.schema; $DSHOME/../PXEngine/bin/orchadmin describe -s #dataset_path##ds_filename# 2>/dev/null | sed 1,11d >> #seqfilepath##ds_filename#.schema; echo #seqfilepath##ds_filename#.schema


you can figure out the rest.