Schemas for flat files and datasets

hobocamp · Post by **hobocamp** » Thu May 31, 2018 1:45 pm

My searching for this hasn't turned up any results, but figured I'd put the question to the experts.

Is there a method for performing a mass extract of the file and dataset schemas within a DS project? I know that for a particular instance, the columns could be saved as a table definition and then exported.

But in light of GDPR requirements, there is a request to be able to look at these definitions en masse, in order to locate any sensitive data (SSN, Bank info, etc.) being captured and stored.

Thanks in advance for any advice.

Tom Smith

ray.wurlod · Post by **ray.wurlod** » Thu May 31, 2018 11:18 pm

None of which I'm aware.

Are the files guaranteed to have column headers? Does it violate any rules to inspect the data to guess the data types?
In a perfect world it would be possible to use the Connector Import Wizard in DataStage Designer, specifying the File Connector but, alas, it's not.

For data sets, you have the orchadmin command; you could build a script to loop through all the descriptor file names, which I hope you keep in a consistent location.
Similarly to above, importing an Orchestrate schema definition only allows for one descriptor file at a time to be processed.

PaulVL · Post by **PaulVL** » Fri Jun 01, 2018 12:17 pm

You can filter out the schema file from a dataset like this:

Put this into a sequencer execute stage.

echo "record" > #seqfilepath##ds_filename#.schema; echo "{record_delim='\n', final_delim=end,delim=',', quote=double}" >> #seqfilepath##ds_filename#.schema; $DSHOME/../PXEngine/bin/orchadmin describe -s #dataset_path##ds_filename# 2>/dev/null | sed 1,11d >> #seqfilepath##ds_filename#.schema; echo #seqfilepath##ds_filename#.schema

you can figure out the rest.