My searching for this hasn't turned up any results, but figured I'd put the question to the experts.
Is there a method for performing a mass extract of the file and dataset schemas within a DS project? I know that for a particular instance, the columns could be saved as a table definition and then exported.
But in light of GDPR requirements, there is a request to be able to look at these definitions en masse, in order to locate any sensitive data (SSN, Bank info, etc.) being captured and stored.
Thanks in advance for any advice.
Tom Smith
Schemas for flat files and datasets
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
None of which I'm aware.
Are the files guaranteed to have column headers? Does it violate any rules to inspect the data to guess the data types?
In a perfect world it would be possible to use the Connector Import Wizard in DataStage Designer, specifying the File Connector but, alas, it's not.
For data sets, you have the orchadmin command; you could build a script to loop through all the descriptor file names, which I hope you keep in a consistent location.
Similarly to above, importing an Orchestrate schema definition only allows for one descriptor file at a time to be processed.
Are the files guaranteed to have column headers? Does it violate any rules to inspect the data to guess the data types?
In a perfect world it would be possible to use the Connector Import Wizard in DataStage Designer, specifying the File Connector but, alas, it's not.
For data sets, you have the orchadmin command; you could build a script to loop through all the descriptor file names, which I hope you keep in a consistent location.
Similarly to above, importing an Orchestrate schema definition only allows for one descriptor file at a time to be processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
You can filter out the schema file from a dataset like this:
Put this into a sequencer execute stage.
echo "record" > #seqfilepath##ds_filename#.schema; echo "{record_delim='\n', final_delim=end,delim=',', quote=double}" >> #seqfilepath##ds_filename#.schema; $DSHOME/../PXEngine/bin/orchadmin describe -s #dataset_path##ds_filename# 2>/dev/null | sed 1,11d >> #seqfilepath##ds_filename#.schema; echo #seqfilepath##ds_filename#.schema
you can figure out the rest.
Put this into a sequencer execute stage.
echo "record" > #seqfilepath##ds_filename#.schema; echo "{record_delim='\n', final_delim=end,delim=',', quote=double}" >> #seqfilepath##ds_filename#.schema; $DSHOME/../PXEngine/bin/orchadmin describe -s #dataset_path##ds_filename# 2>/dev/null | sed 1,11d >> #seqfilepath##ds_filename#.schema; echo #seqfilepath##ds_filename#.schema
you can figure out the rest.