Page 1 of 1

Can we copy Datasets from one server to another?

Posted: Mon Jul 23, 2018 9:24 am
by rumu
We have copied Datasets within the same server. How can we copy Datasets from one server to another server ?

Posted: Mon Jul 23, 2018 12:18 pm
by UCDI
it is better if you do not do this. In an emergency, you can move the dataset AND its underlying files AND change the 'header' file (the one you interface to in datastage when you open or write a dataset) to point to the new locations. The header file is in some sort of unicodeish partial text partial binary format, so editing it and getting it working has a little risk. You can also mount the other server and re-create it there.

You can also write another type of file to transport, like a flat file, or land the data in a temporary database table, and fetch it on the other server.

Posted: Mon Jul 23, 2018 9:58 pm
by ray.wurlod
I too counsel against trying to copy Data Sets from one server to another.


You could mount all the required disks from the new system, then create a configuration file that mentioned resource disk values on those mounts, set environment variables appropriately, and use a command line interface to effect the copy.

But UCDI has identified the preferred approaches.

Posted: Tue Jul 24, 2018 9:27 am
by PaulVL
I have written a custom set of jobs to dump a generic named dataset (parm to the job) into a sequential file + a schema file. The schema file was created with the orchadmin command to dump the schema, filter that via a few awk commands and echo statements and you can easily recreate the schema text file layout that a sequential stage will need to read that file in.

So, here is my generic process.

Step 1) Create a sequencer that has an execute stage and a parallel job.

Execute stage has the orchadmin command.

Job is the dataset to Sequential file process.

Step 2) Create a generic job for sequential file to dataset creation.

Parms to that job would contain the schema file created via that orchadmin command.


Now you should be able to fill in the blanks on the above process. But what this does is that it allows you to pass in ANY dataset and dump to a text file + schema file.

SFTP those over to your target system and run job #2 with takes a Sequential file + Schema file and creates a Dataset.


Here is my Execute Stage command:

echo "record" > #seqfilepath##ds_filename#.schema; echo "{record_delim='\n', final_delim=end,delim=',', quote=double}" >> #seqfilepath##ds_filename#.schema; $DSHOME/../PXEngine/bin/orchadmin describe -s #dataset_path##ds_filename# 2>/dev/null | sed 1,11d >> #seqfilepath##ds_filename#.schema; echo #seqfilepath##ds_filename#.schema


That should set you on the right path.

Please keep in mind that this is A method, not the only method. Some far craftier people out there have better ways of skinning the cat.

Posted: Tue Jul 24, 2018 11:41 am
by rumu
Thanks all for your response . We decided not to copy datasets at this point.

Posted: Wed Jul 25, 2018 8:58 am
by asorrell
IBM's official answer:

How to move dataset from one server to another in IBM InfoSphere DataStage
https://www-01.ibm.com/support/docview. ... wg21392477

Posted: Wed Jul 25, 2018 10:54 am
by PaulVL
Andy, that method lacks the schema information that will be required for the dataset. The method I listed above is basically that + schema.

I'm (not) surprised that IBM left the schema information out of their explanation.

How the heck would you translate anything into a field other than a var char without the schema file?!?

Posted: Wed Jul 25, 2018 11:48 am
by chulett
Isn't that what they mean by the 'header file'?
Once the job has run, the data will be on the target system but the dataset header file for the target dataset will still be on the source server. You will have to move the header files to your target system (same paths).

Posted: Wed Jul 25, 2018 3:33 pm
by PaulVL
if you make your own schema, then you do not have to worry about an APT file that has different node name info. You could even have a different degree of parallelism.

I'm dead set against someone copying over the .ds header file to another host. that is just plain silly. shame on IBM for recommending that.

Posted: Thu Jul 26, 2018 2:04 pm
by UCDI
there is something that works for many small simple DS too:

orchadmin dump -delim ',' dsname.ds outputname.csv

but this is a quick fix for simple problems; there are times when it won't be sufficient, and its terrible for anything big.

it will also lose the schema. But you can export a table def for a ds, datastage can create those from the ds and you can send it alongside the csv.

Posted: Mon Jul 30, 2018 9:27 am
by PaulVL
The process I listed above works for us. Makes the schema for any dataset. Dumps to a sequential file... works great.

Posted: Tue Jul 31, 2018 3:52 am
by ArndW
UCDI wrote:there is something that works for many small simple DS too:

orchadmin dump -delim ',' dsname.ds outputname.csv
...
I seem to remember that you will be losing any NULL values when Dumping a dataset, so one Needs to be careful.