Can we copy Datasets from one server to another?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Can we copy Datasets from one server to another?

Post by rumu »

We have copied Datasets within the same server. How can we copy Datasets from one server to another server ?
Rumu
IT Consultant
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

it is better if you do not do this. In an emergency, you can move the dataset AND its underlying files AND change the 'header' file (the one you interface to in datastage when you open or write a dataset) to point to the new locations. The header file is in some sort of unicodeish partial text partial binary format, so editing it and getting it working has a little risk. You can also mount the other server and re-create it there.

You can also write another type of file to transport, like a flat file, or land the data in a temporary database table, and fetch it on the other server.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I too counsel against trying to copy Data Sets from one server to another.


You could mount all the required disks from the new system, then create a configuration file that mentioned resource disk values on those mounts, set environment variables appropriately, and use a command line interface to effect the copy.

But UCDI has identified the preferred approaches.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I have written a custom set of jobs to dump a generic named dataset (parm to the job) into a sequential file + a schema file. The schema file was created with the orchadmin command to dump the schema, filter that via a few awk commands and echo statements and you can easily recreate the schema text file layout that a sequential stage will need to read that file in.

So, here is my generic process.

Step 1) Create a sequencer that has an execute stage and a parallel job.

Execute stage has the orchadmin command.

Job is the dataset to Sequential file process.

Step 2) Create a generic job for sequential file to dataset creation.

Parms to that job would contain the schema file created via that orchadmin command.


Now you should be able to fill in the blanks on the above process. But what this does is that it allows you to pass in ANY dataset and dump to a text file + schema file.

SFTP those over to your target system and run job #2 with takes a Sequential file + Schema file and creates a Dataset.


Here is my Execute Stage command:

echo "record" > #seqfilepath##ds_filename#.schema; echo "{record_delim='\n', final_delim=end,delim=',', quote=double}" >> #seqfilepath##ds_filename#.schema; $DSHOME/../PXEngine/bin/orchadmin describe -s #dataset_path##ds_filename# 2>/dev/null | sed 1,11d >> #seqfilepath##ds_filename#.schema; echo #seqfilepath##ds_filename#.schema


That should set you on the right path.

Please keep in mind that this is A method, not the only method. Some far craftier people out there have better ways of skinning the cat.
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Thanks all for your response . We decided not to copy datasets at this point.
Rumu
IT Consultant
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

IBM's official answer:

How to move dataset from one server to another in IBM InfoSphere DataStage
https://www-01.ibm.com/support/docview. ... wg21392477
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Andy, that method lacks the schema information that will be required for the dataset. The method I listed above is basically that + schema.

I'm (not) surprised that IBM left the schema information out of their explanation.

How the heck would you translate anything into a field other than a var char without the schema file?!?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Isn't that what they mean by the 'header file'?
Once the job has run, the data will be on the target system but the dataset header file for the target dataset will still be on the source server. You will have to move the header files to your target system (same paths).
-craig

"You can never have too many knives" -- Logan Nine Fingers
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

if you make your own schema, then you do not have to worry about an APT file that has different node name info. You could even have a different degree of parallelism.

I'm dead set against someone copying over the .ds header file to another host. that is just plain silly. shame on IBM for recommending that.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

there is something that works for many small simple DS too:

orchadmin dump -delim ',' dsname.ds outputname.csv

but this is a quick fix for simple problems; there are times when it won't be sufficient, and its terrible for anything big.

it will also lose the schema. But you can export a table def for a ds, datastage can create those from the ds and you can send it alongside the csv.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

The process I listed above works for us. Makes the schema for any dataset. Dumps to a sequential file... works great.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

UCDI wrote:there is something that works for many small simple DS too:

orchadmin dump -delim ',' dsname.ds outputname.csv
...
I seem to remember that you will be losing any NULL values when Dumping a dataset, so one Needs to be careful.
Post Reply