Dataset Magic

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Dataset Magic

Post by daignault »

I'm looking to update a unix (linux) MAGIC file so I can scan a directory and identifiy both Dataset headers as well as Dataset parts for cleaning/purging older datasets on the system using the file command to identify the object.

I've identified that all Dataset headers start with "Torrent Orchestrate" so this can be placed in the magic file, but I'm not sure about the dataset parts.

It appears that dataset parts start with x001 x000 x000 x000 x000 but I suspect this is going to be too generic. Has anyone tried to setup MAGIC for Datastage?

Thanks in advance,

Ray D
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Don't know about magic, but Data Set and File Set segment files do have a very idiosyncratic naming convention that you might be able to leverage. There are probably some identifying bytes farther into the header than the first few bytes, but I haven't looked in there in some time.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

I believe you are approaching this incorrectly. You should not attempt to automate the cleanup / purge by identifying the dataset header and then identifying the dataset data partitions, rather you should identify the headers and then use the tools (orchadmin) to handle the cleanup/archive/purge. This way you know it is correctly handled.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

mhester - unfortunately, the ".ds" descriptor files often get deleted using OS commands, leaving orphaned (large) data files, so one needs to find all descriptor files, and any data files that don't link to a descriptor are orphans and can get deleted.

I posted This FAQ on Magic/Orphans which you can use, Ray (Daignault, not Wurlod).
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Post by daignault »

Thanks Arnd, you took the words from my typing fingers :). I searched the system for dataset and magic but I did not discover your posting. That would have saved the group my rantings.

FYI, I'm doing some Datastage admin work for a large datastage site using offshore contractors. Sometimes the jobs developed are not quite up to spec....such as Datasets and Dataset parts using a suffix of ".txt", Job cleanup using a rm on the dataset header and not the dataset part files, etc.

A background using the file command is here: ( http://unixhelp.ed.ac.uk/CGI/man-cgi?file ) "file" uses a number of methods of identifying the file type. It must either be already aware of the file structure, or have some unique part of that structure defined in a file normally labeled "magic". Not to be confused with this old poster some of us owned ( http://boldt.us/humor/unix_magic.html ).

Cheers,

Ray D
karrisuresh
Participant
Posts: 57
Joined: Sat Jun 09, 2007 1:14 am
Location: chicago

Post by karrisuresh »

Hi orchadmin command followed by filename

1)better to write a program in which pass the filenames as operators

2)or in the datastage ->designer->tools->dataset management->go to the path and select the dataset and delete it

3) To flush the dataset

develop a job in wchi src is row generator and put the condition number of rows 0 and connect it to dataset .
when u run the job the job flushes the dataset ie
keep the dataset empty but not removes it


as per ur req u can use any of the above 3 options
Hi I have experience in parallel extender datastage I am ready to give/take help from other
hope we all help each other hand in hand
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

karrisuresh - the problem is that if you delete the descriptor file by mistake then you have no way of knowing which of the files in the dataset directory are "orphaned" and will never be used again; therefore your suggestion, while being the correct way to do things in a perfect world, will not solve the OPs problem.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Ray,

I get it :D

This can be a problem and I have asked one of the framework developers at IBM if there is a way to find orphaned resource data.

I will let you know what I hear.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

mike - I'm a bit chagrined by your response since the FAQ that I wrote explains exactly how to locate such orphaned files.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm assuming he means something more... official. A button, perhaps. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

ArndW wrote:mike - I'm a bit chagrined by your response since the FAQ that I wrote explains exactly how to locate such orphaned files. ...
I guess if I paid the premium membership I would have seen your response and not responded in the first place :lol:

Sorry to trump your answer - did not mean too.

And Craig, nothing official, but I did verify with the framework developer @ IBM that there is noting within the data that will identify the header other than the name on disk (which if I had the premium membership Arnd probably already mentioned).

You know us pilots Arnd - too cheap to buy anything :wink:
Post Reply