Dataset

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
balaya.ds
Participant
Posts: 27
Joined: Fri Dec 25, 2009 10:50 pm

Dataset

Post by balaya.ds »

while loading dataset how many files are created internally?

and what is default path of dataset ...?
Sudheer
arvind_ds
Participant
Posts: 428
Joined: Thu Aug 16, 2007 11:38 pm
Location: Manali

Post by arvind_ds »

No idea about number of files but the path is whatever you have defined in your configuration file for Datasets/Scratch folder location. By default it should go to Project directory.
Arvind
ramsubbiah
Participant
Posts: 40
Joined: Tue Nov 11, 2008 5:49 am

Re: Dataset

Post by ramsubbiah »

balaya.ds wrote:while loading dataset how many files are created internally?

and what is default path of dataset ...?
you need to derive your dataset path,whatever path you have defiend the dataset will reside on that path.

while laoding the dataset based on the nodes (you have defined in the configuration file) the reocrds will be loaded in to the dataset.
Knowledge is Fair,execution is matter!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

At least one file per resource disk mentioned in the node pool that the Data Set stage is using from the configuration file. More than one file if the operating system limits file size (for example to 2GB). More than one file (potentially) if you append to the Data Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kvsudheer
Premium Member
Premium Member
Posts: 20
Joined: Fri Aug 18, 2006 12:01 am
Location: India

Post by kvsudheer »

ray.wurlod wrote:At least one file per resource disk mentioned in the node pool that the Data Set stage is using from the configuration file. More than one file if the operating system limits file size (for example t ...
My question is in the samelines of this post, so i am continuing in this post. please let me know if i have to start a new thread.

Till now i have been thinking that when we create a dataset, the descriptor file will be stored in resource disk and the data file(s) will be stored in scratch disk space.
But as per this post , i understand that even the data files will be stored in resource disk. means the dataset has nothing to do with scratch disk?

i request you to kindly guide me in this regard.

Thanks,
Sudheer
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

The descriptor is stored in folder specified in dataset stage property (project directory if no folder specified) and datafiles are stored in Resource disk specified in configuration file. Hence, Scratch disk is never used for dataset storage unless resource and scratch disk are same in configuration file or may be for virtual datasets(not for datasets itself).

Scratch disk is used as buffer between/for processes and should get cleaned after job completion.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

priyadarshikunal wrote: The descriptor is stored in folder specified in dataset stage property (project directory if no folder specified) and datafiles are stored in Resource disk specified in configuration file.
Right on.

The Resource disk is a permanent storage for the data set data file. However, as the discussion is going on I think the question I am about to post is relevant here. We are having our resource disk in a path and that is getting filled up pretty fast. I was going through that directory the other day and found that data files were sitting there from a couple of years ago. However, it is quite intermittent. The same data file is not present for all the days, but it is present on random dates (at least it seems random dates to me).

The question is, why are these still sitting there on all these dates. All of these data sets are set to be "Overwritten" in every run. So, why are these data files sitting there from times immemorial?
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Many reasons, the first one I would suspect is that you changed the path for your descriptor file, or someone deleted the descriptor file. When it is set to overwrite, it reads the descriptor to delete the data from each of the node locations. If someone deleted your descriptor it wouldn't have this informtion. If you moved the path for your descriptor, it would be like a new data set and wouldn't overwrite anything.

Compare the date on the descriptors to the date on the data files in your resource locations and clean up the ones where the dates don't match.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

kwwilliams wrote: Compare the date on the descriptors to the date on the data files in your resource locations and clean up the ones where the dates don't match.
Thanks for your response. I am not sure if somebody deleted that descriptor file as I am relatively new at this place. Ever since I got here though, none of that happened. Anyway, is it okay if I do a simple "rm" on those unnecessary data files? It would not have any other repercussions?
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

You would need to try to ensure that the locations of your dataset descriptor does not match to the data file you are removing. You wouldn't want to delete data that someone is dependent upon. Most environments will have a handful of environmental vairables used to direct the location of the descriptor. If you're not sure ask someone who has been there for a while. If there as old as you say, then I would think that it would be safe to remove.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

Thanks. I will make sure about that. :D
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

kwwilliams wrote:You would need to try to ensure that the locations of your dataset descriptor does not match to the data file you are removing. You wouldn't want to delete data that someone is dependent upon. Most environments will have a handful of environmental vairables used to direct the location of the descriptor. If you're not sure ask someone who has been there for a while. If there as old as you say, then I would think that it would be safe to remove.
Righto!
Deleting the data file without right iformation may lead to disaster.
Why not "orchadmin delete" the descriptor file which you are sure about.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

kumar_s wrote: Why not "orchadmin delete" the descriptor file which you are sure about.
Normally we would do that. However, there are these files sitting from 2 years ago. I see the same file (of course, the extended name - some Hex things appended to the data set name - is different) again for later dates (including the latest date).
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

vivekgadwal wrote:
kumar_s wrote: Why not "orchadmin delete" the descriptor file which you are sure about.
The question being posed is why would he have older dates on dataset resource files than exist on the descriptor files. His dataset overwrite function is not working and he was seeking an answer to why. Orchadmin is not needed for this situation.
Post Reply