Hii,
I am new to DataStage.
Can somebody help me with DataSet?
I did read the documentation but i can't understand what it is used for?
SP.
DataSet
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 25
- Joined: Wed Oct 13, 2004 1:11 am
welcome Aboard Srinu !
Dataset is Datastage specific file with .ds extension. It is used in parallel jobs for faster data loading as the dataset resides in the temporary disk space used by datastage parallel extender. The metadata should be same on the i/p and o/p links. You can check the manuals or the forum for more information as it has been discussed earlier too.
Sam
Dataset is Datastage specific file with .ds extension. It is used in parallel jobs for faster data loading as the dataset resides in the temporary disk space used by datastage parallel extender. The metadata should be same on the i/p and o/p links. You can check the manuals or the forum for more information as it has been discussed earlier too.
Sam
Welcome Aboard.
What is a data set?
Data set is a like flat file. DataStage parallel extender jobs use data sets to manage data within a job. You can think of each link in a job as carrying a data set.
The Data Set stage allows you to store data being
operated on in a persistent form, which can then be used by other
DataStage jobs. Data sets are operating system files, each referred to by
a control file, which by convention has the suffix .ds.
Using data sets wisely can be key to good performance in a set of linked jobs. You can
also manage data sets independently of a job using the Data Set
Management utility, available from the DataStage Designer, Manager,
or Director.
If you want more information please see the Chapter 56 in Parjdev.pdf..
Thanks
Man.
What is a data set?
Data set is a like flat file. DataStage parallel extender jobs use data sets to manage data within a job. You can think of each link in a job as carrying a data set.
The Data Set stage allows you to store data being
operated on in a persistent form, which can then be used by other
DataStage jobs. Data sets are operating system files, each referred to by
a control file, which by convention has the suffix .ds.
Using data sets wisely can be key to good performance in a set of linked jobs. You can
also manage data sets independently of a job using the Data Set
Management utility, available from the DataStage Designer, Manager,
or Director.
If you want more information please see the Chapter 56 in Parjdev.pdf..
Thanks
Man.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi,
What Ray has mentioned is partly right. The data in the Dataset gets distributed across the processing nodes and the control file gives the information where it is distributed. But the Dataset is more than a text file. In a text file everything is represented as a String or a Number whereas in a Dataset the representation of data is based on the datatype and the representation of Null data is also available.
HTH
--Rich
Pride comes before a fall
Humility comes before honour
What Ray has mentioned is partly right. The data in the Dataset gets distributed across the processing nodes and the control file gives the information where it is distributed. But the Dataset is more than a text file. In a text file everything is represented as a String or a Number whereas in a Dataset the representation of data is based on the datatype and the representation of Null data is also available.
HTH
--Rich
Pride comes before a fall
Humility comes before honour
Hi,
With Datasets you have the advantage of parllellism in reading and writing. Though you have the option of multiple readers in sequential file, there you cannot yuo vriablelength columns. You have to use fixed length columns.
By using datasets you have advantage of multiple nodes reading use variable length columns.
With Datasets you have the advantage of parllellism in reading and writing. Though you have the option of multiple readers in sequential file, there you cannot yuo vriablelength columns. You have to use fixed length columns.
By using datasets you have advantage of multiple nodes reading use variable length columns.
Happy DataStaging