Difference between File Set and Data Set

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bks_prasad
Participant
Posts: 10
Joined: Thu Mar 18, 2004 12:08 am

Difference between File Set and Data Set

Post by bks_prasad »

Hi All,

Can Any body explain me the scenario where i can go for File Set Stage rather than Data Set Stage? What is the exact difference between File Set and Data Set?Both are operating Systems files and are proprietary to the Data Stage

Thanks in Advance

Regards
Prasad
mandyli
Premium Member
Premium Member
Posts: 898
Joined: Wed May 26, 2004 10:45 pm
Location: Chicago

Post by mandyli »

I hope following main difference between File set and Dataset .

File set stage : only executes in parallel mode. You Can't handle file set independently.

Data Set Stage : configured to execute in parallel or sequential mode and You can also manage data sets independently of a job using the Data Set Management utility, that is available from the DataStage Designer, Manager, or Director.
bks_prasad
Participant
Posts: 10
Joined: Thu Mar 18, 2004 12:08 am

Difference between File Set and Data Set

Post by bks_prasad »

Hi,

Thanks for your reply...But I need to know the exact scenerio where I can use File Set rather than Data Set.

Regards
Prasad
mandyli
Premium Member
Premium Member
Posts: 898
Joined: Wed May 26, 2004 10:45 pm
Location: Chicago

Post by mandyli »

Based on volume of data you can choose file or data set.
nivas
Participant
Posts: 117
Joined: Sun Mar 21, 2004 4:40 pm

Post by nivas »

mandyli wrote:Based on volume of data you can choose file or data set.
My assumption is For High volume we should go FileSet and the DataSet for the latter. Am I correct?
atul9806
Participant
Posts: 96
Joined: Tue Mar 06, 2012 6:12 am
Location: Pune
Contact:

Post by atul9806 »

Yes, In previous OS u can not make a dataset file greater then 2 GB.
But now, No OS have Such type of condition. [ It depends on Sys Admin also ;) )

FileSet stage can handle a lot of data if you want the data in readable format with preserving the partitioning.
Where DataSet Stage can also do the same but u are not able to see the data with datastage tool.

So, It depends on ur need if we skip the filesize condition.
~Atul Singh
<a href=http://www.datagenx.net>DataGenX</a> | <a href=https://www.linkedin.com/in/atulsinghds>LinkedIn</a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You are all on the wrong track entirely. Both Data Sets and File Sets are parallel structures for storing data on disk retaining partitioning and sorted order. The difference between them is that Data Sets store data in the same internal format that DataStage parallel engine uses (the operator for reading and writing Data Sets is copy) whereas writing a File Set uses the export operator and reading from a File Set uses the import operator. Data stored in File Sets is in the same format as that used within text files, and therefore the data in a File Set can be read by humans and by other applications.

Both Data Set stage and File Set stage can operate in either parallel or sequential mode. However I can't think of a good reason to execute either in sequential mode.

Either can have up to 10,000 data ("segment") files per node. Even with a 2GB file size limit, that means that you can store up to 20,000 GB per node.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply