Amount of data a dataset can handle

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
lakshmipriya
Participant
Posts: 31
Joined: Tue Jul 13, 2004 5:26 am
Location: chennai
Contact:

Amount of data a dataset can handle

Post by lakshmipriya »

We have a data set which has the following columns and data type

The structure of the data set is
Dset1
C1 char(40)
C2 smallint
C3 char(6)
C4 decimal(9,2)
C5 integer
C6 decimal(9,2)
C7 integer
C8 decimal(9,2)
C9 integer

This data set will have 150 million rows so how much will be the size of the Dataset.

Will the datastage can handle the size of the data set?
Lakshmi
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi Lakshmi,

Pls look into this post and check if that helps

viewtopic.php?t=88791

According to the post they have a dataset which can hold 4.5GB of data.
Lakshmi wrote:This data set will have 150 million rows so how much will be the size of the Dataset.
I think Dataset Management Tool available in Datastage Manager can give you the details you want.

If I were to do this I will load the dataset with different volumes and check how much size the Dataset is taking.

Rgds
--Rich
leo_t_nice
Participant
Posts: 25
Joined: Thu Oct 02, 2003 8:57 am

Post by leo_t_nice »

Hi

I guess the maximum size of the dataset will be 150 million * the row width, plus a small amount for the 'file pointer', but the size of each of the actual files making up the dataset will depend on the number of nodes you are running.

We have datasets in excess of 20 Gb, but we run with 8 nodes so the size of each file is around 2.5 Gb.

Hope this helps
mandyli
Premium Member
Premium Member
Posts: 898
Joined: Wed May 26, 2004 10:45 pm
Location: Chicago

Post by mandyli »

Hi Lakshmi,

I hope you got enough answer from rich and leo_t_nice reply. Size of the dataset files will depends on the number of nodes and no of CPU you are running and also check your datastage server memory size.


Thanks
Post Reply