Issue with Dataset performance in RedHat Linux

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
John Corbin
Premium Member
Premium Member
Posts: 12
Joined: Fri Jun 13, 2008 2:51 pm

Issue with Dataset performance in RedHat Linux

Post by John Corbin »

We have a very similar issue in our shop... writing to or reading from datasets seems to take forever.

as an extreme example, I had a job that read 145 rows from a sequential file and wrote them to a dataset.

We use an 8 node config file running DS 8.5

seqFile------------------>Transformer-------------------->Dataset

2 columns on seq file

COL1 VARCHAR(4)
COL2 VARCHAR(4)

Transformer

Does nothing, I did not write this job or it would not be there

2 columns on Dataset

COL1 VARCHAR(4)
COL2 VARCHAR(4)

Here is the Director job start and end times Not kidding either...

Starts at 2015-12-19 11:26:19 AM
Ends at 2015-12-19 04:01:25 PM

No warnings
nothing captured/ignored in a message handler at the job or project level.

on our older SOLARIS server, the job ran fine in seconds.

I tried the

RowGenerator------------------>Transformer-------------------->Dataset

took seconds to run for 145 rows.

Some other info....

in July 2015, we migrated from a Solaris server to a server running Red Hat Linux 2.6.32-431.5.1.el6.x86_64

when I ran the RowGenerator, it was during a quiet time on our production server so this may explain why the job ran fast

When to original job ran, it was during the busy time on the production server.

Could LINIUX be trying to implement some sort of resource management when the box is busy?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No, that's just a weird result. Can you reproduce it? How big are the Data Set segment files?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
John Corbin
Premium Member
Premium Member
Posts: 12
Joined: Fri Jun 13, 2008 2:51 pm

Post by John Corbin »

Ray

It happens every week since we moved to Linux Red Hat.. There are other jobs experiencing the same issue in performance. The one I wrote about is an extreme example

Not sure how to tell how big each segment is...

is this info from Dataset Management useful?

Code: Select all

##I IIS-DSEE-TFCN-00001 08:47:18(000) <main_program> 
IBM WebSphere DataStage Enterprise Edition 8.5.0.6152 
Copyright (c) 2001, 2005-2008 IBM Corporation. All rights reserved
 


##I IIS-DSEE-TUTL-00031 08:47:18(001) <main_program> The open files limit is 16384; raising to 32768.
##I IIS-DSEE-TFCN-00006 08:47:18(002) <main_program> conductor uname: -s=Linux; -r=2.6.32-431.3.1.el6.x86_64; -v=#1 SMP Fri Dec 13 06:58:20 EST 2013; -n=EC24LP4060; -m=x86_64
##I IIS-DSEE-TFSC-00001 08:47:19(000) <main_program> APT configuration file: /disk/temp/datastage/ADW_GST_AUDIT/TMPDIR/aptoa6660cc0f71fc
##I IIS-DSEE-TOIX-00059 08:47:19(000) <APT_RealFileExportOperator in APT_FileExportOperator,0> Export complete; 101 records exported successfully, 0 rejected.
                 Name:  /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds
              Version:  ORCHESTRATE V8.5.0 DM Block Format 6.
     Time of Creation:  12/19/2015 14:30:39
 Number of Partitions:  8
   Number of Segments:  1
       Valid Segments:  1
Preserve Partitioning:  false
Segment Creation Time:
            0:  12/19/2015 14:30:39

Partition 0
  node   : node1
  records: 19
  blocks : 1
  bytes  : 168
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0000.0000.aea.d83efb5f.0000.0527aae4  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0000.0001.aea.d83efb5f.0001.ef31cece  0 bytes
  total   : 131072 bytes
Partition 1
  node   : node2
  records: 18
  blocks : 1
  bytes  : 160
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0001.0000.aea.d83efb5f.0002.6e1bf330  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0001.0001.aea.d83efb5f.0003.0f1ca6f1  0 bytes
  total   : 131072 bytes
Partition 2
  node   : node3
  records: 18
  blocks : 1
  bytes  : 158
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0002.0000.aea.d83efb5f.0004.0b748e6b  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0002.0001.aea.d83efb5f.0005.287499f0  0 bytes
  total   : 131072 bytes
Partition 3
  node   : node4
  records: 18
  blocks : 1
  bytes  : 160
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0003.0000.aea.d83efb5f.0006.d8c1bf43  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0003.0001.aea.d83efb5f.0007.b0f249f6  0 bytes
  total   : 131072 bytes
Partition 4
  node   : node5
  records: 18
  blocks : 1
  bytes  : 160
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0004.0000.aea.d83efb5f.0008.fbbfcb0d  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0004.0001.aea.d83efb5f.0009.432e22ca  0 bytes
  total   : 131072 bytes
Partition 5
  node   : node6
  records: 18
  blocks : 1
  bytes  : 158
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0005.0000.aea.d83efb5f.000a.8edb13b2  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0005.0001.aea.d83efb5f.000b.7505a734  0 bytes
  total   : 131072 bytes
Partition 6
  node   : node7
  records: 18
  blocks : 1
  bytes  : 162
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0006.0000.aea.d83efb5f.000c.3e74c98b  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0006.0001.aea.d83efb5f.000d.be714afc  0 bytes
  total   : 131072 bytes
Partition 7
  node   : node8
  records: 18
  blocks : 1
  bytes  : 160
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0007.0000.aea.d83efb5f.000e.48465cc3  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0007.0001.aea.d83efb5f.000f.0a10ea2d  0 bytes
  total   : 131072 bytes

Totals:
  records : 145
  blocks  : 8
  bytes   : 1286
  filesize: 1048576
  min part: 131072
  max part: 131072

Schema:
record
( ADW_Office_Value: string;
  ORG_office: string;
)
##I IIS-DSEE-TFSC-00010 08:47:19(001) <main_program> Step execution finished with status = OK.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, your segment files are minimally sized (128KB each). Therefore that's not the problem. Data are moved to/from data sets in units of not less than 32KB, so you should be seeing very few I/O operations.

Time to involve your official support provider, methinks.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

APT file on a temp disk... are you running on a GRID?
John Corbin
Premium Member
Premium Member
Posts: 12
Joined: Fri Jun 13, 2008 2:51 pm

Post by John Corbin »

Just confirmed with our support area.. we are not on a GRID
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Do you know what filesystem is used on "/disk/data/datastage/" and does that reside on a SAN or mounted/remote disk?
John Corbin
Premium Member
Premium Member
Posts: 12
Joined: Fri Jun 13, 2008 2:51 pm

Post by John Corbin »

Red Hat Enterprise Linux on mounted/remote disk.
John Corbin
Premium Member
Premium Member
Posts: 12
Joined: Fri Jun 13, 2008 2:51 pm

Update...

Post by John Corbin »

Update...

We have just upgraded to 11.5. We were on 8.5 without support as we were past our license expire date by a year.

We contacted IBM as soon as we were in 11.5... here is what were told...

Datasets make use of a linux system call named fsync. As a test, IBM told our support area how to disable calls to fsync. Jobs then ran in Seconds with out fail.

Sadly, we cannot disable fsynch permanently but this proved the isdue not with Datastage itself but rather our setup...

I did not know at the time, but we are also using vmware over top of linux... vmware is interfering with calls to fsync.

Upshot... we will soon move to hardware using linux but vmware..
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks for the update!
-craig

"You can never have too many knives" -- Logan Nine Fingers
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

John - do you have any further details or a case number from IBM that you could share?
John Corbin
Premium Member
Premium Member
Posts: 12
Joined: Fri Jun 13, 2008 2:51 pm

Post by John Corbin »

apologies for reviving this thread....

sorry no CASE number from IBM only because I am no in the support gropu that would deal with them.

I asked my support area how they fixed it and this is what I was told:

the group that maintains our Data stage added this line to the dsenv file

APT_DATASET_FLUSH_NOSYNC=1; export APT_DATASET_FLUSH_NOSYNC
Post Reply