Page 1 of 1

Issue with Dataset performance in RedHat Linux

Posted: Fri Jan 01, 2016 9:45 am
by John Corbin
We have a very similar issue in our shop... writing to or reading from datasets seems to take forever.

as an extreme example, I had a job that read 145 rows from a sequential file and wrote them to a dataset.

We use an 8 node config file running DS 8.5

seqFile------------------>Transformer-------------------->Dataset

2 columns on seq file

COL1 VARCHAR(4)
COL2 VARCHAR(4)

Transformer

Does nothing, I did not write this job or it would not be there

2 columns on Dataset

COL1 VARCHAR(4)
COL2 VARCHAR(4)

Here is the Director job start and end times Not kidding either...

Starts at 2015-12-19 11:26:19 AM
Ends at 2015-12-19 04:01:25 PM

No warnings
nothing captured/ignored in a message handler at the job or project level.

on our older SOLARIS server, the job ran fine in seconds.

I tried the

RowGenerator------------------>Transformer-------------------->Dataset

took seconds to run for 145 rows.

Some other info....

in July 2015, we migrated from a Solaris server to a server running Red Hat Linux 2.6.32-431.5.1.el6.x86_64

when I ran the RowGenerator, it was during a quiet time on our production server so this may explain why the job ran fast

When to original job ran, it was during the busy time on the production server.

Could LINIUX be trying to implement some sort of resource management when the box is busy?

Posted: Fri Jan 01, 2016 11:02 pm
by ray.wurlod
No, that's just a weird result. Can you reproduce it? How big are the Data Set segment files?

Posted: Sat Jan 02, 2016 7:54 am
by John Corbin
Ray

It happens every week since we moved to Linux Red Hat.. There are other jobs experiencing the same issue in performance. The one I wrote about is an extreme example

Not sure how to tell how big each segment is...

is this info from Dataset Management useful?

Code: Select all

##I IIS-DSEE-TFCN-00001 08:47:18(000) <main_program> 
IBM WebSphere DataStage Enterprise Edition 8.5.0.6152 
Copyright (c) 2001, 2005-2008 IBM Corporation. All rights reserved
 


##I IIS-DSEE-TUTL-00031 08:47:18(001) <main_program> The open files limit is 16384; raising to 32768.
##I IIS-DSEE-TFCN-00006 08:47:18(002) <main_program> conductor uname: -s=Linux; -r=2.6.32-431.3.1.el6.x86_64; -v=#1 SMP Fri Dec 13 06:58:20 EST 2013; -n=EC24LP4060; -m=x86_64
##I IIS-DSEE-TFSC-00001 08:47:19(000) <main_program> APT configuration file: /disk/temp/datastage/ADW_GST_AUDIT/TMPDIR/aptoa6660cc0f71fc
##I IIS-DSEE-TOIX-00059 08:47:19(000) <APT_RealFileExportOperator in APT_FileExportOperator,0> Export complete; 101 records exported successfully, 0 rejected.
                 Name:  /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds
              Version:  ORCHESTRATE V8.5.0 DM Block Format 6.
     Time of Creation:  12/19/2015 14:30:39
 Number of Partitions:  8
   Number of Segments:  1
       Valid Segments:  1
Preserve Partitioning:  false
Segment Creation Time:
            0:  12/19/2015 14:30:39

Partition 0
  node   : node1
  records: 19
  blocks : 1
  bytes  : 168
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0000.0000.aea.d83efb5f.0000.0527aae4  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0000.0001.aea.d83efb5f.0001.ef31cece  0 bytes
  total   : 131072 bytes
Partition 1
  node   : node2
  records: 18
  blocks : 1
  bytes  : 160
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0001.0000.aea.d83efb5f.0002.6e1bf330  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0001.0001.aea.d83efb5f.0003.0f1ca6f1  0 bytes
  total   : 131072 bytes
Partition 2
  node   : node3
  records: 18
  blocks : 1
  bytes  : 158
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0002.0000.aea.d83efb5f.0004.0b748e6b  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0002.0001.aea.d83efb5f.0005.287499f0  0 bytes
  total   : 131072 bytes
Partition 3
  node   : node4
  records: 18
  blocks : 1
  bytes  : 160
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0003.0000.aea.d83efb5f.0006.d8c1bf43  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0003.0001.aea.d83efb5f.0007.b0f249f6  0 bytes
  total   : 131072 bytes
Partition 4
  node   : node5
  records: 18
  blocks : 1
  bytes  : 160
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0004.0000.aea.d83efb5f.0008.fbbfcb0d  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0004.0001.aea.d83efb5f.0009.432e22ca  0 bytes
  total   : 131072 bytes
Partition 5
  node   : node6
  records: 18
  blocks : 1
  bytes  : 158
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0005.0000.aea.d83efb5f.000a.8edb13b2  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0005.0001.aea.d83efb5f.000b.7505a734  0 bytes
  total   : 131072 bytes
Partition 6
  node   : node7
  records: 18
  blocks : 1
  bytes  : 162
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0006.0000.aea.d83efb5f.000c.3e74c98b  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0006.0001.aea.d83efb5f.000d.be714afc  0 bytes
  total   : 131072 bytes
Partition 7
  node   : node8
  records: 18
  blocks : 1
  bytes  : 160
  files  :
    Segment 0 :
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0007.0000.aea.d83efb5f.000e.48465cc3  131072 bytes
           /disk/data/datastage/ADW_DWMA/GST_AUDIT/Datasets/LkupFoundInADWorg.ds.iwadwp.ec24lp4060.0000.0007.0001.aea.d83efb5f.000f.0a10ea2d  0 bytes
  total   : 131072 bytes

Totals:
  records : 145
  blocks  : 8
  bytes   : 1286
  filesize: 1048576
  min part: 131072
  max part: 131072

Schema:
record
( ADW_Office_Value: string;
  ORG_office: string;
)
##I IIS-DSEE-TFSC-00010 08:47:19(001) <main_program> Step execution finished with status = OK.

Posted: Sat Jan 02, 2016 3:18 pm
by ray.wurlod
Yes, your segment files are minimally sized (128KB each). Therefore that's not the problem. Data are moved to/from data sets in units of not less than 32KB, so you should be seeing very few I/O operations.

Time to involve your official support provider, methinks.

Posted: Mon Jan 04, 2016 4:28 am
by PaulVL
APT file on a temp disk... are you running on a GRID?

Posted: Tue Jan 05, 2016 6:20 am
by John Corbin
Just confirmed with our support area.. we are not on a GRID

Posted: Tue Jan 05, 2016 9:09 am
by ArndW
Do you know what filesystem is used on "/disk/data/datastage/" and does that reside on a SAN or mounted/remote disk?

Posted: Wed Jan 06, 2016 6:23 am
by John Corbin
Red Hat Enterprise Linux on mounted/remote disk.

Update...

Posted: Sat Feb 24, 2018 4:49 pm
by John Corbin
Update...

We have just upgraded to 11.5. We were on 8.5 without support as we were past our license expire date by a year.

We contacted IBM as soon as we were in 11.5... here is what were told...

Datasets make use of a linux system call named fsync. As a test, IBM told our support area how to disable calls to fsync. Jobs then ran in Seconds with out fail.

Sadly, we cannot disable fsynch permanently but this proved the isdue not with Datastage itself but rather our setup...

I did not know at the time, but we are also using vmware over top of linux... vmware is interfering with calls to fsync.

Upshot... we will soon move to hardware using linux but vmware..

Posted: Sat Feb 24, 2018 7:52 pm
by chulett
Thanks for the update!

Posted: Wed May 16, 2018 9:13 am
by thompsonp
John - do you have any further details or a case number from IBM that you could share?

Posted: Sun Apr 28, 2019 6:03 pm
by John Corbin
apologies for reviving this thread....

sorry no CASE number from IBM only because I am no in the support gropu that would deal with them.

I asked my support area how they fixed it and this is what I was told:

the group that maintains our Data stage added this line to the dsenv file

APT_DATASET_FLUSH_NOSYNC=1; export APT_DATASET_FLUSH_NOSYNC