Dataset corruption, SIGSEGV while reading

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

niremy
Participant
Posts: 23
Joined: Tue Sep 22, 2009 3:17 am

Dataset corruption, SIGSEGV while reading

Post by niremy »

Hello,

I'm facing an odd problem and I need your enlightment:
I've a job that fails to read a dataset with the following error:

Code: Select all

Event Id: 5834
Time    : Wed Feb 16 18:03:24 2011
Type    : FATAL
User    : ...
Message :
        DS_001,0: Unable to map file /.../dataset/node1/DS_001.ds...0000.0000.0000.7080.cf254e0d.0005.ce3515ca: Invalid argument
        The error occurred on Orchestrate node node1 (hostname ...)
Event Id: 5835
Time    : Wed Feb 16 18:03:24 2011
Type    : FATAL
User    : ...
Message :
        DS_001,1: Unable to map file /.../dataset/node2/DS_001.ds...0000.0001.0000.7080.cf254e0d.0006.9934fd0a: Invalid argument
        The error occurred on Orchestrate node node2 (hostname ...)
Event Id: 5836
Time    : Wed Feb 16 18:03:25 2011
Type    : WARNING
User    : ...
Message :
        DS_001,0: /bin/echo: write error: Broken pipe
Event Id: 5837
Time    : Wed Feb 16 18:03:25 2011
Type    : FATAL
User    : ...
Message :
        DS_001,1: Operator terminated abnormally: received signal SIGSEGV
Event Id: 5838
Time    : Wed Feb 16 18:03:30 2011
Type    : FATAL
User    : ...
Message :
        DS_001,0: Operator terminated abnormally: received signal SIGSEGV
I've checked disk space during execution and nothing seems to consume much space on disks.

The source file is 84 lines long and weight 20K.

I tried to run the same job with the same file on my test server and everything runs smoothly.

I tried to rerun several times the job with the same file but each time it fails with the very same error.

I also searched this forum and couldn't find any clue to the source of my current problem.

I thank you in advance for all your remarks that could lead me to the resolution of this issue :wink:
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

What is the meaning of 'weight 20K' :)
gssr
Participant
Posts: 243
Joined: Fri Jan 09, 2009 12:51 am
Location: India

Post by gssr »

The dataset was not properly loaded. Check the job that creates the Dataset
RAJ
niremy
Participant
Posts: 23
Joined: Tue Sep 22, 2009 3:17 am

Post by niremy »

Sreenivasulu wrote:What is the meaning of 'weight 20K' :)
20 KBytes
It was to prevent the response "The file is too big" :wink:
niremy
Participant
Posts: 23
Joined: Tue Sep 22, 2009 3:17 am

Post by niremy »

gssr wrote:The dataset was not properly loaded. Check the job that creates the Dataset
How come that the job works perfectly on another server ?

I already checked it multiple times and it doesn't differ from my other dataset creation :(
Vidyut
Participant
Posts: 24
Joined: Wed Oct 13, 2010 12:45 am

Post by Vidyut »

R using the same dataset created in your Test Environment??
niremy
Participant
Posts: 23
Joined: Tue Sep 22, 2009 3:17 am

Post by niremy »

Vidyut wrote:R using the same dataset created in your Test Environment??
In fact I have a job sequence that runs the first job that creates the dataset using information from the flat file and then the second job that reads the dataset.

The dataset is clearly corrupted on one of the server as even the orchadmin dump command fails to read the dataset properly.

I'm puzzled because I don't have any warnings with the creation of the dataset :(
devesh_ssingh
Participant
Posts: 148
Joined: Thu Apr 10, 2008 12:47 am

Post by devesh_ssingh »

check the enviorment in which you are reading it...
since it partition it wont work on two different enviorment unless config file is same for both...

i mean if you are reading dataset which created on 8-node config server to 4-node it won't work ....

for that you should create new dataset on 4-node server....
niremy
Participant
Posts: 23
Joined: Tue Sep 22, 2009 3:17 am

Post by niremy »

devesh_ssingh wrote:check the enviorment in which you are reading it...
since it partition it wont work on two different enviorment unless config file is same for both...

i mean if you are reading dataset which created on 8-node config server to 4-node it won't work ....

for that you should create new dataset on 4-node server....
Thanks for the hint ...

For the job creating the dataset :

Code: Select all

Environment variable settings: 
APT_CONFIG_FILE=/app/EQOPIGL/ISF/Projects/EQOPIGL1/EQOPIGL1_INIT/apt_config_file_2_nodes.apt
For the job reading the dataset

Code: Select all

Environment variable settings: 
APT_CONFIG_FILE=/app/EQOPIGL/ISF/Projects/EQOPIGL1/EQOPIGL1_INIT/apt_config_file_2_nodes.apt
So no luck ...
I forgot to mention that the very same jobs work fine with tiny file with 2 or 3 lines
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Show us the content of your APT file.

I'd be interested to see if your datasegments path is a valid path on the server you are executing on.

Also, do you have proper read/write authority to that path.
niremy
Participant
Posts: 23
Joined: Tue Sep 22, 2009 3:17 am

Post by niremy »

PaulVL wrote:Show us the content of your APT file.

I'd be interested to see if your datasegments path is a valid path on the server you are executing on.

Also, do you have proper read/write authority to that path.
Here is the content of the APT_CONFIG_FILE:

Code: Select all

 cat /app/EQOPIGL/ISF/Projects/EQOPIGL1/EQOPIGL1_INIT/apt_config_file_2_nodes.apt
{
        node "node1"
        {
                fastname "slxd2003.app.eiffage.loc"
                pools ""
                resource disk "/app/EQOPIGL/ISF/Files/EQOPIGL1/dataset/node1" {pools ""}
                resource scratchdisk "/app/EQOPIGL/ISF/Files/EQOPIGL1/scratch/node1" {pools ""}
        }
        node "node2"
        {
                fastname "slxd2003.app.eiffage.loc"
                pools ""
                resource disk "/app/EQOPIGL/ISF/Files/EQOPIGL1/dataset/node2" {pools ""}
                resource scratchdisk "/app/EQOPIGL/ISF/Files/EQOPIGL1/scratch/node2" {pools ""}
        }
}

Code: Select all

tree -dpugfDi /app/EQOPIGL/ISF/Files/EQOPIGL1/dataset
/app/EQOPIGL/ISF/Files/EQOPIGL1/dataset
[drwxrwxr-x eqopigl1 eqopigl1 Feb 18 15:05]  /app/EQOPIGL/ISF/Files/EQOPIGL1/dataset/node1
[drwxrwxr-x eqopigl1 eqopigl1 Feb 18 15:05]  /app/EQOPIGL/ISF/Files/EQOPIGL1/dataset/node2
And my user is eqopigl1 of course.

As a reminder, using the same APT_CONFIG_FILE and the same job with very small file in input, the job works flawlessly.
I'm thinking more of a server mis-configuration whereas you seem to think about bad job design :wink:

Nevertheless I appreciate your efforts with helping me finding the problem :)
niremy
Participant
Posts: 23
Joined: Tue Sep 22, 2009 3:17 am

Post by niremy »

May I ask for some more comments ?
I'm stuck with this problem and can't see any solutions ... :?
kshah9
Participant
Posts: 7
Joined: Wed Oct 06, 2010 11:32 am
Location: Pune

Post by kshah9 »

Hey buddy,

Just once contact to your ADMIN team, as I can see the error as "DS_001,0: /bin/echo: write error: Broken pipe", I have faced same issue, and on contacting the DS-ADMIN (Server Team) our issue used to get resolved. So juat an suggestion, to contact the ADMIN team, mentioning the error message.

Not sure, will it resolve the problem, but you can try once.

Regards,
Kunal shah
Best Regards,
Kunal Shah
niremy
Participant
Posts: 23
Joined: Tue Sep 22, 2009 3:17 am

Post by niremy »

kshah9 wrote: Just once contact to your ADMIN team, as I can see the error as "DS_001,0: /bin/echo: write error: Broken pipe", I have faced same issue, and on contacting the DS-ADMIN (Server Team) our issue used to get resolved. So juat an suggestion, to contact the ADMIN team, mentioning the error message.
Thanks, but I post here on behalf of my admin team, we have the same level of knowledge on this issue :roll:
So again any tips will help :wink:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Have you involved your official support provider yet?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply