DS Server Restart
Moderators: chulett, rschirm, roy
DS Server Restart
Hi,
We want to Restart the DS Server.
We know that everybody has logged out.
When we execute netstat -na |grep 31538 command,
we found that one user is still connected to DS.
tcp XXXXX 100.200.20.30 100.200.10.15.1740 .....ESTABLISHED.
Iam sure that nobody is using 100.200.10.15.
How is it possible?
Is there any command to get more details about that connection?
Unless I stop/kill that process i can't proceed further .
regards
kcs
We want to Restart the DS Server.
We know that everybody has logged out.
When we execute netstat -na |grep 31538 command,
we found that one user is still connected to DS.
tcp XXXXX 100.200.20.30 100.200.10.15.1740 .....ESTABLISHED.
Iam sure that nobody is using 100.200.10.15.
How is it possible?
Is there any command to get more details about that connection?
Unless I stop/kill that process i can't proceed further .
regards
kcs
You got the ip 100.200.10.15.
Check with the system if any application is still running?
If not, run $DSHOME/bin/list_readu to the list of locks.
It might be a deamon running behind.
Try to UNLOCK ALL.
use ps -eaf | grep dscs to get the user id who holds the lock (
Probably you may not see any )
Then kill the process.
Check with the system if any application is still running?
If not, run $DSHOME/bin/list_readu to the list of locks.
It might be a deamon running behind.
Try to UNLOCK ALL.
use ps -eaf | grep dscs to get the user id who holds the lock (
Probably you may not see any )
Then kill the process.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
It's not always possible to clear all users and you may not personally have the permissions that you need to do so. In that case, stop the server which will kill the connection... eventually. In most cases.
I've had to do that many times for various and sundry reasons. Once the server is down, start checking to see what state any remaining busy sockets are in:
Ideally, this will produce an empty list. In case it doesn't, you'll see sockets in states like FIN_WAIT1, FIN_WAIT2, LAST_ACK (etc), with the first two being the most typical. Access to an Admin or root privledges can help speed up the process, but on my H-PUX box they usually free themselves, drain out in 5 to 15 minutes. I've only had one time where I needed help from an SA to free one last socket that seemed like it was going to hang on forever, hung with a status I hadn't seen before and don't recall.
Once I've seen the sockets, I usually switch to a short version that just shows the count involved, waiting for it to go to zero:
Don't even think about restarting if this doesn't return zero - the engine will come up but the dsrpcd listener will 'bind bomb' and not be able to start. Meaning, you will not be able to connect from any of your client installations. You can grep for it as a final check of successful engine start. If you do this and it is not running, just stop it again and wait longer. If people get impatient, see if an SA or other Root Wielder can help.
I've had to do that many times for various and sundry reasons. Once the server is down, start checking to see what state any remaining busy sockets are in:
Code: Select all
netstat -a |grep dsrpc
Once I've seen the sockets, I usually switch to a short version that just shows the count involved, waiting for it to go to zero:
Code: Select all
netstat -a |grep dsrpc|wc -l
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Hi Kumar,Craig
Thanks for your reply.
We exeute ps -fe | grep dsapi_slave | more to find the process id of this connection.
we got user2 7623 Sep12 .......ESTABLISHED.
We kill that process(7623) through root.
When we execute netstat -na |grep 31538,we got
user2 7623 Sep12 .......FIN_WAIT_1.
Does it mean that this process is running since sep12.
Can we stop the DS Server,when the process is showing FIN_WAIT_1?
regards
kcs
Thanks for your reply.
We exeute ps -fe | grep dsapi_slave | more to find the process id of this connection.
we got user2 7623 Sep12 .......ESTABLISHED.
We kill that process(7623) through root.
When we execute netstat -na |grep 31538,we got
user2 7623 Sep12 .......FIN_WAIT_1.
Does it mean that this process is running since sep12.
Can we stop the DS Server,when the process is showing FIN_WAIT_1?
regards
kcs
The process should disappear in (at worst) 90 minutes depending upon your UNIX settings. I'm not sure what the network configuration parameter for that socket timeout is right now, though. I think that there isn't much to do between FIN_WAIT and FIN_CLOSED status; although each hardware vendor / UNIX flavor has some means of doing this. If you did a google search on your OS and FIN_WAIT you might get some help in how to get rid of the sockets prior to their natural timeouts.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Aarrggh! Did you check to see whether pid 7362 had any child processes before you killed it? Killing a parent can turn a nohup child into a zombie, and they're almost impossible to get rid of without re-booting UNIX.We kill that process(7623) through root.
And killing a process does not, at least not immediately, release any associated socket, particularly if it is in a wait state. If you're on Solaris, you can use the ndd command to find out what the TCP timeout is. There are similar commands in other UNIX variants, but none in my head if the moment.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi,
Sorry for responding late.
After we got user2 7623 Sep12 .......FIN_WAIT_1 .
we restarted the server.
When we check for netstat -na |grep 31538 ,it returned nothing.
DS is working fine now.
I think we can find the child process using awk script .
This has to be executed in root.
Can you tell me,
If the child process is still there,
What kind of problem it will create?
Is re-booting Unix is the only solution?
regards
kcs
Sorry for responding late.
After we got user2 7623 Sep12 .......FIN_WAIT_1 .
we restarted the server.
When we check for netstat -na |grep 31538 ,it returned nothing.
DS is working fine now.
We didn't check whether pid 7623 had any child process.Did you check to see whether pid 7362 had any child processes before you killed it? Killing a parent can turn a nohup child into a zombie, and they're almost impossible to get rid of without re-booting UNIX.
I think we can find the child process using awk script .
This has to be executed in root.
Can you tell me,
If the child process is still there,
What kind of problem it will create?
Is re-booting Unix is the only solution?
regards
kcs
Yes, but not always.
If the child process left orphaned, and the socket may still be open (though the actuall process is not running in server). Which will leave out the zombie process. Hence if you start the server before the the connection dies, RPC daemon may not start. And it may give you error code - 81016. And which most likely yeild to reboot the unix machine.
If the child process left orphaned, and the socket may still be open (though the actuall process is not running in server). Which will leave out the zombie process. Hence if you start the server before the the connection dies, RPC daemon may not start. And it may give you error code - 81016. And which most likely yeild to reboot the unix machine.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'