Today we had one of the gluster NFS servers go wacko. No better name for it. Using the console it was nice and responsive as well as could talk with outside world, but seems to have had NFS issues. So we sent it to reboot and remounted our /home directory from another gluster node. To do that I performed a lazy umount:
# umount -l /home
the idea of course is to get the fs unmounted, then mount the correct one and as soon as different tasks get their i/o finalized they end up using the new filesystem already. However seems that's not a good idea as even though majority of functionality recovered without any users having to do anything it somehow screwed up the locking. Any task that wanted to set a lock on a file got an access denied error.
Restarting the VM didn't help. On the headnode we saw in dmesg:
[1038502.797998] lockd: cannot monitor 192.168.1.244
[1038508.865280] lockd: cannot monitor 192.168.1.244
which quickly pointed to nfslock service issue. Restarting it fixed everything. Let me now go and bang my head against the wall for a while.
No comments:
Post a Comment