[opensuse-security] Apparent NFS related crash on SLES 10?


If are on the linux-poweredge mailing list then sorry for the
duplicate post.  I posting this to suse-security because this
problem seems like an effective denial of service.

We have started seeing what I think are NFS related crashes on a PowerEdge 2900. The machine ran fine during testing for about a month and a half and the first week of production. After about a week it crashed a few times, then ran fine again for about three weeks. Now it's crashing frequently. It seems to be NFS related, I'm fairly sure I triggered the crash on two or three occasions by installing a large number of RPMs from a directory NFS mounted from the server. It never logs anything to syslog, it locks up hard and does not respond to ping or the keyboard. The only "fix" is to cycle power. The only log is in the
ESM log.  Here's what it says each time it crashes:

-> omreport system esmlog
Embedded System Management (ESM) Log

Health : Ok

Embedded System Management Log contains...

Severity      : Critical
Date and Time : Fri Apr  6 16:59:39 2007
Description : System Software event: run-time critical stop was asserted

I've talked at length with Dell tech support but the only thing that has
been resolved so far is that it does not seem to be a hardware issue.

Has anyone else seen this problem?   Any fixes or workarounds?

Here's more information on the server:

OS:        SLES 10 x86_64 with all available updates installed
FS type:   xfs
Export options: rw,sync,wdelay,insecure,subtree_check

Hardware:  2 x 2.33 GHz Xeon 5140 PowerEdge 2900
RAM        4 GB

This machine is, among other things, an NFS server to a group of about
100 machines, mostly SuSE/OpenSuSE 10.0, 10.1, 10.2 and a smaller number
of OSX machines.  The main filesystem that is exported is the users home
directory which is a RAID 5 disk composed of 4 300 GB SAS disks attached
to a PERC 5/i controller.


----- Stephan

