If you have a Dell PowerEdge server with a RAID array then you’ll probably want to be notified when disks are misbehaving, so that you can replace the disks in a timely manner. Hopefully this article will help you to achieve this.
These tools generally rely on being able to send you email alerts otherwise their usefulness can be somewhat limited, so you should make sure you have a functioning MTA installed which can successfully send email to you from the root account. Setting up an MTA is beyond the scope of this article, so hopefully you already know how to do that (or you can check out my new post on setting up a Postfix-based mail system).
Monitoring with SMART
Firstly, it’s probably worth installing smartmontools to perform SMART monitoring on the storage array (though, to be honest, in all my years of sysadmin I’ve still yet to receive a single alert from smartd before a disk failure… but anyway).
apt-get install smartmontools
Comment out everything in /etc/smartd.conf and add a line akin to the following:
/dev/sda -d megaraid,0 -a -m firstname.lastname@example.org
Replace /dev/sda if you’re using a different device, and replace email@example.com with your email address. Restart smartd:
service smartd restart
If you check /var/log/syslog, you should see that the service has successfully started up and is monitoring your RAID controller. You should now in theory receive emails if smartd suspects impending disk failure (though don’t bet on it).
If you want to get information about your controller, try this command:
smartctl -d megaraid,0 -a /dev/sda
That should show you hardware details, current operating temperatures, error information, etc.
Monitoring and querying the RAID controller
So, let’s get down to the proper monitoring tools we really need. Dell’s “PERC” RAID controllers are apparently supplied by LSI, and there’s a utility which LSI produce called megacli for managing these cards, but it seems as if megacli is fairly unfriendly and unintuitive. However, I haven’t needed to use megacli because I’ve had success using a friendlier tool called megactl. So, my instructions are for installing and using megactl, but if that doesn’t work for you then you’ll probably need to try and figure out how to install and use megacli, in which case I wish you the best of luck.
To install megactl, firstly follow these straightforward instructions to install the required repository on Debian or Ubuntu, then install the tools:
apt-get install megaraid-status
This will automatically start the necessary monitoring daemon and ensure that it gets started on boot. This daemon should send a root email if it detects any problems with the array.
The following command can be used to show the current state of the array and its disks, in detail:
To get more information and instructions on megactl and related tools, this page is a good starting point.
Monitoring with Nagios
In order to get alerted via Nagios, it seems that the check_megasasctl plugin for Nagios will do the trick, so long as you have megactl installed as described above. I haven’t actually tried this in Nagios myself yet, so I can’t vouch for it.