How to monitor HP ProLiant DL360 hardware in CentOS, optionally using Nagios

My original post for monitoring HP storage hardware in CentOS is now out of date, so I decided to write an updated post for monitoring all hardware, not just storage hardware, and for optionally including this hardware monitoring in Nagios.

This is written primarily for CentOS 6. It should be largely fine for CentOS 5 and CentOS 7 too, although one or two modifications may be needed. It should also work with some other HP ProLiant servers such as the DL380.

smartd for (supposedly) predicting drive failure

Before we get onto the HP software, it’s worth taking a minute to install smartd, which you can obtain by installing the smartmontools package in CentOS. This software uses the SMART system to attempt to predict when drives are going to fail. It’s easy to configure so that smartd supposedly emails you as soon as problems are detected with drives.

Here’s an older example of an /etc/smartd.conf file on a server which has two SAS disks arranged into a single RAID partition:

/dev/cciss/c0d0 -d cciss,0 -a -m root@ourdomain.com
/dev/cciss/c0d0 -d cciss,1 -a -m root@ourdomain.com

Here’s a more recent example of an /etc/smartd.conf file on a server which has two SSDs configured as RAID 1:

/dev/sda -a -m root@ourdomain.com

However, I’ve never found smartd to be very useful. It starts up fine and indicates via syslog that it’s monitoring the disks, but I’ve never had smartd give a warning before a drive failure even though I’m quite sure it’s configured correctly.

HP software for hardware monitoring

So, onto the really useful stuff. If you try to do this using the official methods as advised by HP, you’ll probably end up installing a whole bunch of awful bloated software that you don’t need taking up resources on your servers. In fact there are only two or three fairly small components which you actually need.

Previously it was necessary to get the first two of these from the HP Service Pack For ProLiant, but HP have recently changed everything once again, so now it’s necessary to get the Management Component Pack for CentOS 6 (also known as hp-mcp) from CentOS 6 Downloads on the Support section of the HP website; this provides the the hp-health (previously known as hpasm) and hpssacli (previously known as hpacucli) components that you’ll need.

If you have SSDs installed, you’ll also want to get the HP Smart Storage Administrator Diagnostic Utility (also known as HP SSADU or hpssaducli, previously known as hpadu) from the Software – System Management section in Red Hat Enterprise Linux 6 Server (x86-64) Downloads on the Support section of the HP website.

Sorry if that all seems a bit longwinded, but HP do have a way of making things complicated.

When you extract the hp-mcp tarball after downloading the Management Component Pack for CentOS 6, you’ll find a subdirectory called something like mcp/CentOS/6/x86_64/10.10 in which there are a bunch of RPM files. Upload the hp-health and hpssacli RPMs to your servers, along with the hpssaducli RPM you got from the HP Smart Storage Administrator Diagnostic Utility if you have SSDs. Then install them the usual way, with rpm -i ... etc.

Checking server hardware with hpasmcli

Once these are installed you can check server hardware by running hpasmcli. Once in, if you type show then you’ll see what things you can check. For example, show powersupply gives you up to date information on – unsurprisingly – the power supplies:

Power supply #1
        Present  : Yes
        Redundant: Yes
        Condition: Ok
        Hotplug  : Supported
        Power    : 40 Watts
Power supply #2
        Present  : Yes
        Redundant: Yes
        Condition: Ok
        Hotplug  : Supported
        Power    : 30 Watts

Type help to get more information.

Checking storage hardware with hpssacli

Next, to check the RAID controller and installed drives, use a command like the following:

hpssacli ctrl all show status ; hpssacli ctrl slot=0 ld all show status ; 
hpssacli ctrl slot=0 pd all show status

That command should show something like this:

Smart Array P440ar in Slot 0 (Embedded)
   Controller Status: OK
   Cache Status: Not Configured
   Battery/Capacitor Status: OK

   logicaldrive 1 (111.8 GB, 1): OK

   physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 120 GB): OK
   physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 120 GB): OK

Type hpssacli help to get more information on how to use it.

Checking SSDs with hpssaducli

If you have SSDs and you installed hpssaducli, you can also check SSD status with this command:

hpssaducli -ssd -txt -f /tmp/ssd.txt ; cat /tmp/ssd.txt

That should show you a bunch of information about wear on the SSDs, e.g:

Smart Array P440ar in Embedded Slot : Internal Drive Cage at Port 1I : Box 1 : Physical Drive (120 GB SATA SSD) 1I:1:1 : SmartSSD Wear Gauge

   Status                               OK
   Supported                            TRUE
   Log Full                             FALSE
   Utilization                          0.000000
   Power On Hours                       47
   Has Smart Trip SSD Wearout           FALSE

Integrating HP hardware monitoring with Nagios

If you’re not using Nagios then obviously you can stop reading now!

Server hardware

I’ve always used the check_hpasm plugin for checking server hardware, and it’s worked well for me. Just follow their instructions to install it, then you can integrate it into your Nagios configuration as needed.

Note that you’ll need to add the following line to your /etc/sudoers so that it has permission to run:

nrpe              ALL=NOPASSWD: /sbin/hpasmcli

Storage hardware

I’ve always used the check_hparray plugin for checking storage hardware, and it’s always worked perfectly for me, notifying me every time there’s been a drive failure. However, I see that it apparently hasn’t worked for some people, and it’s not clear why not, so use at your own risk.

Note that it does need to be modified now that HP have changed the name of their software, so just replace all instances of “hpacucli” in the script with “hpssacli” then it should work fine. Put the script in your Nagios plugins folder, then you can integrate it into your Nagios configuration as needed.

Note that you’ll need to add the following line to your /etc/sudoers so that it has permission to run:

nrpe              ALL=NOPASSWD: /sbin/hpssacli

SSDs

To check the wear status of SSDs, I wrote a simple Nagios plugin which you can obtain from my GitHub repository. You’ll need to install the dos2unix command if it’s not already installed (with yum -y install dos2unix). Just install the plugin in your Nagios plugins directory, then you can integrate it into your Nagios configuration as needed.