Monitoring PERC RAID controllers and storage arrays on Dell PowerEdge servers with Debian and Ubuntu

If you have a Dell PowerEdge server with a RAID array then you’ll probably want to be notified when disks are misbehaving, so that you can replace the disks in a timely manner. Hopefully this article will help you to achieve this.

These tools generally rely on being able to send you email alerts otherwise their usefulness can be somewhat limited, so you should make sure you have a functioning MTA installed which can successfully send email to you from the root account. Setting up an MTA is beyond the scope of this article, so hopefully you already know how to do that.

Monitoring with SMART

Firstly, it’s probably worth installing smartmontools to perform SMART monitoring on the storage array (though, to be honest, in all my years of sysadmin I’ve still yet to receive a single alert from smartd before a disk failure… but anyway).

apt-get install smartmontools

Comment out everything in /etc/smartd.conf and add a line akin to the following:

/dev/sda -d megaraid,0 -a -m foo@bar.com

Replace /dev/sda if you’re using a different device, and replace foo@bar.com with your email address. Restart smartd:

service smartd restart

If you check /var/log/syslog, you should see that the service has successfully started up and is monitoring your RAID controller. You should now in theory receive emails if smartd suspects impending disk failure (though don’t bet on it).

If you want to get information about your controller, try this command:

smartctl -d megaraid,0 -a /dev/sda

That should show you hardware details, current operating temperatures, error information, etc.

Monitoring and querying the RAID controller

So, let’s get down to the proper monitoring tools we really need. Dell’s “PERC” RAID controllers are apparently supplied by LSI, and there’s a utility which LSI produce called megacli for managing these cards, but it seems as if megacli is fairly unfriendly and unintuitive. However, I haven’t needed to use megacli because I’ve had success using a friendlier tool called megactl. So, my instructions are for installing and using megactl, but if that doesn’t work for you then you’ll probably need to try and figure out how to install and use megacli, in which case I wish you the best of luck.

To install megactl, firstly follow these straightforward instructions to install the required repository on Debian or Ubuntu, then install the tools:

apt-get install megaraid-status

This will automatically start the necessary monitoring daemon and ensure that it gets started on boot. This daemon should send a root email if it detects any problems with the array.

The following command can be used to show the current state of the array and its disks, in detail:

megaraidsas-status

To get more information and instructions on megactl and related tools, this page is a good starting point.

Monitoring with Nagios

In order to get alerted via Nagios, it seems that the check_megasasctl plugin for Nagios will do the trick, so long as you have megactl installed as described above. I haven’t actually tried this in Nagios myself yet, so I can’t vouch for it.

  • Chris Boley

    I thought given your work, you may find it useful. Possibly
    others can benefit from this too.

    Add repo for raid software:

    sudo nano /etc/apt/sources.list

    Paste these two lines below into the bottom of your

    sources list:

    ====================================================================

    For hardware Raid Repository ubuntu 14.04

    deb http://hwraid.le-vert.net/ubuntu

    trusty main

    To save: ‘ctrl+x’ then ‘y’ then ‘[enter]’

    Add GPG signature.

    From CLI:

    wget -O – http://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | sudo apt-key add –
    sudo apt-get update

    Install heirloom-mailx and raid control / monitoring software

    sudo apt-get -y install heirloom-mailx megactl megaraid-status megamgr dellmgr

    We’re going to use mailx as a vessel to move the alert to your local smtp relay srvr.

    You’re on your own to make the proper mailx configs for your own SMTP setups.

    I needed to create a message to remind myself what to do when I got a raid alert message.

    If not, I data dump the whole thing and don’t remember what to do.

    touch /home/idsadmin/RAIDALERT.txt
    nano /home/idsadmin/RAIDALERT.txt

    Paste everything in between the lines change servername and IP where applicable:

    ====================================================
    Hello Admin! This is SRVCHSRVR01[172.16.x.x]
    Something on your Raid Array has an issue!
    Please SSH into the box with Putty and a compatible profile.
    Profile instructions are as described below.
    Creating a new Session for accessing dell raid software:
    [WINDOWS]

    Download / Install putty on your computer.
    1. Open putty
    2. Go to Field entitled ‘Saved Sessions’
    3. Enter in a name like “linux-sessions” and click save
    4. Click on that name in the Saved Session box next to the Load/Save/Delete buttons.
    5. click load — Then put in a hostname or IP address.
    6. go to the ‘Data’ selection in the ‘Connection’ portion in the Categories on the left.
    7. Under ‘Terminal Details’ Select the ‘Terminal-typestring’ field.
    8. The string should read:” xterm-color ” or it won’t work with the raid mgmt software.
    9. Go Click on the ‘Session’ Category again on the left.
    9. Make sure you click / highlight the Name you created in the Sessions box and click ‘ Save ‘
    10. This should have saved all settings in the session you created.
    11. Then when you fire up putty next time, you can connect

    Skip down to +++””+++
    [LINUX]
    If you’re using linux with an X-Windows GUI, I suggest
    installing gnome-terminal application.
    Do the following to manipulate the settings to where they will work the same as the putty app:

    Open gnome-terminal
    Click Edit
    Click Profile Preferences
    Click on the Title and Command tab
    Check the box under the Command section of that screen that says:
    “Run a custom command instead of my shell”
    On the “Custom command:” field please copy paste the following..
    env TERM=xterm-color /bin/bash

    Afterward click close. Then close the app and start it back up.
    You will have all necessary functionality.

    +++From your newly set up terminal session+++
    Run one of the following:
    sudo Megaraid-status
    OR
    sudo dellmgr
    Get in there and see what the issue is / fix it.

    End of MSG..

    To save: ‘ctrl+x’ then ‘y’ then ‘[enter]’

    Then you have to create a script to check RAID status;

    NOTE modify email address where applicable.

    touch /home/idsadmin/checkraidscript.sh
    nano /home/idsadmin/checkraidscript.sh

    Canned script: copy below from “# through

    ‘fi'” and paste everything in between the lines.

    !/bin/bash

    touch /home/idsadmin/degraded.txt
    megactl -H >> /home/idsadmin/degraded.txt
    cat /home/idsadmin/degraded.txt | grep “1” > /dev/null
    if [[ $? -eq 0 ]];
    then
    mailx -v -r “blahblahblah@cogentrix.com” -s “SRVCHSRVR01_Raid_Array_Degraded” -S smtp=”172.16.x.x:25″ admin@somewhere.com,blahblah@somewhere.com < /home/idsadmin/RAIDALERT.txt
    fi

    =====================================================================================================

    To save: ‘ctrl+x’ then ‘y’ then ‘[enter]’

    Then make the script executable:

    sudo chmod +x /home/idsadmin/checkraidscript.sh

    Add that script to a cron job.

    sudo crontab -e

    notes:

    Ubuntu has a really polite way of asking you which editor

    you like as a default crontab editor. Very cool IMHO!

    See below output from my term:

    ADMIN@ROUTERSIM:~$ sudo crontab -e

    no crontab for root – using an empty one

    Select an editor. To change later, run ‘select-editor’.

    /bin/ed
    /bin/nano <—- easiest
    /usr/bin/vim.basic
    /usr/bin/vim.tiny

    Choose 1-4 [2]: (pressing [enter] will insert the default in brackets) — nano is recommended.

    Paste the following into the crontab file:

    ====================================================

    00 01-23 * * * /home/idsadmin/checkraidscript.sh

    ====================================================

    EXPLANATION

    00 – 0th Minute (Top of the hour)

    09-18 – 9 am, 10 am,11 am, 12 am, 1 pm, 2 pm, 3 pm, 4 pm,

    5 pm, 6 pm

    – Every day
    – Every month
    – Every day of the week

    Finally you have to blow away the degraded.txt file each

    time the first job runs. It needs to be done with a second script.

    touch /home/idsadmin/killdegradedfile.sh
    nano /home/idsadmin/killdegradedfile.sh

    ================================================================

    !/bin/bash

    rm /home/idsadmin/degraded.txt

    exit

    To save: ‘ctrl+x’ then ‘y’ then ‘[enter]’

    Add that script to a cron job.

    sudo crontab -e

    Paste the following into the crontab file:

    ====================================================

    05 01-23 * * * /home/idsadmin/killdegradedfile.sh

    ====================================================

    To save: ‘ctrl+x’ then ‘y’ then ‘[enter]’

    Make it executable:

    sudo chmod +x /home/idsadmin/killdegradedfile.sh

    From here forward there will be a checking job that runs

    at the top of every hour.

    Then it does a cleanup of the checking script at 5 mins

    past the top of every hour.

    ‘megactl -H’ literally runs a health check against the array. If ANY errors whatsoever pop up,

    you’ll get output into the degraded.txt file. At that point the script will generate an email

    letting you know to go check the array.