SNMP Monitoring of LSI MegaRaid Cards

By Alasdair Lumsden on 18 Nov 2008

We use LSI 3041E raid cards (which use the SAS1064ET chipset) in a bunch of our Sun x2100 and x2200 Servers, and naturally you want a simple and straight forward method of monitoring the raid status.

Checking the Raid Status on Linux

On Linux, we opted for the simple and easy to use mpt-status utility, which you can script easily. You can install it straight from Debian apt-get, although it doesn’t seem to be in the normal CentOS Yum repositories. It’s pretty easy to use, as this demonstrates:

# mpt-status
open /dev/mptctl: No such file or directory
  Try: mknod /dev/mptctl c 10 220
Make sure mptctl is loaded into the kernel

# modprobe mptctl
# mpt-status

You seem to have no SCSI disks attached to your HBA or you have
them on a different scsi_id. To get your SCSI id, run:

    mpt-status -p

# mpt-status -p
Checking for SCSI ID:0
Checking for SCSI ID:1
Checking for SCSI ID:2
Found SCSI id=2, use ''mpt-status -i 2`` to get more information.

# mpt-status -i 2
ioc0 vol_id 2 type IM, 2 phy, 135 GB, state OPTIMAL, flags ENABLED
ioc0 phy 1 scsi_id 4 SEAGATE  ST314654SSUN146G 022D, 136 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 3 SEAGATE  ST3146855SS      0002, 136 GB, state ONLINE, flags NONE

You can then write a simple bash script to check that the status is “OPTIMAL”, and set up some kind of remote monitoring to access it via SNMP or Nagios NRPE.

Checking the Raid Status on Windows

On Windows Server 2003/2008, for remote monitoring your best (only?) option is to install Windows SNMP, and install LSI MegaRaid Storage Manager with the SNMP plugin. You can download the LSI MegaRaid Storage Manager from LSI’s website. Once SNMP and the MegaRaid SNMP plugin are installed, you should be able to snmpwalk your Windows server:

root mibs (mon01): snmpwalk -v1 -c public w01.someserver.everycity.co.uk | head
SNMPv2-MIB::sysDescr.0 = STRING: Hardware: x86 Family 15 Model 67 Stepping 3 AT/AT COMPATIBLE - Software: Windows Version 5.2 (Build 3790 Multiprocessor Free)
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.311.1.1.3.1.2
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (819029) 2:16:30.29
SNMPv2-MIB::sysContact.0 = STRING:
SNMPv2-MIB::sysName.0 = STRING: W01-SOMESERVER
...

Great! Now, you need the LSI Mib Files. Technically you don’t “need” them to check the relevant SNMP OIDs, but it’s helpful to know what you’re querying. I obtained them by downloading and digging through the Linux version of LSI MegaRaid Storage Manager. At the time of writing this was MSM_Linux_28800.zip Inside this is a tar.gz file called MSM_linux_installer-2.88-00.tar.gz. Inside this are 4 RPM files. This is starting to remind me of Russian dolls. Inside sas_ir_snmp-3.16-1002.i386.rpm and sas_snmp-3.16-1002.i386.rpm (Which you can extract with “rpm2cpio *.rpm | cpio -idmv”). Finally you can get your two MIB files:

./etc/lsi_mrdsnmp/sas/LSI-AdapterSAS.mib
./etc/lsi_mrdsnmp/sas-ir/LSI-AdapterSASIR.mib

If you don’t want to arse around, and lets face it who enjoys arsing around, please enjoy LSI-AdapterSAS.mib and LSI-AdapterSASIR.mib.

On a typical ucd/net SNMP install, you’d place these in /usr/share/snmp/mibs. There’s a good guide on ensuring the mibs get loaded when you call tools such as snmpwalk, which means instead of getting:

# snmpwalk -v1 -c public w01.someserver.everycity.co.uk .1.3.6.1.4.1.3582 | head -n 50
SNMPv2-SMI::enterprises.3582.4.1.1.1 = STRING: "W01-SOMESERVER"
SNMPv2-SMI::enterprises.3582.4.1.2.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0"
SNMPv2-SMI::enterprises.3582.4.1.3.1.1 = STRING: "1.23-02"
SNMPv2-SMI::enterprises.3582.4.1.3.2.1 = STRING: "lsi_mrdsnmpagent.dll"
SNMPv2-SMI::enterprises.3582.4.1.3.3.1 = STRING: "3.16.0.1"
SNMPv2-SMI::enterprises.3582.4.1.3.4.1 = STRING: "28th May 2008"
SNMPv2-SMI::enterprises.3582.4.1.9.1.1 = STRING: "LSI Corporation"
SNMPv2-SMI::enterprises.3582.5.1.1.1 = STRING: "W01-SOMESERVER"
SNMPv2-SMI::enterprises.3582.5.1.2.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0"
SNMPv2-SMI::enterprises.3582.5.1.3.1.1 = STRING: "1.14-01"
SNMPv2-SMI::enterprises.3582.5.1.3.2.1 = STRING: "lsi_mrdsnmpagent.dll"
SNMPv2-SMI::enterprises.3582.5.1.3.3.1 = STRING: "3.16.0.1"
SNMPv2-SMI::enterprises.3582.5.1.3.4.1 = STRING: "28th May 2008"

You get:

LSI-MegaRAID-SAS-MIB::hostName.1 = STRING: "W01-SOMESERVER"
LSI-MegaRAID-SAS-MIB::hostOSInfo.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0"
LSI-MegaRAID-SAS-MIB::mibVersion.1 = STRING: "1.23-02"
LSI-MegaRAID-SAS-MIB::agentModuleName.1 = STRING: "lsi_mrdsnmpagent.dll"
LSI-MegaRAID-SAS-MIB::agentModuleVersion.1 = STRING: "3.16.0.1"
LSI-MegaRAID-SAS-MIB::releaseDate.1 = STRING: "28th May 2008"
LSI-MegaRAID-SAS-MIB::copyright.1 = STRING: "LSI Corporation"
LSI-megaRAID-SAS-IR-MIB::hostName.1 = STRING: "W01-SOMESERVER"
LSI-megaRAID-SAS-IR-MIB::hostOSInfo.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0"
LSI-megaRAID-SAS-IR-MIB::mibVersion.1 = STRING: "1.14-01"
LSI-megaRAID-SAS-IR-MIB::agentModuleName.1 = STRING: "lsi_mrdsnmpagent.dll"
LSI-megaRAID-SAS-IR-MIB::agentModuleVersion.1 = STRING: "3.16.0.1"
LSI-megaRAID-SAS-IR-MIB::releaseDate.1 = STRING: "28th May 2008"

This is obviously much more readable and understandable. You can also view the comments in the MIB file, for example:

pdDiskPredFailureCount                OBJECT-TYPE
    SYNTAX                      INTEGER
    ACCESS                      read-only
    STATUS                      optional
    DESCRIPTION                 "Number of disk devices in this adapter those are critical"

alarmStatus                OBJECT-TYPE
    SYNTAX                      INTEGER{
                                status-ok(1),
                                status-critical(2),
                                status-nonCritical(3),
                                status-unrecoverable(4),
                                status-not-installed(5),
                                status-unknown(6),
                                status-not-available(7)
                                }

Depending on the model of your RAID card, the most useful OIDs to monitor are:

# snmptranslate -IR -On vdDegradedCount
.1.3.6.1.4.1.3582.4.1.4.1.2.1.19

# snmptranslate -IR -On vdOfflineCount
.1.3.6.1.4.1.3582.4.1.4.1.2.1.20

# snmptranslate -IR -On pdDiskFailedCount
.1.3.6.1.4.1.3582.4.1.4.1.2.1.24

# snmptranslate -IR -On pdDiskPredFailureCount
.1.3.6.1.4.1.3582.4.1.4.1.2.1.23

Or:

# snmptranslate -IR -On vdDegradedCount
.1.3.6.1.4.1.3582.5.1.4.1.1.3.1.20

# snmptranslate -IR -On vdOfflineCount
.1.3.6.1.4.1.3582.5.1.4.1.1.3.1.21

# snmptranslate -IR -On pdDiskFailedCount
.1.3.6.1.4.1.3582.5.1.4.1.1.3.1.25

# snmptranslate -IR -On pdDiskPredFailureCount
.1.3.6.1.4.1.3582.5.1.4.1.1.3.1.24

All of which should be zero. You can script snmpget or use nagios’s snmp plugin directly to monitor these values.

Last bot not least, checking on Solaris

Solaris is the easiest of all:

# raidctl -l
Controller: 1
        Volume:c1t0d0
        Disk: 0.1.0
        Disk: 0.2.0

# raidctl -l c1t0d0
Volume                  Size    Stripe  Status   Cache  RAID
        Sub                     Size                    Level
                Disk
----------------------------------------------------------------
c1t0d0                  135.9G  N/A     OPTIMAL  OFF    RAID1
                0.1.0   135.9G          GOOD
                0.2.0   135.9G          GOOD

Enjoy!