Linux: Dell's "omreport storage controller" Degraded State Issue

System

* Redhat Enterprise Linux (multiple versions)

Note that I am bouncing around on different boxes, please keep versioning agnostic... In general I'm talking about 'an older version' ... not specific versions in general. Our goal here is to get the kernel module for the RAID controller updated to resolve our issue.


Background Information and Disclaimer

The following article references the 'megaraid_sas' kernel driver required for the Dell 2950/2970 and many other Dell system. Though in my case, I am working with those two boxes in particular. For your system, you may need to alter some of the steps to properly reference a different kind of kernel driver.

Commands written in this article assume you have root access on the box in question.

Disclaimer: This article should be used as a reference for resolving similar issues. Please do not trust it like the holy grail. All actions should first be implemented and tested on non-production systems to guarantee there are no issues.

Disclaimer: This article is not necessarily outlining best practices for system administration. I myself would not perform the manually installation of 'dkms' and non-Redhat RPMs for the megaraid_sas unless it was absolutely necessary.


Overview:

I came across this issue during a project to automate the upgrade of BIOS firmware across... well, a large number of systems. The issue that I've seen across a number of boxes is that the RAID controller is reporting a degraded controller, however the disk status is fine:

~]# omreport storage controller
 
 Controller  PERC 6/i Integrated (Embedded)
 
Controllers
ID                                            : 0
Status                                        : Non-Critical
Name                                          : PERC 6/i Integrated
Slot ID                                       : Embedded
State                                         : Degraded
Firmware Version                              : 6.1.1-0047
Minimum Required Firmware Version             : Not Applicable
Driver Version                                : 00.00.03.13
 
Minimum Required Driver Version               : 00.00.03.20
 
 
 
 
~]# omreport storage vdisk
 
List of Virtual Disks in the System
 
Controller PERC 6/i Integrated (Embedded)
ID                  : 0
Status              : Ok
Name                : Virtual Disk 0
State               : Ready
Progress            : Not Applicable
Layout              : RAID-1
Size                : 67.75 GB (72746008576 bytes)
Device Name         : /dev/sda
Type                : SAS
Read Policy         : Adaptive Read Ahead
Write Policy        : Write Back
Cache Policy        : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy   : Disabled



The issue is really that the 'Degraded' state is misleading. The reason that the controller is showing a degraded state is due to the fact that the firmware knows there is a newer revision of the megaraid_sas kernel driver or firmware for the controller available, and it wants that newer version.

As you can see, the current driver version is 00.00.03.13, but it knows it needs a minimum of 00.00.03.20:

Driver Version                                : 00.00.03.13
 
Minimum Required Driver Version               : 00.00.03.20


The Short Version

In short, the srvadmin utilities (omreport, etc) will show the controller as 'Non-Critical/Degraded' if it detects that either the RAID Controller Firmware, or the Linux Kernel Driver are out of date. Ultimately, this article goes way deep into the details of updating both of those. If you know what you're doing, that's really all you need to know.


The Long Version

After a few hours of scouring the internet and finding that pretty much everyone and their mother has come across this issue, and with the help of my colleagues, I found the resolution.

Updating the kernel driver isn't really that difficult of a task (now-a-days), though you might come across some snags.

That said, something to keep in mind is that your OS vendor may have already implemented these changes in their latest kernel. If you are skipping the auto kernel updates, like a lot of people are, then you may have a newer kernel update available to resolves this for you. If upgrading the kernel is an option, I would suggest that route.

When Not Able To Upgrade The Kernel

This section explains the steps required when you're not able to upgrade your kernel, or if the latest kernel available from your OS Vendor (Redhat in this case) doesn't resolve the issues for you.



Get the Latest Kernel Driver:

As mentioned previously, we are using Dell 2950/2970 boxes here, and need to update the megaraid_sas driver to a newer version than what the RAID controller firmware is requiring.

You might have to search around a bit, but I came across the latest (at present time) here:

* Dell Support Downloads



Download that to your box, unpack it, and install the files:

~]# wget http://tinyurl.com/c7gv56
 
~]# tar -zxvf megaraid_sas-v00.00.03.21-4-R193772.tar.gz 
dkms-2.0.19-1.noarch.rpm
megaraid_sas-v00.00.03.21-4.noarch.rpm
megaraid_sas-v00.00.03.21-4.txt
megaraid_sas-v00.00.03.21.dkms.spec
megaraid_sas-v00.00.03.21-src.tgz



As you can see, this provides us 'dkms' and 'megaraid_sas' rpms. Lets install those:

~]# rpm -Uvh dkms-2.0.19-1.noarch.rpm 
Preparing...                ########################################### [100%]
   1:dkms                   ########################################### [100%]
 
 
~]# rpm -Uvh megaraid_sas-v00.00.03.21-4.noarch.rpm 
Preparing...                ########################################### [100%]
   1:megaraid_sas           warning: user compiler does not exist - using root
warning: group compiler does not exist - using root
########################################### [100%]
 
Loading tarball for module: megaraid_sas / version: v00.00.03.21
 
Loading /usr/src/megaraid_sas-v00.00.03.21...
Creating /var/lib/dkms/megaraid_sas/v00.00.03.21/source symlink...
 
DKMS: ldtarball Completed.
 
Error! Your kernel source for kernel 2.6.18-92.1.13.el5 cannot be found at
/lib/modules/2.6.18-92.1.13.el5/build or /lib/modules/2.6.18-92.1.13.el5/source.
You can use the --kernelsourcedir option to tell DKMS where it's located.
 
Error! Could not locate megaraid_sas.ko for module megaraid_sas in the DKMS tree.
You must run a dkms build for kernel 2.6.18-92.1.13.el5 (x86_64) first.
error: %post(megaraid_sas-v00.00.03.21-4.noarch) scriptlet failed, exit status 4



Whoa, what? This brings us to our next point...

Installing The Right Kernel Devel Package

On Redhat and other RPM based distros, the kernel can sometimes get 'upgraded' but not actually implemented. Meaning, the RPM packages installed are newer than the actual running kernel. If this is the case, you may need to manually download the RPM for 'kernel-devel' or 'kernel-smp-devel' or 'kernel-xen-devel' (depending) for the proper version of your kernel.

However, on this box we are running the same kernel as the RPM packages installed... however the error we are receiving is because we are missing the kernel-devel package.

[root@122242-r900test1 ~]# uname -r
2.6.18-92.1.13.el5
 
[root@122242-r900test1 ~]# rpm -qa | grep kernel
kernel-2.6.18-92.1.13.el5
kernel-devel-2.6.18-92.1.6.el5
kernel-headers-2.6.18-92.1.6.el5



In my case, kernel version 2.6.18-92.1.13.el5 is the running kernel but there is no kernel-devel installed for that version. Installing via up2date/yum should resolve this if you are running the latest kernel. In our environment we choose to skip the kernel by default... so installing the kernel-devel package now means the package from RHN will be newer than my running kernel, and maybe I don't want to upgrade my kernel at this point. If that is the case, you may need to login to RHN directly to download the proper kernel-devel package.

Navigate your RHN account to download the package you are looking for... take that URL, and wget the package to your box. Install said package:

~]# wget http://rhn.redhat.com/.../kernel-devel-2.6.18-92.1.13.el5.x86_64.rpm
 
 
~]# rpm -Uvh kernel-devel-2.6.18-92.1.13.el5.x86_64.rpm 
Preparing...                ########################################### [100%]
   1:kernel-devel           ########################################### [100%]



EL5+/Yum

With systems that use Yum as their updater, you can also specify the package exactly including the version-release.arch:

~]# yum install kernel-2.6.18-92.1.22.el5.x86_64  --disableexcludes=all


Alternatively, Upgrade Your Kernel

On one of these boxes, we have the latest kernel update available from Redhat EL 5.3 that implements the megaraid_sas-00.00.04.xx meaning that simply upgrading the kernel would fix our issue:

~]# yum install kernel kernel-devel --disableexcludes=all



Verify the right kernel is set as the default in /boot/grub/menu.lst, and if so... reboot:

~]# reboot


Once you reboot... verify your raid controller again to see if this resolved your issue with the controller showing as degraded:

~]$ omreport storage controller
 Controller  PERC 6/i Integrated (Embedded)
 
Controllers
ID                                            : 0
Status                                        : Non-Critical
Name                                          : PERC 6/i Integrated
Slot ID                                       : Embedded
State                                         : Degraded
Firmware Version                              : 6.0.2-0002
Minimum Required Firmware Version             : 6.1.1-0034
Driver Version                                : 00.00.03.15-RH1
 
Minimum Required Driver Version               : 00.00.03.20

Note: this is currently a Redhat EL 5.2 box... the kernel update that brings megaraid_sas to a newer version is in EL 5.2.



In this case, I still need to manually compile the megaraid_sas update in... or upgrade to EL 5.3. For the sake of showing this upgrade, I'm not going to update to EL 5.3.


Build The megaraid_sas DKMS Module

Now that we have our proper/matching kernel, and kernel-devel packages installed and match our currently running kernel, we can proceed. The following steps outline building the DKMS module against or current kernel.

~]# dkms build -m megaraid_sas -v v00.00.03.21
Kernel preparation unnecessary for this kernel.  Skipping...
applying patch rhel5.patch...patching file megaraid_sas.c
patching file megaraid_sas.h
 
 
Building module:
cleaning build area....
make KERNELRELEASE=2.6.18-92.1.22.el5 -C /lib/modules/2.6.18-92.1.22.el5/build SUBDIRS=/var/lib/dkms/megaraid_sas/v00.00.03.21/build modules....
cleaning build area....
 
DKMS: build Completed.


Once built, we can examine it in dkms status:

~]# dkms status
megaraid_sas, v00.00.03.21, 2.6.18-92.1.22.el5, x86_64: built



The dkms megaraid_sas module is built, but not installed... we can verify that it isn't installed with the modinfo utility:

 ~]# modinfo megaraid_sas
filename:       /lib/modules/2.6.18-92.1.22.el5/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description:    LSI MegaRAID SAS Driver
author:         megaraidlinux@lsi.com
version:        00.00.03.15-RH1
license:        GPL
srcversion:     9E19F51B1AACEC3540858D4
alias:          pci:v00001028d00000015sv*sd*bc*sc*i*
alias:          pci:v00001000d00000413sv*sd*bc*sc*i*
alias:          pci:v00001000d00000060sv*sd*bc*sc*i*
alias:          pci:v00001000d00000411sv*sd*bc*sc*i*
depends:        scsi_mod
vermagic:       2.6.18-92.1.22.el5 SMP mod_unload gcc-4.1
parm:           poll_mode_io:Complete cmds from IO path, (default=0) (int)
module_sig:     883f35049393feaeac96a84dfee9f6112209d09e301eb6e8d0cd
3343beceda63b3ff608028cd56d809e2b309a719f4770d2bc06b
59e17068c18ae9c



The modinfo utility shows the currently running megaraid_sas module. Once we install the new dkms module, this should reference our new version of 00.00.03.21:

~]# dkms install -m megaraid_sas -v v00.00.03.21Running module version sanity check.
 
megaraid_sas.ko:
 - Original module
   - Found /lib/modules/2.6.18-92.1.22.el5/kernel/drivers/scsi/megaraid/megaraid_sas.ko
   - Storing in /var/lib/dkms/megaraid_sas/original_module/2.6.18-92.1.22.el5/x86_64/
   - Archiving for uninstallation purposes
 - Installation
   - Installing to /lib/modules/2.6.18-92.1.22.el5/extra/
Adding any weak-modules
 
depmod....
 
Saving old initrd as /boot/initrd-2.6.18-92.1.22.el5_old.img
Making new initrd as /boot/initrd-2.6.18-92.1.22.el5.img
(If next boot fails, revert to the _old initrd image)
mkinitrd.....
 
DKMS: install Completed.



And is it loaded??? .... wait for it.... wait for it...

~]# modinfo megaraid_sas
filename:       /lib/modules/2.6.18-92.1.22.el5/extra/megaraid_sas.ko
description:    LSI Logic MegaRAID SAS Driver
author:         megaraidlinux@lsi.com
version:        00.00.03.21
license:        GPL
srcversion:     FC21D12115DCD2FF4A2ABDD
alias:          pci:v00001028d00000015sv*sd*bc*sc*i*
alias:          pci:v00001000d00000413sv*sd*bc*sc*i*
alias:          pci:v00001000d0000007Csv*sd*bc*sc*i*
alias:          pci:v00001000d00000060sv*sd*bc*sc*i*
alias:          pci:v00001000d00000411sv*sd*bc*sc*i*
depends:        scsi_mod
vermagic:       2.6.18-92.1.22.el5 SMP mod_unload gcc-4.1
parm:           fast_load:megasas: Faster loading of the driver, skips physical devices!         (default=0) (int)
parm:           max_sectors:Maximum number of sectors per IO command (int)
parm:           cmd_per_lun:Maximum number of commands per logical unit (default=128) (int)
parm:           poll_mode_io:Complete cmds from IO path, (default=0) (int)


That's awesome, but did it resolve our raid controller issue? Not yet... because the same issue exists for the actual RAID Controller firmware also:

~]# omreport storage controller
 
Controller  PERC 6/i Integrated (Embedded)
 
Controllers
ID                                            : 0
Status                                        : Non-Critical
Name                                          : PERC 6/i Integrated
Slot ID                                       : Embedded
State                                         : Degraded
Firmware Version                              : 6.0.2-0002
Minimum Required Firmware Version             : 6.1.1-0034
Driver Version                                : 00.00.03.21
 
Minimum Required Driver Version               : Not Applicable



As you can see, the Driver Version is now fine but we still have firmware out of date. This is the easy upgrade... find your firmware .BIN file update on ftp.dell.com, and execute it. You can find the right files by looking up your Dell version, and finding the right support/downloads for it... mostly they are usually on ftp.dell.com somewhere. My file for this box is 'RAID_FRMW_LX_R201071.BIN' but thats as far as I can go into that.

Download the RAID Firmware Update to your box, and execute. Then reboot:

~]# wget http://ftp.dell.com/.../RAID_FRMW_LX_R201071.BIN
 
~]# bash RAID_FRMW_LX_R201071.BIN -q
 
Collecting inventory...
..
Running validation...
 
PERC 6/i Integrated Controller 0
 
The version of this Update Package is newer than the currently installed version.
Software application name: PERC 6/i Integrated Controller 0 Firmware
Package version: 6.1.1-0047
Installed version: 6.0.2-0002
 
 
Executing update...
WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER DELL PRODUCTS WHILE UPDATE IS IN PROGRESS.
THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!
..................................
The system should be restarted for the update to take effect.


If all went well, reboot.

~]# reboot



And now....

~]# omreport storage controller
 Controller  PERC 6/i Integrated (Embedded)
 
Controllers
ID                                            : 0
Status                                        : Ok
Name                                          : PERC 6/i Integrated
Slot ID                                       : Embedded
State                                         : Ready
Firmware Version                              : 6.1.1-0047
Minimum Required Firmware Version             : Not Applicable
Driver Version                                : 00.00.03.21



Boom!

Conclusion

This all went into a lot of detail for the different issues that might come up. In summary, the following needs to happen to resolve this type of 'degraded' state:

* Get the latest megaraid_sas kernel driver
* Install the kernel-devel package for the running kernel version.
* dkms build -m megaraid_sas -v v00.00.03.21
* dkms install -m megaraid_sas -v v00.00.03.21
* Upgrade to the latest RAID Controller Firmware
* Reboot
* Verify



As mentioned, if you can... use the versions of the Linux Kernel Driver provided by your OS Vendor.

Hope this helps.. please feel free to comment and provide any further information/road blocks/etc that might help other people.