Hot Swap Raid 10 Drive with no Reboot on Dell Poweredge R530 with PERC H730P

Dell’s documentation on the R530 and the PERC H730P leaves a lot to be desired, especially when it comes to the contradictory nature of the information on hand. This Dell community post from 2019 shows just how confusing it can be as a result. In it, the OP laments that they are “Still in disbelief, with how much money I paid for this Gen-12 server, that I have to reboot to take a hot-swap drive offline in preparation for replacement!” Whether that was true of the idrac firmware installable at the time is unclear, but that’s not the only area of confusion as this belief carries on in this community post from 2021, in which a DellEMC employee states “If it is iDRAC8 then you can do it from iDRAC but need a system reboot. You can also do it from PERC configuration utility during boot.” And, just to add in injury to insult, this Dell EMC PowerEdge RAID Controller 9 User’s Guide H330, H730, and H830 article only defines Hot Swapping and tells you it is only possible if the controller supports it and the drives match.

This spec sheet for the R530 from Dell lists a variety of different RAID controller options available for the R530, but says nothing about whether any of the controllers support hot swap.  This continues on this Dell spec sheet for the H730P which still mentions nothing about hot swap, mentioning only hot spare. In several places online I found mention of the R530 supporting Hot Plug, rathe than Hot Swap (though it does point readers to the owners manual).

Is there any hope?

The Dell PowerEdge R530 owners manual is the first place we really see any hope that the server does, in fact, support hot swap, both in the “Front Panel Features” and in the “Installing a hot-swap hard drive” sections.  Okay. That’s great. Now what? Dell lays out one option in this article titled “PowerEdge HDD: How to physically replace an HDD (Hot Swap procedure).” Finally!

By this point, it would be easy to rush into the replacement process. But there are pre-requisites that could be easily missed in a rush, and one option to ensure those are completed are laid out in this article titled “Dell PowerEdge: How to switch offline a hard disk using OpenManage Server Administrator

Can this be done Using racadm?

Yes, and no.

On a Windows Server, using Admin permissions, you can run the following :

  • racadm storage get pdisks
    • The results will look something like this :
      • Disk.Bay.0:Enclosure.Internal-0-1:RAID.Integrated.1-1
    • Copy and paste the entire line for the drive that is in imminent failure.
  • racadm raid forceoffline:Disk.Bay.#.Enclosure.Internal.#-#:RAID.Integrated.#-#
    • Remember, everything after forceoffline: will be copied and pasted from the appropriate line from the get pdisks command.
  • If all goes well, the results will include something along the lines of :
    • STOR094 : The storage configuration operation is successfully completed and the change is in pending state.
      • Those last few words are important : change is in pending state. Now, read further on :
      • To apply the configuration operation immediately, create a configuration job using the –realtime option.
  • racadm jobqueue create RAID.Integrated.#-# -s TIME_NOW –realtime
    • Note that the #-# portion is simply the text from :RAID in the racadm storage get pdisks line.
    • If the operation is successful, it will read as follows :
      • RAC1024: Successfully scheduled a job.
        Verify the job status using “racadm jobqueue view -i JID_XXXX” command.
        Commit JID = JID_#########
    • racadm jobqueue view -i JID_#########
    • Continue re-running this command until the Percent Complete=[100]

Technically, the job is done and you can use several different options to view the state of the disks and confirm that the disk is offline. Once replaced, the array should detect it is in a degraded state and begin rebuilding with the new drive. But, how do you know? This is where racadm fails us. As far as I know, there is no way to query the state of the rebuild with racadm. But you can do it with omreport :

  • omreport storage pdisk controler=#
    • Look for the State, and the Progress to confirm that it is rebuilding, and then you can monitor the % complete.

And that’s it! A successful hot swap of a hard drive in a RAID array with no reboot on a Dell PowerEdge R530 with a PERC H730P.