Upgrading a RAID array in-flight

in #raid9 months ago (edited)

I have a home server I built several years ago. It used to be a mail server, but now it's mainly just a home file server. Still, it runs a three-drive RAID 5 array for safety. Recently, one of the drives failed and I decided it was time to replace the whole array (as it was old spinning disk drives and those are so early-2010s).

You'd think this would be easy, and it would be, if documentation for it were any good. The best I could find skipped over some rather important details, which I had to figure out from extensive Duck Duck Going and the friendly folks in the #ubuntu channel on irc.libera.chat. Much of this is derived from this tutorial, but with a lot more detail here.

So that I don't have to re-research all of this next time, and hopefully to help someone else in a similar situation, I'm going to document the whole process here in detail, with descriptions. This isn't quite a blow-by-blow, since I am not going to include all of my missteps along the way, but it's close.

Buckle up.

The setup

My old configuration was three 1 TB spinning disk drives, in RAID 5 configuration using Linux software RAID. It was set up by the Ubuntu installer somewhere on the order of 8 years ago, at least. For the past few months, the whole system has been extremely sluggish on anything involving disk IO. After some pondering and checking for things like hidden trojan infections, I concluded that one of the drives was dying. Given the age, I figured it was probably worth the time to port the whole thing over to larger SSD drives.

The system is running Ubuntu 22.04, although the hardware is quite old at this point.

The way software RAID works in Linux, each hard drive is divided into partitions (as always), and then a RAID array is defined that spans multiple partitions on (presumably) separate disks. Then the OS can mount the RAID array as a drive like any other drive.

On my system, I had /dev/sda, /dev/sdb, and /dev/sdc physical drives, each with a 1 partition that was 8 GB for swap (which is RAIDed), a 2 extended partition, and then the rest of the space in a 5 logical partition in the extended partition. (If that's all greek to you, hard drive partitioning is utterly weird and still based on what 386 computers could handle, leading to weirdness like primary/extended/logical partitions. Sorry. Welcome to computers.) The RAID arrays are /dev/md0 (the swap partitions) and /dev/md1 (mounted at /).

If you're not sure how your server is set up, the lsblk command will give an overview of what devices are defined and how they're configured, RAID-wise.

Note that all commands listed below are run as root; you could also use sudo for all of them if you prefer. Any time I refer to /dev/sdX, the X is for any of a, b, or c.

The diagnostics

The first step was to confirm my theory. To do that involved the smartctl utility, which on Debian-based distributions is included in the smartmontools package.

# apt-get install smartmontools

smartctl is a hard disk checking utility. (SMART is a drive-diagnostics standard.) Running smartctl -a /dev/sda shows the health status of that drive. It can also run diagnostics, including both "short" and long versions. I ran the following:

# smartctl -t short /dev/sda
# smartctl -t short /dev/sdb
# smartctl -t short /dev/sdc

The diagnostics run in the background. On sdb and sdc, they finished within seconds. On sda, it ran for a half hour and never actually finished. That confirmed my suspicion that the issue was a dying drive, which is exactly what RAID is there for.

The new hardware

My motherboard can handle up to four SATA devices, but first I needed to confirm what kind of SATA I have, since like everything else SATA comes in multiple versions. The most straightforward way I found (that someone in IRC told me) to determine that was:

# sudo dmesg | grep -i sata | grep 'link up'
[    3.019890] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    3.031035] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    3.042040] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

The 6.0 GB part indicates that I'm using SATA 3.0, since that's the version that handles that speed. A little searching of review sites (mainly Wirecutter) I settled on a Western Digital 2TB WD Blue SA510 SATA. Or, rather, three of them.

Serial problems

One important issue I ran into during the process is that every time I added or removed a drive, the sdX letters for each drive changed. Those assignments are given out in the order of the ports on the motherboard, not bound to the drive. Since everything is RAID there's no issue with data getting lost, just with me needing to keep track of what drive was which every time I booted. The way to find out is with this command:

# udevadm info --query=all --name=/dev/sda | grep ID_SERIAL
E: ID_SERIAL=WDC_WD10EZEX-00BN5A0_WD-WCC3F7UX7FFR
E: ID_SERIAL_SHORT=WD-WCC3F7UX7FFR

That tells me the serial number of whatever is connected to sda, and then I could match up the serial number with what's printed on the physical drive to know which is which. I won't repeat that step each time, but I did have to run that over again every time I changed the hardware configuration.

Fail safely

Software RAID is controlled by a command called mdadm, available in a package of the same name.

I'm not sure if it's "better" to physically install the new drive before or after removing the old one. I did it by adding it first, which is probably part of why I ended up with my sdX letters moving around on me so much. Other tutorials say to remove first, so pick your preference.

The status of the arrays can be checked at any time by examining the mdstat pseudo-device (because in Linux, half the diagnostics are available by reading from a device file).

# cat /proc/mdstat
[sudo] password for crell:
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md0 : active raid5 sdc1[2] sdb1[1] sdd1[3]
    15612928 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
     
md1 : active raid5 sdc5[2] sdb5[1] sdd5[3]
    1937634304 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

Which shows the two arrays.

Once I had the first SSD installed (it got assigned /dev/sdd), it was time to disable the old drive. That needs to be done for both RAID arrays (md0 and md1), as it's per-partition.

# mdadm --manage /dev/md0 --fail /dev/sda1
# mdadm --manage /dev/md1 --fail /dev/sda5

The first command tells the md0 array that its member partition sda1 is bad and it should feel bad, so stop using it. The second does the same for the md1 array's sda5 partition. Checking mdstat again shows an F next to each disabled partition.

Now that the partitions are marked "failed", and the array is avoiding using it, it's time to remove it from the array:

# mdadm --manage /dev/md0 --remove /dev/sda1
# mdadm --manage /dev/md1 --remove /dev/sda5

(You may need to run swapoff to disable swap for the swap partition to let you remove sda1. I did it, but I'm not sure if that's because I messed up some other things along the way. If you do, remember to run swapon when you're all done to re-enable it.)

Set up the new drive

I already had the first new drive installed, but if you don't, this is the time to physically install it. Then, it needs an identical partition table to the other drives. The easiest way to copy that over is with the sfdisk command, piped to itself:

# sfdisk -d /dev/sda | sfdisk /dev/sdd

You can use any of the old drives for the copying source, and depending on how you plugged in the drives will determine if the new drive is sdd or something else. Check the serial numbers (as above) to be sure.

Note that this will setup a 1 TB configuration on the new drive, even though it's a 2 TB drive. That's OK. It has to be, as all parts of a RAID 5 configuration have to be the same size. We'll be able to resize it when we're done.

Now, add the new drive's partitions to the RAID array:

# mdadm --manage /dev/md0 --add /dev/sdd1
# mdadm --manage /dev/md1 --add /dev/sdd5

(Again, your letters may vary.)

As soon as the partition is added, the RAID software controller will begin reshuffling data around to fill it up. This can take anywhere from a few seconds to a few hours, depending on how much data there is. Once again, checking mdstat will report on the progress.

# cat /proc/mdstat  
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]  
md0 : active raid5 sda1[4] sdc1[2] sdd1[3]
     15612928 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [U_U]
     [=============>.......]  recovery = 67.7% (5286012/7806464) finish=0.2min speed=167805K/sec
      
md1 : active raid5 sda5[4] sdc5[2] sdd5[3]
     1937634304 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [U_U]
       resync=DELAYED

(The above shows the md1 resync 2/3 of the way done, and the md1 resync waiting for it to finish.)

Again and again

That done, I checked and disk IO was nice and fast again, confirming that it was a hardware failure. I removed the dead drive and set it aside for later disposal.

Then, repeat the process for the other two old drives. It's identical, but if you're sloppy as I was about which drives went into which ports on the motherboard as I was then you'll need to recheck what serial number maps to which device each time.

Boot problems

Once I swapped out the last drive, the machine refused to boot. Strange. I put back in one of the old drives and it booted again, even though the booted configuration was using only the new SSD drives. I finally figured out (by attaching a monitor to the server and seeing the boot process go to the BIOS configuration) that the issue was that copying over the partitions did not copy over the Master Boot Record (MBR), so the computer couldn't find boot instructions on any of the available devices. That's because it scans each SATA connection in turn looking for a device with a working MBR, and wasn't finding one unless I had one of the old drives plugged into /dev/sdd. Oops.

Fortunately the solution was to just reinstall grub, the Linux boot loader ("GRand Unified Bootloader"), using the configuration it could already derive from my existing configuration. So with the old drive installed so I could boot, I ran

# dpkg-reconfigure grub-pc

It interactively asked which devices to install to. Just to be safe, I had it install to sda, sdb, and sdc, all three of the new drives. That way the computer can boot with any of them installed; the MBR is entirely independent of RAID and comes into play long before RAID is even loaded. (Note: grub will give you the option of installing to md, but will fail if you try.)

Now I was able to shut down, remove the old HDD, reboot, and it booted correctly from the new drives. Huzzah!

Growth mindset

The final step is to expand the RAID array configuration to use the extra 3 TB worth of space I have now. There's actually three different layers that have to be done in order here; fortunately, each one is very fast.

Grow the partitions

Linux has an annoying number of possible tools to use here, ranging from the minimalize growpart to the hyper-versatile parted. In my case, growpart was the easiest option as I only wanted to expand the existing partitions at the end of the disk, not move any partitions around. (Had I wanted to do that, I would have needed parted.) It's found in the cloud-guest-utils package, which makes no sense to me at all but then I'm not a distribution packager.

Fortunately, growpart has a dry-run mode. When I told it to expand sda5, it revealed that it would give me a larger sda5 than the sda2 it technically lives in. (As I said, partition design is ancient, complicated, and dumb.) Fortunately, expanding sda2 first is easy enough.

# growpart /dev/sda 2
# growpart /dev/sda 5
# growpart /dev/sdb 2
# growpart /dev/sdb 5
# growpart /dev/sdc 2
# growpart /dev/sdc 5

That expanded the partitions to fill all remaining space after them, which is what we wanted. (I'm not resizing the swap partitions.)

Grow the array

The next step is to tell the RAID array itself that it should be bigger. Fortunately, that's one quick command:

# mdadm --grow /dev/md1 --size max

Which tells md1 to grow to use all available space on its partitions.

Grow the file system

Finally, we need to expand the file system itself on the RAID array. That is, again, fortunately a simple command.

# resize2fs /dev/md1

Which, again, just tells the file system to grow as big as it can.

The result

All that finally done, I now have twice as much space available as I used to:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           782M  1.5M  781M   1% /run
/dev/md1        3.6T  1.1T  2.4T  32% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /run/qemu
tmpfs           782M  8.0K  782M   1% /run/user/1000

Yay! Remember, RAID 5 stores parity check versions of all data across the drives, so the available disk space you get is n-1 where n the capacity of one drive. That's why my 6 TB of disk space only gives 3.6 TB of usable space. Still plenty for now, and if I ever need more I still have space to add one more drive. And since I've actually written down all these instructions, I'll even know how to do it! Yay!

I hope this was helpful to someone else as well.

Sort:  

PIZZA!
The Hive.Pizza team manually curated this post.

You can now send $PIZZA tips in Discord via tip.cc!

Tim Düsterhus on Mastodon noted that if the drive is not-quite-dead, as in this case, it's a bit more efficient to do a direct replacement, as per this post.