Redundant Array of Independent Disks (RAID) is a utility that mitigates data loss on a server by replicating data across two or more disks.
This guide shows how to remove a failed or corrupted hard drive from a Linux RAID1 array (software RAID), and how to add a new hard disk to the RAID1 array without losing data.
1 Preliminary Note
In this example I have two hard drives, /dev/sda and /dev/sdb, with the partitions /dev/sda1 and /dev/sda2 as well as /dev/sdb1 and /dev/sdb2.
/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0.
/dev/sda2 and /dev/sdb2 make up the RAID1 array /dev/md1.
/dev/sdb has failed, and we want to replace it.
2 How Do I Tell If A Hard Disk Has Failed?
If a disk has failed, you will probably find a lot of error messages in the log files, e.g. /var/log/messages or /var/log/syslog.
You can also run
and instead of the string [UU] you will see [U_] if you have a degraded RAID1 array.
3 Removing The Failed Disk
To remove /dev/sdb, we will mark /dev/sdb1 and /dev/sdb2 as failed and remove them from their respective RAID arrays (/dev/md0 and /dev/md1).
First we mark /dev/sdb1 as failed:
The output of
should look like this:
Then we remove /dev/sdb1 from /dev/md0:
The output should be like this:
And
should show this:
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]
unused devices: <none>
This guide shows how to remove a failed or corrupted hard drive from a Linux RAID1 array (software RAID), and how to add a new hard disk to the RAID1 array without losing data.
1 Preliminary Note
In this example I have two hard drives, /dev/sda and /dev/sdb, with the partitions /dev/sda1 and /dev/sda2 as well as /dev/sdb1 and /dev/sdb2.
/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0.
/dev/sda2 and /dev/sdb2 make up the RAID1 array /dev/md1.
Code:
/dev/sda1 + /dev/sdb1 = /dev/md0
Code:
/dev/sda2 + /dev/sdb2 = /dev/md1
2 How Do I Tell If A Hard Disk Has Failed?
If a disk has failed, you will probably find a lot of error messages in the log files, e.g. /var/log/messages or /var/log/syslog.
You can also run
Code:
cat /proc/mdstat
3 Removing The Failed Disk
To remove /dev/sdb, we will mark /dev/sdb1 and /dev/sdb2 as failed and remove them from their respective RAID arrays (/dev/md0 and /dev/md1).
First we mark /dev/sdb1 as failed:
Code:
mdadm --manage /dev/md0 --fail /dev/sdb1
Code:
cat /proc/mdstat
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[2](F)
24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]
unused devices: <none>
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[2](F)
24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]
unused devices: <none>
Then we remove /dev/sdb1 from /dev/md0:
Code:
mdadm --manage /dev/md0 --remove /dev/sdb1
server1:~# mdadm --manage /dev/md0 --remove /dev/sdb1 mdadm: hot removed /dev/sdb1
Code:
cat /proc/mdstat
should show this:
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]
unused devices: <none>