Difference between revisions of "Xcat:xcat replace failed mdadm drive"

From Define Wiki
Jump to navigation Jump to search
(Created page with "== Check for failed drives == If any one disk fails in the RAID1 array, do not panic. Follow the procedure listed below to replace the failed disk. Faulty disks should appear...")
 
(Blanked the page)
Tag: Blanking
 
Line 1: Line 1:
== Check for failed drives ==
 
If any one disk fails in the RAID1 array, do not panic. Follow the procedure listed below to replace the failed disk.
 
  
Faulty disks should appear marked with an (F) if you look at /proc/mdstat:
 
 
<pre>
 
# cat /proc/mdstat
 
Personalities : [raid1]
 
md2 : active raid1 dm-11[0](F) dm-6[1]
 
      291703676 blocks super 1.1 [2/1] [_U]
 
      bitmap: 1/1 pages [64KB], 65536KB chunk
 
 
md1 : active raid1 dm-8[0](F) dm-3[1]
 
      1048568 blocks super 1.1 [2/1] [_U]
 
 
md0 : active raid1 dm-9[0](F) dm-4[1]
 
      204788 blocks super 1.0 [2/1] [_U]
 
 
unused devices: <none>
 
</pre>
 
 
We can see that the first disk is broken because all the RAID partitions on this disk are marked as (F).
 
 
== Remove the failed disk from RAID array ==
 
 
mdadm is the command that can be used to query and manage the RAID arrays on Linux. To remove the failed disk from RAID array, use the command:
 
 
<pre>
 
mdadm --manage /dev/mdx --remove /dev/xxx
 
</pre>
 
 
Where the /dev/mdx are the RAID partitions listed in /proc/mdstat file, such as md0, md1 and md2; the /dev/xxx are the backend devices like dm-11, dm-8 and dm-9 in the multipath configuration and sda5, sda3 and sda2 in the non-multipath configuration.
 
 
Here is the example of removing failed disk from the RAID1 array in the non-multipath configuration:
 
<pre>
 
mdadm --manage /dev/md0 --remove /dev/sda3
 
mdadm --manage /dev/md1 --remove /dev/sda2
 
mdadm --manage /dev/md2 --remove /dev/sda5
 
</pre>
 
 
After the failed disk is removed from the RAID1 array, the partitions on the failed disk will be removed from /proc/mdstat and the mdadm --detail output also.
 
 
<pre>
 
# cat /proc/mdstat
 
Personalities : [raid1]
 
md2 : active raid1 dm-6[1]
 
      291703676 blocks super 1.1 [2/1] [_U]
 
      bitmap: 1/1 pages [64KB], 65536KB chunk
 
 
md1 : active raid1 dm-3[1]
 
      1048568 blocks super 1.1 [2/1] [_U]
 
 
md0 : active raid1 dm-4[1]
 
      204788 blocks super 1.0 [2/1] [_U]
 
 
unused devices: <none>
 
</pre>
 
 
<pre>
 
# mdadm --detail /dev/md0
 
/dev/md0:
 
        Version : 1.0
 
  Creation Time : Tue Jul 19 02:39:03 2011
 
    Raid Level : raid1
 
    Array Size : 204788 (200.02 MiB 209.70 MB)
 
  Used Dev Size : 204788 (200.02 MiB 209.70 MB)
 
  Raid Devices : 2
 
  Total Devices : 1
 
    Persistence : Superblock is persistent
 
 
    Update Time : Wed Jul 20 02:00:04 2011
 
          State : clean, degraded
 
Active Devices : 1
 
Working Devices : 1
 
Failed Devices : 0
 
  Spare Devices : 0
 
 
          Name : c250f17c01ap01:0  (local to host c250f17c01ap01)
 
          UUID : eba4d8ad:8f08f231:3c60e20f:1f929144
 
        Events : 26
 
 
    Number  Major  Minor  RaidDevice State
 
      0      0        0        0      removed
 
      1    253        4        1      active sync  /dev/dm-4
 
</pre>
 
 
== Replace the disk ==
 
The first thing we must do now is to create the exact same partitioning as on the new disk. We can do this with one simple command:
 
 
<pre>
 
sfdisk -d /dev/<good_disk> | sfdisk /dev/<new_disk>
 
</pre>
 
 
If you got error message “sfdisk: I don’t like these partitions - nothing changed.”, you can add --force option to the sfdisk command:
 
 
<pre>
 
sfdisk -d /dev/sdb | sfdisk /dev/sda --force
 
</pre>
 
 
Check the new disk partition layout
 
<pre>
 
fdisk -l /dev/<new_disk>
 
</pre>
 
 
== Add the new disk into the RAID1 array ==
 
 
After the partitions are created on the new disk, you can use command:
 
 
<pre>
 
mdadm --manage /dev/mdx --add /dev/xxx
 
<pre>
 
 
While the RAID1 array is reconstructing, you will see some progress information in /proc/mdstat:
 
 
<pre>
 
# cat /proc/mdstat
 
Personalities : [raid1]
 
md2 : active raid1 dm-11[0] dm-6[1]
 
      291703676 blocks super 1.1 [2/1] [_U]
 
      [>....................]  recovery =  0.7% (2103744/291703676) finish=86.2min speed=55960K/sec
 
      bitmap: 1/1 pages [64KB], 65536KB chunk
 
 
md1 : active raid1 dm-8[0] dm-3[1]
 
      1048568 blocks super 1.1 [2/1] [_U]
 
      [=============>.......]  recovery = 65.1% (683904/1048568) finish=0.1min speed=48850K/sec
 
 
md0 : active raid1 dm-9[0] dm-4[1]
 
      204788 blocks super 1.0 [2/1] [_U]
 
      [===================>.]  recovery = 96.5% (198016/204788) finish=0.0min speed=14144K/sec
 
 
unused devices: <none>
 
</pre>
 
 
== Make the new disk bootable ==
 
 
If the new disk does not have a PReP partition or the PReP partition has some problem, it will not be bootable, here is an example on how to make the new disk bootable, you may need to substitute the device name with your own values.
 
 
<pre>
 
[RHEL]:
 
# run on the node you replaced the drive in
 
mkofboot .b /dev/sda
 
bootlist -m normal sda sdb
 
</pre>
 

Latest revision as of 12:54, 11 December 2020