This is an automatically generated mail message from mdadm
running on vs0
A DegradedArray event had been detected on md device /dev/md/0.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1]
md1 : active raid1 sda2[2]
976236408 blocks super 1.2 [2/1] [U_]
md0 : active raid1 sda1[2]
523252 blocks super 1.2 [2/1] [U_]
unused devices:
Ou seja, /proc/mdstat indica que ambos os volumes estão "degraded":
[U_] - indica que uma drive está OK U e outra degradada _ (em caso de falha seria apresentado F)
Mais informações:
mdadm --detail /dev/md0
Raid Level : raid1
Array Size : 523252 (511.07 MiB 535.81 MB)
Used Dev Size : 523252 (511.07 MiB 535.81 MB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Thu Nov 27 09:38:27 2014
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : vs0:0 (local to host vs0)
UUID : b053cdfd:6e0a2b48:f10dcc45:23fdc34a
Events : 185
Number Major Minor RaidDevice State
2 8 1 0 active sync /dev/sda1
1 0 0 1 removed
Assim, importa perceber o que se passa com a segunda unidade do array: /dev/sdb.
Utilizar as ferramentas de SMART
A ferramenta smartctl permite obter diversas informações de discos com SMART e efeturar testes.
Para verificar a saúde de uma unidade:
smartctl -H /dev/sdbsmartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-33-pve] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Teste rápido da unidade:
smartctl --test=short /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-33-pve] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Thu Nov 27 10:28:19 2014
Use smartctl -X to abort test.
A unidade fica a efetuar o teste rápido autonomamente (cerca de 1 minuto).
Para ver o histórico dos testes à drive:
smartctl -a /dev/sdb
(...)
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 18225 -
(...)
Teste longo da unidade:
smartctl --test=long /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-33-pve] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 164 minutes for test to complete.
Test will complete after Thu Nov 27 13:15:45 2014
A unidade fica a efetuar o teste rápido autonomamente (cerca de 1 minuto).
Para ver o histórico dos testes à drive:
smartctl -a /dev/sdb
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 18228 -
# 2 Short offline Completed without error 00% 18225 -
Se o disco está OK, vamos reconstruir o array.
Reconstruir o Array
Se /dev/sdb1 e /dev/sdb2 ainda estiverem no array é possível provocar a falha:
mdadm /dev/md0 -f /dev/sdb1
mdadm /dev/md0 -f /dev/sdb2
E depois removê-los do array:
mdadm /dev/md0 -r /dev/sdb1
mdadm /dev/md0 -r /dev/sdb2
Como já estavam removidos do array, é possível colocar a zero o superbloco de cada uma das partições:
mdadm --zero-superblock /dev/sdb1
mdadm --zero-superblock /dev/sdb2
Para os voltar a adicionar ao array:
mdadm --manage -a /dev/md0 /dev/sdb1
mdadm --manage -a /dev/md1 /dev/sdb2
Para ver a evolução da reconstrução do array:
cat /proc/mdstat Personalities : [raid1]
md1 : active raid1 sdb2[3] sda2[2]
976236408 blocks super 1.2 [2/1] [U_]
[>....................] recovery = 0.2% (2303296/976236408) finish=169.1min speed=95970K/sec
md0 : active raid1 sdb1[3] sda1[2]
523252 blocks super 1.2 [2/2] [UU]
unused devices:
A reconstrução do volume /dev/md0 é muito rápida porque é uma partição pequena.
Já a reconstrução do volumae /dev/md1 é mais demorada.
Após a reconstrução com sucesso é possível observar o estado do array:
cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[3] sda2[2]
976236408 blocks super 1.2 [2/2] [UU]
md0 : active raid1 sdb1[3] sda1[2]
523252 blocks super 1.2 [2/2] [UU]
unused devices:
Referências utilizadas:
[1] - Marius Ducea @ MDLog:/sysadmin - Mdadm Cheat Sheethttp://www.ducea.com/2009/03/08/mdadm-cheat-sheet/
[2] - Thomas Niedermeier @ Thomas Krenn - Mdadm recover degraded array
http://www.thomas-krenn.com/en/wiki/Mdadm_recover_degraded_Array
[3] - Vincent Danen @ TechRepublic - Using smartctl to get SMART status information on your hard drives
http://www.techrepublic.com/blog/linux-and-open-source/using-smartctl-to-get-smart-status-information-on-your-hard-drives/
[4] - fjgaude @ ubuntu forums - [SOLVED] Repair Degraded Raid 5 w/ mdadm
http://ubuntuforums.org/showthread.php?t=1293780
bem explicado, muito obrigado!
ResponderEliminar