Ben's

디스크 hotswap 기능의 진실 본문

리눅스

디스크 hotswap 기능의 진실

Ben Ko (SINCE 2013) 2013. 1. 21. 16:04
728x90

간혹 백업 디스크를 hotswap 으로 교체할때, 새로 넣은 디스크가 sdb 가 아닌 sdc로 인식되어
불가피하게 야간에 리붓을 하여 바로 잡는 경우가 있습니다.

이때 리붓없이 장치명을 바로 잡는 법이 있습니다.

오늘 실제로 있었던 예를 들어서 설명 드리겠습니다 nhkotest2-030 인데요

백업 디스크 확장을 위해 hotswap을 이용하여 용량이 작은 디스크를 제거하고 500G짜리 디스크를 삽입하였습니다
그런데 헐~ sdc 로 잡힙니다.

***********************************************************************************************

★ [원인] => 정상적으로 제거 프로세스가 진행되기 전에 새로운 디스크가 삽입되어 OS에서 새로운 장치명을 할당한 걸로 추정됩니다
                   (제거후 최소 30초~1분정도는 지나야 OS에서 디스크 제거 프로세스가 끝나는걸로 보입니다.)

## 정상(디스크 제거 로그)

Apr 25 14:57:38 nhkotest1-002 kernel: ata2: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen
Apr 25 14:57:38 nhkotest1-002 kernel: ata2: irq_stat 0x00400040, connection status changed
Apr 25 14:57:38 nhkotest1-002 kernel: ata2: SError: { HostInt PHYRdyChg 10B8B DevExch }
Apr 25 14:57:38 nhkotest1-002 kernel: ata2: hard resetting link
Apr 25 14:57:39 nhkotest1-002 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Apr 25 14:57:39 nhkotest1-002 kernel: ata2: failed to recover some devices, retrying in 5 secs
Apr 25 14:57:44 nhkotest1-002 kernel: ata2: hard resetting link
Apr 25 14:57:44 nhkotest1-002 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Apr 25 14:57:44 nhkotest1-002 kernel: ata2: failed to recover some devices, retrying in 5 secs
Apr 25 14:57:49 nhkotest1-002 kernel: ata2: hard resetting link
Apr 25 14:57:50 nhkotest1-002 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Apr 25 14:57:50 nhkotest1-002 kernel: ata2.00: disabled
Apr 25 14:57:50 nhkotest1-002 kernel: ata2: EH complete
Apr 25 14:57:50 nhkotest1-002 kernel: ata2.00: detaching (SCSI 1:0:0:0)
Apr 25 14:57:50 nhkotest1-002 kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
Apr 25 14:57:50 nhkotest1-002 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 25 14:57:50 nhkotest1-002 kernel: sd 1:0:0:0: [sdb] Stopping disk
Apr 25 14:57:50 nhkotest1-002 kernel: sd 1:0:0:0: [sdb] START_STOP FAILED
Apr 25 14:57:50 nhkotest1-002 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK

## 비정상(디스크 제거 로그)

Apr 25 15:05:28 nhkotest2-030 kernel: ata2: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen
Apr 25 15:05:28 nhkotest2-030 kernel: ata2: irq_stat 0x00400040, connection status changed
Apr 25 15:05:28 nhkotest2-030 kernel: ata2: SError: { HostInt PHYRdyChg 10B8B DevExch }
Apr 25 15:05:28 nhkotest2-030 kernel: ata2: hard resetting link
Apr 25 15:05:29 nhkotest2-030 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Apr 25 15:05:29 nhkotest2-030 kernel: ata2: failed to recover some devices, retrying in 5 secs
Apr 25 15:05:34 nhkotest2-030 kernel: ata2: hard resetting link
Apr 25 15:05:40 nhkotest2-030 kernel: ata2: port is slow to respond, please be patient (Status 0x80)
Apr 25 15:05:44 nhkotest2-030 kernel: ata2: COMRESET failed (errno=-16)
Apr 25 15:05:44 nhkotest2-030 kernel: ata2: hard resetting link
Apr 25 15:05:45 nhkotest2-030 kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 25 15:05:45 nhkotest2-030 kernel: ata2.00: model number mismatch 'ST3250318AS' != 'WDC WD5003ABYX-18WERA0'
Apr 25 15:05:45 nhkotest2-030 kernel: ata2.00: revalidation failed (errno=-19)
Apr 25 15:05:45 nhkotest2-030 kernel: ata2: failed to recover some devices, retrying in 5 secs
Apr 25 15:05:50 nhkotest2-030 kernel: ata2: hard resetting link
Apr 25 15:05:55 nhkotest2-030 kernel: ata2: port is slow to respond, please be patient (Status 0x80)
Apr 25 15:06:00 nhkotest2-030 kernel: ata2: COMRESET failed (errno=-16)
Apr 25 15:06:00 nhkotest2-030 kernel: ata2: hard resetting link
Apr 25 15:06:00 nhkotest2-030 kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 25 15:06:00 nhkotest2-030 kernel: ata2.00: model number mismatch 'ST3250318AS' != 'WDC WD5003ABYX-18WERA0'
Apr 25 15:06:00 nhkotest2-030 kernel: ata2.00: revalidation failed (errno=-19)
Apr 25 15:06:00 nhkotest2-030 kernel: ata2.00: disabled
Apr 25 15:06:01 nhkotest2-030 kernel: ata2: soft resetting link
Apr 25 15:06:01 nhkotest2-030 kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 25 15:06:01 nhkotest2-030 kernel: ata2.00: ATA-8: WDC WD5003ABYX-18WERA0, 01.01S02, max UDMA/133
Apr 25 15:06:01 nhkotest2-030 kernel: ata2.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 25 15:06:01 nhkotest2-030 kernel: ata2.00: configured for UDMA/133
Apr 25 15:06:01 nhkotest2-030 kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
Apr 25 15:06:01 nhkotest2-030 kernel: ata2: irq_stat 0x00400040, connection status changed
Apr 25 15:06:01 nhkotest2-030 kernel: ata2.00: configured for UDMA/133
Apr 25 15:06:01 nhkotest2-030 kernel: ata2: EH complete
Apr 25 15:06:01 nhkotest2-030 kernel: ata2.00: detaching (SCSI 1:0:0:0)
Apr 25 15:06:01 nhkotest2-030 kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
Apr 25 15:06:01 nhkotest2-030 kernel: sd 1:0:0:0: [sdb] Stopping disk ==> 제거 프로세스 진행중...

Apr 25 15:06:02 nhkotest2-030 kernel: scsi 1:0:0:0: Direct-Access     ATA      WDC WD5003ABYX-1 01.0 PQ: 0 ANSI: 5 ==> 새로운 디스크 삽입됨...
Apr 25 15:06:02 nhkotest2-030 kernel: sd 1:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
Apr 25 15:06:02 nhkotest2-030 kernel: sd 1:0:0:0: [sdc] Write Protect is off
Apr 25 15:06:02 nhkotest2-030 kernel: sd 1:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 25 15:06:02 nhkotest2-030 kernel: sd 1:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
Apr 25 15:06:02 nhkotest2-030 kernel: sd 1:0:0:0: [sdc] Write Protect is off
Apr 25 15:06:02 nhkotest2-030 kernel: sd 1:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 25 15:06:02 nhkotest2-030 kernel:  sdc: unknown partition table
Apr 25 15:06:02 nhkotest2-030 kernel: sd 1:0:0:0: [sdc] Attached SCSI disk
Apr 25 15:06:02 nhkotest2-030 kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0

***********************************************************************************************

★ [해결 방법]

1. 교체 대상 디스크 제거

2. rescan 할 대상 host 를 확인합니다

[root@nhkotest2-030 0:0:0:0]# ll
total 0
lrwxrwxrwx  1 root root    0 Apr 25 22:49 device -> ../../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/
lrwxrwxrwx  1 root root    0 Apr 25 22:49 subsystem -> ../../../class/scsi_device/
--w-------  1 root root 4096 Apr 25 22:49 uevent

[root@nhkotest2-030 0:0:0:0]# pwd
/sys/class/scsi_device/0:0:0:0

===> 현재 master disk 는 devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0 이거라고 하네요. 그럼 host0은 절대 건들면 안되겠죠? ^^

3. 깔끔하게 아래 커맨드를 날려줍니다(단, 상황에 따라 host number는 달라질수 있습니다, 여기서는 master disk가 host0(sda) 이니까 대상은 host1(sdb)이 되겠죠?)
    => 아래 커맨드로 host1을 rescan 하여 장치가 없다는걸 OS 쪽에 명시적으로 인식시켜 줍니다.

echo "- - -" > /sys/class/scsi_host/host1/scan

이때 dmesg를 보니 뭔가 남아 있는게 깔끔하게 정리된 느낌이 납니다..

Apr 25 22:35:16 nhkotest2-030 kernel: ata2: soft resetting link
Apr 25 22:35:16 nhkotest2-030 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Apr 25 22:35:16 nhkotest2-030 kernel: ata2: EH complete

4. 새 디스크를 삽입하면 sdb 로 정상적으로 인식합니다.

Apr 25 22:35:55 nhkotest2-030 kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xa frozen
Apr 25 22:35:55 nhkotest2-030 kernel: ata2: irq_stat 0x00400040, connection status changed
Apr 25 22:35:55 nhkotest2-030 kernel: ata2: SError: { RecovComm PHYRdyChg CommWake DevExch }
Apr 25 22:35:56 nhkotest2-030 kernel: ata2: soft resetting link
Apr 25 22:36:01 nhkotest2-030 kernel: ata2: port is slow to respond, please be patient (Status 0x80)
Apr 25 22:36:03 nhkotest2-030 kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 25 22:36:03 nhkotest2-030 kernel: ata2.00: ATA-8: WDC WD5003ABYX-18WERA0, 01.01S02, max UDMA/133
Apr 25 22:36:03 nhkotest2-030 kernel: ata2.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 25 22:36:03 nhkotest2-030 kernel: ata2.00: configured for UDMA/133
Apr 25 22:36:03 nhkotest2-030 kernel: ata2: EH complete
Apr 25 22:36:03 nhkotest2-030 kernel: scsi 1:0:0:0: Direct-Access     ATA      WDC WD5003ABYX-1 01.0 PQ: 0 ANSI: 5
Apr 25 22:36:03 nhkotest2-030 kernel: sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Apr 25 22:36:03 nhkotest2-030 kernel: sd 1:0:0:0: [sdb] Write Protect is off
Apr 25 22:36:03 nhkotest2-030 kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 25 22:36:03 nhkotest2-030 kernel: sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Apr 25 22:36:03 nhkotest2-030 kernel: sd 1:0:0:0: [sdb] Write Protect is off
Apr 25 22:36:03 nhkotest2-030 kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 25 22:36:03 nhkotest2-030 kernel:  sdb: unknown partition table
Apr 25 22:36:03 nhkotest2-030 kernel: sd 1:0:0:0: [sdb] Attached SCSI disk
Apr 25 22:36:03 nhkotest2-030 kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
***********************************************************************************************

'리눅스' 카테고리의 다른 글

프로세스당 버추얼메모리 사이즈 제한  (0) 2013.01.21
conntrack-tools project  (0) 2013.01.21
tcping-0.1(패킷 로스 체크)  (0) 2013.01.21
powernow-k8 off 방법 및 CPU 개별 off 방법  (0) 2013.01.21
disable CPU cores in linux  (0) 2013.01.21