Down the Docks

November 6, 2006

More RAID Pain

Filed under: Uncategorized — ealing @ 1:17 pm

Once it’s working, wouldn’t it be nice if it stayed working?

The RAID array fell over again. At first, I thought it might be samba causing the network access problem, but restarting that didn’t achieve anything. Then I ran top on the server box, to see if anything had spun out of control, but everything was calm – no processes running at over 0.2% of CPU. Then I thought I’d check the syslog, which is where I started to find the problems.

Nov  5 03:09:57 chino kernel: ata4: command 0x35 timeout, stat 0x50 host_stat 0x24

I looked up the syntax of some mdadm commands, and ran this, which I hoped would give more info:

mdadm --detail /dev/md0

That one froze the terminal, and so did this:

cat /proc/mdstat

and so did this:

mdadm --stop /dev/md0

At this point I’d given up on a nice easy fix, and I tried to reboot the machine. It wouldn’t restart, though. Something was preventing it from shutting down. I’m not a big fan of the power switch reboot, so I tried to kill the relevant md processes. However, one process:

root     17110     6  0 Sep12 ?        00:24:39 [md0_raid5]

wouldn’t die, even when hit with a kill -9, and shutdown -h wouldn’t work. So power switch it was.

After restart, I looked through the dmesg output, and found this:

SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sda: drive cache: write back
sda: sda1
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sdb: drive cache: write through
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sdb: drive cache: write through
sdb: sdb1
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdc: drive cache: write back
SCSI device sdc: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdc: drive cache: write back
sdc: sdc1
sd 2:0:0:0: Attached scsi disk sdc
SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdd: drive cache: write back
SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdd: drive cache: write back
sdd: sdd1
sd 3:0:0:0: Attached scsi disk sdd
md: md0 stopped.
md: bind<sdc1>
md: bind<sda1>
md: bind<sdb1>
md: bind<sdd1>
md: kicking non-fresh sdc1 from array!
md: unbind<sdc1>
md: export_rdev(sdc1)
raid5: device sdd1 operational as raid disk 1
raid5: device sdb1 operational as raid disk 3
raid5: device sda1 operational as raid disk 2
raid5: allocated 4202kB for md0
raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2
RAID5 conf printout:
--- rd:4 wd:3 fd:1
disk 1, o:1, dev:sdd1
disk 2, o:1, dev:sda1
disk 3, o:1, dev:sdb1
kjournald starting.  Commit interval 5 seconds

One line in particular stood out:

md: kicking non-fresh sdc1 from array!

A quick Google search reveals that this same message came up during my last RAID failure. It looks like a hardware problem. For now, I’ve just added the disk back in and rebuilt the array:

e2fsck -n -f -v  /dev/md0

followed by:

mdadm -a /dev/md0 /dev/sdc1

In fairness, this is exactly what RAID is designed for. I’ve had hardware failure, but my data is safe because of the redundancy of RAID5. On the other hand, I will have to get that drive replaced. It’s less than a year old, so I’d expected it to last a bit longer.

Advertisements

2 Comments »

  1. I recently heard that the massive density of modern drives makes them less reliable. Apparently smaller drives of 150MB or so are more reliable.

    Comment by Iain — November 6, 2006 @ 3:30 pm

  2. Well, the more data you put on a single drive, the less reliable it is per megabyte, ceteris paribus.
    Another factor favouring smaller disks is performance. In a RAID system with a high thoughpur requirement, larger disks means fewer disk heads per megabyte, which means slower throughput.

    Comment by ealing — November 8, 2006 @ 1:34 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: