Down the Docks

September 10, 2006

RAID Failure

Filed under: Technology — ealing @ 12:50 am

My fileserver keeled over, and I’m still trying to put it right.

The first thing I noticed was that I couldn’t access my remote drives, and when I tried to connect over SSH with PuTTY, I couldn’t get to a command prompt. Before I hauled a monitor and keyboard over to the headless machine, I did a hard reset on it from the case. This may have been a mistake, and in any case didn’t allow me to start logging in remotely.

I set up a monitor and keyboard on the machine, and turned it on again. It reported a problem while mounting and fscking disks. I bypassed it to let it complete booting, then ran dmesg to find out more about the problem. The log contained this:

SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sda: drive cache: write back
sda: sda1
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sdb: drive cache: write through
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sdb: drive cache: write through
sdb: sdb1
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdc: drive cache: write back
SCSI device sdc: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdc: drive cache: write back
sdc: sdc1
sd 2:0:0:0: Attached scsi disk sdc
SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdd: drive cache: write back
SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdd: drive cache: write back
sdd: sdd1
sd 3:0:0:0: Attached scsi disk sdd
md: md0 stopped.
md: bind<sdc1>
md: bind<sda1>
md: bind<sdb1>
md: bind<sdd1>
md: kicking non-fresh sdc1 from array!
md: unbind<sdc1>
md: export_rdev(sdc1)
md: md0: raid array is not clean -- starting background reconstruction
raid5: device sdd1 operational as raid disk 1
raid5: device sdb1 operational as raid disk 3
raid5: device sda1 operational as raid disk 2
raid5: cannot start dirty degraded array for md0
RAID5 conf printout:
--- rd:4 wd:3 fd:1
disk 1, o:1, dev:sdd1
disk 2, o:1, dev:sda1
disk 3, o:1, dev:sdb1
raid5: failed to run raid set md0
md: pers->run() failed ...

With little expectation of success, I tried this:

chino:/var/log# mount -a
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or other error
(could this be the IDE device where you in fact use
ide-scsi so that sr0 or sda or so is needed?)
In some cases useful info is found in syslog - try
dmesg | tail  or so

No joy there, so time to reach for the diagnostics:

chino:/var/log# mdadm -Q /dev/sdc1
/dev/sdc1: is not an md array
/dev/sdc1: device 0 in 4 device mismatch raid5 md0.  Use mdadm --examine for more detail.
chino:/var/log# mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : ebdfaedf:59e64777:d81d8f6e:8d6b0392
Creation Time : Tue Jan 31 21:45:55 2006
Raid Level : raid5
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Update Time : Fri Aug 25 10:00:19 2006
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : ebd9e059 - correct
Events : 0.150982Layout : left-symmetric
Chunk Size : 64K

Number   Major   Minor   RaidDevice State
this     0       8       33        0      active sync   /dev/sdc1
0     0       8       33        0      active sync   /dev/sdc1
1     1       8       49        1      active sync   /dev/sdd1
2     2       8        1        2      active sync   /dev/sda1
3     3       8       17        3      active sync   /dev/sdb1

And the config file looks like this:

chino:/var/log# cat /etc/mdadm/mdadm.conf
DEVICE partitions
ARRAY /dev/md0 level=raid5 num-devices=4 UUID=ebdfaedf:59e64777:d81d8f6e:8d6b0392
   devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1

The mdadm man page says: “The config file is only used if explicitly named with –config or requested with (a possibly implicit) –scan. In the later case, /etc/mdadm/mdadm.conf is used.” Since the config file looked right, I gave it a try:

chino:/var/log# mdadm --assemble /dev/md0
mdadm: device /dev/md0 already active - cannot assemble it

So, apparently the array is running, but I can’t use it. This is what /etc/fstab says:

/dev/md0        /home           ext3    defaults        0       2

And that seems right to me. So I tried “mount -a” again, in case it had miraculously sorted itself out. No such luck. Next stop was the the Linux Software-RAID howto, which had this:

Q: After creating a raid array on /dev/md0,  I try to mount it and get the following error:  mount: wrong fs type, bad option, bad superblock on /dev/md0, or too many mounted file systems. What's wrong?
A: You need to create a file system on /dev/md0 before you can mount it.  Use mke2fs.

Well, that’s a non-starter, because I know very well that there’s a file system on there, and making another one over the top of it will destroy my chances of getting it working again. So I rebooted, to see if the array had rebuilt itself into a sensible state. The relevant parts of the system log this time were:

SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sda: drive cache: write back
sda: sda1
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sdb: drive cache: write through
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sdb: drive cache: write through
sdb: sdb1
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdc: drive cache: write back
SCSI device sdc: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdc: drive cache: write back
sdc: sdc1
sd 2:0:0:0: Attached scsi disk sdc
SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdd: drive cache: write back
SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdd: drive cache: write back
sdd: sdd1
sd 3:0:0:0: Attached scsi disk sdd
md: md0 stopped.
md: bind<sdc1>
md: bind<sda1>
md: bind<sdb1>
md: bind<sdd1>
md: kicking non-fresh sdc1 from array!
md: unbind<sdc1>
md: export_rdev(sdc1)
md: md0: raid array is not clean -- starting background reconstruction
raid5: device sdd1 operational as raid disk 1
raid5: device sdb1 operational as raid disk 3
raid5: device sda1 operational as raid disk 2
raid5: cannot start dirty degraded array for md0
RAID5 conf printout:
--- rd:4 wd:3 fd:1
disk 1, o:1, dev:sdd1
disk 2, o:1, dev:sda1
disk 3, o:1, dev:sdb1
raid5: failed to run raid set md0
md: pers->run() failed ...
EXT3-fs: unable to read superblock

I can’t see any change there, so what now? I’ll have to trawl the Internet for better ideas, I guess. I trust there’s some way of re-synching the array. In any case, if mdadm is happy with three of the four disks, it should be able to rebuild the array completely.

Advertisements

3 Comments »

  1. […] My RAID array is broken. So can I get it to work again, preferably with my data intact? […]

    Pingback by RAID Victory « High Above Ealing — September 13, 2006 @ 3:27 am

  2. […] The RAID array fell over again. At first, I thought it might be samba causing the network access problem, but restarting that didn’t achieve anything. Then I ran top on the server box, to see if anything had spun out of control, but everything was calm – no processes running at over 0.2% of CPU. Then I thought I’d check the syslog, which is where I started to find the problems. […]

    Pingback by More RAID Pain « High Above Ealing — November 6, 2006 @ 1:17 pm

  3. did you try hot-adding the disk?

    mdadm /dev/md0 -a /dev/sdc1

    Comment by MikeKlem — July 31, 2007 @ 3:11 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: