The case for RAIDZ2

By Alasdair Lumsden on 10 Apr 2010

We have an old x4500 knocking around which is getting on for 3 years old now. At the beginning of last month, we did a scrub, and to our horror discovered checksum errors on almost all the drives:

  pool: pool01
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 23h0m with 0 errors on Wed Mar  3 12:55:36 2010
config:

        NAME         STATE     READ WRITE CKSUM
        pool01       DEGRADED     0     0     0
          raidz1-0   ONLINE       0     0     0
            c11t3d0  ONLINE       0     0     4  2.50K repaired
            c10t3d0  ONLINE       0     0     0
            c13t3d0  ONLINE       0     0     4  1.50K repaired
            c7t1d0   ONLINE       0     0     0
            c8t3d0   ONLINE       0     0     5  1K repaired
            c7t3d0   ONLINE       0     0     4  2K repaired
            c10t2d0  ONLINE       0     0     3  1K repaired
            c13t2d0  ONLINE       0     0     2  1K repaired
            c11t6d0  ONLINE       0     0     3  1K repaired
            c8t2d0   ONLINE       0     0    16  7K repaired
            c7t2d0   ONLINE       0     0     4  2.50K repaired
          raidz1-1   DEGRADED     0     0     0
            c11t7d0  ONLINE       0     0     6  64K repaired
            c10t7d0  DEGRADED     0     0    58  too many errors
            c13t7d0  ONLINE       0     0     4  3.50K repaired
            c12t7d0  ONLINE       0     0     3  7K repaired
            c8t7d0   ONLINE       0     0     2  4.50K repaired
            c7t7d0   ONLINE       0     0     4  11.5K repaired
            c10t6d0  ONLINE       0     0     4  11K repaired
            c13t6d0  ONLINE       0     0     8  86K repaired
            c12t6d0  ONLINE       0     0     0
            c8t6d0   ONLINE       0     0     2  1K repaired
            c7t6d0   ONLINE       0     0     2  2.50K repaired
          raidz1-2   DEGRADED     0     0     0
            c11t5d0  ONLINE       0     0     1  9K repaired
            c10t5d0  ONLINE       0     0     1  13K repaired
            c13t5d0  ONLINE       0     0     2  1.50K repaired
            c12t5d0  ONLINE       0     0     1  1K repaired
            c8t5d0   DEGRADED     0     0   135  too many errors
            c7t5d0   ONLINE       0     0     2  1.50K repaired
            c10t4d0  ONLINE       0     0     8  44K repaired
            c13t4d0  ONLINE       0     0     3  5K repaired
            c12t4d0  ONLINE       0     0     3  2K repaired
            c8t4d0   ONLINE       0     0     2  6.50K repaired
            c7t4d0   ONLINE       0     0     2  13.5K repaired

errors: No known data errors

Thankfully it’s not used for production, so this didn’t bother us a huge amount. ZFS repaired the data errors without issue (hurrah for ZFS!), and we have been replacing the worst affected disks. We’re now doing weekly scrubs to keep the data “fresh” and stop it rotting away.

However one interesting issue that cropped up. We’re using RAIDZ1, which only stores enough parity for 1 disk to be out of service. Since ZFS uses the parity data to reconstruct blocks with checksum errors, if you’re one disk down, and have a block with a checksum error, you’re in trouble – it can’t repair it and you’re data is corrupted.

So when you replace a failed disk in a RAIDZ1 set, you had better hope you don’t encounter any checksum errors on the other disks during the resilver process. Because ZFS has to read in all the data from the other disks to resilver the new disk, you’re at a high risk of encountering checksum errors, especially in our situation where the disks are wearing out.

And this is precisely what happened next. We replaced a failed disk, and during the resilver, ZFS encountered checksum errors on the other disks it couldn’t repair, and we started to lose data:

  pool: pool01
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 15h47m with 219 errors on Sat Apr 10 16:14:59 2010
config:

        NAME         STATE     READ WRITE CKSUM
        pool01       DEGRADED     0     0   331
          raidz1-0   ONLINE       0     0     0
            c11t3d0  ONLINE       0     0     0
            c10t3d0  ONLINE       0     0     0
            c13t3d0  ONLINE       0     0     0
            c8t5d0   ONLINE       0     0     0
            c8t3d0   ONLINE       0     0     0
            c7t3d0   ONLINE       0     0     0
            c10t2d0  ONLINE       0     0     0
            c13t2d0  ONLINE       0     0     0
            c11t6d0  ONLINE       0     0     0
            c8t2d0   ONLINE       0     0     0
            c7t2d0   ONLINE       0     0     0
          raidz1-1   ONLINE       0     0     0
            c11t7d0  ONLINE       0     0     0
            c11t2d0  ONLINE       0     0     0
            c13t7d0  ONLINE       0     0     0
            c12t7d0  ONLINE       0     0     0
            c8t7d0   ONLINE       0     0     1
            c7t7d0   ONLINE       0     0     0
            c10t6d0  ONLINE       0     0     0
            c13t6d0  ONLINE       0     0     0
            c12t6d0  ONLINE       0     0     0
            c8t6d0   ONLINE       0     0     0
            c7t6d0   ONLINE       0     0     0
          raidz1-2   DEGRADED     0     0   888
            c11t5d0  DEGRADED     0     0     0  too many errors
            c10t5d0  DEGRADED     0     0     0  too many errors
            c13t5d0  DEGRADED     0     0     0  too many errors
            c12t5d0  ONLINE       0     0     0  401G resilvered
            c12t3d0  DEGRADED     0     0     0  too many errors
            c7t5d0   DEGRADED     0     0     0  too many errors
            c10t4d0  DEGRADED     0     0     0  too many errors
            c13t4d0  DEGRADED     0     0     0  too many errors
            c12t4d0  DEGRADED     0     0     0  too many errors
            c8t4d0   DEGRADED     0     0     0  too many errors
            c7t4d0   DEGRADED     0     0     0  too many errors

errors: 219 data errors, use '-v' for a list

Ouch! 219 data errors.

Thankfully ZFS knows precisely which files are affected, and you can just delete/replace/restore the affected files/snapshots and it keeps on running.

However after this, I’m sold on RAIDZ2. I don’t think I’ll be using RAIDZ1 again – the risk of losing data when you’re replacing a failed disk is just too high.