Pool Health

From the ‘zpool’ man pages:

A pool’s health status is described by one of three states: online, degraded, or faulted. An online pool has all devices operating normally. A degraded pool is one in which one or more devices have failed, but the data is still available due to a redundant configuration. A faulted pool has corrupted metadata, or one or more faulted devices, and insufficient replicas to continue functioning.

A pool’s health status can be viewed with the ‘zpool status’ command. In the following example the pool ‘tank’ is ‘ONLINE’ and operating normally.

#> zpool status
  pool: tank
 state: ONLINE
  scan: none requested

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0

errors: No known data errors

Alternatively the ‘zpool status -x’ command can be used to only display the status of problematic pools. In the following example all pools are healthy.

#> zpool status -x
all pools are healthy

The following bash script was written to to actively monitor pool health on my home servers. The script parses the output of ‘zpool status -x’ and will send an email if the pool is degraded or faulted. I added the script to the cron table with an hour interval but if you have a strict mean time to recovery requirement shorter intervals can be used.

