ZFS runs *really* slowly when free disk usage goes above 80%

By Alasdair Lumsden on 18 Jul 2010

You’re sat at your desk, sipping a nice beverage, all is well in the world. Your busy database is sat happily in the background as always. Suddenly, out of the blue, performance drops significantly. You’re perplexed – nothing has changed. The DBAs go wild – fingers get pointed. They claim the DB is fine – the hardware/OS is at fault. You investigate, you see a lot of IO, but you can’t pin it down. What’s going on?

Well, you could very well just have hit 80% disk space usage, and now your disk performance has gone through the toilet.

You can fix the issue by running this:

echo "metaslab_df_free_pct/W 4" | mdb -kw

And you can make it permanent by doing:

echo "set zfs:metaslab_df_free_pct=4" >> /etc/system

What does this do? Well, ZFS normally uses “first fit” block allocation policy. When you hit 80% disk space usage, it switches to best fit. To quote the source code:

     50  * The minimum free space, in percent, which must be available
     51  * in a space map to continue allocations in a first-fit fashion.
     52  * Once the space_map's free space drops below this level we dynamically
     53  * switch to using best-fit allocations.

All current Solaris 10 releases, and versions of OpenSolaris prior to 22-Nov-2009 use a default of 30 for this value. How does a value of 30 equate to 80% disk space usage? I have no idea – I’ve never figured that one out. All I know is, run the above commands, and the problem goes away. Perhaps someone with more knowledge of how metaslabs work can enlighten me :)