It was over 6 years ago that I started EveryCity with my cousin Duncan, and whilst a great deal has changed with my daily routine, it’s surprising what also hasn̵ [...]
You’re sat at your desk, sipping a nice beverage, all is well in the world. Your busy database is sat happily in the background as always. Suddenly, out of the blue, performance drops significantly. You’re perplexed – nothing has changed. The DBAs go wild – fingers get pointed. They claim the DB is fine – the hardware/OS is at fault. You investigate, you see a lot of IO, but you can’t pin it down. What’s going on?
Well, you could very well just have hit 80% disk space usage, and now your disk performance has gone through the toilet.
You can fix the issue by running this:
echo "metaslab_df_free_pct/W 4" | mdb -kw
And you can make it permanent by doing:
echo "set zfs:metaslab_df_free_pct=4" >> /etc/system
What does this do? Well, ZFS normally uses “first fit” block allocation policy. When you hit 80% disk space usage, it switches to best fit. To quote the source code:
50 * The minimum free space, in percent, which must be available 51 * in a space map to continue allocations in a first-fit fashion. 52 * Once the space_map's free space drops below this level we dynamically 53 * switch to using best-fit allocations.
All current Solaris 10 releases, and versions of OpenSolaris prior to 22-Nov-2009 use a default of 30 for this value. How does a value of 30 equate to 80% disk space usage? I have no idea – I’ve never figured that one out. All I know is, run the above commands, and the problem goes away. Perhaps someone with more knowledge of how metaslabs work can enlighten me