illumos watch – November 2013

By Alasdair Lumsden on 25 Nov 2013

illumos watch

Hi Folks, welcome to the first illumos watch blog post, where each month I’ll be sharing some of the latest developments in the world of illumos, OpenSolaris, SmartOS and OpenIndiana!

As this is the first post, I’ll include a few from earlier months that I thought were noteworthy too…

Commits down at the illumos-gate

2989 LOGNAME_MAX should be increased to 32

2989 LOGNAME_MAX should be increased to 32

My personal favourite, this increases the username character limit from 8 characters to 32. While it was possible to use >8 character usernames, they didn’t work everywhere and various utilities would complain.

4236 Internet Packet Disturber

4236 Internet Packet Disturber

The man page says it best:

The ipdadm utility is used to administer the illumos facility for
simulating pathological networks by induce packet drops, delays, and
corruption.

This functionality is only able to the global zone and zones with
exclusive networking stacks. If this is enabled for the global zone,
any zone with a shared networking stack will be affected.

More info on the Joyent Blog: What is ipdadm(1M) Used For?.

2583 Add -p (parsable) option to zfs list

2583 Add -p (parsable) option to zfs list

This one has been in SmartOS for quite some time, and it’s great to see it in illumos as it’s long overdue. The default list option prints values in human-readable form (e.g. MB/GB/TB) – with the new -p option you can print them in a machine parsable format, which for sizes means bytes, and for times its seconds since the epoch, perfect for scripting:

zfs list -p

4303 want local zone TCP kstats in the global zone

4303 want local zone TCP kstats in the global zone

This one is fairly self explanatory, you can now inspect TCP kstats of a zone from the global zone, which is useful for mass monitoring without requiring agents in every zone.

4128 disks in zpools never go away when pulled

4128 disks in zpools never go away when pulled

This one I’m curious to see in action. At the moment if you boot a machine up with a missing drive, it shows up as state "UNAVAIL". There’s also state "REMOVED", which in my experience is correctly used if a drive is erroring and is removed by the OS. However I can’t recall what happens if you just pull a drive out, so perhaps this addresses that. Either way, improvements such as this are always useful.

Fixes and minor enhancements

There’s a long list of fixes and minor enhancements, notable ones include richlowe adding improved argument handling for ld(1) (4270), logadm improvements (4301, 4300, 4299), zfs deadlock fixes (4161, 4322), tail -f improved to notice file truncation (3928).

There’s also a fix for a very serious kernel panic issue in the new the NFS lock manager, on the client side (4198). We hit this bug in production, special thanks to Marcel Telka for fixing this so darn quickly.

Commits over at SmartOS Towers

ZFS Throttle Improvements

OS-2531 zfs/zone IO throttle comment improvement and code cleanup

OS-2576 zfs throttle throttling too much

Joyent have done some work to improve commenting of the ZFS IO throttling code, along with enhancing it slightly. Enhancement mostly covered by this algorithm:

   * To minimize porpoising, we have three separate states for our
   * assessment of I/O performance:  overutilized, underutilized, and
   * neither overutilized nor underutilized.  We will increment the
   * throttle if a zone is using more than its fair share _and_ I/O
   * is overutilized; we will decrement the throttle if a zone is using
   * less than its fair share _or_ I/O is underutilized.
   */

svcadm enhancements

  • OS-1315 Please, oh please, let me clear N services in maintenance across zones from the GZ
  • OS-2566 Want svcadm restart -d […] for taking cores before restart
  • OS-2567 SMF: allow svcadm to act on multiple instances simultaneously
  • OS-2574 svcadm could use -Z option

The titles are mostly self explanatory, and it’s nice to see this tool receive enhancement as its used so frequently.

OS-2556 make existing zfs filesystem limit feature obsolete

OS-2556 make existing zfs filesystem limit feature obsolete

This one is quite interesting – Joyent added a filesystem limits feature to limit the number of child datasets, snapshots, clones, etc that a parent dataset can create. This is useful for zone-delegated datasets in a multi-tenanted environment. Here, the actual feature isn’t being obsoleted, but the code reworked for upstreaming into illumos. (Thanks to Robert Mustacchi for providing more info on this).

OS-2495 add support for multiple mac addresses per client

OS-2495 add support for multiple mac addresses per client

Another useful feature, this adds multiple mac access support to the dladm layer, presumably to allow KVM guests to have multiple MAC addresses.

OS-2544 ipf rules from the GZ should be add to in-zone rules,…

… not replace them

OS-2544 ipf rules from the GZ should be add to in-zone rules, not replace them

 * For each non-global zone, we create two ipf stacks: the per-zone stack and
 * the GZ-controlled stack.  The per-zone stack can be controlled and observed
 * from inside the zone or from the global zone.  The GZ-controlled stack can
 * only be controlled and observed from the global zone (though the rules
 * still only affect that non-global zone).
 *
 * The two hooks are always arranged so that the GZ-controlled stack is always
 * "outermost" with respect to the zone.

I wasn’t aware of this feature, but it seems on SmartOS there are two IPF firewall stacks per zone, allowing you to control firewalling from the global zone. I can see this being a very useful feature if you want to mandate global firewall policy.

In the hosting world, its common to also provide basic "network-level" firewalling, which clients will assume is being done by some perimeter firewall, but if you can do it from the global zone, all the better – logically there’s no real distinction from a clients perspective.

From the Water Cooler

L2ARC Persistence & ZFS UNMAP/SATA TRIM

intel_ssd

Saso Kiselkov has been working on some excellent ZFS features over the past year, such as L2ARC cache persistency and ZFS SCSI Unmap/SATA TRIM support. By the sounds of it the ZFS bit is pretty much done, but the SCSI layer isn’t. Saso is asking for reviews of his L2ARC persistency code, so hopefully that will arrive in-gate soon. More on the thread over on listbox.

N-Way mirror read performance, Steven Hartland (FreeBSD)

FreeBSD ZFS

Looks like some work done by Steven Hartland of the FreeBSD project on N-Way mirror read performance is on its way to illumos, although the thread it generated was quite a long one.

This is quite interesting in light of the benchmark work I did on the different ZFS RAID Levels (Google Docs results, Blog Post), where ZFS RAID10 read performance is already ahead of RAIDZ[23] by quite a bit.

Ongoing 4k Sector Drive Discussions

advformat

It seems a month doesn’t go by where there aren’t more discussions about 4k sector drives/zpool ashift settings:

I think basically avoid 4k sector drives if you can. If you must use them, try and avoid RAIDZ. Think of them as being for bulk storage, not VM storage or general purpose usage.

This Delphix blog post by George Wilson on 4K Sectors, ZFS and ashift is useful.

OpenJDK vs Oracle JDK

openjdk

A short thread on OpenJDK vs Oracle JDK cropped up on the SmartOS list. The short answer is of course, there’s not much of a difference but the binary Oracle JDK can’t be redistributed. Alain O’Dea reports from the field that he’s been using OpenJDK for over a year in production without issue, which is reassuring.

Fascinating exchange between Andrew Galloway and Richard Elling

relling galloway exchange

Here’s an example of a discussion between two different styles of thinking. Personally, I’d side with Andrew – statements should reflect the common case found in the real world.

The takeaway is that with illumos as it stands, you may, depending on your workload, want to limit your ARC to 128GB of RAM until Nexenta upstream a fix to a performance problem with large ARCs.

High Availability with Zones

High Availability with Wackamole

Providing high availability with Zones often requires some kind of IP Address failover. Joyent have managed to get VRRP almost working (it’s still a bit buggy), but fear not – there are multiple options available to solve this particular issue. At EveryCity we use Wackamole with Spread, and there’s also ucarp. The VRRP and 2 HAProxy on different smartos servers thread discusses the different approaches.

There’s a good write up by Guillaume Hilt on the SmartOS wiki: High Availability with Wackamole. Theo Schlossnagle, co-creator of Wackamole mentioned that people should be using Vippy, which is the successor to Wackamole.

OpenIndiana Hipster work continues

osoloi

It’s good to see that work continues on the hipster branch of OpenIndiana at quite a pace. I’m also pleased to see some other names committing, such as Aurélien Larcher, Gordon Ross of Nexenta and Marcel Telka. Plus a fantastic stream of work from Alexander Pyhalov as always followed closely by Adam Števko and Andrzej Szeszo.

Closing Remarks

Whoa, what a lot of goings on in the past month or so. I didn’t realise this blog post would end up quite so big! It just goes to show that the illumos and friends community is really thriving in this post-Oracle world. Catch you all next month for more updates… :-)