Docker, NFS, ZFS, and extended attributes

It may be difficult to develop an emotional connection to all of the features of filesystems and filers. Take deduplication for instance. Dedup is cool. Rabin-Karp rolling hash, sliding-window Content Defined Chunking (CDC) – those were cool 15 years ago and remain cool today. Improvements and products (and startups) keep pouring in.

But when it comes to extended file attributes (xattrs), emotions range from a blank stare to dismay. As in: wouldn’t touch with a ten-foot pole.

Come to think of it, part of the problem is – NFS. And part of the NFS problem is that both v3 and v4 do not support xattrs. There is no support whatsoever: none, nada, zilch. And how there can be with no interoperable standard?

To which end, the work is being done and progress is being made, at least on the IETF side. It’s a relatively recent addition to IETF drafts considering that xattrs are locally supported for about millennia while NFSv4 was first standardized in 2003.

Could this lackluster history be a telltale sign that NFS, heaven forbid, is gradually phasing out? Hell, no. Au contraire – lots of good turbulence in ecosystems: VMware (added NFSv4.1 client), OpenStack (Manila project), Amazon (NFSv4.0 in AWS EFS) to name a few. The scaling demand for shared files further fosters improvements inside common OSes and dedicated NAS devices aka filers. IDC analytics continues to see strong growth on the scale-out (server) side of POSIX-compliant services and products.

Plus, NFS4.2 is practically around the corner, adding a bunch of features that look so familiar to those of us who’d done business with VMware VAAIs: Sparse File, Space Reservation, Server Side Copy, Application Data Blocks. And more.

In short, NFS is here to stay and thrive for at least another decade. But.

The fact of the matter is that slowly but surely NFS gets relegated to a space where the following two things exist simultaneously:

  1. the files are relatively large, meaning – not small
  2. there are no extended attributes

For instance, already today it is almost unthinkable to place a docker image on an NFS server. It won’t run.

Why? First, more often than not there’s a bunch of Linux capabilities that are getting used with root filesystems and apps – and those are in fact a (security) type extended attributes.

Second, even when there are no extended attributes whatsoever, performance of the container will leave a lot to be desired. And it won’t matter whether you’ve got a bunch of run-of-the-mill 7500 RPM SATA drives on the back, or the latest-greatest NVMe-PCIe-Gen3 SSDs.

The first scenario, by the way, will manifest itself in the honest: “operation not supported”. No extended attributes, period, end of story:xattrs-term-1

On the performance front, the following comparison must be looking convincing:xattrs-table-1This is apples-to-apples ‘docker pull’ times measured for identical bunch of images, with the only difference on the client side being (roughly): loopback=no/yes.

Running (‘docker run’) containers from the unadulterated remote backend (e.g., the same NFS + loopback combination, or iSCSI LUN) is more of a boring routine – no surprises. Except maybe the cases when containerized apps are interpreted: Ruby On Rails, PHP etc. friends will definitely prefer a total local presence:xattrs-docker-run-bench-1

Fig. 1: Excerpt from one ‘docker run’ benchmark – higher is slower

The above is an illustration and case in point. Y-axis (Fig. 1) counts seconds until php-zendserver becomes ready for service – for a variety of backends (for instance, “nfs-loopback-fs” stands for NFSv4 + loopback + ext4 + aufs).

Unless explicitly specified, local filesystem here is always ext4. One provisional observation that can be drawn: while local ZFS is winning by a margin, device mapper (dm) in general is making things worse, performance-wise.

And by the way

While this particular container appears to prefer local ZFS filesystem and driver, arguably many/most other containers would exhibit the same exact preference. Recall now that docker image layers are read-only. Each time there’s a read-only content, first rhyme that must come to mind is – caching. RAM size matters of course but all things being equal, some filesystems read-cache better than others…

And some of those can even do extended attributes!

Related text: Docker Detour