Storage management in LXD 2.15

 

containers

For a long time LXD has supported multiple storage drivers. Users could choose between zfs, btrfs, lvm, or plain directory storage pools but they could only ever use a single storage pool. A frequent feature request was to support not just a single storage pool but multiple storage pools. This way users would for example be able to maintain a zfs storage pool backed by an SSD to be used by very I/O intensive containers and another simple directory based storage pool for other containers. Luckily, this is now possible since LXD gained its own storage management API a few versions back.

Creating storage pools

A new LXD installation comes without any storage pool defined. If you run lxd init LXD will offer to create a storage pool for you. The storage pool created by lxd init will be the default storage pool on which containers are created.

asciicast

Creating further storage pools

Our client tool makes it really simple to create additional storage pools. In order to create and administer new storage pools you can use the lxc storage command. So if you wanted to create an additional btrfs storage pool on a block device /dev/sdb you would simply use lxc storage create my-btrfs btrfs source=/dev/sdb. But let’s take a look:

asciicast

Creating containers on the default storage pool

If you started from a fresh install of LXD and created a storage pool via lxd init LXD will use this pool as the default storage pool. That means if you’re doing a lxc launch images:ubuntu/xenial xen1 LXD will create a storage volume for the container’s root filesystem on this storage pool. In our examples we’ve been using my-first-zfs-pool as our default storage pool:

asciicast

Creating containers on a specific storage pool

But you can also tell lxc launch and lxc init to create a container on a specific storage pool by simply passing the -s argument. For example, if you wanted to create a new container on the my-btrfs storage pool you would do lxc launch images:ubuntu/xenial xen-on-my-btrfs -s my-btrfs:

asciicast

Creating custom storage volumes

If you need additional space for one of your containers to for example store additional data the new storage API will let you create storage volumes that can be attached to a container. This is as simple as doing lxc storage volume create my-btrfs my-custom-volume:

asciicast

Attaching custom storage volumes to containers

Of course this feature is only helpful because the storage API let’s you attach those storage volume to containers. To attach a storage volume to a container you can use lxc storage volume attach my-btrfs my-custom-volume xen1 data /opt/my/data:

asciicast

Sharing custom storage volumes between containers

By default LXD will make an attached storage volume writable by the container it is attached to. This means it will change the ownership of the storage volume to the container’s id mapping. But Storage volumes can also be attached to multiple containers at the same time. This is great for sharing data among multiple containers. However, this comes with a few restrictions. In order for a storage volume to be attached to multiple containers they must all share the same id mapping. Let’s create an additional container xen-isolated that has an isolated id mapping. This means its id mapping will be unique in this LXD instance such that no other container does have the same id mapping. Attaching the same storage volume my-custom-volume to this container will now fail:

asciicast

But let’s make xen-isolated have the same mapping as xen1 and let’s also rename it to xen2 to reflect that change. Now we can attach my-custom-volume to both xen1 and xen2 without a problem:

asciicast

Summary

The storage API is a very powerful addition to LXD. It provides a set of essential features that are helpful in dealing with a variety of problems when using containers at scale. This short introducion hopefully gave you an impression on what you can do with it. There will be more to come in the future.

Advertisements

Storage Tools

Having implemented or at least rewritten most storage backends in LXC as well as LXD has left me under the impression that most storage tools suck. Most advanced storage drivers provide a set of tools that allow userspace to administer storage without having to link against an external library. This is a huge advantage if one wants to keep the amount of external dependencies to a minimum. This is a policy to which LXC and LXD always try to adhere. One of the most crucial features such tools should provide is the ability to retrieve each property for each storage entity they allow to administer in a predictable and machine-readable way. As far as I can tell, only the ZFS and LVM tools allow one to do this. For example

zfs get -H -p -o "value" <key> <storage-entity>

will let you retrieve (nearly) all properties. The RBD and BTRFS tools lack this ability which makes them inconvenient to use at times.

lxc exec vs ssh

Recently, I’ve implemented several improvements for lxc exec. In case you didn’t know, lxc exec is LXD‘s client tool that uses the LXD client api to talk to the LXD daemon and execute any program the user might want. Here is a small example of what you can do with it:

asciicast

One of our main goals is to make lxc exec feel as similar to ssh as possible since this is the standard of running commands interactively or non-interactively remotely. Making lxc exec behave nicely was tricky.

1. Handling background tasks

A long-standing problem was certainly how to correctly handle background tasks. Here’s an asciinema illustration of the problem with a pre LXD 2.7 instance:

asciicast

What you can see there is that putting a task in the background will lead to lxc exec not being able to exit. A lot of sequences of commands can trigger this problem:

chb@conventiont|~
> lxc exec zest1 bash
root@zest1:~# yes &
y
y
y
.
.
.

Nothing would save you now. yes will simply write to stdout till the end of time as quickly as it can…
The root of the problem lies with stdout being kept open which is necessary to ensure that any data written by the process the user has started is actually read and sent back over the websocket connection we established.
As you can imagine this becomes a major annoyance when you e.g. run a shell session in which you want to run a process in the background and then quickly want to exit. Sorry, you are out of luck. Well, you were.
The first, and naive approach is obviously to simply close stdout as soon as you detect that the foreground program (e.g. the shell) has exited. Not quite as good as an idea as one might think… The problem becomes obvious when you then run quickly executing programs like:

lxc exec -- ls -al /usr/lib

where the lxc exec process (and the associated forkexec process (Don’t worry about it now. Just remember that Go + setns() are not on speaking terms…)) exits before all buffered data in stdout was read. In this case you will cause truncated output and no one wants that. After a few approaches to the problem that involved, disabling pty buffering (Wasn’t pretty I tell you that and also didn’t work predictably.) and other weird ideas I managed to solve this by employing a few poll() “tricks” (In some sense of the word “trick”.). Now you can finally run background tasks and cleanly exit. To wit:
asciicast

2. Reporting exit codes caused by signals

ssh is a wonderful tool. One thing however, I never really liked was the fact that when the command that was run by ssh received a signal ssh would always report -1 aka exit code 255. This is annoying when you’d like to have information about what signal caused the program to terminate. This is why I recently implemented the standard shell convention of reporting any signal-caused exits using the standard convention 128 + n where n is defined as the signal number that caused the executing program to exit. For example, on SIGKILL you would see 128 + SIGKILL = 137 (Calculating the exit codes for other deadly signals is left as an exercise to the reader.). So you can do:

chb@conventiont|~
> lxc exec zest1 sleep 100

Now, send SIGKILL to the executing program (Not to lxc exec itself, as SIGKILL is not forwardable.):

kill -KILL $(pidof sleep 100)

and finally retrieve the exit code for your program:

chb@conventiont|~
> echo $?
137

Voila. This obviously only works nicely when a) the exit code doesn’t breach the 8-bit wall-of-computing and b) when the executing program doesn’t use 137 to indicate success (Which would be… interesting(?).). Both arguments don’t seem too convincing to me. The former because most deadly signals should not breach the range. The latter because (i) that’s the users problem, (ii) these exit codes are actually reserved (I think.), (iii) you’d have the same problem running the program locally or otherwise.
The main advantage I see in this is the ability to report back fine-grained exit statuses for executing programs. Note, by no means can we report back all instances where the executing program was killed by a signal, e.g. when your program handles SIGTERM and exits cleanly there’s no easy way for LXD to detect this and report back that this program was killed by signal. You will simply receive success aka exit code 0.

3. Forwarding signals

This is probably the least interesting (or maybe it isn’t, no idea) but I found it quite useful. As you saw in the SIGKILL case before, I was explicit in pointing out that one must send SIGKILL to the executing program not to the lxc exec command itself. This is due to the fact that SIGKILL cannot be handled in a program. The only thing the program can do is die… like right now… this instance… sofort… (You get the idea…). But a lot of other signals SIGTERM, SIGHUP, and of course SIGUSR1 and SIGUSR2 can be handled. So when you send signals that can be handled to lxc exec instead of the executing program, newer versions of LXD will forward the signal to the executing process. This is pretty convenient in scripts and so on.

In any case, I hope you found this little lxc exec post/rant useful. Enjoy LXD it’s a crazy beautiful beast to play with. Give it a try online https://linuxcontainers.org/lxd/try-it/ and for all you developers out there: Checkout https://github.com/lxc/lxd and send us patches. 🙂 We don’t require any CLA to be signed, we simply follow the kernel style of requiring a Signed-off-by line. 🙂