Reading view

There are new articles available, click to refresh the page.

Linux Fu: The SSD Super Cache

NVMe solid state disk drives have become inexpensive unless you want the very largest sizes. But how do you get the most out of one? There are two basic strategies: you can use the drive as a fast drive for things you use a lot, or you can use it to cache a slower drive.

Each method has advantages and disadvantages. If you have an existing system, moving high-traffic directories over to SSD requires a bind mount or, at least, a symbolic link. If your main filesystem uses RAID, for example, then those files are no longer protected.

Caching sounds good, in theory, but there are at least two issues. You generally have to choose whether your cache “writes through”, which means that writes will be slow because you have to write to the cache and the underlying disk each time, or whether you will “write back”, allowing the cache to flush to disk occasionally. The problem is, if the system crashes or the cache fails between writes, you will lose data.

Compromise

For some time, I’ve adopted a hybrid approach. I have an LVM cache for most of my SSD that hides the terrible performance of my root drive’s RAID array. However, I have some selected high-traffic, low-importance files in specific SSD directories that I either bind-mount or symlink into the main directory tree. In addition, I have as much as I can in tmpfs, a RAM drive, so things like /tmp don’t hit the disks at all.

There are plenty of ways to get SSD caching on Linux, and I won’t explain any particular one. I’ve used several, but I’ve wound up on the LVM caching because it requires the least odd stuff and seems to work well enough.

This arrangement worked just fine and gives you the best of both worlds. Things like /var/log and /var/spool are super fast and don’t bog down the main disk. Yet the main disk is secure and much faster thanks to the cache setup. That’s been going on for a number of years until recently.

The Upgrade Issue

I recently decided to give up using KDE Neon on my main desktop computer and switch to OpenSUSE Tumbleweed, which is a story in itself. The hybrid caching scheme seemed to work, but in reality, it was subtly broken. The reason? SELinux.

Tumbleweed uses SELinux as a second level of access protection. On vanilla Linux, you have a user and a group. Files have permissions for a specific user, a specific group, and everyone else. Permission, in general, means if a given user or group member can read, write, or execute the file.

SELinux adds much more granularity to protection. You can create rules that, for example, allow certain processes to write to a directory but not read from it. This post, though, isn’t about SELinux fundamentals. If you want a detailed deep dive from Red Hat, check out the video below.

The Problem

The problem is that when you put files in SSD and then overlay them, they live in two different places. If you tell SELinux to “relabel” files — that is, put them back to their system-defined permissions, there is a chance it will see something like /SSD/var/log/syslog and not realize that this is really the same file as /var/log. Once you get the wrong label on a system file like that, bad, unpredictable things happen.

There is a way to set up an “equivalence rule” in SELinux, but there’s a catch. At first, I had the SSD mounted at /usr/local/FAST. So, for example, I would have /usr/local/FAST/var/log. When you try to equate /usr/local/FAST/var to /usr/var, you run into a problem. There is already a rule that /usr and /usr/local are the same. So you have difficulties getting it to understand that throws a wrench in the works.

There are probably several ways to solve this, but I took the easy way out: I remounted to /FAST. Then it was easy enough to create rules for /var/log to /FAST/var/log, and so on. To create an equivalence, you enter:


semanage fcontext -a -e /var/log /FAST/var/log

The Final Answer

So what did I wind up with? Here’s my current /etc/fstab:


UUID=6baad408-2979-2222-1010-9e65151e07be /              ext4    defaults,lazytime,commit=300 0 1
tmpfs                                     /tmp           tmpfs   mode=1777,nosuid,nodev 0 0
UUID=cec30235-3a3a-4705-885e-a699e9ed3064 /boot          ext4    defaults,lazytime,commit=300,inode_readahead_blks=64 0 2
UUID=ABE5-BDA4                            /boot/efi      vfat    defaults,lazytime 0 2
tmpfs                                       /var/tmp    tmpfs  rw,nosuid,nodev,noexec,mode=1777 0 0

<h1>NVMe fast tiers</h1>

UUID=c71ad166-c251-47dd-804a-05feb57e37f1 /FAST  ext4  defaults,noatime,lazytime  0  2
/FAST/var/log /var/log  none  bind,x-systemd.requires-mounts-for=/FAST 0 0
/FAST/usr/lib/sysimage/rpm /usr/lib/sysimage/rpm none bind,x-systemd.requires-mounts-for=/FAST 0 0
/FAST/var/spool /var/spool  none  bind,x-systemd.requires-mounts-for=/FAST 0 0

As for the SELinux rules:


/FAST/var/log = /var/log
/FAST/var/spool = /var/spool
/FAST/alw/.cache = /home/alw/.cache
/FAST/usr/lib/sysimage/rpm = /usr/lib/sysimage/rpm
/FAST/alw/.config = /home/alw/.config
/FAST/alw/.zen = /home/alw/.zen

Note that some of these don’t appear in /etc/fstab because they are symlinks.

A good rule of thumb is that if you ask SELinux to relabel the tree in the “real” location, it shouldn’t change anything (once everything is set up). If you see many changes, you probably have a problem:


restorecon -Rv /FAST/var/log

Worth It?

Was it worth it? I can certainly feel the difference in the system when I don’t have this setup, especially without the cache. The noisy drives quiet down nicely when most of the normal working set is wholly enclosed in the cache.

This setup has worked well for many years, and the only really big issue was the introduction of SELinux. Of course, for my purposes, I could probably just disable SELinux. But it does make sense to keep it on if you can manage it.

If you have recently switched on SELinux, it is useful to keep an eye on:


ausearch -m AVC -ts recent

That shows you if SELinux denied any access recently. Another useful command:


systemctl status setroubleshootd.service

Another good systemdstupid trick.” Often, any mysterious issues will show up in one of those two places. If you are on a single-user desktop, it isn’t a bad idea to retry any strange anomalies with SELinux turned off as a test: setenforce 0. If the problem goes away, it is a sure bet that something is wrong with the SELinux system.

Of course, every situation is different. If you don’t need RAID or a huge amount of storage, maybe just use an SSD as your root system and be done with it. That would certainly be easier. But, in typical Linux fashion, you can make of it whatever you want. We like that.

❌