tldr: Take me to the process
What Is a Kernel Module?
Have you ever wondered how your computer’s operating system manages to support a seemingly endless variety of devices, from your mouse and keyboard to your graphics card and Wi-Fi adapter? The secret lies in kernel modules, also known as Loadable Kernel Modules (LKMs). Think of a kernel module (kmod) as a plug-in for your computer’s brain, the kernel. It’s a piece of code that can be loaded into the operating system while it’s running, without requiring a full reboot. This dynamic nature is what makes modern operating systems so flexible and adaptable.
The most common use for kernel modules is as device drivers. When you plug in a new piece of hardware, the operating system doesn’t necessarily have the code to talk to it built-in. Instead, it can load a specific kernel module that contains the instructions for that device. This modular approach keeps the core kernel—the part of the OS that’s always running—small and efficient. When a module is needed, it’s loaded into the kernel’s memory, giving it direct, privileged access to system resources.
This approach offers significant advantages. By not including every possible feature in the core kernel, boot times are faster and the system uses less memory. Furthermore, developers can update or fix a specific module without the need to rebuild and reinstall the entire operating system. This makes maintenance and security updates much more streamlined. So, the next time you see your computer instantly recognize a new device, you can thank the power of kernel modules for that seamless experience.
Not all kernel modules are dynamically loaded after the system boots. Many drivers, particularly those for essential hardware like networking cards or storage controllers, are compiled directly into the kernel itself. These are known as in-tree modules because their source code resides within the main Linux kernel source tree. When you download and compile the Linux kernel, you have the option to build these drivers as either loadable modules (with the .ko extension) or to bake them directly into the kernel’s main binary file, vmlinux. This is a crucial choice during system configuration. For instance, the drivers for most common network interface cards (NICs) are often built-in to ensure the system has network connectivity from the very beginning of the boot process, which is essential for tasks like fetching updates or logging into a remote server. This is in contrast to out-of-tree modules, which are developed and compiled separately from the main kernel source and must be loaded dynamically after the system has booted.
Enter the Solarflare NIC
A Solarflare network card, now an AMD product, is a high-performance network interface card (NIC) designed for applications where every millisecond counts. Unlike standard NICs, which are built for general-purpose use, Solarflare cards are optimized for ultra-low latency, high message rates, and high-speed data transfer. This makes them a staple in specialized environments such as electronic trading, financial services, and high-performance computing.
The core of their technology is the Onload kernel module, a software component that works with the NIC hardware to bypass the traditional Linux networking stack. Instead of packets and data having to pass through multiple layers of the kernel, Onload allows applications to directly access the card’s network buffers. This “kernel bypass” dramatically reduces latency and CPU overhead. Solarflare cards are known for their sub-microsecond latency and near-zero jitter, which are essential for applications like algorithmic trading, where minimizing the time it takes to process a transaction can mean the difference between profit and loss.
The Problem
Since the Onload kernel module is not included in the standard Linux kernel distribution, using a Solarflare network card requires a multi-step installation process. First, you must obtain the Onload source code from AMD’s website or a repository. Next, you need to compile the module against your specific Linux kernel. This is a crucial step because kernel modules are tightly coupled with the kernel version they’re built for. The compilation process ensures compatibility by using your system’s kernel headers to create a .ko file—the compiled kernel object. Finally, after a successful build, you must install the module onto your system using a tool like onload_install or by manually loading it and managing its dependencies with depmod.
The Complication
This compilation and install process must be repeated whenever the Linux kernel is updated to maintain the functionality of the Solarflare hardware. In a traditional Red Hat Enterprise Linux (RHEL) host, we could use Dynamic Kernel Module Support. Dynamic Kernel Module Support (DKMS ) is a framework designed to automate the often tedious and manual process of rebuilding and installing out-of-tree kernel modules every time a system’s kernel is updated. Without DKMS, a kernel update would break drivers for devices like specialized network cards or certain graphics cards, forcing the user to manually re-compile and reinstall the modules to restore functionality. DKMS eliminates this problem by providing a system that intelligently manages module source code and automatically handles the build process.
The process begins when a module’s source code is first installed into a specific location on the system, along with a configuration file that contains the build instructions. The module is then registered with the DKMS framework. From that point on, DKMS takes over. It works by integrating with the distribution’s package manager, such as dnf on RHEL. When a new kernel is installed via a package update, a special “hook” is triggered. This hook tells DKMS that a new kernel is present and instructs it to rebuild all registered modules. DKMS then reads the build instructions, compiles the modules against the new kernel’s headers, and installs the newly created kernel objects (.ko files) into the correct directory, ensuring a seamless transition and preserving hardware functionality across upgrades.
The Problem with Immutability
On traditional RHEL, you have full control over the host’s operating system, but OpenShift is fundamentally different. OpenShift nodes run on Red Hat CoreOS (RHCOS), a specialized, immutable, and minimal operating system designed specifically for running containers. This immutability is the primary reason why manual kernel module management with tools like DKMS doesn’t work.
RHCOS nodes are not like standard RHEL servers where you can simply install packages and modify the system at will. They operate on a read-only root filesystem. This means you cannot use commands like dnf install to add packages or make to compile new code directly on the host. The entire OS is treated as a single, atomic unit that is updated by swapping out the whole image, not by applying patches or individual package updates. This design enhances security and consistency across the cluster, preventing configuration drift and making updates more reliable.
The OpenShift Way: Machine Config Operator and Kernel Module Management Operator
Red Hat provides a different, Kubernetes-native approach to managing node configurations and kernel modules.
- Machine Config Operator (MCO): The MCO is a core component of OpenShift that manages the entire lifecycle of the operating system on each node. It takes high-level declarative configurations (called MachineConfigs) and applies them to nodes by generating and applying a new, complete OS image. Any changes you want to make, such as adding a new user, a file, or a systemd service, must be done through a MachineConfig object. You cannot just SSH into a node and make changes because they would be lost on the next reboot or OS update.
- Kernel Module Management (KMM) Operator: For out-of-tree kernel modules, the KMM Operator is the officially supported and recommended method. Instead of compiling the module on each node, you first build a container image that contains the compiled .ko file. This container is typically built using a “driver-toolkit” image that has the kernel headers and build tools matching the RHCOS kernel. You then use a custom resource (CR) to tell the KMM Operator which nodes need the module. The KMM Operator handles the rest, compiling and signing the kernel modules, automatically deploying the container, extracting the module, loading it onto the host, and ensuring it’s available after every kernel update, just like DKMS would.
This container-centric approach aligns with OpenShift’s philosophy, treating the kernel module as a deployable artifact rather than a manually managed system component. It provides a secure, reliable, and scalable way to manage specialized hardware drivers across a large, dynamic cluster.
The Implementation: Prerequisites
For these implementation details, we have the following.
- An OpenShift cluster (4.19 used for this writeup)
- Kernel Module Management operator is installed
- Node Feature Discovery Operator is installed
- Nodes with Solarflare network cards has NFD label those nodes with
feature.node.kubernetes.io/pci-1924.present=true
as expected
- Nodes with Solarflare network cards has NFD label those nodes with
- OpenShift internal registry is configured and accessible
- Username/password for user with cluster-admin on the OpenShift cluster
- Bastion host running RHEL 9.6
- Red Hat account with valid subscriptions to the products.
Everything from here on out will be done from the bastion host. From the bastion host, make sure you can login to all the necessary systems.
# Login to your cluster
oc login --server=https://api.<cluster_suffix>:6443 -u kubeadmin -p <password>
# Login to OpenShift's exposed internal registry (change cluster and domain values)
podman login -u kubeadmin -p $(oc whoami -t) default-route-openshift-image-registry.apps.<cluster_suffix> --tls-verify=false
# Login to registy.redhat.io from local Podman environment
podman login registry.redhat.io -u <user> -p <password>
# Create a working directory
git clone https://github.com/openshift-tigerteam/onload-kmm.git
cd onload-kmm
Download the OpenOnload SRPM Release Package
- Download OpenOnload SRPM Release Package to project root folder.
- Example:
curl --http2 -O https://www.xilinx.com/content/dam/xilinx/publications/solarflare/onload/openonload/9_0_2_47/sf-122450-ls-17-openonload-srpm-release-package.zip
- Example:
- Note the version as
<srpm_version>
for later use.- Example:
9.0.2.140
- Example:
Create OpenShift Project
Create a new project for the onload-kmm work. Create the service account and add the privileged SCC to it. Copy the entitlement certs from openshift-config-managed
to the new project so that the build process can access the RHEL CRB repository. This is one of the gotchas of this particular implementation and is the main reason the DTK is not used in the builder. The DTK image does not have any references to the RHEL CRB repository and subscription-manager doesn’t allow you to make those types of updates in the image build.
# Create project
oc new-project onload-kmm
# Added privileged RBAC for onload-kmm-sa service account
oc apply -f onload-kmm-sa.yaml -n onload-kmm
# Copy the entitlement certs from openshift-config-managed to onload-kmm project
# This is required to access the RHEL CRB repository during the build process
oc get secret etc-pki-entitlement -n openshift-config-managed -o yaml | \
sed 's/namespace: openshift-config-managed/namespace: onload-kmm/' | \
sed '/resourceVersion:/d; /uid:/d; /creationTimestamp:/d' | \
oc replace -f -
Creating and Importing the Machine Owner Key (MOK) for Onload
When Secure Boot is enabled, a system’s firmware only allows code that is cryptographically signed with a trusted key to be executed during the boot process. This is a crucial security feature that prevents malicious software, such as rootkits, from compromising the system at a low level before the operating system even starts. RHCOS, as an immutable OS with a focus on security, takes this a step further by requiring all kernel modules—including out-of-tree modules—to be signed. If a module isn’t signed or the signature doesn’t match a trusted key, the kernel will refuse to load it.
The Machine Owner Key (MOK) is a mechanism that allows an administrator to enroll their own public signing key into the system’s UEFI firmware. This enables the system to trust and load custom-built kernel modules, like the AMD Solarflare Onload driver, that have been signed with the corresponding private key. Without MOK and module signing, the high-security posture of RHCOS would prevent any third-party kernel code from running, effectively blocking the use of specialized hardware.
Create the keys for the Machine Owner Key (MOK) for Onload and all the formats.
openssl req -new -x509 -newkey rsa:2048 -keyout mok-onload.priv -outform DER -out mok-onload.der -nodes -days 36500 -subj "/CN=OnloadModule/"
openssl x509 -in mok-onload.der -inform DER -out mok-onload.pem -outform PEM
Create secrets using the created keys in onload-kmm project to be used to sign the kernel modules
oc create secret generic onload-signing-key \
--from-file=key=mok-onload.priv \
-n onload-kmm --dry-run=client -o yaml | oc apply -f -
oc create secret generic onload-signing-cert \
--from-file=cert=mok-onload.pem \
-n onload-kmm --dry-run=client -o yaml | oc apply -f -
Importing the Key: Begin Tedious Loop Here
This part is tedious. If your company does key rotations, you will need to figure out how to automate this piece of the puzzle. Copy the key to the host (DER format) and then import the key into MOK. Do this for every host where the kernel driver needs to be present.
scp mok-onload.der core@<node_ip>:/var/home/core/
oc debug node/<node_name>
chroot /host
sudo mokutil --import /var/home/core/mok-onload.der
# Enter password
exit
exit
Reboot the host to complete the MOK enrollment.
oc adm drain --ignore-daemonsets --delete-emptydir-data <node_name>
oc debug node/<node_name>
chroot /host
sudo reboot
When the host reboots, during the GRUB window, The MOK blue screens will come up. It’s on a timer so you have to be quick and hit a key to get into the MOK management screen. Then follow these steps:
- [Shim UEFI Key Management]. Press any key to perform MOK management
- [Perform MOK Management]. Select
Enroll MOK
- [Enroll MOK]. Select
Continue
- Enroll the key(s)? Select
Yes
- Enroll the key(s)? Enter password
- [Perform MOK Management]. Select
Reboot
- SYSTEM REBOOTS
- Wait for the node to be available.
After the node reboots and is available, uncordon it.
oc debug node/<node_name>
chroot /host
oc adm uncordon <node_name>
> End Tedious Loop 🙂
KMM – Building the SPRM source container image
We are going to utilize the OpenShift build capabilities to create our SRPM source image. The reason we have to do this is that the SRPM for OnLoad is huge and doesn’t fit in a ConfigMap. You would use object storage, but this solution feels right, especially with the tagging.
Let’s take a quick look at the Dockerfile.onload-srpm to see that there isn’t much there. Just a quick unzip and copy.
FROM registry.redhat.io/ubi9/ubi-minimal:latest
# Install the minimum tools we need
RUN microdnf install -y unzip && microdnf clean all
# Copy in your SRPM zip
COPY *-openonload-srpm-release-package.zip /root/openonload-srpm-release-package.zip
# Unpack it
RUN unzip /root/openonload-srpm-release-package.zip -d /root/ \
&& rm -f /root/*-openonload-srpm-release-package.zip
# Default to a shell (or adjust if you want this to build/run something directly)
CMD ["/bin/bash"]
Build SRPM source container image. The tag should match the <srpm_version>
of the onload package downloaded. This will use the Dockerfile.onload-srpm
dockerfile.
# Create the build config
oc new-build --name=onload-srpm --strategy=docker --binary -n onload-kmm
# Run the build
tar --transform='s|Dockerfile.onload-srpm|Dockerfile|' \
-czf - Dockerfile.onload-srpm *.zip \
| oc start-build onload-srpm --from-archive=- -F -n onload-kmm
# Tag the image to the internal registry with the version of the srpm
oc tag onload-kmm/onload-srpm:latest onload-kmm/onload-srpm:<srpm_version> -n onload-kmm
KMM Build Process
The Kernel Module Management operator has a CR named Module
that defines how to build the SRPM. The way it does this is to defer the build to a Dockerfile, which is stored in a ConfigMap. Here’s the SRPM build Dockerfile in the ConfigMap.
apiVersion: v1
kind: ConfigMap
metadata:
name: onload-kmm-build-dockerfile
namespace: onload-kmm
data:
dockerfile: |
ARG KERNEL_FULL_VERSION
ARG SRPM_IMAGE
FROM ${SRPM_IMAGE} AS srpm
FROM registry.redhat.io/ubi9/ubi:latest as builder
ARG KERNEL_FULL_VERSION
RUN echo "KERNEL_FULL_VERSION=${KERNEL_FULL_VERSION}"
# Bring in SRPM
COPY --from=srpm /root/*.src.rpm /root/
# Expand and build SRPM
RUN rpm -ivh /root/*.src.rpm
RUN dnf builddep -y /root/rpmbuild/SPECS/openonload.spec && \
dnf install -y \
rpm-build \
kernel-devel-${KERNEL_FULL_VERSION} \
kernel-headers-${KERNEL_FULL_VERSION} \
kmod \
&& dnf clean all
RUN rpmbuild -bb /root/rpmbuild/SPECS/openonload.spec
RUN rpm -ivh --nodeps --noscripts /root/rpmbuild/RPMS/x86_64/onload-kmod-*.rpm
FROM registry.redhat.io/ubi9/ubi-minimal:latest
ARG KERNEL_FULL_VERSION
RUN microdnf install -y kmod \
&& microdnf clean all
# Preserve the directory structure KMM expects
COPY --from=builder /lib/modules/${KERNEL_FULL_VERSION} /opt/lib/modules/${KERNEL_FULL_VERSION}
RUN ls -laR /opt/lib/modules/${KERNEL_FULL_VERSION}
RUN depmod -b /opt ${KERNEL_VERSION}
RUN sleep 10
CMD ["/bin/bash"]
Notes:
- The KERNEL_FULL_VERSION is a builtin ARG passed by KMM as part of the build process.
- The SRPM_IMAGE argument is passed via config. More on this later.
- We are utilizing DNF in two different ways. One to use the
openonload.spec
to enumerate the known dependencies for the build to install and two, to install the actual necesaary tools for rpm builds, specifically the kernel headers. Remember, we HAVE to build against the exact kernel.
Apply the ConfigMap yaml.
oc apply -f onload-kmm-build-dockerfile.cm.yaml -n onload-kmm
Create the KMM Module CR
This is the KMM module CR which ties everything together to create the build process and the resulting daemonset which signs and installs the .ko
files onto the identified nodes.
apiVersion: kmm.sigs.x-k8s.io/v1beta1
kind: Module
metadata:
name: onload
namespace: onload-kmm
spec:
selector:
#kubernetes.io/hostname: "<node_name>"
#node-role.kubernetes.io/worker: ""
#feature.node.kubernetes.io/pci-1924.present=true
moduleLoader:
serviceAccountName: onload-kmm-sa
container:
modprobe:
moduleName: sfc
modulesLoadingOrder:
- sfc
- sfc_resource
- sfc_char
- onload
dirName: /opt
lifecycle:
preStart:
exec:
command:
- /bin/bash
- -c
- |
echo "Pre-loading MTD module..."
modprobe mtd
echo "MTD module loaded, ready for sfc"
kernelMappings:
- regexp: '^.+$'
containerImage: >-
image-registry.openshift-image-registry.svc:5000/onload-kmm/onload-kmod:<srpm_version>-${KERNEL_FULL_VERSION}
registryTLS:
insecureSkipTLSVerify: true
build:
dockerfileConfigMap:
name: onload-kmm-build-dockerfile
key: dockerfile
buildArgs:
- name: SRPM_IMAGE
value: image-registry.openshift-image-registry.svc:5000/onload-kmm/onload-srpm:<srpm_version>
secrets:
- name: etc-pki-entitlement
sign:
keySecret:
name: onload-signing-key
certSecret:
name: onload-signing-cert
filesToSign:
- /opt/lib/modules/${KERNEL_FULL_VERSION}/extra/onload.ko
- /opt/lib/modules/${KERNEL_FULL_VERSION}/extra/sfc.ko
- /opt/lib/modules/${KERNEL_FULL_VERSION}/extra/sfc_char.ko
- /opt/lib/modules/${KERNEL_FULL_VERSION}/extra/sfc_resource.ko
You must edit this manifest to match your environment.
- Edit this file to add your version of the srpm container image created above –
<srpm_version>
. - Change the node selector to match your environment. If you are using the
feature.node.kubernetes.io/pci-1924.present=true
label from NFD, make sure that the nodes you want to install the module on have that label.
oc apply -f onload.module.yaml -n onload-kmm
Watch the Process: An Explanation of KMM
Applying the module will kick off the process. Here’s how it works.
- It matches on the
regexp: '^.+$'
to see if this build even applies here. In our case, we atually use the presence of the image to manage the complexity. - After matching, it looks to see if the
containerImage
defined exists in the repository. If is doesn’t, then it kicks off the build. The inclusion of the${KERNEL_FULL_VERSION}
variable in the name of the image is the key to making sure KMM builds a new image every time OpenShift updates. But it doesn’t create the final image because the.ko
files aren’t signed yet. It actually creates an intermediary image with the unsigned kmods.- If it exists, it just skips the build.
- You can watch the build happen as a pod in the
onload-kmm
namespace. If the build fails, you can loop over new versions of the dockerbuild and deleting/applying the module back to the project.
- After the build happens, the KMM process takes that intermediary image with the unsigned kmods and kicks off another pod to grab the files, sign them, and then place them in the final image.
- Once the final image is present, then the KMM process deploys a pod to every node matching the
selector
defined in the Module CR and installs the kernel modules.
This process happens every time a node joins the cluster. Reboot the node? It runs. Upgrade the cluster? Remember the nodes are drained one-by-one and the entire OS is reinstalled from scratch so after it reboots, the KMM process runs and sees that the kernel version has been updated and builds a new set of kmods and installs them.
Links
- Github Repo: https://github.com/openshift-tigerteam/onload-kmm
- Kernel Module Management Operator: https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/specialized_hardware_and_driver_enablement/kernel-module-management-operator
- Solarflare Onload Documentation: https://www.xilinx.com/support/download/nic-software-and-drivers.html