ublk (or ubd) is a generic framework for implementing generic userspace block device based on io_uring
. It is designed to create a highly efficient data path for userspace storage software to provide high-performance block device service in local host.
The whole ublk service involves three parts: ublk driver, ublk server and ublk workload.
/dev/ublkbN
) into a ublk server.SPDK ublk target acts as a ublk server. It can handle ublk I/O requests within the whole SPDK userspace storage software stack.
A typical usage scenario is for container attached storage:
Previously, the design of putting I/O processing logic into userspace software always has a noticeable interaction overhead between the kernel module and userspace part.
ublk utilizes io_uring
which has been proven to be very efficient in decreasing the interaction overhead. The I/O request is delivered to the userspace ublk server via the newly added io_uring
command. A shared buffer via mmap
is used for sharing I/O descriptor to userspace from the kernel driver. The I/O data is copied only once between the specified userspace buffer address and request/bio's pages by the ublk driver.
A control device is create by ublk kernel module at /dev/ublk-control
. Userspace server sends control commands to kernel module via the control device using io_uring
.
Control commands includes add, configure, and start new ublk block device. Retrieving device information, stop and delete existing ublk block device are also there.
The add device command creates a bulk char device /dev/ublkcN
. It will be used by the ublk userspace server to mmap
I/O descriptor buffer. The start device command exposes a ublk block device /dev/ublkbN
. The block device can be formatted and mounted by a kernel filesystem, or read/written directly by other processes.
The datapath between ublk server and kernel driver includes io_uring
and shared memory buffer. The shared memory buffer is an array of I/O descriptors. Each SQE (Submission Queue Entry) in io_uring
is assigned one I/O descriptor and one user buffer address. When ublk kernel driver receives I/O requests from upper layer, the information of I/O requests will be filled into I/O descriptors by ublk kernel driver. The I/O data is copied between the specified user buffer address and request/bio's pages at the proper time.
At start, the ublk server needs to fill the io_uring
SQ (Submission Queue). Each SQE is marked with an operation flag UBLK_IO_FETCH_REQ
which means the SQE is ready to get I/O request.
When a CQE (Completion Queue Entry) is returned from the io_uring
indicating I/O request, the ublk server gets the position of the I/O descriptor from CQE. The ublk server handles the I/O request based on information in the I/O descriptor.
After the ublk server completes the I/O request, it updates the I/O's completion status and ublk operation flag. This time, the operation flag is UBLK_IO_COMMIT_AND_FETCH_REQ
which informs kernel module that one I/O request is completed, and also the SQE slot is free to fetch new I/O request.
UBLK_IO_COMMIT_AND_FETCH_REQ
is designed for efficiency in ublk. In runtime, the ublk server needs to commit I/O results back, and then provide new free SQE slots for fetching new I/O requests. Without UBLK_IO_COMMIT_AND_FETCH_REQ
flag, io_uring_submit()
should be called twice, once for committing I/O results back, once for providing free SQE slots. With UBLK_IO_COMMIT_AND_FETCH_REQ
flag, calling io_uring_submit()
once is enough because the ublk driver realizes that the submitted SQEs are reused both for committing back I/O results and fetching new requests.
SPDK ublk target is implemented as a high performance ublk server.
It creates one ublk spdk_thread on each spdk_reactor by default or on user specified reactors. When adding a new ublk block device, SPDK ublk target will assign queues of ublk block device to ublk spdk_threads in round-robin. That means one ublk device queue will only be processed by one spdk_thread. One ublk device with multiple queues can get multiple spdk reactors involved to process its I/O requests; One spdk_thread created by ublk target may process multiple queues, each from different ublk devices. In this way, spdk reactors can be fully utilized to achieve best performance, when there are only a few ublk devices.
ublk is io_uring
based. All ublk I/O queues are mapped to io_uring
. ublk spdk_thread gets I/O requests from available CQEs by polling all its assigned io_uring
s. When there are completed I/O requests, ublk spdk_thread will submit them as SQE back to io_uring
in batch.
Currently, ublk driver has a system thread context limitation that one ublk device queue can be only processed in the context of system thread which initialized the it. SPDK can't schedule ublk spdk_thread between different SPDK reactors. In other words, SPDK dynamic scheduler can't rebalance ublk workload by rescheduling ublk spdk_thread.
Build SPDK with SPDK ublk target enabled.
SPDK ublk target related libraries will then be linked into SPDK application spdk_tgt
. Setup some hugepages for the SPDK, and then run the SPDK application spdk_tgt
.
Once the spdk_tgt
is initialized, user can enable SPDK ublk feature by creating ublk target. However, before creating ublk target, ublk kernel module ublk_drv
should be loaded using modprobe
.
SPDK bdevs are block devices which will be exposed to the local host kernel as ublk block devices. SPDK supports several different types of storage backends, including NVMe, Linux AIO, malloc ramdisk and Ceph RBD. Refer to Block Device User Guide for additional information on configuring SPDK storage backends.
This guide will use a malloc bdev (ramdisk) named Malloc0. The following RPC will create a 256MB malloc bdev with 512-byte block size.
The following RPC will create a ublk block device exposing Malloc0 bdev. The created ublk block device has ID 1. It internally has 2 queues with queue depth 128.
This RPC will reply back the ID of ublk block device.
The position of ublk block device is determined by its ID. It is created at /dev/ublkb${ID}
. So the device we just created will be accessible to other processes via /dev/ublkb1
. Now applications like FIO or DD can work on /dev/ublkb1
directly.
A ublk block device is a generic kernel block device that can be formatted and mounted by kernel file system.
After usage, ublk block device can be stopped and deleted by RPC ublk_stop_disk
with its ID. Specify ID 1, then device /dev/ublkb1
will be removed.
If ublk is not used anymore, SPDK ublk target can be destroyed to free related SPDK resources.
Of course, SPDK ublk target and all ublk block devices would be destroyed automatically when SPDK application is terminated.