本文测试环境:ubuntu 16.04 。

普通容器直接运行 systemd 不能正常工作

尝试直接在普通容器中运行 systemd,无输出显示,systemd 不能正常工作:

docker run --name test -ti --rm hanyong/ubuntu:16.04 /sbin/init

进入容器观察:

# ll /sbin/init
lrwxrwxrwx 1 root root 20 Mar  8 17:51 /sbin/init -> /lib/systemd/systemd

# env
HOSTNAME=9dab51b984ad
TERM=xterm
LS_COLORS=......
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
SHLVL=1
HOME=/root
_=/usr/bin/env

root@9dab51b984ad:/
# ps -ef f
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
root         7     0  0 09:16 ?        Ss     0:00 /bin/bash
root        18     7  0 09:16 ?        R+     0:00  \_ ps -ef f
root         1     0  0 09:15 ?        Ss     0:00 /sbin/init

root@9dab51b984ad:/
# systemctl status
Failed to connect to bus: No such file or directory

systemd 在容器中正常工作需要的条件

参考:What is the scoop on running systemd in a container?

systemd 与 docker 两者都需要一些适配才能配合正常工作(新版本中已完成必要改造)。 systemd 正常工作还需要如下条件:

  • Systemd expects /run is mounted as a tmpfs.
  • Systemd expects /sys/fs/cgroup filesystem is mounted. It can work with it being mounted read/only.
  • Systemd expects /sys/fs/cgroup/systemd be mounted read/write.
  • Systemd does not exit on sigterm. Systemd defines that shutdown signal as SIGRTMIN+3, docker upstream should send this signal when user does a docker stop.
  • Systemd wants to have a unique /etc/machine-id to identify the system.
  • Journald expects to write content to memory or to the /var/log/journal if it exists.

相关条件整理:

  • 除上述条件外还有一条,指定环境变量 container=docker,指示当前运行在 docker 容器内。
  • 文件系统。容器内查看可知,默认情况下 /etc/machine-id 内容为空,/var/log/journal 不存在,/run 未挂载 tmpfs 。 /etc/machine-id 可暂忽略,tmpfs 创建容器时可指定挂载。建议 /tmp 也挂载为 tmpfs 。
  • cgroup 文件系统(/sys/fs/cgroup 等),默认未挂载,可指定从宿主机挂载到容器内。更进一步,可将除 /sys/fs/cgroup/systemd 之外的目录挂载为只读以增强安全性。
  • stop-signal,可在创建容器时指定。

整理 docker 参数如下:

docker run --name test -ti --rm -e container=docker --tmpfs /run -v /sys/fs/cgroup:/sys/fs/cgroup:ro --stop-signal SIGRTMIN+3

测试设置容器运行 systemd

进行相关设置后再次运行容器,发现还是没有输出,systemd 未正常工作:

docker run --name test -ti --rm -e container=docker --tmpfs /run -v /sys/fs/cgroup:/sys/fs/cgroup:ro --stop-signal SIGRTMIN+3 hanyong/ubuntu:16.04 /sbin/init

进入容器查看:

root@01802d3fa9a9:/
# ps -ef f
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
root         7     0  0 10:08 ?        Ss     0:00 /bin/bash
root        24     7  0 10:09 ?        R+     0:00  \_ ps -ef f
root         1     0  0 10:07 ?        Ss     0:00 /sbin/init

root@01802d3fa9a9:/
# findmnt /sys/fs/cgroup/ -R
TARGET                            SOURCE FSTYPE OPTIONS
/sys/fs/cgroup                    tmpfs  tmpfs  ro,mode=755
|-/sys/fs/cgroup/systemd          cgroup cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,n
|-/sys/fs/cgroup/perf_event       cgroup cgroup rw,nosuid,nodev,noexec,relatime,perf_event,release_agent=/run/cgmanager/agents/cgm-relea
|-/sys/fs/cgroup/blkio            cgroup cgroup rw,nosuid,nodev,noexec,relatime,blkio
|-/sys/fs/cgroup/cpu,cpuacct      cgroup cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
|-/sys/fs/cgroup/pids             cgroup cgroup rw,nosuid,nodev,noexec,relatime,pids,release_agent=/run/cgmanager/agents/cgm-release-age
|-/sys/fs/cgroup/net_cls,net_prio cgroup cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
|-/sys/fs/cgroup/hugetlb          cgroup cgroup rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-
|-/sys/fs/cgroup/cpuset           cgroup cgroup rw,nosuid,nodev,noexec,relatime,cpuset,clone_children
|-/sys/fs/cgroup/freezer          cgroup cgroup rw,nosuid,nodev,noexec,relatime,freezer
|-/sys/fs/cgroup/devices          cgroup cgroup rw,nosuid,nodev,noexec,relatime,devices
`-/sys/fs/cgroup/memory           cgroup cgroup rw,nosuid,nodev,noexec,relatime,memory

root@01802d3fa9a9:/
# systemctl status
Failed to connect to bus: No such file or directory

是否因为容器内 systemd 版本较低(未经改造)?查看 ubuntu 16.04 容器内 systemd 版本:

# systemd --version
systemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN

使用特权模式,不必指定挂载 cgroup 文件系统,容器内 systemd 启动很慢但可以启动成功。

docker run --name test -ti --rm -e container=docker --tmpfs /run --stop-signal SIGRTMIN+3 --privileged hanyong/ubuntu:16.04 /sbin/init

进入容器查看,可看到自动挂载了 cgroup 文件系统,同时启动了 getty 等容器内无用的服务,自动设置了 /etc/machine-id,貌似与虚拟机上启动的方式一样?

root@6a6f771ae4da:/
# ps -ef f
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
root        19     0  0 10:11 ?        Ss     0:00 /bin/bash
root       435    19  0 11:27 ?        R+     0:00  \_ ps -ef f
root         1     0  0 10:11 ?        Ds     0:01 /sbin/init
root        32     1  0 10:11 ?        Ss     0:00 /lib/systemd/systemd-journald
root        87     1  0 10:12 tty5     Ss+    0:00 /sbin/agetty --noclear tty5 linux
root        89     1  0 10:12 tty2     Ss+    0:00 /sbin/agetty --noclear tty2 linux
root        92     1  0 10:12 tty3     Ss+    0:00 /sbin/agetty --noclear tty3 linux
root        95     1  0 10:12 tty4     Ss+    0:00 /sbin/agetty --noclear tty4 linux
root        97     1  0 10:12 tty6     Ss+    0:00 /sbin/agetty --noclear tty6 linux
root       434     1  0 11:27 ?        Rs     0:00 (agetty)

root@6a6f771ae4da:/
# findmnt /sys/fs/cgroup/ -R
TARGET                            SOURCE                                                                           FSTYPE OPTIONS
/sys/fs/cgroup                    tmpfs                                                                            tmpfs  ro,nosuid,node
|-/sys/fs/cgroup/systemd          cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node
|-/sys/fs/cgroup/perf_event       cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node
|-/sys/fs/cgroup/blkio            cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node
|-/sys/fs/cgroup/cpu,cpuacct      cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node
|-/sys/fs/cgroup/pids             cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node
|-/sys/fs/cgroup/net_cls,net_prio cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node
|-/sys/fs/cgroup/hugetlb          cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node
|-/sys/fs/cgroup/cpuset           cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node
|-/sys/fs/cgroup/freezer          cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node
|-/sys/fs/cgroup/devices          cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node
`-/sys/fs/cgroup/memory           cgroup[/docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840] cgroup rw,nosuid,node

root@6a6f771ae4da:/
# systemctl status
● 6a6f771ae4da
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Tue 2018-06-26 10:11:43 UTC; 1h 16min ago
   CGroup: /docker/6a6f771ae4da447c4d69e5b30cb46e0df21824cb0ac23f0f01c2a4be6cb8d840
           ├─init.scope
           │ └─1 /sbin/init
           └─system.slice
             ├─systemd-journald.service
             │ └─32 /lib/systemd/systemd-journald
             ├─system-getty.slice
             │ ├─getty@tty4.service
             │ │ └─95 /sbin/agetty --noclear tty4 linux
             │ ├─getty@tty6.service
             │ │ └─97 /sbin/agetty --noclear tty6 linux
             │ ├─getty@tty3.service
             │ │ └─92 /sbin/agetty --noclear tty3 linux
             │ ├─getty@tty5.service
             │ │ └─87 /sbin/agetty --noclear tty5 linux
             │ └─getty@tty2.service
             │   └─89 /sbin/agetty --noclear tty2 linux
             └─console-getty.service
               └─439 (agetty)  

root@6a6f771ae4da:/
# cat /etc/machine-id 
5bc4776a7dca4c9fb776baa55c903101

测试直接运行 fedora:24

docker run --name test -ti --rm fedora:24 /sbin/init

查看 systemd 版本,与 ubuntu 16.04 一样,特性标识也完全相同:

[root@2e50623efb6d /]# /lib/systemd/systemd --version
systemd 229
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN

查看容器状态:

[root@2e50623efb6d /]# findmnt /sys/fs/cgroup -R
TARGET                            SOURCE                                                                           FSTYPE OPTIONS
/sys/fs/cgroup                    tmpfs                                                                            tmpfs  ro,nosuid,node
|-/sys/fs/cgroup/systemd          cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node
|-/sys/fs/cgroup/perf_event       cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node
|-/sys/fs/cgroup/blkio            cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node
|-/sys/fs/cgroup/cpu,cpuacct      cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node
|-/sys/fs/cgroup/pids             cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node
|-/sys/fs/cgroup/net_cls,net_prio cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node
|-/sys/fs/cgroup/hugetlb          cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node
|-/sys/fs/cgroup/cpuset           cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node
|-/sys/fs/cgroup/freezer          cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node
|-/sys/fs/cgroup/devices          cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node
`-/sys/fs/cgroup/memory           cgroup[/docker/2e50623efb6d7f4e9a46605f7f497745702bb6b379620711d3ee3e4ead7af314] cgroup ro,nosuid,node

[root@2e50623efb6d /]# findmnt /run

[root@2e50623efb6d /]# systemctl status
Failed to connect to bus: No such file or directory

[root@2e50623efb6d /]# cat /etc/machine-id 
cat: /etc/machine-id: No such file or directory

测试进行相关设置后运行 fedora:24,systemd 未正常工作,同时输出报错信息:

$ docker run --name test -ti --rm -e container=docker --tmpfs /run -v /sys/fs/cgroup:/sys/fs/cgroup:ro --stop-signal SIGRTMIN+3 fedora:24 /sbin/init
Failed to determine whether /sys is a mount point: Operation not permitted
Failed to determine whether /proc is a mount point: Operation not permitted
Failed to determine whether /dev is a mount point: Operation not permitted
Failed to determine whether /dev/shm is a mount point: Operation not permitted
Failed to determine whether /run is a mount point: Operation not permitted
Failed to determine whether /sys/fs/cgroup is a mount point: Operation not permitted
Failed to determine whether /sys/fs/cgroup/systemd is a mount point: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.
Freezing execution.

与参考文章中正常工作的 systemd 对比,版本号同为 229,特性标识存在以下差异(但貌似并不是问题的原因?):

+APPARMOR -LZ4 -IDN

搜索找到另一段启动 systemd 的配置,见:https://github.com/maci0/docker-systemd-unpriv/blob/master/Dockerfile