353 lines
12 KiB
Plaintext
353 lines
12 KiB
Plaintext
== System Administration
|
|
|
|
=== Services on a running system
|
|
|
|
Liminix services are built on s6-rc, which is itself layered on s6.
|
|
Services are defined at build time in your configuration (see
|
|
<<_services>> for information) and can't be added
|
|
to/changed at runtime, but to monitor events or diagnose problems you
|
|
may need to inspect them on the running system. Here are some of the
|
|
most commonly used s6,-rc commands:
|
|
|
|
.Service management quick reference
|
|
[width="100%",cols="55%,45%",options="header",]
|
|
|===
|
|
|What |How
|
|
|List all running services |`+s6-rc -a list+`
|
|
|
|
|List all services that are *not* running |`+s6-rc -da list+`
|
|
|
|
|List services that `+wombat+` depends on
|
|
|`+s6-rc-db dependencies wombat+`
|
|
|
|
|... transitively |`+s6-rc-db all-dependencies wombat+`
|
|
|
|
|List services that depend on service `+wombat+`
|
|
|`+s6-rc-db -d dependencies wombat+`
|
|
|
|
|... transitively |`+s6-rc-db -d all-dependencies wombat+`
|
|
|
|
|Stop service `+wombat+` and everything depending on it
|
|
|`+s6-rc -d change wombat+`
|
|
|
|
|Start service `+wombat+` (but not any services depending on it)
|
|
|`+s6-rc -u change wombat+`
|
|
|
|
|Start service `+wombat+` and all* services depending on it
|
|
|`+s6-rc-up-tree wombat+`
|
|
|===
|
|
|
|
`+s6-rc-up-tree+` brings up a service and all services that depend on
|
|
it, except for any services that depend on a "controlled" service that
|
|
is not currently running. Controlled services are not started at boot
|
|
time but in response to external events (e.g. plugging in a particular
|
|
piece of hardware) so you probably don't want to be starting them by
|
|
hand if the conditions aren't there.
|
|
|
|
A service may be *up* or *down* (there are no intermediate states like
|
|
"started" or "stopping" or "dying" or "cogitating"). Some (but not all)
|
|
services have "readiness" notifications: the dependents of a service
|
|
with a readiness notification won't be started until the service signals
|
|
(by writing to a nominated file descriptor) that it's prepared to start
|
|
work. Most services defined by Liminix also have a `+timeout-up+`
|
|
parameter, which means that if a service has readiness notifications and
|
|
doesn't become ready in the allotted time (defaults 20 seconds) it will
|
|
be terminated and its state set to *down*.
|
|
|
|
If the process providing a service dies, it will be restarted
|
|
automatically. Liminix does not automatically set it to *down*.
|
|
|
|
(If the process providing a service dies without ever notifying
|
|
readiness, Liminix will restart it as many times as it has to until the
|
|
timeout period elapses, and then stop it and mark it down.)
|
|
|
|
==== Controlled services
|
|
|
|
*Controlled* services are those which are started/stopped on demand by a
|
|
*controller* (another service) instead of being started at boot time.
|
|
For example:
|
|
|
|
* `+svc.uevent-rule.build+` creates a controlled service which is active
|
|
when a particular hardware device (identified by uevent/sysfs directory)
|
|
is present.
|
|
* `+svc.round-robin.build+` creates a service controller that invokes
|
|
two or more services in turn, running the next one when the process
|
|
providing the previous one exits. We use this for failover from one
|
|
network connection to a backup connection, for example.
|
|
* `+svc.health-check.build+` creates a service controller that runs a
|
|
controlled service and periodically tests whether it is healthy by
|
|
running an external health check command or script. If the check command
|
|
repeatedly fails, the controlled service is restarted.
|
|
+
|
|
The Configuration section of the manual describes controlled services in
|
|
more detail. Some operational considerations
|
|
* `+round-robin+` detects a service status by looking at its `+outputs+`
|
|
directory, so it won't work unless the service creates some outputs.
|
|
This is considered a bug and will be fixed in a future release
|
|
* `+health-check+` works for longruns but not for oneshots, as it
|
|
internally relies on `+s6-svc+` to restart the process
|
|
|
|
==== Logs
|
|
|
|
Logs for all services are collated into `+/run/log/current+`. The log
|
|
file is rotated when it reaches a threshold size, into another file in
|
|
the same directory whose name contains a TAI64 timestamp.
|
|
|
|
Each log line is prefixed with a TAI64 timestamp and the name of the
|
|
service, if it is a longrun. If it is a oneshot, a timestamp and the
|
|
name of some other service. To convert the timestamp into a
|
|
human-readable format, use `+s6-tai64nlocal+`.
|
|
|
|
[source,console]
|
|
----
|
|
# ls -l /run/log/
|
|
-rw-r--r-- 1 0 lock
|
|
-rw-r--r-- 1 0 state
|
|
-rwxr--r-- 1 98059 @4000000000025cb629c311ac.s
|
|
-rwxr--r-- 1 98061 @40000000000260f7309c7fb4.s
|
|
-rwxr--r-- 1 98041 @40000000000265233a6cc0b6.s
|
|
-rwxr--r-- 1 98019 @400000000002695d10c06929.s
|
|
-rwxr--r-- 1 98064 @4000000000026d84189559e0.s
|
|
-rwxr--r-- 1 98055 @40000000000271ce1e031d91.s
|
|
-rwxr--r-- 1 98054 @400000000002760229733626.s
|
|
-rwxr--r-- 1 98104 @4000000000027a2e3b6f4e12.s
|
|
-rwxr--r-- 1 98023 @4000000000027e6f0ed24a6c.s
|
|
-rw-r--r-- 1 42374 current
|
|
|
|
# tail -2 /run/log/current
|
|
@40000000000284f130747343 wan.link.pppoe Connect: ppp0 <--> /dev/pts/0
|
|
@40000000000284f230acc669 wan.link.pppoe sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x667a9594> <pcomp> <accomp>]
|
|
# tail -2 /run/log/current | s6-tai64nlocal
|
|
1970-01-02 21:51:45.828598156 wan.link.pppoe sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x667a9594> <pcomp> <accom
|
|
p>]
|
|
1970-01-02 21:51:48.832588765 wan.link.pppoe sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x667a9594> <pcomp> <accom
|
|
p>]
|
|
----
|
|
|
|
===== Log persistence
|
|
|
|
Logs written to `+/run/log/+` will not survive a reboot or crash, as it
|
|
is an ephemeral filesystem.
|
|
|
|
On supported hardware you can enable logging to
|
|
https://www.kernel.org/doc/Documentation/ABI/testing/pstore[pstore]
|
|
which means the most recent log messages will be preserved on reboot.
|
|
Set the config option `+logging.persistent.enable = true+` to log
|
|
messages to `+/dev/pmsg0+` as well as to the regular log. This is a
|
|
circular buffer, so when it fills up newer messages will overwrite the
|
|
oldest messages.
|
|
|
|
Logs found in pstore after a reboot will be moved at startup to
|
|
`+/run/log/previous-boot+`
|
|
|
|
=== Updating an installed system
|
|
|
|
If your system has a writable root filesystem (JFFS2, btrfs etc
|
|
-anything but squashfs), we have mechanisms for in-places updates
|
|
analogous to `+nixos-rebuild+`, but the operation is a bit different
|
|
because it expects to run on a build machine and then copy to the host
|
|
device using `+ssh+`.
|
|
|
|
To use this, build the `+outputs.updater+` target and then run the
|
|
`+update.sh+` script it generates.
|
|
|
|
[source,console]
|
|
----
|
|
nix-build -I liminix-config=./my-configuration.nix \
|
|
--arg device "import ./devices/mydevice" \
|
|
-A outputs.updater
|
|
./result/bin/update.sh root@the-device
|
|
----
|
|
|
|
The update script uses min-copy-closure to copy new or changed packages
|
|
to the device, then (perhaps) reboots it. The reboot behaviour can be
|
|
affected by flags:
|
|
|
|
* [.title-ref]#--no-reboot# will cause it not to reboot at all, if you
|
|
would rather do that yourself. Note that none of the newly-installed or
|
|
updated services will be running until you do.
|
|
* [.title-ref]#--fast# causes it tn not do a full reboot, but instead to
|
|
restart only the services that have been changed. This will restart all
|
|
of the services that have updated store paths (and anything that depends
|
|
on them), but will not affect services that haven't changed.
|
|
|
|
It doesn't delete old packages automatically: to do that run
|
|
`+min-collect-garbage+`, which will delete any packages not in the
|
|
current system closure. Note that Liminix does not have the NixOS
|
|
concept of environments or generations, and there is no way back from
|
|
this except for building the previous configuration again.
|
|
|
|
==== Caveats
|
|
|
|
* it needs there to be enough free space on the device for all the new
|
|
packages in addition to all the packages already on it - which may be a
|
|
problem if there is little flash storage or if a lot of things have
|
|
changed (e.g. a new version of nixpkgs).
|
|
* it may not be able to upgrade the kernel: this is device-dependent. If
|
|
your device boots from a kernel image on a raw MTD partition or or UBI
|
|
volume, update.sh is unable to alter the kernel partition. If your
|
|
device boots from a kernel inside the filesystem (e.g. using
|
|
bootloader.extlinux or bootloder.fit) then the kernel will be upgraded
|
|
along with the userland
|
|
|
|
==== Recovery/downgrades
|
|
|
|
The `+update.sh+` script also creates a timestamped symlink on the
|
|
device which points to the system configuration it installs. If you
|
|
install a configuration that doesn't work, you can revert to any other
|
|
installed configuration by
|
|
|
|
[arabic]
|
|
. booting to some kind of rescue or recovery system (which may be some
|
|
vendor-provided rescue option, or your own recovery system perhaps based
|
|
on `+examples/recovery.nix+`) and mounting your Liminix filesystem on
|
|
/mnt
|
|
. picking another previously-installed configuration that _link:[did]
|
|
work, and switching back to it:
|
|
|
|
[source,console]
|
|
----
|
|
# ls -ld /mnt/*configuration
|
|
lrwxrwxrwx 1 90 /mnt/20252102T182104.configuration -> nix/store/v1w0h4zw65ah4c2r0k7nyy125qrxhq78-system-configuration-aarch64-unknown-linux-musl
|
|
lrwxrwxrwx 1 90 /mnt/20251802T181822.configuration -> nix/store/wqjl9s9xljl2wg8257292zghws9ssidk-system-configuration-aarch64-unknown-linux-musl
|
|
# : 20251802T181822 is the working system, so reinstall it
|
|
# /mnt/20251802T181822.configuration/bin/install /mnt
|
|
# umount /mnt
|
|
# reboot
|
|
----
|
|
|
|
This will install the previous configuration's activation binary into
|
|
/bin, and copy its kernel and initramfs into /boot. Note that it depends
|
|
on the previous system not having been garbage-collected.
|
|
|
|
==== Adding packages
|
|
|
|
If you simply wish to add a package without any change to services, you
|
|
can call `+min-copy-closure+` directly to install any package in Nixpkgs
|
|
or in the Liminix overlay
|
|
|
|
[source,console]
|
|
----
|
|
nix-build -I liminix-config=./my-configuration.nix \
|
|
--arg device "import ./devices/mydevice" -A pkgs.tcpdump
|
|
|
|
nix-shell -p min-copy-closure root@the-device result/
|
|
----
|
|
|
|
Note that this only copies the package and its dependencies to the
|
|
device: it doesn't update any profile to add it to `+$PATH+`
|
|
|
|
[reftext="Levitate"]
|
|
[[levitate]]
|
|
=== Levitate: Reinstalling on a running system
|
|
|
|
Liminix is initially installed from a monolithic `+firmware.bin+` - and
|
|
unless you're running a writable filesystem, the only way to update it
|
|
is to build and install a whole new `+firmware.bin+`. However, you
|
|
probably would prefer not to have to remove it from its installation
|
|
site, unplug it from the network and stick serial cables in it all over
|
|
again.
|
|
|
|
It is not (generally) safe to install a new firmware onto the flash
|
|
partitions that the active system is running on. To address this we have
|
|
`+levitate+`, which a way for a running Liminix system to "soft restart"
|
|
into a ramdisk running only a limited set of services, so that the main
|
|
partitions can then be safely flashed.
|
|
|
|
==== Configuration
|
|
|
|
Levitate _needs to be configured when you create the initial system_ to
|
|
specify which services/packages/etc to run in maintenance mode. Most
|
|
likely you want to configure a network interface and an ssh for example
|
|
so that you can login to reflash it.
|
|
|
|
[source,nix]
|
|
----
|
|
defaultProfile.packages = with pkgs; [
|
|
...
|
|
(levitate.override {
|
|
config = {
|
|
services = {
|
|
inherit (config.services) dhcpc sshd watchdog;
|
|
};
|
|
defaultProfile.packages = [ mtdutils ];
|
|
users.root = config.users.root;
|
|
};
|
|
})
|
|
];
|
|
----
|
|
|
|
==== Use
|
|
|
|
Connect (with ssh, probably) to the running Liminix system that you wish
|
|
to upgrade.
|
|
|
|
[source,console]
|
|
----
|
|
bash$ ssh root@the-device
|
|
----
|
|
|
|
Run `+levitate+`. This takes a little while (perhaps a few tens of
|
|
seconds) to execute, and copies all config required for maintenance mode
|
|
to `+/run/maintenance+`.
|
|
|
|
[source,console]
|
|
----
|
|
# levitate
|
|
----
|
|
|
|
Reboot into maintenance mode. You will be logged out
|
|
|
|
[source,console]
|
|
----
|
|
# reboot
|
|
----
|
|
|
|
Connect to the device again - note that the ssh host key will have
|
|
changed.
|
|
|
|
[source,console]
|
|
----
|
|
# ssh -o UserKnownHostsFile=/dev/null root@the-device
|
|
----
|
|
|
|
Check we're in maintenance mode
|
|
|
|
[source,console]
|
|
----
|
|
# cat /etc/banner
|
|
|
|
LADIES AND GENTLEMEN WE ARE FLOATING IN SPACE
|
|
|
|
Most services are disabled. The system is operating
|
|
with a ram-based root filesystem, making it safe to
|
|
overwrite the flash devices in order to perform
|
|
upgrades and maintenance.
|
|
|
|
Don't forget to reboot when you have finished.
|
|
----
|
|
|
|
Perform the upgrade, using flashcp. This is an example, your device will
|
|
differ
|
|
|
|
[source,console]
|
|
----
|
|
# cat /proc/mtd
|
|
dev: size erasesize name
|
|
mtd0: 00030000 00010000 "u-boot"
|
|
mtd1: 00010000 00010000 "u-boot-env"
|
|
mtd2: 00010000 00010000 "factory"
|
|
mtd3: 00f80000 00010000 "firmware"
|
|
mtd4: 00220000 00010000 "kernel"
|
|
mtd5: 00d60000 00010000 "rootfs"
|
|
mtd6: 00010000 00010000 "art"
|
|
# flashcp -v firmware.bin mtd:firmware
|
|
----
|
|
|
|
All done
|
|
|
|
[source,console]
|
|
----
|
|
# reboot
|
|
----
|