Using Centos 5.2 stateless Linux support on a flash based root filesystem

Notes on using stateless linux support with a compact flash based root filesystem.

The stateless Linux support in CentOS v5.2 is provided by the initscripts (8.45.19.1.EL-1.el5) package. According to comments found through google, the stateless linux support is intended for live images. Stateless Linux provides support for:

a read-only root filesystem
putting temporary files in a temporary filesystem
mounting read/write persistent state from a local filesystem or NFS

Stateless Linux Documentation

Documentation generated from reverse engineering the scripts.

Files

The files & directories involved in a stateless linux configuration are:

File/Directory	Description
/etc/sysconfig/readonly-root	the top level configuration file
/etc/rwtab	a configuration file for the list of files and directories that should be mounted in the temporary read-write filesystem
/etc/rwtab.d/	a directory of rwtab configuration files
/.snapshot	the default mount point for the stateless configuration filesystem [Note: This is /var/lib/stateless/state on later versions of the initscripts]
/var/lib/stateless/writable	the default mount point for the temporary read-write filesystem
<STATE_MOUNT>/etc	this directory must be present in the state device. The script checks that this directory is present.
<STATE_MOUNT>/files	the list of files/directories to mount. The files must be listed one per line. The files/directories must exist in both the state filesystem and the root filesystem [Gottcha: If the file is to be mounted in a directory that is also listed in the rwtab configuration file, it needs to be present in the tmpfs]
/etc/statetab	[Note: Not supported in CentOS 5.2 initscripts v8.45.19.1]
/etc/statetab.d	[Note: Not supported in CentOS 5.2 initscripts v8.45.19.1]

Kernel parameters

The following kernel parameters are supported:

Parameter	Description
'readonlyroot'	override the configuration of 'READONLY' parameter of the /etc/sysconfig/readonly-root configuration to the value 'true'
'noreadonlyroot'	override the configuration of 'READONLY' parameter of the /etc/sysconfig/readonly-root configuration to the value 'false'. Setting this value will no override the setting of 'TEMPORARY_STATE'.

Readonly-root configuration

The configuration file /etc/sysconfig/readonly-root supports the following variables:

Variable Name	Values	Description
READONLY	yes \| no	Whether to enable support for 'Stateless Linux'.
TEMPORARY_STATE	yes \| no	Whether to mount the files/directories listed in the rwtab configuration files into a temporary filesystem. Implied to be enabled if READONLY is 'yes'
RW_MOUNT	<directory> [default=/var/lib/stateless/writable]	The mount point for the temporary scratch writable space. There are three options for mounting this: mount using the device and options defined in /etc/fstab. This allows options to be set in the fstab file. mount using the filestem label defined in RW_LABEL mount a tmpfs filesystem
RW_LABEL	<filesystem label> [default 'stateless-rw'].	Label on local filesystem which can be used for temporary scratch space. Note: UUID's are not supported.
RW_OPTIONS		[Note: Not supported in CentOS 5.2 initscripts v8.45.19.1]
STATE_MOUNT	<directory> [default '/.snapshot', or '/var/lib/stateless/state' on later versions]	Where to mount to the persistent data. There are three options mounting this that are attempted by the script: mount using the device and options defined in /etc/fstab mount using the filestem label defined in STATE_LABEL mount a NFS filesystem (If CLIENTSTATE is defined)
STATE_LABEL	[default 'stateless-state']	The label for partition with persistent data.
STATE_OPTIONS		[Note: Not supported in CentOS 5.2 initscripts v8.45.19.1]
CLIENTSTATE		Used to mount NFS state filesystem

A nearly read-only root filesystem

These notes are aimed at using a compact flash based root filesystem, where the truely read-only root filesystem feature is not required. Limiting write cycles is a good thing, but keeping the convenience of being able to write/update packages and configuration is useful.

Given a root filesystem backed by a simple flash device, it is desirable to limit the number of write cycles performed. To this end, configure the machine so that:

the root filesystem doesn't write atimes
log to another host (because /var/log will be lost on reboot)
set 'TEMPORARY_STATE=yes' in /etc/sysconfig/readonly-root

noatime

Use the 'noatime' (no access time) mount option on the root filesystem:

LABEL=/                 /                       ext3    noatime 1 1

Note: The CentOS 5.2 'util-linux' package doesn't support the 'relatime' mount option.

Accessing the 'real' root filesystem

The root filesystem has various files and directories mounted on it, thus obscuring the 'real' files. I added a mount for the root filesystem so that it was (easily) possible to edit files like /etc/fstab. I added the following to the fstab (Note: You must edit the real fstab file, not the one on the temporary filesystem).

/                       /mnt/root               none    bind            0 0

Temporary filesystem

The scripts will make three attempts to create a temporary filesystem. Performing no additional configuration will mean the last option (see above) will create a default tmpfs filesystem. On a machine with no swap, it might be a good idea to size the tmpfs (the default maximum size is half the physical RAM, which is great when you have swap).

Provide an '/etc/fstab' entry for the temporary filesystem:

tmpfs     /var/lib/stateless/writable tmpfs noauto,size=128M 0 0

Note: Consider sizing '/dev/shm' in the /etc/fstab configuration.

Monitoring

Use inotify tools to monitor filesystem write access. Install inotify-tools directly from the dag repository (given it is only one package, don't bother installing the RPMForge yum repo).

# rpm -Uvh http://rpmforge.sw.be/redhat/el5/en/i386/rpmforge/RPMS/inotify-tools-3.13-1.el5.rf.i386.rpm

Once the machine has been restarted, it is possible to view the effect of the stateless linux configuration by vieeing '/proc/mounts'. The content in '/etc/mtab' is incomplete since most of the mounts are performed with the --n' option.

$ cat /proc/mounts

Appendices

/etc/sysconfig/readonly-root

# Set to 'yes' to mount the system filesystems read-only.
READONLY=no
# Set to 'yes' to mount various temporary state as either tmpfs
# or on the block device labelled RW_LABEL. Implied by READONLY
TEMPORARY_STATE=no
# Place to put a tmpfs for temporary scratch writable space
RW_MOUNT=/var/lib/stateless/writable
# Label on local filesystem which can be used for temporary scratch space
RW_LABEL=stateless-rw
# Label for partition with persistent data
STATE_LABEL=stateless-state
# Where to mount to the persistent data
STATE_MOUNT=/.snapshot

/etc/rwtab

dirs    /var/cache/man
dirs    /var/gdm
dirs    /var/lock
dirs    /var/log
dirs    /var/run

empty   /tmp
empty   /var/cache/foomatic
empty   /var/cache/logwatch
empty   /var/cache/mod_ssl
empty   /var/cache/mod_proxy
empty   /var/cache/php-pear
empty   /var/cache/systemtap
empty   /var/db/nscd
empty   /var/lib/dav
empty   /var/lib/dhcp
empty   /var/lib/dhclient
empty   /var/lib/php
empty   /var/lib/ups
empty   /var/tmp
empty   /var/tux

files   /etc/adjtime
files   /etc/fstab
files   /etc/mtab
files   /etc/ntp.conf
files   /etc/resolv.conf
files   /etc/lvm/.cache
files   /var/account
files   /var/arpwatch
files   /var/cache/alchemist
files   /var/lib/iscsi
files   /var/lib/logrotate.status
files   /var/lib/ntp
files   /var/lib/xen

initscripts v8.45.19.1/etc/rc.sysinit (CentOS v5.2)

This is a small section of the init script relating to the stateless linux support

READONLY=
if [ -f /etc/sysconfig/readonly-root ]; then
        . /etc/sysconfig/readonly-root
fi
if strstr "$cmdline" readonlyroot ; then
        READONLY=yes
        [ -z "$RW_MOUNT" ] && RW_MOUNT=/var/lib/stateless/writable
fi
if strstr "$cmdline" noreadonlyroot ; then
        READONLY=no
fi

if [ "$READONLY" = "yes" -o "$TEMPORARY_STATE" = "yes" ]; then

        mount_empty() {
                if [ -e "$1" ]; then
                        echo "$1" | cpio -p -vd "$RW_MOUNT" &>/dev/null
                        mount -n --bind "$RW_MOUNT$1" "$1"
                fi
        }

        mount_dirs() {
                if [ -e "$1" ]; then
                        mkdir -p "$RW_MOUNT$1"
                        # fixme: find is bad
                        find "$1" -type d -print0 | cpio -p -0vd "$RW_MOUNT" &>/dev/null
                        mount -n --bind "$RW_MOUNT$1" "$1"
                fi
        }

        mount_files() {
                if [ -e "$1" ]; then
                        cp -a --parents "$1" "$RW_MOUNT"
                        mount -n --bind "$RW_MOUNT$1" "$1"
                fi
        }

        # Common mount options for scratch space regardless of
        # type of backing store
        mountopts=

        # Scan partitions for local scratch storage
        rw_mount_dev=$(blkid -t LABEL="$RW_LABEL" -o device | awk '{ print ; exit }')

        # First try to mount scratch storage from /etc/fstab, then any
        # partition with the proper label.  If either succeeds, be sure
        # to wipe the scratch storage clean.  If both fail, then mount
        # scratch storage via tmpfs.
        if mount $mountopts "$RW_MOUNT" > /dev/null 2>&1 ; then
                rm -rf "$RW_MOUNT" > /dev/null 2>&1
        elif [ x$rw_mount_dev != x ] && mount $rw_mount_dev $mountopts "$RW_MOUNT" > /dev/null 2>&1; then
                rm -rf "$RW_MOUNT"  > /dev/null 2>&1
        else
                mount -n -t tmpfs $mountopts none "$RW_MOUNT"
        fi

        for file in /etc/rwtab /etc/rwtab.d/* ; do
                is_ignored_file "$file" && continue
                [ -f $file ] && cat $file | while read type path ; do
                        case "$type" in
                                empty)
                                        mount_empty $path
                                        ;;
                                files)
                                        mount_files $path
                                        ;;
                                dirs)
                                        mount_dirs $path
                                        ;;
                                *)
                                        ;;
                        esac
                        [ -n "$SELINUX_STATE" -a -e "$path" ] && restorecon -R "$path"
                done
        done

        # In theory there should be no more than one network interface active

        # this early in the boot process -- the one we're booting from.
        # Use the network address to set the hostname of the client.  This
        # must be done even if we have local storage.
        ipaddr=
        if [ "$HOSTNAME" = "localhost" -o "$HOSTNAME" = "localhost.localdomain" ]; then
                ipaddr=$(ip addr show to 0/0 scope global | awk '/[[:space:]]inet / { print gensub("/.*","","g",$2) }')
                if [ -n "$ipaddr" ]; then
                        eval $(ipcalc -h $ipaddr 2>/dev/null)
                        hostname ${HOSTNAME}
                fi
        fi

        # Clients with read-only root filesystems may be provided with a
        # place where they can place minimal amounts of persistent
        # state.  SSH keys or puppet certificates for example.
        #
        # Ideally we'll use puppet to manage the state directory and to
        # create the bind mounts.  However, until that's all ready this
        # is sufficient to build a working system.

        # First try to mount persistent data from /etc/fstab, then any
        # partition with the proper label, then fallback to NFS
        state_mount_dev=$(blkid -t LABEL="$STATE_LABEL" -o device | awk '{ print ; exit }')
        if mount $mountopts "$STATE_MOUNT" > /dev/null 2>&1 ; then
                /bin/true
        elif [ x$state_mount_dev != x ] && mount $state_mount_dev $mountopts "$STATE_MOUNT" > /dev/null 2>&1;  then
                /bin/true
        elif [ -n "$CLIENTSTATE" ]; then
                # No local storage was found.  Make a final attempt to find
                # state on an NFS server.

                mount -t nfs $CLIENTSTATE/$HOSTNAME $STATE_MOUNT -o rw,nolock
        fi

        if [ -d $STATE_MOUNT/etc ]; then
                # Copy the puppet CA's cert from the r/o image into the
                # state directory so that we can create a bind mount on
                # the ssl directory for storing the client cert.  I'd really
                # rather have a unionfs to deal with this stuff
                cp --parents -f -p /var/lib/puppet/ssl/certs/ca.pem $STATE_MOUNT 2>/dev/null

                # In the future this will be handled by puppet
                for i in $(grep -v "^#" $STATE_MOUNT/files); do
                        if [ -e $i ]; then
                                mount -n -o bind $STATE_MOUNT/${i} ${i}
                        fi
                done
        fi
fi

initscripts v8.86 /etc/rc.sysinit

READONLY=
if [ -f /etc/sysconfig/readonly-root ]; then
        . /etc/sysconfig/readonly-root
fi
if strstr "$cmdline" readonlyroot ; then
        READONLY=yes
        [ -z "$RW_MOUNT" ] && RW_MOUNT=/var/lib/stateless/writable
        [ -z "$STATE_MOUNT" ] && STATE_MOUNT=/var/lib/stateless/state
fi
if strstr "$cmdline" noreadonlyroot ; then
        READONLY=no
fi

if [ "$READONLY" = "yes" -o "$TEMPORARY_STATE" = "yes" ]; then

        mount_empty() {
                if [ -e "$1" ]; then
                        echo "$1" | cpio -p -vd "$RW_MOUNT" &>/dev/null
                        mount -n --bind "$RW_MOUNT$1" "$1"
                fi
        }

        mount_dirs() {
                if [ -e "$1" ]; then
                        mkdir -p "$RW_MOUNT$1"
                        find "$1" -type d -print0 | cpio -p -0vd "$RW_MOUNT" &>/dev/null
                        mount -n --bind "$RW_MOUNT$1" "$1"
                fi
        }

        mount_files() {
                if [ -e "$1" ]; then
                        cp -a --parents "$1" "$RW_MOUNT"
                        mount -n --bind "$RW_MOUNT$1" "$1"
                fi
        }

        # Common mount options for scratch space regardless of
        # type of backing store
        mountopts=

        # Scan partitions for local scratch storage
        rw_mount_dev=$(blkid -t LABEL="$RW_LABEL" -l -o device)

        # First try to mount scratch storage from /etc/fstab, then any
        # partition with the proper label.  If either succeeds, be sure
        # to wipe the scratch storage clean.  If both fail, then mount
        # scratch storage via tmpfs.
        if mount $mountopts "$RW_MOUNT" > /dev/null 2>&1 ; then
                rm -rf "$RW_MOUNT" > /dev/null 2>&1
        elif [ x$rw_mount_dev != x ] && mount $rw_mount_dev $mountopts "$RW_MOUNT" > /dev/null 2>&1; then
                rm -rf "$RW_MOUNT"  > /dev/null 2>&1
        else
                mount -n -t tmpfs $RW_OPTIONS $mountopts none "$RW_MOUNT"
        fi

        for file in /etc/rwtab /etc/rwtab.d/* ; do
                is_ignored_file "$file" && continue
                [ -f $file ] && cat $file | while read type path ; do
                        case "$type" in
                                empty)
                                        mount_empty $path
                                        ;;
                                files)
                                        mount_files $path
                                        ;;
                                dirs)
                                        mount_dirs $path
                                        ;;
                                *)
                                        ;;
                        esac
                        [ -n "$SELINUX_STATE" -a -e "$path" ] && restorecon -R "$path"
                done
        done

        # In theory there should be no more than one network interface active
        # this early in the boot process -- the one we're booting from.
        # Use the network address to set the hostname of the client.  This
        # must be done even if we have local storage.
        ipaddr=
        if [ "$HOSTNAME" = "localhost" -o "$HOSTNAME" = "localhost.localdomain" ]; then
                ipaddr=$(ip addr show to 0.0.0.0/0 scope global | awk '/[[:space:]]inet / { print gensub("/.*","","g",$2) }')
                if [ -n "$ipaddr" ]; then
                        eval $(ipcalc -h $ipaddr 2>/dev/null)
                        hostname ${HOSTNAME}
                fi
        fi

        # Clients with read-only root filesystems may be provided with a
        # place where they can place minimal amounts of persistent
        # state.  SSH keys or puppet certificates for example.
        #
        # Ideally we'll use puppet to manage the state directory and to
        # create the bind mounts.  However, until that's all ready this
        # is sufficient to build a working system.

        # First try to mount persistent data from /etc/fstab, then any
        # partition with the proper label, then fallback to NFS
        state_mount_dev=$(blkid -t LABEL="$STATE_LABEL" -l -o device)
        if mount $mountopts $STATE_OPTIONS "$STATE_MOUNT" > /dev/null 2>&1 ; then
                /bin/true
        elif [ x$state_mount_dev != x ] && mount $state_mount_dev $mountopts "$STATE_MOUNT" > /dev/null 2>&1;  then
                /bin/true
        elif [ ! -z "$CLIENTSTATE" ]; then
                # No local storage was found.  Make a final attempt to find
                # state on an NFS server.

                mount -t nfs $CLIENTSTATE/$HOSTNAME $STATE_MOUNT -o rw,nolock
        fi

        if [ -w "$STATE_MOUNT" ]; then

                mount_state() {
                        if [ -e "$1" ]; then
                                [ ! -e "$STATE_MOUNT$1" ] && cp -a --parents "$1" "$STATE_MOUNT"
                                mount -n --bind "$STATE_MOUNT$1" "$1"
                        fi
                }

                for file in /etc/statetab /etc/statetab.d/* ; do
                        is_ignored_file "$file" && continue
                        [ ! -f "$file" ] && continue

                        if [ -f "$STATE_MOUNT/$file" ] ; then
                                mount -n --bind "$STATE_MOUNT/$file" "$file"
                        fi

                        for path in $(grep -v "^#" "$file" 2>/dev/null); do
                                mount_state "$path"
                                [ -n "$SELINUX_STATE" -a -e "$path" ] && restorecon -R "$path"
                        done
                done

                if [ -f "$STATE_MOUNT/files" ] ; then
                        for path in $(grep -v "^#" "$STATE_MOUNT/files" 2>/dev/null); do
                                mount_state "$path"
                                [ -n "$SELINUX_STATE" -a -e "$path" ] && restorecon -R "$path"
                        done
                fi
        fi
fi

Sections

Personal tools