- Package Handling & Automatic Updates
- Adding a new SL release
- Booting a rescue system
- Building Kernel Packages
- Notes on 10GbE
- Workarounds for certain hardware/OS combinations
SL systems are installed using kickstart
The repository is mirrored from the ftp server at FNAL and is located on the installation server, pallas.ifh.de, in /nfs1/pallas/SL. The host profiles for the kickstart install are kept in /project/linux/SL/profiles, and some files needed during the postinstallation, before the AFS client is available, in /project/linux/SL/firstboot (accessible through the http server running on pallas). More files, and most utility scripts, are located in /project/linux/SL.
Installation takes the following steps:
Configure the host in VAMOS
This is important, because several variables must be set correctly since they are needed by the tools used in the following steps.
Create a system profile
Using CKS, information from VAMOS and possibly from the AMS directory or the live host, a kickstart file is generated that will steer the installation process. Today, only the partitioning information in that file is used any longer, and CKS will eventually be stripped down to not creating more than that.
Activate private key distribution
Only after this step, the host will be able to request its private keys and initial configuration cache from mentor.
Prepare system boot into installation
Current options include PXE and hard disk. Other possible methods like USB stick, CD-ROM, or a tftp grub floppy are not currently available.
Boot the system into installation
During boot, the system will load the kernel and initrd made available in the previous step. Networking information comes from a DHCP server or is provided on the kernel command line.
A generic kickstart profile for the major release is retrieved by anaconda, according to the ks parameter on the kernel command line.
The kickstart profile contains all other information needed, including the repository location, partitioning & package selection, and a postinstall script that will do some very basic configuration and retrieve and install a one-time init script.
After the first reboot, this init script (executing as the very last one) will retrieve the system's private keys and initial vamos configuration cache and some essential packages, and then bootstrap our site mechanisms for system maintenance.
Beginning with SL6, and now backported to SL5, the kickstart profile is generic (SL5.ks/SL6.ks/SL7.ks/...) because retrieval of per-host profiles is broken in current EL releases. A per-host profile is still written by CKS, but only the partitioning information from this file is actually used during installation (the whole file is retrieved with wget in %pre, and partitioning info extracted with sed).
System Configuration in VAMOS
Choose a default derived from the current slX-def, where X = 5, 6, 7, ... Before SL6, defaults starting with "slX-" were 32bit, those starting with "slXa-" were 64bit. These mainly differ in the settings for OS_ARCH and AFS_SYSNAME (see the slXa-mod modifier). 64bit capable systems can run the 32bit version as well. As of SL6, only 64bit Systems are supported, and slX-def will be 64bit. The slX-32-mod modifier is used for the few special purpose 32bit systems. At the time of writing, the only such SL6 system is used for testing building and running the 32bit OpenAFS module from the SRPM provided to the SL developers. Supporting 32-bit systems for users is not foreseeen for SL6+.
OS_ARCH is read by several tools in the following steps to determine what to install. The same is true for CF_SL_release: This variable determines which minor SL release the system will use. Both OS_ARCH and CF_SL_release affect the choice of installation kernel & initrd, installation repository, and yum repositories for updating and installing additional packages.
It should now be safe to do this step without disabling sue on the system, since sue.bootstrap will no longer permit OS_ARCH to change.
Run the Workflow whenever a system changes between major SL releases (say, from SL5 to SL6 or back), changes netgroups etc. If in doubt, just do it and wait for half an hour before proceeding.
Creating System Profiles
This is done with the tool CKS.pl which reads "host.cks" files and creates "host.ks" files from them, using additional information from VAMOS, the AMS directory, or the live system still running SL.
CKS.pl is located in /project/linux/SL/scripts, and is fully perldoc'd. There should be a link pointing to it in the profiles directory as well. A sample DEFAULT.cks with many comments is located in the same directory.
To create a profile:
- You need
write access to /project/linux/SL/profiles
- read access to VAMOS
- ssh access to the system to install, if it is up and partitions to be kept
cd into /project/linux/SL/profiles
- Check whether a .cks file for your host exists.
- If it does, and you find you have to modify the file, make sure it is not a link to some other file before you do so.
- If it does not, create one by starting with a copy from a similar
machine, or a copy of DEFAULT.cks
NO host.cks IS NEEDED AT ALL if you just want to upgrade or reinstall a normal system without changing disk partitioning, since DEFAULT.cks is always read and should cover this case completely.
Run CKS.pl, like this: ./CKS.pl host
Always rerun CKS before installing a system, even if the existing .cks file looks fine.
- Check the output! It contains a lot of information! Make sure you understand what the profile is
going to do to the machine! If in doubt, read and understand the SLx.ks and host.ks files before actually installing.Also make sure the SL release and architecture are what you want.
In particular, make sure you understand the partitioning, and any clearpart statements.
Other than with good old SuSE-based DL5, these may wipe the disks even in "interactive" installs!
Notice that as of SL6, only the partitioning information is actually used from the generated file. Anything else is not!
Activating Private Key Distribution
If you followed the instructions above (read the CKS output), you already know what to do:
ssh configsrv sudo prepare-ai <host>
This will activate the one-shot mechanism for giving the host (back) its private keys (root password, kerberos keyfile, vamos/ssh keys, ...). The init script retrieved during postinstall will start the script /products/ai/scripts/ai-web which will retrieve a tarball with these keys and a few other files from the install server. This works only once after each prepare-ai for a host. If after the installation the host has its credentials, it worked and no other system can possibly have them as well. If it hasn't, the keys are burned and have to be scrubbed. Hasn't happened yet, but who knows.
If ai-web fails, the system will retry after 5 minutes. Mails will be sent to linuxroot from both the AI server and the installing system, indicating that this happened. The reason is usually that the prepare-ai step was forgotten. Remember it has to be repeated before every reinstallation. The ai daemon writes a log /var/log/ai/ai_script.log.
Booting the system into installation
There are several options:
If the system is still running a working SL installation, this is the most convenient and reliable method: After logging on as root, run the script
/project/linux/SL/scripts/SLU.pl yes pleaseThe script will create an additional, default, boot loader entry to start the installation system. By default, all needed information is appended to the kernel command line, including networking information. Hence not even DHCP is needed. The script comes with full perldoc documentation. Some additional options are available or may even be necessary for certain hosts. To mention the two most important ones:
-dhcp will make the installation system use dhcp to find the IP address, which is useful if a system will be installed with a different address
-reboot will make the system reboot itself after a countdown (which can be interrupted with ^C)
This requires entries on both the DHCP and TFTP servers. The client will receive IP, netmask, gateway etc from the DHCP server, plus the information that it should fetch "pxelinux.0" from the TFTP server, and run it. Then, pxelinux.0 will request the host configuration file (IP address in hex notation) from the TFTP server (a link in /tftpboot/pxelinux.cfg/). This will in turn tell pxelinux.0 which kernel & initrd to retrieve from the TFTP server, and what parameters the kernel should receive on the command line. As root on pallas, run
/project/linux/SL/scripts/pxe <host>This will add the right link for the system in /tftpboot/pxelinux.cfg
- and also attempt to update the DHCP configuration on the right server.
/project/linux/SL/scripts/unpxe <host>If the client boots via PXE afterwards, it will pick up the default configuration which tells it to boot from its local disk. Anyway, using PXE is not recommended for systems which have no "one time boot menu" or a "request network boot" key stroke.
Package Handling & Automatic Updates
See the "aaru" feature for how all this (except kernels) is handled.
There are three distinct mechanisms for package handling on the client:
aaru (package updates)
Handled by the aaru feature, the scripts /sbin/aaru.yum.daily and /sbin/aaru.yum.boot run yum to update installed packages. Yum  is told to use specific repository descriptions for these tasks, which are created by /sbin/aaru.yum.create before, according to the values of VAMOS variables OS_ARCH, CF_SL_release, CF_YUM_extrarepos*, CF_YUM_bootonly* and CF_DZPM_AGING.
yumsel (addition and removal of packages)
Handled by the aaru feature, the script /sbin/yumsel installs additional packages or removes installed ones. Configuration files for this task are read from /etc/yumsel.d/, which is populated by /sbin/yumsel.populate before, according to the values of VAMOS variables CF_yumsel_*.
yumsel documentation, including the file format, is available with perldoc /sbin/yumsel
KU[SL(3)] (everything related to kernels)
Handled by the kernel feature, this script deals with kernels and related packages (modules, source), according to the values of VAMOS variable Linux_kernel_version and a few others. On SL5, KUSL3 was replaced by KUSL, and this should happen eventually on SL3/4 as well. SL3/4 systems in update class "A" already use KUSL.pl.
SL Standard & Errata Packages
For yum on SL3, the command to create the necessary repository data is yum-arch <dir>
Errata are synced to pallas with /project/linux/SL/scripts/sync-pallas.pl (still manually). Packages to be installed additionally by /sbin/yumsel or updated by /sbin/aaru.yum.boot and /sbin/aaru.yum.daily are NOT taken from the errata mirror created like this, but instead from "staged errata" directories created (also, still manually) by the script /project/linux/SL/scripts/stage-errata. The sync/stage scripts send mail to linuxroot unless in dryrun mode. The stage_errata script is fully perldoc'ed, the others are too simple.
Addon Packages (Zeuthen)
These are generally found in /nfs1/pallas/SL/Z, with their (no)src rpms in /afs/ifh.de/packages/SRPMS/System and the source tarballs in /afs/ifh.de/packages/SOURCES (for .nosrc.rpms). Some come from external sources like the dag repository (http://dag.wieers.com/home-made/), freshrpms (http://freshrpms.net/) or the SuSE 8.2/9.0 distributions. These latter ones are typically not accompanied by a src rpm.
Available subdirectories under Z:
Repo used by
really common packages
all SL5 systems
typical addons for a major release
all 32-bit SL5 systems
all 64-bit SL5 systems
all SL5.4 systems
bug fixes included in next release
all 32-bit SL5.4 systems
all 64-bit SL5.4 systems
all SL5.4 systems
as above, but available already during system installation
all 32-bit SL5.4 systems
all 64-bit SL5.4 systems
The distinction fo the INSTALL repositories is necessary because some of our add-ons do not work correctly when installed during initial system installation. Notice these repos may be populated by symlinks to packages or subdirectories, e.g. afs -> ../../x86_64/afs , but the metadata must be updated separately.
After adding a package, make it available to yum like this:
/project/linux/SL/scripts/UpdateRepo.pl -x /path/to/repo
Do not use createrepo manually.
Selectable Addon Packages (Zeuthen)
There's a way to provide packages in selectable repositories. For example, this was used to install an openafs-1.2.13 update on selected systems while the default for SL3 was still 1.2.11, and we didn't want to have 1.2.13 on every system.
These packages reside in directories Z/<major release>/extra/<arch>/<name> on the installation server. For example, the afs update packages for SL3/i386 would be in /nfs1/pallas/SL/Z/3/extra/i386/afs1213 . To have clients access this repository, set any vamos variable starting with CF_YUM_extrarepos (CF_YUM_extrarepos or CF_YUM_extrarepos_host or ...) to a space separated list of subdirectories in <arch>_extra.
For example, CF_YUM_extrarepos='afs1213' will make aaru.yum.create add this repository (accessible via nf or http) to the host's yum configuration.
To make available packages in such a repository, you must provide the full path, including the *sub*directory, to the repo update script:
/project/linux/SL/scripts/UpdateRepo.pl -x /nfs1/pallas/SL/Z/3/extra/i386/afs1213
Note that matching kernel modules must still reside in a directory searched by the update script (see below). This should generally not cause problems since these aren't updated by yum anyway.
Additional Modules for Kernel Updates
Starting with SL5, KUSL3.pl is being replaced by KUSL.pl. As of May 2007, the new script is still being tested on SL3/4, but eventually should be used on all platforms. SL6 uses an even newer script called KU.pl
Handled by the kernel feature, the script /usr/sbin/KUSL3.pl reads its information about which kernels to install from VAMOS variables Linux_kernel_version and a few others, and carries out whatever needs to be done in order to install new kernels and remove old ones. The script is perldoc'ed.
Basically, set Linux_kernel_version in VAMOS, and on the host (after a sue.bootstrap) run KUSL3.pl. Make sure you like what it would do, then run KUSL.pl -x.
Kernels and additional packages are found in the repository mirror including the errata directory (CF_SL_release is used to find those), and in /afs/ifh.de/packages/RPMS/@sys/System (and some subdirectories).
If the variable Linux_kernel_modules is set to a (whitespace separated) list of module names, KUSL(3) will install (and require the availability of) the corresponding kernel-module rpm. For example, if Linux_kernel_version is 2.4.21-20.0.1.EL 2.4.21-27.0.2.EL, and Linux_kernel_modules is foo bar, the mandatory modules are:
Generally speaking, kernel module packages must comply with the SL conventions. The new KUSL.pl will also handle packages complying with the kmod conventions introduced with RHEL5.
KU(SL)(3) will refuse to install a kernel if mandatory packages are not available. Non mandatory packages include kernel-source, sound modules, kernel-doc.
ESD CAN Module (for PITZ Radiation Monitor)
Only the 'gen2' packages are required:
rpmbuild --rebuild --sign --define 'kernel 2.6.18-308.8.1.el5' --define 'nvgen 2' /packages/SRPMS/System/nvidia/nvidia-driver-gx-080102-3.sl.src.rpm --target i686
We need the 'gen2' and 'gen3' packages:
rpmbuild --rebuild --sign --define 'kernel 2.6.18-308.8.1.el5' --define 'nvgen 3' /packages/SRPMS/System/nvidia/nvidia-driver-gx-120416-1.sl.src.rpm
rpmbuild --rebuild --sign --define 'kernel 2.6.18-308.8.1.el5' --define 'nvgen 2' /packages/SRPMS/System/nvidia/nvidia-driver-gx-080102-3.sl.src.rpm
On SL6, the nvidia drivers are packaged as "kABI-tracking kmods". It shouldn't be required to rebuild them for normal kernel updates.
We only provide these for the normal (non-Xen) kernel:
KVERSION=2.6.18-308.8.1.el5 rpmbuild --rebuild --sign /packages/SRPMS/System/lustre/lustre-1.8.7-1.wc1.1.src.rpm
The modules also have to be copied to the NAF.
KVERSION='2.6.32-220.17.1.el6.x86_64' rpmbuild --rebuild --sign /packages/SRPMS/System/lustre/lustre-1.8.7-1.wc1.2.el6.src.rpm
ARECA RAID (SL4)
Adding a new SL release
There are quarterly releases of SL, following Red Hat's updates to RHEL. Each new release must be made available for installation and updates. The procedure is the same for SL3 and SL4. Just substitute filenames and paths as appropriate:
Step 1: Mirror the new subdirectory
- Create a new logical volume on a:
lvcreate -L 30G -n SL44 vg00
Add an according line in /etc/fstab (mount with the acl option)
- Create the directory, mount the volume, and make sure permissions and security context are right:
chgrp sysprog /nfs1/pallas/SL/44 chmod g+w /nfs1/pallas/SL/44 getfacl /nfs1/pallas/SL/43 | setfacl --set-file=- /nfs1/pallas/SL/44 chcon system_u:object_r:httpd_sys_content_t /nfs1/pallas/SL/44The last command makes it possible to access the directory through apache. The chgrp and chmod are actually redundant if ACLs are used.
- Modify sync-pallas.pl to include the new release. Now sync-pallas.pl (do a dryrun first, and look wether additional subdirectories should be excluded).
- If you're using xrolling for testing, make a link like this:
/nfs1/a/SL/44 -> 40rolling
- Check access through http
Step 2: Create staged errata directories
Modify /project/linux/SL/scripts/stage-errata.cf to include the new release. Note if you're trying 30rolling as a test for the release, you must configure 30rolling, not 304 (or whatever). The same for SL4. Now run stage-errata.
Step 3: Make the kernel/initrd available by TFTP for PXE boot
Run the script
/project/linux/SL/scripts/tftp_add_sl_release 44 i386
and accordingly for other releases and architectures. This will copy the kernel and the initrd, and create a pxelinux configuration file. You may still want/have to add a few lines in /tftpboot/pxelinux.cfg/default (for example, for Tier2 installs).
Step 4: Make the release available in VAMOS
Fire up the GUI, select "vars" as the top object, go to CF_SL_release, choose the "values_host" tab, and add the new value to the available choices. Set it on some test host.
Step 5: test
Make sure this works and sets the right link:
Make sure this chooses the right directory:
cd /project/linux/SL/profiles ./CKS.pl <testhost>
Make sure SLU works correctly:
ssh <testhost> /project/linux/SL/scripts/SLU.pl yes please
Try an installation:
- then boot it
Try updating an existing installation:
set CF_SL_release for the host in VAMOS
have a look into /var/log/yum.log, and check everything still works
Booting a rescue system
There are several ways to do this, including:
From CD1 of the distribution
Simply boot from CD1 of the distribution. At the boot prompt, type linux rescue.
Over the network using PXE
Make sure the system gets the "next-server" and "filename" responses from the dhcp server, but there's no link for the system in /tftpboot on the install server. At the boot prompt, then enter something like "sa53 rescue".
Building Kernel Packages
First install the kernel srpm (not kernel-source). Make your changes to the spec, add patches etc.
rpmbuild --sign -ba kernel-2.4.spec
This will build
rpmbuild --sign -ba --target i686 kernel-2.4.spec
This will build
Trying to turn off build of the hugemen kernel breaks the spec.
Additional modules for these are built as for any other SL kernel, with one exception of course:
Building the 1.2.x kernel-module-openafs (and openafs) packages
For ordinary SL kernels, this is done at FNAL, hence we needn't bother. But for our own kernels, or if we want to apply a change to the SRPM, we have to do this ourselves.
The kernel version must always be defined, and always without the "smp" suffix.
For each kernel version you want to build modules for, install the kernel-source, kernel, and kernel-smp RPMs on the build system, they're all needed. Then:
- To build the base packages on i686:
PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --define 'kernel 2.4.21...' openafs-...src.rpm
- To build the kernel module packages (UP and SMP) on i686:
PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --target i686 --define 'kernel 2.4.21...' openafs-...src.rpm
- To build the base packages plus the kernel modules (UP and SMP) on x86_64:
PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --define 'kernel 2.4.21...' openafs-...src.rpm
- To build just the kernel modules (UP and SMP) on x86_64:
PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --define 'kernel 2.4.21...' --define 'build_modules 1' openafs-...src.rpm
To build just the kernel module for ia32e (intel CPUs) on x86_64:
PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --target ia32e --define 'kernel 2.4.21...' openafs-...src.rpm
Building the NEW 1.4.x kernel-module-openafs packages
As of December 2006, there exists a unified SRPM for OpenAFS 1.4.2+, which doesn't have the build problems described above, and works in exactly the same way on SL3, SL4, and SL5. It's named openafs.SLx, but will create packages named openafs with SL3, SL4, SL5 in the release number. The SRPM can (and should) be rebuilt without being root. The steps are the same on every platform:
First install the right kernel-source (SL3) or matching kernel[-smp|-largesmp|-xen|...]-devel package for the target kernel.
rpmbuild --rebuild --sign --target i686 --define 'kernel 2.4.21...' --define 'build_modules 1' openafs.SLx-...src.rpm
There's always just one module built per invocation. Building on SMP systems is ok.
Supported targets include i686, athlon (SL3 only), ia32e (SL3 only, must build on 64bit system), x86_64 (must build on 64bit system), ia64 (untested).
Supported kernel flavours include smp (SL3/4), hugemem (SL3/4), largesmp (SL4), xen (SL5), xenU (SL4), PAE (SL5).
Building the 1.6.x kmod-openafs packages (SL6)
Normally, this should not be required as these packages needn't be updated with the kernel as long as the kernel ABI used by the modules is kept stable. At the time of writing (late in the 6.2 cycle) , this has been the case since EL6 GA - the modules in use are still the ones built against the very first SL6 kernel.
If it's required to rebuild against a new kernel:
rpmbuild --rebuild --sign --define 'kernel 2.6.32...' --define 'build_kmod 1' openafs.SLx-1.6.....src.rpm
Notes on 10GbE
So far we have used two different cards:
- Intel 82598EB
Broadcom NetXtreme II BCM57710
The intel card is used in zyklop41..43. It has no PXE ROM (but supposedly can be flashed), is cooled passively, and works out of the box. It can also do 1Gb, but not 10/100. Throughput is not very good, maybe because the cards won't fit into the PCIex8 Slot in the Dell R510.
The Broadcom card is used in dusk & dawn so far. It does have a PXE ROM (not tested yet though), has a cooling fan , and currently needs an updated driver because the bnx2x driver in the SL5.6 kernel (2.6.18-238.*) has a bug. The host will need a powercycle after the SL5.6 driver was used, a reset is not sufficient. The driver update is packaged in kernel-module-bnx2 rpms. These can currently only be built if the target kernel is running on the build host. The driver update should become unnecessary with SL5.7. The cards (purchased from Dell) can only do 10000baseT - not even Gigabit is supported, hence they must be connected to a 10GbE switch with an appropriate cable.
LRO/TPA and bridged networking
Large Receive Offload is a hardware feature significantly enhancing throughput. Alas, it is incompatible with bridged networking. To use such a device with Xen or KVM virtualization, LRO has to be turned off by specifying options bnx2x disable_tpa=1 in /etc/modprobe.conf (or modprobe.d/...). Otherwise, all kinds of weird things happen although the network basically works. Unfortunately, this reduces the throughput as measured with qperf (under the Xen kernel in DOM0) to just 30% of what can be achieved with the normal kernel and LRO enabled. NB "TPA" = "Transparent Packet Aggregation".
Workarounds for certain hardware/OS combinations
SL5 on Dell Precision T3500
Requires vars.CF_kernel_append='acpi_mcfg_max_pci_bus_num=on' . Without this, performance is very bad.
SL6 on Dell Poweredge M620 with HT disabled
Requires vars.CF_kernel_append='acpi=ht noapic' . Without this, performance is abysmal.