Debian SPARC in a T1000 Logical Domain

Last updated: Wed, 22 Apr 2009 19:37:00 GMT

It can be done, with a little bit of jiggery here, and some pokery there. Here are a few notes that might save you some time, couched in a verbose and conversational tone that's guaranteed to waste you some.

It's a bit of a strange corner to find yourself in, but here I am. I've got a fair bit of Solaris behind me but I'm looking to keep my hand in with various other operating systems. I have a T1000 to play with, and I'm nowhere near utilising the many-threaded grunt it has. Logical Domains look like a great solution to both requirements.

I was aiming for a a couple of guest operating systems; a Linux and a BSD. I'd prefer FreeBSD, but the intarnets tell me that FreeBSD isn't a go, lacking drivers to support the virtual network interfaces and virtual disc controllers that the LDom host presents to guests. The latest snapshot of OpenBSD might be worth a try. That'll be Phase Two.

So, to Linux. I prefer something Debian-ish, simply because I see no reason to suffer anything other than apt-get for package management. I have painful and thankfully distant memories of RPM. With no other advice, the Debian SPARC port was the obvious place to start.

Set Up Your Domains

First things first, patch the Solaris instance on the T1000 up to a reasonable level. I used PCA, obviously. If you're not using PCA, you really should.

Next, install the LDom 1.1 software, as obtained from Sun. I chose not to fart around with their JASS stuff -- I'll secure this myself when I'm comfortable with it, but for now I don't want their package doing anything to my server that's not related directly to LDoms.

My T1000 has 8GiB of memory, 32 virtual CPUs, one 80GB SATA drive. I was aiming for a 2:1:1 split on just about everything: 4GiB RAM and 16 vCPUs for Solaris, 2GiB RAM and 8 vCPUs for each of two guest domains. The LDom Administration Guide I was using was pretty straight-forward and helpful, especially Chapter 4: "Setting Up Service and Logical Domains".

Already having all four onboard NICs configured as two bonded channels, I simply replaced my primary aggregate (aggr1) with my virtual switch (vsw0). Painless.

Division of Memory didn't quite work as I'd expected. With the primary domain allotted 4GiB, I can't give 2GiB each to two guest domains. Obviously, I'm using a certain amount of memory already, but... isn't that coming out of the primary domain's share? I've obviously skimmed over something important somewhere. For now, the two guest domains get 1.5GiB each. Probably more than enough anyway, especially with no TMPFS to suck it up.

On the subject of filesystems, the only aspect of guest domain setup that caused me any grief was supplying virtual disc for storage. I had planned to keep everything neatly separate by following the LDom documentation's example and supplying each guest domain with a slice of the only disc to hand. I spent a few hours shuffling stuff around so that I could destroy my existing ZFS pool and create two big slices at the end of the disc.

I never got it to work. I don't know why. With problems ranging from the Debian installer just not being able to see any storage, to it hanging when the partitioner tried to write a disc label, I just couldn't find the magic combination of vdsdev configuration options.

Avoid.

Instead, read the later example showing use of ZFS zvols, 6.11: "Using ZFS with Virtual Disks". That works a treat.

Here's my ZFS config:

trevor_concat_p0/vols/ldg0_root  type             volume                   -
trevor_concat_p0/vols/ldg0_root  creation         Wed Apr 22 10:08 2009    -
trevor_concat_p0/vols/ldg0_root  used             12G                      -
trevor_concat_p0/vols/ldg0_root  available        36.3G                    -
trevor_concat_p0/vols/ldg0_root  referenced       2.23G                    -
trevor_concat_p0/vols/ldg0_root  compressratio    1.00x                    -
trevor_concat_p0/vols/ldg0_root  reservation      12G                      local
trevor_concat_p0/vols/ldg0_root  volsize          12G                      -
trevor_concat_p0/vols/ldg0_root  volblocksize     8K                       -
trevor_concat_p0/vols/ldg0_root  checksum         on                       default
trevor_concat_p0/vols/ldg0_root  compression      off                      inherited from trevor_concat_p0/vols
trevor_concat_p0/vols/ldg0_root  readonly         off                      default
trevor_concat_p0/vols/ldg0_root  shareiscsi       off                      default
trevor_concat_p0/vols/ldg0_root  copies           1                        default
trevor_concat_p0/vols/ldg0_root  refreservation   12G                      local

You can see that I've fully-reserved the root volume for ldg0. That was superstition. As is the disabling of compression, which I otherwise leave on, all other things being equal. I had some trouble later with EXT2 filesystems over sunvdc devices. More later on that.

This, finally, is my working domain configuration:

NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  SP      16    4G       0.2%  21h 1m

SOFTSTATE
Solaris running

MAC
    00:14:4f:be:ef:ba

HOSTID
    0x4fbeefba

VCPU
    VID    PID    UTIL STRAND
    0      0      0.8%   100%
    1      1      0.1%   100%
    2      2      0.2%   100%
    3      3      0.0%   100%
    4      4      0.0%   100%
    5      5      0.0%   100%
    6      6      0.0%   100%
    7      7      0.0%   100%
    8      8      0.1%   100%
    9      9      0.0%   100%
    10     10     0.0%   100%
    11     11     0.0%   100%
    12     12     0.8%   100%
    13     13     0.4%   100%
    14     14     0.0%   100%
    15     15     0.0%   100%

MAU
    ID     CPUSET
    0      (0, 1, 2, 3)
    1      (4, 5, 6, 7)
    2      (8, 9, 10, 11)
    3      (12, 13, 14, 15)

MEMORY
    RA               PA               SIZE            
    0x8000000        0x8000000        4G

VARIABLES
    keyboard-layout=US-English
    nvramrc=." ChassisSerialNumber 4FBEEFBA " cr
    security-#badlogins=4294967295

IO
    DEVICE           PSEUDONYM        OPTIONS
    pci@780          bus_a           
    pci@7c0          bus_b           

VCC
    NAME             PORT-RANGE
    primary-vcc0     6969-6996

VSW
    NAME             MAC               NET-DEV   DEVICE     DEFAULT-VLAN-ID PVID VID                  MODE  
    primary-vsw0     00:14:4f:be:ef:de aggr1     switch@0   1               1                               

VDS
    NAME             VOLUME         OPTIONS          MPGROUP        DEVICE
    primary-vds0     zdisk0         excl                            /dev/zvol/dsk/trevor_concat_p0/vols/ldg0_root
                     zdisk1         excl                            /dev/zvol/dsk/trevor_concat_p0/vols/ldg1_root

VLDC
    NAME            
    primary-vldc0   
    primary-vldc3   

VLDCC
    NAME             SERVICE                     DESC
    ds               primary-vldc0@primary       domain-services  
    vldcc1           primary-vldc0@primary       ldmfma           
    vldcc2           SP                          spfma            

VCONS
    NAME             SERVICE                     PORT
    SP

------------------------------------------------------------------------------
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
ldg0             active     -n----  6969    8     1536M    0.0%  2h 24m

SOFTSTATE
Linux running

MAC
    00:14:4f:be:ef:d7

HOSTID
    0x4fbeefd7

VCPU
    VID    PID    UTIL STRAND
    0      16     0.0%   100%
    1      17     0.0%   100%
    2      18     0.0%   100%
    3      19     0.0%   100%
    4      20     0.0%   100%
    5      21     0.0%   100%
    6      22     0.0%   100%
    7      23     0.0%   100%

MAU
    ID     CPUSET
    4      (16, 17, 18, 19)
    5      (20, 21, 22, 23)

MEMORY
    RA               PA               SIZE            
    0x8000000        0x108000000      1536M

VARIABLES
    auto-boot?=true
    boot-device=disk

NETWORK
    NAME             SERVICE                     DEVICE     MAC               MODE   PVID VID                 
    vnet0            primary-vsw0@primary        network@0  00:14:4f:be:ef:62        1                        

DISK
    NAME             VOLUME                      TOUT DEVICE  SERVER         MPGROUP       
    vdisk0           zdisk0@primary-vds0              disk@0  primary                      

VLDCC
    NAME             SERVICE                     DESC
    ds               primary-vldc0@primary       domain-services  

VCONS
    NAME             SERVICE                     PORT
    ldg0             primary-vcc0@primary        6969  

------------------------------------------------------------------------------
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
ldg1             active     -t----  6996    8     1536M     12%  7h 7m

SOFTSTATE
OpenBoot Primary Boot Loader

MAC
    00:14:4f:be:ef:a1

HOSTID
    0x4fbeefa1

VCPU
    VID    PID    UTIL STRAND
    0      24     100%   100%
    1      25     0.0%   100%
    2      26     0.0%   100%
    3      27     0.0%   100%
    4      28     0.0%   100%
    5      29     0.0%   100%
    6      30     0.0%   100%
    7      31     0.0%   100%

MAU
    ID     CPUSET
    6      (24, 25, 26, 27)
    7      (28, 29, 30, 31)

MEMORY
    RA               PA               SIZE            
    0x8000000        0x168000000      1536M

VARIABLES
    auto-boot?=true
    boot-device=disk

NETWORK
    NAME             SERVICE                     DEVICE     MAC               MODE   PVID VID                 
    vnet1            primary-vsw0@primary        network@0  00:14:4f:be:ef:c2        1                        

DISK
    NAME             VOLUME                      TOUT DEVICE  SERVER         MPGROUP       
    vdisk1           zdisk1@primary-vds0              disk@0  primary                      

VLDCC
    NAME             SERVICE                     DESC
    ds               primary-vldc0@primary       domain-services  

VCONS
    NAME             SERVICE                     PORT
    ldg1             primary-vcc0@primary        6996  

Of note, here, I've set each guest domain's boot-device to disk. This works. The LDom docs suggest that this should be set to vdisk. This does not work.

Netboot the Debian Installer

The Debian SPARC installation documentation is very helpful. I already have a JumpStart/PXEboot setup at home, for various platforms I have that don't have any other practical way to install an OS. So I skimmed much of the netboot installation, simply sticking the boot.img on my boot server and amending ISC-DHCPd's config accordingly.

	host terry {
		hardware ethernet 00:14:4f:be:ef:62;
		fixed-address 192.168.0.8;
		filename "debian_sparc_boot.img";
		next-server 192.168.0.200;
	}

Then, making sure that the guest domain was bound and started, connected to its PROM by telnet, from the primary domain to the appropriate local port. Then DHCP boot:

-bash-3.00$ telnet 0 6969
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.

Connecting to console "ldg0" in group "ldg0" ....
Press ~? for control options ..

{0} ok boot net:dhcp

Aside, if you run into trouble later, you can perform a rescue boot with the following, from the fake PROM:

{0} ok boot net:dhcp - rescue/enable=true

But you won't be getting into any trouble later.

The installation should proceed as you'd expect. Knock yourself out. Luckily, due to recent changes, the installer does have drivers the are happy to chat to the virtual network and disc that are attached to your domain. Unfortunately, though, the kernel doesn't load them automatically. Consequently, it'll fail to find an ethernet interface; select sunvnet from the list. It'll fail to find a disc controller; select sunvdc from the list.

I had a number of difficulties trying to lay an EXT2 filesystem out in the root partition. I've no idea why. In the end I partitioned my drive with an EXT2 /boot partition at 1, an EXT3 / partition at 2, and swap at 4.

At the task selection phase, I just picked "Standard System". I wish it came with sshd installed. I heartily support their decision not to have sshd running by default, but not to install it seems a little strange to me. No matter.

The End?

Reboot to Breakage

Once the installation is complete the domain will reset. If you get the chance, pass the kernel a rootdelay=2 parameter. This will save you some time shortly, cutting timeout drastically. The booting process will start, tear along noisly, and then come grinding to an unceremonious halt, like this:

Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Running /scripts/local-top ... done.
Begin: Waiting for root file system ... done.
Gave up waiting for root device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/vdiska2 does not exist. Dropping to a shell!

BusyBox v1.10.2 (Debian 1:1.10.2-2) built-in shell (ash)
Enter 'help' for a list of built-in commands.

/bin/sh: can't access tty; job control turned off
(initramfs)

Fix Breakage

When I got this far, I led myself down a lot of dead-ends. It's not silo at fault. It's not a missing bootable partition flag. There are no parameters that you can pass to the kernel to make this any better. Not that I could find, at least.

The problem is exactly the same as the problem we came across earlier with the installer -- the kernel hasn't loaded the sunvnet and sunvdc drivers for us. It's easy to fix, so we can be on our way:

(initramfs) modprobe sunvnet
[324164.637414] eth0: Sun LDOM vnet 00:14:4f:be:ef:62 h0: Sun LDOM vnet
[324164.646112] eth0: PORT ( remote-mac 00:14:4f:be:ef:de switch-port )
[324164.651743] eth0: PORT ( remote-mac 00:14:4f:be:ef:c2 )

[initramfs) modprobe sunvdc
[324208.648989] sunvdc.c:v1.0 (June 25, 2007)
[324208.653931] sunvdc: vdiska: 25164600 sectors (12287 MB)
[324208.657314]  vdiska: vdiska1 vdiska2 vdiska3 vdiska4 vdiska4

With this done, we could continue to boot. The problem with that would be that the default install doesn't spawn a working getty on our fake serial console. It doesn't start sshd, either. We should fix both of those now. Start by mounting our filesystems:

(initramfs) mkdir /target
(initramfs) modprobe ext3
(initramfs) mount /dev/vdiska2 /target
[324285.663774] EXT2-fs warning (device vdiska2): ext2_fill_super: mounting ext3 filesystem as ext2
(initramfs) mount /dev/vdiska1 /target/boot

Note that, even though we've loaded the EXT3 driver, the root filesystem is still mounted as EXT2. This time. Sometimes, it mounts as EXT3. I'm not going to call it non-deterministic, because that's just fancy-talk for "I don't know enough about what's going on to predict how this will behave." It's not a problem either way, just remember to unmount it cleanly when you're done.

In order to get a login prompt on our console, we need to edit /target/etc/inittab and uncomment the following line:

T0:23:respawn:/sbin/getty -L ttyS0 9600 vt100

And now we need to install sshd. Start by configuring the network, then chroot to our target filesystem, before using apt-get to install openssh-server:

(initramfs) ifconfig eth0 inet 192.168.0.8 netmask 255.255.255.0 up
(initramfs) route add default gw 192.168.0.1
(initramfs) chroot /target
(initramfs) apt-get install openssh-server
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  libx11-6 libx11-data libxau6 libxcb-xlib0 libxcb1 libxdmcp6 libxext6
  libxmuu1 openssh-blacklist openssh-blacklist-extra x11-common xauth
Suggested packages:
  ssh-askpass rssh molly-guard
The following NEW packages will be installed:
  libx11-6 libx11-data libxau6 libxcb-xlib0 libxcb1 libxdmcp6 libxext6
  libxmuu1 openssh-blacklist openssh-blacklist-extra openssh-server x11-common
  xauth
0 upgraded, 13 newly installed, 0 to remove and 0 not upgraded.
Need to get 5844kB of archives.

You'll probably see a bunch of errors like this:

Setting up x11-common (1:7.3+18) ...
Can not write log, openpty() failed (/dev/pts not mounted?)

but these don't matter, because we end up getting to here:

Setting up openssh-server (1:5.1p1-5) ...
Creating SSH2 RSA key; this may take some time ...
Creating SSH2 DSA key; this may take some time ...
Restarting OpenBSD Secure Shell server: sshd.

Now, having installed and started sshd, it's time to kill it. If you don't, you'll find sshd camping with its cwd on your chrooted filesystem, and you won't be able to unmount. Once killed, exit chroot. Now you can unmount /target/boot and /target. Exit the ash shell.

The boot process should pick up where it left off and continue all the way to a login prompt on the fake serial interface at ttyS0. Login as root and fix up the initrd archive so that, next time we reboot, it contains instructions to load the sunvnet and sunvdc drivers:

Debian GNU/Linux 5.0 terry.local ttyS0

terry.local login: root
Password:

Linux terry.local 2.6.26-2-sparc64-smp #1 SMP Sat Mar 28 12:03:31 UTC 2009 sparc64

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
terry:~# echo sunvnet >> /etc/initramfs-tools/modules
terry:~# echo sunvdc >> /etc/initramfs-tools/modules
terry:~# update-initramfs -u

We're done. Reboot, watch in amazement as your domain boots cleanly, all the way to the login prompt. It should have started sshd, too.

Enjoy!