Xen networking for OpenSolaris

Posted on Feb 21, 2006

As Tim mentioned, I’ve been spending some time figuring out how to add dom0 networking support to OpenSolaris. At one level it’s a pretty straight-forward problem - write a netback driver for OpenSolaris, in a manner similar to the xennetf driver that Stu wrote for domU networking. That will certainly get bits moving, but we really need to do more.

Network configuration for Xen dom0 under Linux is generally done in one of three ways:

  1. bridged,
  2. routed,
  3. NAT.

The first of these, bridged, seems to be the most common from reading the various Xen mailing lists. It involves bridging together the Linux netback driver with a physical network interface (actually, in Xen 3 it’s bridged with a pseudo-device that passes packets to the physical interface). It’s a convenient mechanism, as it allows easy integration into an existing network. Using a software bridge can sometimes be a performance problem though, particularly in a non-switched network environment (do any of those still exist? :-)). Bridged is also useful if guest domains are expected to migrate from one physical machine to another. As long as the new physical machine is connected to the same layer 2 network, gratuitous ARP can be used by the guest OS to get packets flowing after the migration is completed.

The routed approach works exactly as it sounds - domain 0 acts as a router between the various physical interfaces in the machine and the guest domains. Guest domains are connected to domain 0 using the netback driver again, but this time the interfaces in domain 0 have IP addresses and the netback to netfront connections between dom0 and domU are distinct layer 2 (and hence layer 3) networks.

NAT is similar to routed, with the addition of NAT functionality in domain 0. This allows guest domains access to the wider world via domain 0 physical interfaces but complicates access into the guest domains - some type of inbound-NAT is necessary.

Once an OpenSolaris netback driver exists, the routed configuration should “just work”. Thanks to the integration of ipfilter the same is true of a NAT configuration. Bridged is more difficult, though being able to build on Mike and Yukun’s bridging module will give a big head start.

With all of this done it will be an opportunity to move on to some more interesting aspects of OpenSolaris on Xen networking.

For example, the Xen patches for Linux introduce the idea of deferred checksumming. In a typical domain 0 to domain U virtual network the packets are passed between domains either by copying between memory buffers or flipping pages. This should be a relatively safe operation, with little opportunity for corruption along the way and as a result perhaps it is reasonable to avoid the protocol layer checksum that would usually be performed. In both Linux and OpenSolaris, the domU network driver avoids performing the checksum calculation on packets destined from dom0 and marks a bit in the packet control block to indicate that this is the case. If the packet is actually destined for a remote host (i.e. one across a physical network), dom0 is expected to calculate and insert the checksum.

A similar mechanism applies in reverse, where packets from dom0 to domU are generally not checksummed. There’s a small amount of code in the OpenSolaris xennetf (domU network) driver to support this and it works well.

So far this is similar to a typical hardware checksum offload mechanism. An added complication is provided by the ability of the Linux dom0 to note that it has already verified the checksum of a packet received from a physical interface (either as a result of hardware doing the verification or a software check). A bit to represent “checksum good” is carried around with the packet and can be used if, for example, the packet is forwarded, to reduce the cost of updating the checksum. Adding a similar feature to the OpenSolaris IP stack is on my “to do” list.

Another interesting option to examine would be how the creation of the proposed Crossbow virtual NIC might be used. One possibility is to use a MAC based virtual NIC in domain 0 as the source/sink of packets for a domain U network interface. This would result in a network topology very similar to that of the bridged approach, but without the need to actually use a software bridge (so no need for spanning tree protocols, etc.). Further, if the Crossbow vNIC can take advantage of the ability to have a physical NIC receive packets destined to multiple MAC addresses, the need to have the physical NIC in promiscuous mode (a consequence of most software bridge implementations) can be removed. This would be good from a performance perspective.

Finally, the current Xen inter-domain network protocol is “point to point”. If two guest domains wish to communicate it’s necessary for all of the packets to flow through either the software bridge or the IP forwarding path in domain 0. A fully-connected mesh of links between domains would allow guest domains to communicate directly. Sun’s Logical Domains feature for the Niagara machines would appear to have a feature similar to this, it would be good to add similar support to Xen (maybe even take the LDoms code?). There is some possible issues with doing this, but also interest in a simpler inter-domain transport which is a good starting point.