Xen networking for OpenSolaris
As Tim
mentioned,
I’ve been spending some time figuring out how to add dom0 networking
support to OpenSolaris. At one level
it’s a pretty straight-forward problem - write a netback
driver for
OpenSolaris, in a manner similar to the xennetf
driver that Stu
wrote for domU networking. That will certainly get bits moving, but
we really need to do more.
Network configuration for Xen dom0 under Linux is generally done in one of three ways:
- bridged,
- routed,
- NAT.
The first of these, bridged, seems to be the most common from reading
the various Xen mailing lists. It
involves bridging together the Linux netback
driver with a physical
network interface (actually, in Xen 3 it’s bridged with a
pseudo-device that passes packets to the physical interface). It’s a
convenient mechanism, as it allows easy integration into an existing
network. Using a software bridge can sometimes be a performance
problem though, particularly in a non-switched network environment (do
any of those still exist? :-)). Bridged is also useful if guest
domains are expected to migrate from one physical machine to another.
As long as the new physical machine is connected to the same layer 2
network, gratuitous ARP can be used by the guest OS to get packets
flowing after the migration is completed.
The routed approach works exactly as it sounds - domain 0 acts as a
router between the various physical interfaces in the machine and the
guest domains. Guest domains are connected to domain 0 using the
netback
driver again, but this time the interfaces in domain 0 have
IP addresses and the netback
to netfront
connections between dom0
and domU are distinct layer 2 (and hence layer 3) networks.
NAT is similar to routed, with the addition of NAT functionality in domain 0. This allows guest domains access to the wider world via domain 0 physical interfaces but complicates access into the guest domains - some type of inbound-NAT is necessary.
Once an OpenSolaris netback
driver exists, the routed configuration
should “just work”. Thanks to the integration of
ipfilter the same is true of a
NAT configuration. Bridged is more difficult, though being able to
build on Mike and
Yukun’s bridging
module
will give a big head start.
With all of this done it will be an opportunity to move on to some more interesting aspects of OpenSolaris on Xen networking.
For example, the Xen patches for Linux introduce the idea of deferred checksumming. In a typical domain 0 to domain U virtual network the packets are passed between domains either by copying between memory buffers or flipping pages. This should be a relatively safe operation, with little opportunity for corruption along the way and as a result perhaps it is reasonable to avoid the protocol layer checksum that would usually be performed. In both Linux and OpenSolaris, the domU network driver avoids performing the checksum calculation on packets destined from dom0 and marks a bit in the packet control block to indicate that this is the case. If the packet is actually destined for a remote host (i.e. one across a physical network), dom0 is expected to calculate and insert the checksum.
A similar mechanism applies in reverse, where packets from dom0 to
domU are generally not checksummed. There’s a small amount of code in
the OpenSolaris xennetf
(domU network) driver to support this and it
works well.
So far this is similar to a typical hardware checksum offload mechanism. An added complication is provided by the ability of the Linux dom0 to note that it has already verified the checksum of a packet received from a physical interface (either as a result of hardware doing the verification or a software check). A bit to represent “checksum good” is carried around with the packet and can be used if, for example, the packet is forwarded, to reduce the cost of updating the checksum. Adding a similar feature to the OpenSolaris IP stack is on my “to do” list.
Another interesting option to examine would be how the creation of the proposed Crossbow virtual NIC might be used. One possibility is to use a MAC based virtual NIC in domain 0 as the source/sink of packets for a domain U network interface. This would result in a network topology very similar to that of the bridged approach, but without the need to actually use a software bridge (so no need for spanning tree protocols, etc.). Further, if the Crossbow vNIC can take advantage of the ability to have a physical NIC receive packets destined to multiple MAC addresses, the need to have the physical NIC in promiscuous mode (a consequence of most software bridge implementations) can be removed. This would be good from a performance perspective.
Finally, the current Xen inter-domain network protocol is “point to point”. If two guest domains wish to communicate it’s necessary for all of the packets to flow through either the software bridge or the IP forwarding path in domain 0. A fully-connected mesh of links between domains would allow guest domains to communicate directly. Sun’s Logical Domains feature for the Niagara machines would appear to have a feature similar to this, it would be good to add similar support to Xen (maybe even take the LDoms code?). There is some possible issues with doing this, but also interest in a simpler inter-domain transport which is a good starting point.