sexta-feira, 19 de janeiro de 2018

Xen synchronicity between frontend and backend devices

So I bumped into a problem last month and it took me too much time to figure out the big picture of the problem since I didn't find too much documentation about that. The help I could find when trying to figure out this was mostly from good people on the channel #xendevel @ Freenode, mostly maintainers. So if you want to understand a little bit of Xen without pinging people on IRC, that's the place.

The problem is the following: I'm running RHEL on Xen Hypervisor and whenever I try to unload and reload xen_netfront kernel module I see outputs like that on dmesg:
# modprobe -r xen_netfront

# dmesg|tail
[ 105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
[ 105.236839] deferring g.e. 0x903 (pfn 0x35805)
[ 105.237156] xen:grant_table: WARNING: g.e. 0x904 still in use!
[ 105.237160] deferring g.e. 0x904 (pfn 0x35804)
[ 105.237163] xen:grant_table: WARNING: g.e. 0x905 still in use!
[ 105.237166] deferring g.e. 0x905 (pfn 0x35803)
[ 105.237545] xen:grant_table: WARNING: g.e. 0x906 still in use!
[ 105.237550] deferring g.e. 0x906 (pfn 0x35802)
[ 105.237553] xen:grant_table: WARNING: g.e. 0x907 still in use!
[ 105.237556] deferring g.e. 0x907 (pfn 0x35801)

Moreover, the interface is not usable as well:

# dmesg|tail
[ 105.237163] xen:grant_table: WARNING: g.e. 0x905 still in use!
[ 105.237166] deferring g.e. 0x905 (pfn 0x35803)
[ 105.237545] xen:grant_table: WARNING: g.e. 0x906 still in use!
[ 105.237550] deferring g.e. 0x906 (pfn 0x35802)
[ 105.237553] xen:grant_table: WARNING: g.e. 0x907 still in use!
[ 105.237556] deferring g.e. 0x907 (pfn 0x35801)
[ 160.050882] xen_netfront: Initialising Xen virtual ethernet driver
[ 160.066937] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 160.067270] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 160.069355] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready

# ifconfig eth0
eth0: flags=4098 mtu 1500
ether 00:00:00:00:00:00 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

# ifconfig eth0 up
SIOCSIFFLAGS: Cannot assign requested address

The first problem happens because the backend part of the module (xen_netback) is still using some pieces of memory (g.e. which states for grant entries) that are shared between guest and host. The ideal scenario would be to wait for the netback to free those entries and only then unload the netfront module. This was actually a bug on the synchronicity of the netfront and netback parts.

The state of the drivers are kept in separate structs, as defined in include/xen/xenbus.h:69:

/* A xenbus device. */
struct xenbus_device {
    const char *devicetype;
    const char *nodename;
    const char *otherend; 
    int otherend_id;
    struct xenbus_watch otherend_watch; 
    struct device dev;
    enum xenbus_state state;
    struct completion down;
    struct work_struct work; 
};

And the netfront state can be seen from the hypervisor with the command:

# xenstore-ls -fp
[...]
/local/domain/1/device/vif/0/state = "4" (n1,r0)
[...]

The number 4 indicates XenbusStateConnected (as defined in include/xen/interface/io/xenbus.h:17). So it means everything is a matter of wait for one end to finish using the memory region and the other to free, this first piece of the puzzle is solved by this patch:

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 8b8689c6d887..391432e2725d 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -87,6 +87,8 @@ struct netfront_cb {
 /* IRQ name is queue name with "-tx" or "-rx" appended */
 #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
 
+static DECLARE_WAIT_QUEUE_HEAD(module_unload_q);
+
 struct netfront_stats {
        u64                     packets;
        u64                     bytes;
@@ -2021,10 +2023,12 @@ static void netback_changed(struct xenbus_device *dev,
                break;
 
        case XenbusStateClosed:
+               wake_up_all(&module_unload_q);
                if (dev->state == XenbusStateClosed)
                        break;
                /* Missed the backend's CLOSING state -- fallthrough */
        case XenbusStateClosing:
+               wake_up_all(&module_unload_q);
                xenbus_frontend_closed(dev);
                break;
        }
@@ -2130,6 +2134,20 @@ static int xennet_remove(struct xenbus_device *dev)
 
        dev_dbg(&dev->dev, "%s\n", dev->nodename);
 
+       if (xenbus_read_driver_state(dev->otherend) != XenbusStateClosed) {
+               xenbus_switch_state(dev, XenbusStateClosing);
+               wait_event(module_unload_q,
+                          xenbus_read_driver_state(dev->otherend) ==
+                          XenbusStateClosing);
+
+               xenbus_switch_state(dev, XenbusStateClosed);
+               wait_event(module_unload_q,
+                          xenbus_read_driver_state(dev->otherend) ==
+                          XenbusStateClosed ||
+                          xenbus_read_driver_state(dev->otherend) ==
+                          XenbusStateUnknown);
+       }
+
        xennet_disconnect_backend(info);
 
        unregister_netdev(info->netdev);

The second piece of the problem is that the interface is not usable when reloaded back. And that's a lack of initializing the state of the device so the backend notices it, and hence, connects the two drivers together (frontend and backend). This was easily solved by the following patch:

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c5a34671abda..9bd7ddeeb6a5 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1326,6 +1326,7 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 
        netif_carrier_off(netdev);
 
+       xenbus_switch_state(dev, XenbusStateInitialising);
        return netdev;
 
  exit: