SWITCH Cloud Blog


Tuning Virtualized Network Node: multi-queue virtio-net

The infrastructure used by SWITCHengines is composed of about 100 servers. Each one uses two 10 Gb/s network ports. Ideally, a given instance (virtual machine) on SWITCHengines would be able to achieve 20 Gb/s of throughput when communicating with the rest of the Internet. In the real world, however, several bottlenecks limit this rate. We are working hard to address these bottlenecks and bring actual performance closer to the theoretical optimum.

An important bottleneck in our infrastructure is the network node for each region. All logical routers are implemented on this node, using Linux network namespaces and Open vSwitch (OVS). That means that all packets between the Internet and all the instances of the region need to pass through the node.

In our architecture, the OpenStack services run inside various virtual machines (“service VMs” or “pets”) on a dedicated set of redundant “provisioning” (or “prov”) servers. This is good for serviceability and reliability, but has some overhead—especially for I/O-intensive tasks such as network packet forwarding. Our network node is one of those service VMs.

In the original configuration, a single instance would never get more than about 2 Gb/s of throughput to the Internet when measured with iperf. What’s worse, the aggregate Internet throughput for multiple VMs was not much higher, which meant that a single high-traffic VM could easily “starve” all other VMs.

We had investigated many options for improving this situation: DVR, multiple network nodes, SR-IOV, DPDK, moving work to switches etc. But each of these methods has its drawbacks such as additional complexity (and thus potential for new and exciting bugs and hard-to-debug failure modes), lock-in, and in some cases, loss of features like IPv6 support. So we stayed with our inefficient but simple configuration that has worked very reliably for us so far.

Multithreading to the Rescue!

Our network node is a VM with multiple logical CPUs. But when running e.g. “top” during high network load, we noticed that only one (virtual) core was busy forwarding packets. So we started looking for a way to distribute the work over several cores. We found that we could achieve this by enabling three things:

Multi-queue virtio-net interfaces

Our service nodes run under libvirt/Qemu/KVM and use virtio-net network devices. These interfaces can be configured to expose multiple queues. Here is an example of an interface definition in libvirt XML syntax which has been configured for eight queues:

 <interface type='bridge'>
   <mac address='52:54:00:e0:e1:15'/>
   <source bridge='br11'/>
   <model type='virtio'/>
   <driver name='vhost' queues='8'/>
   <virtualport type='openvswitch'/>
 </interface>

A good rule of thumb is to set the number of queues to the number of (virtual) CPU cores of the system.

Multi-threaded forwarding in the network node VM

Within the VM, kernel threads need to be allocated to the interface queues. This can be achieved using ethtool -L:

ethtool -L eth3 combined 8

This should be done during interface initialization, for example in a “pre-up” action in /etc/network/interfaces. But it seems to be possible to change this configuration on a running interface without disruption.

Recent version of the Open vSwitch datapath

Much of the packet forwarding on the network node is performed by OVS. Its “datapath” portion is integrated into the Linux kernel. Our systems normally run Ubuntu 14.04, which includes the Linux 3.13 kernel. The OVS kernel module isn’t included with this package, but is installed separately from the openvswitch-datapath-dkms package, which corresponds to the relatively old OVS version 2.0.2. Although the OVS kernel datapath is supposed to have been multi-threaded since forever, we found that in our setup, upgrading to a newer kernel is vital for getting good (OVS) network performance.

The current Ubuntu 16.04.1 LTS release includes a fairly new Linux kernel based on 4.4. That kernel also has the OVS datapath module included by default, so that the separate DKMS package is no longer necessary. Unfortunately we cannot upgrade to Ubuntu 16.04 because that would imply upgrading all OpenStack packages to OpenStack “Mitaka”, and we aren’t quite ready for that. But thankfully, Canonical makes newer kernel packages available for older Ubuntu releases as part of their “hardware enablement” effort, so it turns out to be very easy to upgrade 14.04 to the same new kernel:

sudo apt-get install -y --install-recommends linux-generic-lts-xenial

And after a reboot, the network node should be running a fresh Linux 4.4 kernel with the OVS 2.5 datapath code inside.

Results

A simple test is to run multiple netperf TCP_STREAM tests in parallel from a single bare-metal host to six VMs running on separate nova-compute nodes behind the network node.

Each run consists of six netperf TCP_STREAM measurements started in parallel, whose throughput values are added together. Each figure is the average over ten consecutive runs with identical configuration.

The network node VM is set up with 8 vCPUs, and the two interfaces that carry traffic are configured with 8 queues each. We vary the number of queues that are actually used using ethtool -L iface combined n. (Note that even the 1-queue case does not exactly correspond to the original situation; but it’s the closest approximation that we had time to test.)

Network node running 3.13.0-95-generic kernel

1: 3.28 Gb/s
2: 3.41 Gb/s
4: 3.51 Gb/s
8: 3.57 Gb/s

Making use of multiple queues gives very little benefit.

Network node running 4.4.0-36-generic kernel

1: 3.23 Gb/s
2: 6.00 Gb/s
4: 8.02 Gb/s
8: 8.42 Gb/s (8.75 Gb/s with 12 target VMs)

Here we see that performance scales up nicely with multiple queues.

The maximum possible throughput in our setup is lower than 10 Gb/s, because the network node VM uses a single physical 10GE interface for both sides of traffic. And traffic between the network node and the hypervisors is sent encapsulated in VXLAN, which has some overhead.

Outlook

Now we know how to enable multi-core networking for hand-configured service VMs (“pets”) such as our network node. But what about the VMs under OpenStack’s control?

Starting in Liberty, Nova supports multi-queue virtio-net. Our benchmarking cluster was still running Kilo, so we could not test that yet. But stay tuned!

 

 


SDN/NFV Paradigm: Where the Industry meets the Academia

On Thursday, 16th of June 2016, the Software Defined Networking (SDN) Switzerland community met up for their SDN Workshop collocated with the Open Cloud Day at ZHAW, School of Engineering in Winterthur. It was the first time where the SDN workshop stood as a separate and dedicated track at a larger event, where we expected synergy between cloud computing and SDN topics, especially from the industry point of view. The participants were free to attend both, the Open Cloud Day main track and/or the SDN event.

The objectives of the SDN workshop were basically the same as the last time, to share knowledge, have hands on sessions, pushing best practices, implementations supporting operations, and presentations of current research topics and prototypes in SDN. This time I might have to say that the focus was on SDNization, sometimes called “Softwarization, which will have a deep impact on the techno-economic aspects. Examples of Softwarization are reducing costs on digitalizing and automation of processes as optimizing the usage of resources and creating new forms of coordination and also competition on the value-chain. The consequence of that is growing up new business models.

Up to 30 experts mostly from the Industry and academia took interest in the SDN session, (Slides). Lively discussions came up on SDN/NFV infrastructure deployments, service presentations and implementation forms, establishing micro service architectures, making cloud native applications more agile, and how to open the field to innovation.

Furthermore, two open source projects ONOS and CORD were introduced and discussed. These two conceptual architecture approaches set-up modularly and address scalability, high availability, performance, and a NB/SB abstraction, which allow communities providing research and production networks, and to establish their infrastructure leveraging open source software and white boxes. First SDN deployments powered by ONOS started at the beginning of 2015 with GEANT, GARR in Europe, Internet2 (USA), and AMLIGHT and FIU in  South-America. The target audience of such a Global SDN network deployment are RENs, network operators, and users. The motivation enforcing a Global SDN network are manifold, but can be summarized in (1) enabling network and service innovation and (2) learning and improvements through an agile collaboration model. Furthermore, new apps enabling network innovation – Castor (providing L2/L3 connectivity for SDX), SDN-IP (transforming a SDN into a transit IP network that means SDN AS using BGP for communication between the neighbors, and L3 connectivity without legacy router), SDX L2/L3 and VPLS can be offered.

Another trend was interesting to follow: Combining open source tools/technologies, where e.g Snabb switch technology meets vMX a full-featured carrier grade router within a Docker container for elaborating a high performance carrier grade lightweight 4over6. In a demonstration, a service directory licensing model that delivers a vADC (virtual Application Delivery Controller) as a service solution was presented.

The answer to the question “What can the network do for clouds” was shown by the contribution of Cumulus networks. With the Cumulus VX and OpenStack topology routing to the host allows server admins to utilize multipath capabilities on the server by using multiple uplinks and with this to take an active role without bounds on L2 networking. Various stages on cumulus network implementations are possible – from a full MLAG (Multi Layer Chassi Aggregation) Fabric with MLAG in the backbone, using LACP (Link Aggregation Control Protocol) from the servers and L2 connectivity with limited scalability, to a full Layer 3 fabric with high capability, scalability networking, and IP Fabric to the hosts with cumulus quagga improvements.

In context of Big-Data and applications like Hadoop and MapReduce in a 10-100 Gb/s data centre network, there are a lot of monitoring challenges. One of them is the requirement for faster and scalable network monitoring methods. Deep understanding on workflows and their communication patterns is essential for designing future data centers. Since the current existing monitoring tools show outdated software, high-speed resolution and non-intrusive monitoring in the data plane has to be addressed. Thus a high resolution network monitoring architecture, called zMon,  that targets large-scale data centre and IXP networks was presented and discussed.

The statement “A bird in the hand is worth two in the bush”…means boosting existing networks with SDN, and implies the question – Wouldn’t it be nice to be able to apply the SDN principles on top of existing network architectures? – The answer would be YES, but how to transfer a legacy network to a SDN enabled environment?…Three issues were discussed: (1) SDN requires to upgrade network devices, (2) to upgrade management systems and (3) “to upgrade” network operators. So using SDN would mean: (a) small investment that is providing benefits under partial deployments (mostly one single switch), (b) low risk means minimal impact on operational practices and being compatible with existing technologies, and last but not least  (c) high return, means to solve a timely problem. So two approaches were presented (A) Fibbing (intra-domain routing), an architecture that allows central control of router’s forwarding-table over distributed routing and (B) SDX (inter-domain routing), an approach that highlights flexible Internet policies, and open, flexible APIs. Special attention got Fibbing (lying the network), where flexibility, expressivity and manageability (the advantages of SDN) are combined. Technically spoken: Fake nodes and links will be introduced into an underlying link-state routing protocol (e.g. OSPF) so that routers compute their own forwarding tables based on the extended (faked/real) network topology. Flexible load balancing, traffic engineering and backup routes can be solved by Fibbing.

With “Supercharging SDN security” two project were presented: (1) A secure SDN architecture to confine damage due to a compromised controller and switch by isolating their processes and (2) an SDN security extension for path enforcement and path validation. In (1) the motivation of the discussed work was set-up upon the OpenDaylight Netconf’s vulnerability (DoS, Information disclosure, Topology spoofing),  the  ONOS deserialization bug (DoS), and privilege escalation on Cisco Application Policy Infrastructure Controller and the Cisco Nexus 9000 Series ACI Mode Switch. A question came up on what a secure SDN architecture should look like. The architecture building blocks are set-up on the principle of isolated virtual controllers and switches, connected by an SB API (OpenFlow Channel) among each other. As the isolation is per tenant it’s not possible to communicate through isolated environments thanks to TLS (Transport Layer Security). A prototype implementation was demonstrated. Stated that “a driver is lack of data plane accountability”, means no policy enforcement and data plane policy validation mechanism are in place. Thus an SDN security extension should support the enforcement of network path and reactively inspect the data-plane behavior. Path enforcement mechanism, path validation procedures were deployed and evaluation based on network latency and throughput.  

As a conclusion the SDN/NFV infrastructure deployments come closer to vendors, and ISPs understanding of SDN, e.g. focusing vDC, DC use cases. The question “What can the network do for Clouds”, became important on vendor specific implementations using OpenDaylight, Docker, OpenStack, etc., where services will be orchestrated and provided among a cloud network topology. Furthermore, there was a huge discussion on NB and SB APIs supporting control and data plane programmability. OpenFlow as the traditional SB API provides only a simple “match-action” paradigm and lacks on the function of stateful processing in the SDN data plane. So more flexibility pointed out in P4 – protocol independence (P4, a program language, which define how a switch process packets), target independence (P4, which allows to describe everything from high performance ASICs to virtual switches), and field reconfigurability (P4 that allows to reconfigure the progress of packets into switches) is needed and will be presented in future SDN approaches. Further, combining of open source tools/software with closed source/proprietary protocols does fasten the SDNization process, and brings together researcher, DevOps teams and the Industry.

The SDN Switzerland Group is an independent initiative introduced by SWITCH and the ICCLab (ZHAW) in 2013. The aim of the SDN Switzerland is to organize SDN workshops addressing topics from Research, academic ICTs (Operations), and the Industry (Implementation forms on SDN). This constitution allow us to bring together knowledge and use synergy of an interdisciplinary group for future steps also in collaboration.