SWITCH Cloud Blog

Leave a comment

Openstack Horizon runs on Kubernetes in production at SWITCH

In April we upgraded the SWITCHengines OpenStack Horizon dashboard to the OpenStack Pike version. But this upgrade was a little bit special, it was more than an Horizon upgrade from Newton to Pike.

Our Horizon deployment is now hosted on a Kubernetes cluster. The cluster is deployed using the playbook k8s-on-openstack that we actively develop. We have been testing this Kubernetes deployment for a while, but it is only when you have to deploy an application on top of it in production that you really learn and you fix real problems.

Horizon is a good application to start learning Kubernetes, because it is completely stateless and it does not require any persistent storage. It is just a GUI to the OpenStack API. The user logs in with his credentials, and Horizon will get a token and will start making API calls with the user’s credentials.

Running Horizon in a single Kubernetes pod for a demo takes probably 5 minutes, but deploying for production usage is far more complex. We needed to address the following issues:

  • Horizontally scale the number of pods, keeping a central memcached or redis cache
  • Allow both IPv4 and IPv6 access to engines.switch.ch
  • Define the Load Balancing architecture
  • Implement a persistent logging system

If you want to run to the solution of all these problems, you can have a look at the project SWITCH-openstack-horizon-k8s-deployment where we have published all the Dockerfiles and the Kubernetes descriptors to recreate our deployment.

Scale Horizontally

Horizon performs much faster when it accesses a memory cache, it is the recommended way to deploy in production. We decided to go for Redis cache.

Creating a Redis service in our namespace with the name redis-master we are able to use the special environment variable ${REDIS_MASTER_SERVICE_HOST} when booting the Horizon container, to make sure all the instances point to the same cache server.

This is a good example of how you combine two services together in a Kubernetes namespace. We can horizontally scale the Horizon pods, but the Horizon deployment is independent from the Redis deployment.

IPv4 and IPv6

We always publish our services on IPv6. In our previous Kubernetes demos we used the OpenStack LBaaS to expose services to the outside world. Unfortunately in the Newton version of OpenStack, the LBaaS lacks proper IPv6 integration. To publish a production service on Kubernetes, we suggest to use an ingress controller. There are several kinds available, but we used the standard Nginx ingress controller. The key idea is that we have a K8s node with an interface exposed to the public Internet where a privileged Docker container is running with –net=host. The container runs Nginx that can bind to IPv6 and IPv4 on the node, but of course it can also reach any other pod on the cluster network.

Define the Load Balancing architecture

I already wrote above that if you need IPv6, you should not use the Openstack LBaaSv2. However I am going to explain why I would not use that kind of load balancer even for IPv4.

The first picture shows the network diagram of a LBaaSv2 deployment. The LoadBalancer is implemented as a network namespace on the network node, called qlbaas-<uuid>, in which a HAProxy process is running. This is a L4 LoadBalancer. The bad thing of this architecture is that when an instance boots, the default gateway configured via DHCP will be the IP address of the neutron router. When we expose a service with the floating IP configured on the outer interface of the LBaaS, in order to force the traffic to follow a symmetric return path, the Load Balancer must perform a DNAT and SNAT operation. This means that the IP packets hitting the Pod have completely lost the information about the source IP address of the original client. Because it is a pure L4 load balancer, we don’t have the possibility to carry this lost information on in a HTTP header. This prevents the operator from building any useful logging system, because once the traffic arrives at the pod, the information about the client is filtered out.

In the next picture we have a look on how the Nginx ingress works. In this case the external traffic is received on a public floating IP that is configured on the virtual machine running the ingress pod, in this case on the master. We terminate the TLS connection at the nginx-ingress. This is necessary because the ingress also has to perform a SNAT and DNAT but it adds to the HTTP requests the X-Forwarded-For header that we use to populate our log files. We could not add the header if we were just moving encrypted packets around.

Another advantage of this solution is that it uses just a normal instance to implement the ingress, this means that you can use in a totally independent way from the version of OpenStack you are running on.

In the future you might be able to use the newer OpenStack Octavia Load Balancer, but at the moment I did not investigate that. All I know is that the solution is really similar, but you will have an OpenStack service VM running an Nginx instance.

Implement a persistent logging system

Pods are short lived and distributed over different VMs that are also ephemeral. To collect the logs, we run docker with the log-driver journald. Once this is set up, all the docker containers running on the host will send their logging output to journald. We then collect this information with journalbeat to send the data to our elastic search cluster. This part is not yet released into our public playbook because is not very portable. If you don’t have a ready-to-use ELK cluster, you would have no benefit from running journalbeat.


It is now almost a month that we have been running in production, and we found the system to be robust and stable. We had no complaints from our users, so we can say that the migration was seamless for our users. We have learned a lot from this experience.

In the next blog post we will describe how we implemented the metrics monitoring, to observe how much memory and CPU cores each pod is consuming. Make sure you keep an eye on our blog for updates.

SWITCHdrive Over IPv6

When we built the SWITCHdrive service on the OpenStack platform that was to become SWITCHengines, that platform didn’t really support IPv6 yet. But since Spring 2016 it does. This week, we enabled IPv6 in SWITCHdrive and performed some internal tests. Today around noon, we published its IPv6 address (“AAAA record”) in the DNS. We quickly saw around 5% of accesses use IPv6 instead of IPv4.

In the evening, this percentage climbed to about 14%. This shows the relatively good support for IPv6 on Swiss broadband (home) networks, notably by the good folks at Swisscom.

The lower percentage during office (and lecture, etc.) hours shows that the IPv6 roll-out to higher education campuses still has some way to go. Our SWITCHlan backbone has been running “dual-stack” (IPv4 and IPv6 in parallel) in production for more than 10 years, and most institutions have added IPv6 configuration to their connections to us. But campus networks are wonderfully complex, so getting IPv6 deployed to every network plug and every wireless access point is a daunting task. Some schools are almost there, including some large ones that don’t use SWITCHdrive—yet!?—so the 5% may underestimate the extent of the roll-out for the overall SWITCH community. The others will follow in their footsteps. They can count on the help of the community and benefit from IPv6 training courses organized by our colleagues in the security and network teams. Contact us if you need help!

[Update: After a few weeks, the proportion of IPv6 traffic increased somewhat. Now we typically see around 10% during office hours and 20% during weekends. So the “retail” sector is still clearly ahead of (our academic) enterprise networks in terms of IPv6 penetration.]

IPv6 Address Assignment in OpenStack

In an inquiry “IPv6 and Liberty (or Mitaka)” on the openstack mailing list,

Ken D’Ambrosio writes:
> Hey, all. I have a Liberty cloud, and decided for the heck of it to
> start dipping my toe into IPv6. I do have some confusion, however. I
> can choose between SLAAC, DHCPv6 stateful and DHCPv6 stateless — and
> I see some writeups on what they do, but I don’t understand what
> differentiates them. As far as I can tell, they all do pretty much
> the same thing, just with different pieces doing different things.
> E.g., the chart, found here
> (http://docs.openstack.org/liberty/networking-guide/adv-config-ipv6.html
> — page down a little) shows those three options, but it isn’t clear:
> * How to configure the elements involved
> * What they exactly do (e.g., “optional info”? What’s that?)
> * Why there even *are* different choices. Do they offer functionally
> different results?

SLAAC and DHCPv6-stateless use the same mechanism (SLAAC) to provide instances with IPv6 addresses. The only difference between them is that with DHCPv6-stateless, the instance can also use DHCPv6 requests to get other (than its own address) information such as nameserver addresses etc. So between SLAAC and DHCPv6-stateless, I would always prefer DHCPv6-stateless—it’s a strict superset in terms of functionality, and I don’t see any particular risks associated with it.

DHCPv6-stateful is a different beast: It will use DHCPv6 to give an instance its IPv6 address. DHCPv6 actually fits OpenStack’s model better than SLAAC.

Why DHCPv6-Stateful Fits OpenStack Better

OpenStack (Nova) sees it as part of its job to control the IP address(es) that an instance uses. In IPv4 it uses DHCP (always did). DHCP assigns complete addresses—which are under control of OpenStack. In IPv6, stateful DHCPv6 would be the equivalent.

SLAAC is different in that the node (instance) actually chooses its address based on information it gets from the router. The most common method is that the node uses an “EUI-64” address as the local part (host ID) of the address. The EUI-64 is derived from the MAC address by a fixed algorithm. This can work with OpenStack because OpenStack controls the MAC addresses too, and can thus “guess” what IPv6 address an instance will auto-configure on a given network. You see how this is a little less straightforward than OpenStack simply telling the instance what IPv6 address it should use.

In practice, OpenStack’s guessing fails when an instance uses other methods to get the local part, for example “privacy addresses” according to RFC 4941. These will lead to conflicts with OpenStack’s built-in anti-spoofing filters. So such mechanisms need to be disabled when SLAAC is used under OpenStack (including under “DHCPv6-stateless”).

Why we Use SLAAC/DHCPv6-Stateless Anyway

Unfortunately, most GNU/Linux distributions don’t support Stateful DHCPv6 “out of the box” today.

Because we want our users to use unmodified operating systems images and still get usable IPv6, we have grudgingly decided to use DHCPv6-stateless. For configuration information, see SWITCHengines Under the Hood: Basic IPv6 Configuration.

If you decide to go for DHCPv6-stateful, then there’s a Web page that explains how to enable it client-side for a variety of GNU/Linux distributions.

It would be nice if all systems honored the “M” (Managed) flag in Router Advertisements and would use DHCPv6 if it is set, otherwise SLAAC.

[This is an edited version of my response, which I wasn’t sure I was allowed to post because I use GMANE to read the list– SL.]

SWITCHengines Under the Hood: Basic IPv6 Configuration

My last post, IPv6 Finally Arriving on SWITCHengines, described what users of our IaaS offering can expect from our newly introduced IPv6 support: Instances using the shared default network (“private”) will get publicly routable IPv6 addresses.

This post explains how we set this up, and why we decided to go this route.  We hope that this is interesting to curious SWITCHengines users, and useful for other operators of OpenStack infrastructure.

Before IPv6: Neutron, Tenant Networks and Floating IP

[Feel free to skip this section if you are familiar with Tenant Networks and Floating IPs.]

SWITCHengines uses Neutron, the current generation of OpenStack Networking.  Neutron supports user-definable networks, routers and additional “service” functions such as Load Balancers or VPN gateways.  In principle, every user can build her own network (or several) isolated from the other tenants.  There is a default network accessible to all tenants.  It is called private, which I find quite confusing because it is totally not private, but shared between all the tenants.  But it has a range of private (in the sense of RFC 1918) IPv4 addresses—a subnet in OpenStack terminology—that is used to assign “fixed” addresses to instances.

There is another network called public, which provides global connectivity.  Users cannot connect instances to it directly, but they can use Neutron routers (which include NAT functionality) to route between the private (RFC 1918) addresses of their instances on tenant networks (whether the shared “private” or their own) and the public network, and by extension, the Internet.  By default, they get “outbound-only” connectivity using 1:N NAT, like users behind a typical broadband router.  But they can also request a Floating IP, which can be associated with a particular instance port.  In this case, a 1:1 NAT provides both outbound and inbound connectivity to the instance.

The router between the shared private network and the external public network was provisioned by us; it is called private-router.  Users who build their own tenant networks and want to connect them with the outside world need to set up their own routers.

This is a fairly standard setup for OpenStack installations, although some operators, especially in the “public cloud” business, forgo private addresses and NAT, and let customers connect their VMs directly to a network with publicly routable addresses.  (Sometimes I wish we’d have done that when we built SWITCHengines—but IPv4 address conservation arguments were strong in our minds at the time.  Now it seems hard to move to such a model for IPv4.  But let’s assume that IPv6 will eventually displace IPv4, so this will become moot.)

Adding IPv6: Subnet, router port, return route—that’s it

So at the outset, we have

  • a shared internal network called private
  • a provider network with Internet connectivity called public
  • a shared router between private and public called private-router

We use the “Kilo” (2015.1) version of OpenStack.

As another requirement, the “real” network underlying the public network (in our case a VLAN) needs connectivity to the IPv6 Internet.

Create an IPv6 Subnet with the appropriate options

And of course we need a fresh range of IPv6 addresses that we can route on the Internet.  A single /64 will be sufficient.  We use this to define a new Subnet in Neutron:

neutron subnet-create --ip-version 6 --name private-ipv6 \
  --ipv6-ra-mode dhcpv6-stateless --ipv6-address-mode dhcpv6-stateless \
  private 2001:620:5ca1:80f0::/64

Note that we use dhcpv6-stateless for both ra-mode and address-mode.  This will actually use SLAAC (stateless address autoconfiguration) and router advertisements to configure IPv6 on the instance.  Stateless DHCPv6 could be used to convey information such as name server addresses, but I don’t think we’re actively using that now.

We should now see a radvd process running in an appropriate namespace on the network node.  And instances—both new and pre-existing!—will start to get IPv6 addresses if they are configured to use SLAAC, as is the default for most modern OSes.

Create a new port on the shared router to connect the IPv6 Subnet

Next, we need to add a port to the shared private-router that connects this new subnet with the outside world via the public network:

neutron router-interface-add private-router private-ipv6

Configure a return route on each upstream router

Now the outside world also needs a route back to our IPv6 subnet.  The subnet is already part of a larger aggregate that is routed toward our two upstream routers.  It is sufficient to add a static route for our subnet on each of them.  But where do we point that route to, i.e. what should be the “next hop”? We use the link-local address of the external (gateway) port of our Neutron router, which we can find out by looking inside the namespace for the router on the network node.  Our router private-router has UUID 2b8d1b4f-1df1-476a-ab77-f69bb0db3a59.  So we can run the following command on the network node:

$ sudo ip netns exec qrouter-2b8d1b4f-1df1-476a-ab77-f69bb0db3a59 ip -6 addr list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536
 inet6 ::1/128 scope host
 valid_lft forever preferred_lft forever
55: qg-2d73d3fb-f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
 inet6 2001:620:5ca1:80fd:f816:3eff:fe00:30d7/64 scope global dynamic
 valid_lft 2591876sec preferred_lft 604676sec
 inet6 fe80::f816:3eff:fe00:30d7/64 scope link
 valid_lft forever preferred_lft forever
93: qr-02b9a67d-24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
 inet6 2001:620:5ca1:80f0::1/64 scope global
 valid_lft forever preferred_lft forever
 inet6 fe80::f816:3eff:fe7d:755b/64 scope link
 valid_lft forever preferred_lft forever
98: qr-6aaf629f-19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
 inet6 fe80::f816:3eff:feb6:85f4/64 scope link
 valid_lft forever preferred_lft forever

The port we’re looking for is the one whose name starts with qr-, the gateway port.  The address we’re looking for is the one starting with fe80:, the link-local address.

The “internal” subnet has address 2001:620:5ca1:80f0::/64, and VLAN 908 (ve908 in router-ese) is the VLAN that connects our network node to the upstream router.  So this is what we configure on each of our routers using the “industry-standard CLI”:

ipv6 route 2001:620:5ca1:80f0::/64 ve 908 fe80::f816:3eff:fe00:30d7

And we’re done! IPv6 packets can flow between instances on our private network and the Internet.

Coming up

Of course this is not the end of the story.  While our customers were mostly happy that they suddenly got IPv6, there are a few surprises that came up.  In a future episode, we’ll tell you more about them and how they can be addressed.

IPv6 Finally Arriving on SWITCHengines

As you may have heard or noticed, the Internet is running out of addresses. It’s time to upgrade from the 35 years old IPv4 protocol, which doesn’t even have a single public address per human on the earth, to the brand new (?) IPv6, which offers enough addresses for every grain of sand in the known universe, or something like that.

SWITCH is a pioneer in IPv6 adoption, and has been supporting IPv6 on all network connections and most services in parallel with IPv4 (“dual stack”) for many years.

To our embarrassment, we hadn’t been able to integrate IPv6 support into SWITCHengines from the start. While OpenStack had some IPv6 support, the implementation wasn’t mature, and we didn’t know how to fit it into our network model in a user-friendly way.

IPv6: “On by default” and globally routable

About a month ago we took a big step to change this: IPv6 is now enabled by default for all instances on the shared internal network (“private”).  So if you have an instance running on SWITCHengines, and it isn’t connected to a tenant network of your own, then the instance probably has an IPv6 address right now, in addition to the IPv4 address(es) it always had.  Note that this is true even for instances that were created or last rebooted before we turned on IPv6. On Linux-derived systems you can check using ifconfig eth0 or ip -6 addr list dev eth0; if you see an address that starts with 2001:620:5ca1:, then your instance can speak IPv6.

Note that these IPv6 addresses are “globally unique” and routable, i.e. they are recognized by the general Internet.  In contrast, the IPv4 addresses on the default network are “private” and can only be used locally inside the cloud; communication with the general Internet requires Network Address Translation (NAT).

What you can do with an IPv6 address

Your instance will now be able to talk to other Internet hosts over IPv6. For example, try ping6 mirror.switch.ch or traceroute6 www.facebook.com. This works just like IPv4, except that only a subset of hosts on the Internet speaks IPv6 yet. Fortunately, this subset already includes important services and is growing.  Because IPv6 doesn’t need NAT, routing between your instances and the Internet is less resource-intensive and a tiny bit faster than with IPv4.

But you will also be able to accept connections from other Internet hosts over IPv6. This is different from before: To accept connections over IPv4, you need(ed) a separate public address, a Floating IP in OpenStack terminology.  So if you can get by with IPv6, for example because you only need (SSH or other) access from hosts that have IPv6, then you don’t need to reserve a Floating IP anymore.  This saves you not just work but also money—public IPv4 addresses are scarce, so we need to charge a small “rent” for each Floating IP reserved.  IPv6 addresses are plentiful, so we don’t charge for them.

But isn’t this dangerous?

Instances are now globally reachable by default, but they are still protected by OpenStack’s Security Groups (corresponding to packet filters or access control lists).  The default Security Group only allows outbound connections: Your instance can connect to servers elsewhere, but attempts to connect to your instance will be blocked.  You have probably opened some ports such as TCP port 22 (for SSH) or 80 or 443 (for HTTP/HTTPS) by adding corresponding rules to your own Security Groups.  In these rules, you need to specify address “prefixes” specifying where you want to accept traffic from.  These prefixes can be IPv4 or IPv6—if you want to accept both, you need two rules.

If you want to accept traffic from anywhere, your rules will contain as the prefix. To accept IPv6 traffic as well, simply add identical rules with ::/0 as the prefix instead—this is the IPv6 version of the “global” prefix.

What about domain names?

These IPv6 addresses can be entered in the DNS using “AAAA” records. For Floating IPs, we provided pre-registered hostnames of the form fl-34-56.zhdk.cloud.switch.ch. We cannot do that in IPv6, because there are just too many possible addresses. If you require your IPv6 address to map back to a hostname, please let us know and we can add it manually.

OpenStack will learn how to (optionally) register such hostnames in the DNS automatically; but that feature was only added to the latest release (“Mitaka”), and it will be several months before we can deploy this in SWITCHengines.


We would like to also offer IPv6 connectivity to user-created “tenant networks”. Our version of OpenStack almost supports this, but it cannot be fully automated yet. If you need IPv6 on your non-shared network right now, please let us know via the normal support channel, and we’ll set something up manually. But eventually (hopefully soon), getting a globally routable IPv6 prefix for your network should be (almost) as easy as getting a globally routable Floating IP is now.

You can also expect services running on SWITCHengines (SWITCHdrive, SWITCHfilesender and more) to become exposed over IPv6 over the next couple of months. Stay tuned!