SWITCH Cloud Blog


Hack Neutron to add more IP addresses to an existing subnet

When we designed our OpenStack cloud at SWITCH, we created a network in the service tenant, and we called it private.

This network is shared with all tenants and it is the default choice when you start a new instance. The name private comes from the fact that you will get a private IP via dhcp. The subnet we choosed for this network is the 10.0.0.0/24. The allocation pool goes from 10.0.0.2 to 10.0.0.254 and it can’t be enlarged anymore. This is a problem because we need IP addresses for many more instances.

In this article we explain how we successfully enlarged this subnet to a wider range: 10.0.0.0/16. This operation is not a feature supported by Neutron in Juno, so we show how to hack into Neutron internals. We were able to successfully enlarge the subnet and modify the allocation pool, without interrupting the service for the existing instances.

In the following we assume that the network we are talking about has only 1 router, however this procedure can be easily extended to more complex setups.

What you should know about Neutron, is that a Neutron network has two important namespaces in the OpenStack network node.

  • The qrouter is the router namespace. In our setup one interface is attached to the private network we need to enlarge and a second interface is attached to the external physical network.
  • The qdhcp name space has only 1 interface to the private network. On your OpenStack network node you will find that a dnsmasq process is running bound to this interface to provide IP addresses via DHCP.
Neutron Architecture

Neutron Architecture

In the figure Neutron Architecture we try to give an overview of the overall system. A Virtual Machine (VM) can run on any remote compute node. The compute node has a Open vSwitch process running, that collects the traffic from the VM and with proper VXLAN encapsulation delivers the traffic to the network node. The Open vSwitch at the network node has a bridge containing both the qrouter namespace internal interface and the qdhcp namespace, this will make the VMs see both the default gateway and the DHCP server on the virtual L2 network. The qrouter namespace has a second interface to the external network.

Step 1: hack the Neutron database

In the Neutron database look for the subnet, you can easily find your subnet in the table matching the service tenant id:

select * from subnets WHERE tenant_id='d447c836b6934dfab41a03f1ff96d879';

Take note of id (that in this table is the subnet_id) and network_id of the subnet. In our example we had these values:

id (subnet_id) = 2e06c039-b715-4020-b609-779954fa4399
network_id = 1dc116e9-1ec9-49f6-9d92-4483edfefc9c
tenant_id = d447c836b6934dfab41a03f1ff96d879

Now let’s look into the routers database table:

select * from routers WHERE tenant_id='d447c836b6934dfab41a03f1ff96d879';

Again filter for the service tenant. We take note of the router ID.

 id (router_id) = aba1e526-05ca-4aca-9a80-01601cdee79d

At this point we have all the information we need to enlarge the subnet in the Neutron database.

update subnets set cidr='NET/MASK' WHERE id='subnet_id';

So in our example:

update subnets set cidr='10.0.0.0/16' WHERE id='2e06c039-b715-4020-b609-779954fa4399';

Nothing will happen immediately after you update the values in the Neutron mysql database. You could reboot your network node and Neutron would rebuild the virtual routers with the new database values. However, we show a better solution to avoid downtime.

Step 2: Update the interface of the qrouter namespace

On the network node there is a namespace qrouter-<router_id> . Let’s have a look at the interfaces using iproute2:

sudo ip netns exec qrouter-(router_id) ip addr show

With the values in our example:

sudo ip netns exec qrouter-aba1e526-05ca-4aca-9a80-01601cdee79d ip addr show

You will see the typical Linux output with all the interfaces that live in this namespace. Take note of the interface name with the address 10.0.0.1/24 that we want to change, in our case

 qr-396e87de-4b

Now that we know the interface name we can change IP address and mask:

sudo ip netns exec qrouter-aba1e526-05ca-4aca-9a80-01601cdee79d ip addr add 10.0.0.1/16 dev qr-396e87de-4b
sudo ip netns exec qrouter-aba1e526-05ca-4aca-9a80-01601cdee79d ip addr del 10.0.0.1/24 dev qr-396e87de-4b

Step 3: Update the interface of the qdhcp namespace

Still on the network node there is a namespace qdhcp-<network_id>. Exactly in the same way we did for the qrouter namespace we are going to find the interface name, and change the IP address with the updated netmask.

sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr show
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr add 10.0.0.2/24 dev tapadebc2ff-10
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr show
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr del 10.0.0.2/16 dev tapadebc2ff-10
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr show

The dnsmasq process running bounded to the interface in the qdhcp namespace, is smart enough to detect automatically the change in the interface configuration. This means that the new instances at this point will get via DHCP a /16 netmask.

Step 4: (Optional) Adjust the subnet name in Horizon

We called the subnet name 10.0.0.0/24. For pure cosmetic we logged in the Horizon web interface as admin and changed the name of the subnet to 10.0.0.0/16.

Step 5: Adjust the allocation pool for the subnet

Now that the subnet is wider, the neutron client will let you configure a wider allocation pool. First check the existing allocation pool:

$ neutron subnet-list | grep 2e06c039-b715-4020-b609-779954fa4399

| 2e06c039-b715-4020-b609-779954fa4399 | 10.0.0.0/16     | 10.0.0.0/16      | {"start": "10.0.0.2", "end": "10.0.0.254"}           |

You can resize easily the allocation pool like this:

neutron subnet-update 2e06c039-b715-4020-b609-779954fa4399 --allocation-pool start='10.0.0.2',end='10.0.255.254'

Step 6: Check status of the VMs

At this point the new instances will get an IP address from the new allocation pool.

As for the existing instances, they will continue to work with the /24 address mask. In case of reboot they will get via DHCP the same IP address but with the new address mask. Also, when the DHCP lease expires, depending on the DHCP client implementation, they will hopefully get the updated netmask. This is not the case with the default Ubuntu dhclient, that will not refresh the netmask when the IP address offered by the DHCP server does not change.

The worst case scenario is when the machine keeps the old /24 address mask for a long time. The outbound traffic to other machines in the private network might experience a suboptimal routing through the network node, that will be used as a default gateway.

Conclusion

We successfully expanded a Neutron network to a wider IP range without service interruption. Understanding Neutron internals it is possible to make changes that go beyond the features of Neutron. It is very important to understand how the values in the Neutron database are used to create the network namespaces.

We understood that a better design for our cloud would be to have a default Neutron network per tenant, instead of a shared default network for all tenants.


SWITCHengines upgraded to OpenStack 2014.1 “Juno”

Our Infrastructure-as-a-Service (IaaS) offering SWITCHengines is based on the OpenStack platform.  OpenStack releases new alphabetically-nicknamed versions every six months.  When we built SWITCHengines in 2014, we based it on the then-current “Icehouse” (2014.1) release.  Over the past few months, we have worked on upgrading the system to the newer “Juno” (2014.2) version.  As we already announced via Twitter, this upgrade was finally completed on 26 August.  The upgrade was intended to be interruption-free for running customer VMs (including the SWITCHdrive service, which is built on top of such VMs), and we mostly achieved that.

Why upgrade?

Upgrading a live infrastructure is always a risk, so we should only do so if we have good reasons.  On a basic level, we see two drivers: (a) functionality and (b) reliability.  Functionality: OpenStack is a very dynamic project to which new features—and entire new subsystems—are added all the time.  We want to make sure that our users can benefit from these enhancements.  Reliability: Like all complex software, OpenStack has bugs, and we want to offer reliable and unsurprising service.  Fortunately, OpenStack also has more and more users, so bugs get reported and eventually fixed, and it has quality assurance (QA) processes that improve over time.  Bugs are usually fixed in the most recent releases only.  Fixes to serious bugs such as security vulnerabilities are often “backported” to one or two previous releases.  But at some point it is no longer safe to use an old release.

Why did it take so long?

We wanted to make sure that the upgrade be as smooth as possible for users.  In particular, existing VMs and other resources should remain in place and continue to work throughout the upgrade.  So we did a lot of testing on our internal development/staging infrastructure.  And we experimented with various different methods for switching over.  We also needed to integrate the—significant—changes to the installation system recipes (from the OpenStack Puppet project) with our own customizations.

We also decided to upgrade the production infrastructure in three phases.  Two of them had been announced: The LS region (in Lausanne) was upgraded on 17 August, the ZH (Zurich) region one week later.  But there are some additional servers with special configuration which host a critical component of SWITCHdrive.  Those were upgraded separately another day later.

Because we couldn’t upgrade all hypervisor nodes (the servers on which VMs are hosted) at the same time, we had to run in a compatibility mode that allowed Icehouse hypervisors to work against a Juno controller.  After all hypervisor hosts were upgraded, this somewhat complex compatibility mechanism could be disabled again.

The whole process took us around five months.  Almost as long as the interval between OpenStack releases! But we learned a lot, and we made some modifications to our setup that will make future upgrades easier.  So we are confident that the next upgrade will be quicker.

So it all went swimmingly, right?

Like I wrote above, “mostly”.  All VMs kept running throughout the upgrade.  As announced, the “control plane” was unavailable for a few hours, during which users couldn’t start new VMs.  As also announced, there was a short interruption of network connectivity for every VM.  Unfortunately, this interruption turned out to be much longer for some VMs behind user-defined software routers.  Some of these routers were misconfigured after the upgrade, and it took us a few hours to diagnose and repair those.  Sorry about that!

What changes for me as a SWITCHengines user?

OpenStack dashboard in Juno: The new combined region and project selector

OpenStack dashboard in Juno: The new combined region and project selector

Not much, so far.  There are many changes “under the hood”, but only a few are visible.  If you use the dashboard (“Horizon”), you will notice a few slight improvements in the user interface.  For instance, the selectors for region—LS or ZH—and project—formerly called “tenant”—have been combined into a single element.

The many bug fixes between Icehouse and Juno should make the overall SWITCHengines experience more reliable.  If you notice otherwise, please let us know through the usual support channel.

What’s next?

With the upgrade finished, we will switch back to our previous agile process of rolling out small features and fixes every week or so.  There are a few old and new glitches that we know we have to fix over the next weeks.  We will also add more servers to accommodate increased usage.  To support this upgrade, we will replace the current network in the ZH region with a more scalable “leaf/spine” network architecture based on “bare-metal” switches.  We are currently testing this in a separate environment.

By the end of the year, we will have a solid infrastructure basis for SWITCHengines, which will “graduate” from its current pilot phase and become a regular service offering in January 2016.  In the SCALE-UP project, which started in August 2015 with the generous support of swissuniversities’ SUC P-2 program, many partners from the Swiss university community will work together to add higher-level services and additional platform enhancements.  Stay tuned!