SWITCH Cloud Blog

Starting 1000 instances on SWITCHengines

1 Comment

Is it really possible with Openstack to start 1000 instances, make a parallel computation, and then save the data and delete the instances ?
To answer this question we tested it on SWITCHengines. I had a lot of troubles getting this work, and I have to thank other Openstack Operators I been chatting with: Mattia Belluco, Matteo Panella and Anton Aksola.
Our Openstack control plane is deployed with a dedicated pet VM for each Openstack service (Nova, Cinder, Neutron, Glance and Keystone) and a generic controller VM where we run the mysql and the rabbitmq services. This configuration makes possible to monitor each Openstack service as an isolated VM, and it makes easier for us to identify bottlenecks in the control plane.
For this experiment we never used the web interface, but the Openstack CLI with this reference command line.

openstack server create \
--image "Ubuntu Xenial 16.04 (SWITCHengines)" \
--flavor c1.small \
--network demo-network \
--user-data cloud-init.txt \
--key-name mykey \
--min 100 \
--max 100 test

The c1.small flavor has just 1 CPU core and 1GB of RAM.

We did the experiment in 4 steps, trying with 100, 200, 400 and 1000 instances. To make sure that the instances were really started and operational, we used cloud-init to make them phone home to a registration server. This is a very easy cloud-init feature to use, here is an example cloud-init.txt file:


#cloud-config
phone_home:
url: http://x.x.x.x:8000/$INSTANCE_ID/
post: [ hostname, fqdn ]

In this github gist we share the python code to run the registration service.

The first test with 100 instances did not work. We tried a few runs and we always had a minimum of 4 to a maximum of 7 instances that did not start for various reasons. Monitoring our control plane we noticed that we were saturating the CPUs and memory of the nova and neutron pets.
We increased the resources for both the nova and the neutron pets from 4 to 16 CPU cores and we doubled the memory from 8 GB to 16 GB.
After these changes we were able to start 100 instances without problems. We noticed that the neutron pet had an higher load than the nova pet during the process of creating 100 instances.

When we tried with 200 instances, those were all reported as Running by Openstack but we always had a minimum of 8 to a maximum of 20 instances not phoning home. Looking at the serial console with the command:

openstack console log show

we noticed that these instances were not able to get an IP address from the DHCP server, and the DHCP client would give up after 300 seconds. Using the hint that the neutron pet was more loaded than the nova pet, we found out that the nova instances reached the RUNNING state while the corresponding neutron ports were still in the BUILDING phase.
Thinking of a race condition between nova instances and neutron ports, I asked on the Openstack Developers mailing list, and it turned out that we had a wrong configuration.

We changed our nova.conf as follows:

vif_plugging_is_fatal=True
vif_plugging_timeout=300

After fixing the configuration we had the same result, but instead of the instances starting and not being able to obtain an IP address, they never started and were reported in ERROR state by Openstack.
The real challenge was not to schedule 200 instances, but to allocate 200 network ports.
Troubleshooting in this direction we observed that the rabbitmq queues of the neutron dhcp agents were filling up during the ports creation. For each created port the dhcp agent had to add a corresponding line to the file /var/lib/neutron/dhcp//host

We looked into the detail of what happens when a neutron port is created. Using the guru meditation report we traced down the culprit in a slow “ip route list” call.
This command is called everytime a neutron port is created:

time sudo ip netns exec qdhcp-7a1cfb7f-2960-45f5-903f-0d602450525a ip route list
default via 10.10.0.1 dev tapaf136b11-a5
10.10.0.0/16 dev tapaf136b11-a5 proto kernel scope link src 10.10.0.2
real 0m0.048s
user 0m0.000s
sys 0m0.016s

However calling the same command within neutron-rootwrap takes about 10 times longer:

time sudo neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec qdhcp-7a1cfb7f-2960-45f5-903f-0d602450525a ip route list dev tapaf136b11-a5
default via 10.10.0.1
10.10.0.0/16 proto kernel scope link src 10.10.0.2
real 0m0.713s
user 0m0.472s
sys 0m0.172s

Once we identified this bottleneck, we changed the configuration again to enable the Openstack rootwrap to work in daemon mode.
We had to change the agent section of neutron.conf

[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
root_helper_daemon=sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

After this change, we were able to start successfully 200, 400 and 1000 instances.
With 1000 instances we still get a HTTP 504 gateway timeout error.
This is because the nova-api server takes longer than the reverse proxy timeout to answer the request. The reverse proxy replies with HTTP 504 but the nova-api server will later finish to process the request with a HTTP 200. This is easily fixed using a longer timeout, but we plan to trace the problem in detail to shorten the processing time of the request.

Finally the answer is yes, with Openstack it is really possible to start 1000 instances quickly to have compute power just when needed.

One thought on “Starting 1000 instances on SWITCHengines

  1. Cool stuff – congrats!

    Like

What's your opinion?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s