SWITCH Cloud Blog


Leave a comment

Enable Keystone federated users to use CLI tools with Application Credentials

On this blog I talked in the past about Keystone authentication for your Kubernetes cluster. The solution described works great if you have Openstack users stored in the keystone mysql database. However, in real production systems, it is common to access Openstack with a Federated login. The web login works with a redirect to an Identity provider that will confirm the user identity and will redirect again to the Openstack dashboard.

It is a well known problem that the federated login process needs to go through web pages redirects to enter the necessary information, and this does not work for users that need to authenticate with CLI tools. In our case the CLI tool is kubectl.

A small team of people at SWITCH and GARR worked jointly to find a solution for this use case.

The good news is that the Keystone developers already implemented a solution for this problem, the Keystone Application Credentials. This is a feature available since the Queens release of Openstack. The key idea is that a user can login on the web interface with the federated login process, and then from the dashboard identity panel he can create new credentials to be consumed directly from CLI tools.

The following three screenshoots show the user journey to create an Application Credential in the Openstack Horizon dashboard:

Select Application Credential in the Identity Panel

Select Application Credential in the Identity Panel

 

 

Enter the data to create an application credential

Enter the data to create an application credential

 

 

Download the openrc file to store your application credential

Download the openrc file to store your application credential

 

So what is missing to authenticate with kubectl using a keystone application credential ?

Starting at kubectl v1.11 the specific cloud providers API implementations were moved out of the kubernetes source tree. You have to modify our client configuration as follows:

contexts:
- context:
    cluster: kubernetes
    namespace: keystone-daadb4bcc9704054b108de8ed263dfc2
    user: openstackgarr
  name: garr

users:
- name: openstackgarr
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: /bin/client-keystone-auth
      args:
      - --domain-name=myDomain
      - --keystone-url=https://keystone.doman.tld:5000/v3
      env:
      - name: OS_USERNAME
        value: username
      - name: OS_PASSWORD
        value: secret
      - name: OS_PROJECT_NAME
        value: test
 

Note that in this configuration snippet we are still using username and password. If you want to test this setup make sure the client-keystone-auth version is newer than v0.2.0 or that is patched including commit 66961abd. The version v0.2.0 is not able to request keystone project scoped token, so your setup will not work.

The software client-keystone-auth uses the golang library gophercloud to talk to the Kubernetes API.

To reach our goal, the first step was to patch gophercloud to implement the application_credential authentication method described in the Queens spec.

This patch enables any golang application to easily access the application credentials authentication method, so it could be useful to other golang software tools, like for example Terraform.

Please note that the patch implements just the possibility to issue a token authenticating with the application credential. I did not propose a gophercloud patch that implements the full create/update/delete workflow for application credentials, because this went beyond the scope of this work.

Once the gophercloud PR was merged I proposed the a PR for client-keystone-auth to use the new gophercloud feature.

At the time of this writing the latest PR is still not merged, so you will need to compile the code yourself to test it.

Now that all the code is there, we can change the client configuration to use the application credential instead of username and password as we show in the example below:

contexts:
- context:
    cluster: kubernetes
    namespace: keystone-daadb4bcc9704054b108de8ed263dfc2
    user: openstackgarr
  name: garr

users:
- name: openstackgarr
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: /bin/client-keystone-auth
      args:
      - --domain-name=myDomain
      - --keystone-url=https://keystone.doman.tld:5000/v3
      - --user-name=username
      - --application-credential-name=kuberneteslogin
      - --application-credential-secret=thisismysecret 

You can also use environment variables instead of command line arguments. The client-keystone-auth supports the same variable names as the official openstack client.

Conclusion: you can today use kubectl with your keystone application credentials, this is really useful if you have a federated login to the Openstack cloud.

I would like to thank the people that participated to this development work with me, especially Giuseppe Attardi , Roberto Di Lallo  and Joe Topjian that helped in implementation, discussion and code review.


Openstack Horizon runs on Kubernetes in production at SWITCH

In April we upgraded the SWITCHengines OpenStack Horizon dashboard to the OpenStack Pike version. But this upgrade was a little bit special, it was more than an Horizon upgrade from Newton to Pike.

Our Horizon deployment is now hosted on a Kubernetes cluster. The cluster is deployed using the playbook k8s-on-openstack that we actively develop. We have been testing this Kubernetes deployment for a while, but it is only when you have to deploy an application on top of it in production that you really learn and you fix real problems.

Horizon is a good application to start learning Kubernetes, because it is completely stateless and it does not require any persistent storage. It is just a GUI to the OpenStack API. The user logs in with his credentials, and Horizon will get a token and will start making API calls with the user’s credentials.

Running Horizon in a single Kubernetes pod for a demo takes probably 5 minutes, but deploying for production usage is far more complex. We needed to address the following issues:

  • Horizontally scale the number of pods, keeping a central memcached or redis cache
  • Allow both IPv4 and IPv6 access to engines.switch.ch
  • Define the Load Balancing architecture
  • Implement a persistent logging system

If you want to run to the solution of all these problems, you can have a look at the project SWITCH-openstack-horizon-k8s-deployment where we have published all the Dockerfiles and the Kubernetes descriptors to recreate our deployment.

Scale Horizontally

Horizon performs much faster when it accesses a memory cache, it is the recommended way to deploy in production. We decided to go for Redis cache.

Creating a Redis service in our namespace with the name redis-master we are able to use the special environment variable ${REDIS_MASTER_SERVICE_HOST} when booting the Horizon container, to make sure all the instances point to the same cache server.

This is a good example of how you combine two services together in a Kubernetes namespace. We can horizontally scale the Horizon pods, but the Horizon deployment is independent from the Redis deployment.

IPv4 and IPv6

We always publish our services on IPv6. In our previous Kubernetes demos we used the OpenStack LBaaS to expose services to the outside world. Unfortunately in the Newton version of OpenStack, the LBaaS lacks proper IPv6 integration. To publish a production service on Kubernetes, we suggest to use an ingress controller. There are several kinds available, but we used the standard Nginx ingress controller. The key idea is that we have a K8s node with an interface exposed to the public Internet where a privileged Docker container is running with –net=host. The container runs Nginx that can bind to IPv6 and IPv4 on the node, but of course it can also reach any other pod on the cluster network.

Define the Load Balancing architecture

I already wrote above that if you need IPv6, you should not use the Openstack LBaaSv2. However I am going to explain why I would not use that kind of load balancer even for IPv4.

The first picture shows the network diagram of a LBaaSv2 deployment. The LoadBalancer is implemented as a network namespace on the network node, called qlbaas-<uuid>, in which a HAProxy process is running. This is a L4 LoadBalancer. The bad thing of this architecture is that when an instance boots, the default gateway configured via DHCP will be the IP address of the neutron router. When we expose a service with the floating IP configured on the outer interface of the LBaaS, in order to force the traffic to follow a symmetric return path, the Load Balancer must perform a DNAT and SNAT operation. This means that the IP packets hitting the Pod have completely lost the information about the source IP address of the original client. Because it is a pure L4 load balancer, we don’t have the possibility to carry this lost information on in a HTTP header. This prevents the operator from building any useful logging system, because once the traffic arrives at the pod, the information about the client is filtered out.

In the next picture we have a look on how the Nginx ingress works. In this case the external traffic is received on a public floating IP that is configured on the virtual machine running the ingress pod, in this case on the master. We terminate the TLS connection at the nginx-ingress. This is necessary because the ingress also has to perform a SNAT and DNAT but it adds to the HTTP requests the X-Forwarded-For header that we use to populate our log files. We could not add the header if we were just moving encrypted packets around.

Another advantage of this solution is that it uses just a normal instance to implement the ingress, this means that you can use in a totally independent way from the version of OpenStack you are running on.

In the future you might be able to use the newer OpenStack Octavia Load Balancer, but at the moment I did not investigate that. All I know is that the solution is really similar, but you will have an OpenStack service VM running an Nginx instance.

Implement a persistent logging system

Pods are short lived and distributed over different VMs that are also ephemeral. To collect the logs, we run docker with the log-driver journald. Once this is set up, all the docker containers running on the host will send their logging output to journald. We then collect this information with journalbeat to send the data to our elastic search cluster. This part is not yet released into our public playbook because is not very portable. If you don’t have a ready-to-use ELK cluster, you would have no benefit from running journalbeat.

Conclusion

It is now almost a month that we have been running in production, and we found the system to be robust and stable. We had no complaints from our users, so we can say that the migration was seamless for our users. We have learned a lot from this experience.

In the next blog post we will describe how we implemented the metrics monitoring, to observe how much memory and CPU cores each pod is consuming. Make sure you keep an eye on our blog for updates.


2 Comments

Openstack Keystone authentication for your Kubernetes cluster

At SWITCH we are looking to provide a container platform as a Service solution. We are working on Kubernetes and Openshift to gauge what is possible and how a service could be structured. It would be really nice to use the existing Openstack username and password to authenticate to Kubernetes. We tested this solution and it works great.

How does it work ? Lets start from the client side.

Kubernetes users use the kubectl client to access the cluster. The good news is that since version v1.8.0 of the client, kubectl is able to read the usual openstack env variables, contact keystone to request a token, and forward the request to the kubernetes cluster using the token. This was merged the 7th of August 2017. I could not find anywhere how to correctly configure the client to use this functionality. Finally I wrote some documentation notes HERE.

How does it work on the Kubernetes master side ?

The Kubernetes API receives a request with a keystone token. In the Kubernetes language this is a Bearer Token. To verify the keystone token the Kubernetes API server will use a WebHook. What does it means ? That the Kubernetes API will contact yet another Kubernetes component that is able to authenticate the keystone token.

The k8s-keystone-auth component developed by Dims makes exactly this. I tested his code and I created a Docker container to integrate the k8s-keystone-auth in my kube-system namespace. When you run the k8s-keystone-auth container your pass as an argument the URL of your keystone server.

If you are deploying your cluster with k8s-on-openstack you find this integration summarized in a single commit.

Now that everything is setup I can try:

source ~/openstackcredentials
kubectl get pods

I will be correctly authenticated by keystone that will verify my identity, but I will have no authorization to do anything:

Error from server (Forbidden): pods is forbidden: User "saverio.proto@switch.ch" cannot list pods in the namespace "default"

This is because we need to set up some authorization for this keystone user. You can find detailed documentation about RBAC but I make here a simple example:

kubectl create rolebinding saverio-view --clusterrole view --user saverio.proto@switch.ch --namespace default

Now the my user is able to view anything at the default namespace, and I will be able to do kubectl get pods

Of course setting up RBAC specific rules for every user is not optimal. You can at least use the keystone projects, that are mapped to kind: Group in Kubernetes. Here an example:

---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
 name: read-pods
 namespace: default
subjects:
- kind: Group
 name: <openstack_project_uuid>
 apiGroup: rbac.authorization.k8s.io
roleRef:
 kind: Role
 name: pod-reader
 apiGroup: rbac.authorization.k8s.io

You can then achieve a “soft multitenancy” where every user belonging to a specific keystone project is limited with permissions to a specific namespace. I talk about soft multitenancy because all the pods from all the namespaces, depending on your networking solution, could end up on the same network with a completely open policy.

I would like to thank Dims and the other people on the Slack channel #sig-openstack for the great help while developing this Kubernetes deployment.


Deploy Kubernetes v1.8.3 on Openstack with native Neutron networking

Hello,
I wrote in the past how to deploy Kubernetes on SWITCHengines (Openstack) using this ansible playbook. When I wrote that article, I did not care about the networking setup, and I used the proposed weavenet plugin. I went to Sydney at the Openstack Summit and I saw the great presentation from Angus Lees. It was the right time to see the presentation because I recently watched this video where they explain the networking of Kubernetes when running on GCE. Going back to Openstack, Angus mentioned that the Kubernetes master can talk to neutron, to inject routes in the tenant router to provide connectivity without NAT among the pods that live in different instances. This would make easier the troubleshooting, and would leave MTU 1500 between the pods.

It looked very easy, just use:

--network-plugin=kubenet

and specify in the cloud config the router uuid.

Our first tests with version 1.7.0 did not work. First of all I had to fix the Kubernetes documentation, because the syntax to specify the router UUID was wrong. Then I had a problem with Security groups disappearing from the instances. After troubleshooting and asking for help on the Kubernetes slack channel, I found out that I was hitting a gophercloud known bug.

The bug was already fixed in gophercloud at the time of my finding, but I learned that Kubernetes freezes an older version of this library in the folder “vendor/github.com/gophercloud/gophercloud”. So the only way to get the updated library version was to upgrade to Kubernetes v1.8.0, or any newer version including this commit.

After a bit of testing every works now. The changes are summarised in this PR, or you can just use the master branch from my git repository.

After you deploy, the K8s master will assign from network ClusterCIDR (usually a /16 address space) a smaller /24 subnet per each Openstack instance. The Pods will get addresses from the subnet assigned to the instance. The kubernetes master will inject static routes to the neutron router, to be able to route packets to the Pods. It will also configure the neutron ports of the instances with the correct allowed_address_pairs value, so that the traffic is not dropped by the Openstack antispoofing rules.

This is what a show of the Openstack router looks like:

$ openstack router show b11216cb-a725-4006-9a55-7853d66e5894 -c routes
+--------+--------------------------------------------------+
| Field  | Value                                            |
+--------+--------------------------------------------------+
| routes | destination='10.96.0.0/24', gateway='10.8.10.3'  |
|        | destination='10.96.1.0/24', gateway='10.8.10.8'  |
|        | destination='10.96.2.0/24', gateway='10.8.10.11' |
|        | destination='10.96.3.0/24', gateway='10.8.10.10' |
+--------+--------------------------------------------------+

And this is what the allowed_address_pairs on the port of one instance looks like:

$ openstack port show 42f2a063-a316-4fe2-808c-cd2d4ed6592f -c allowed_address_pairs
+-----------------------+------------------------------------------------------------+
| Field                 | Value                                                      |
+-----------------------+------------------------------------------------------------+
| allowed_address_pairs | ip_address='10.96.1.0/24', mac_address='fa:16:3e:3e:34:2c' |
+-----------------------+------------------------------------------------------------+

There is of course more work to be done.

I will improve the ansible playbook to create automatically the Openstack router and network, at the moment these steps are done manually before starting the playbook.

Working with network-plugin=kubenet is actually deprecated, so I have to understand what is the long term plan for this way of deployment.

The Kubernetes master is still running on a single VM, the playbook can be extended to have an HA setup.

I really would like to have feedback from users of Kubernetes on Openstack. If you use this playbook please let me know, and if you improve it, the Pull Requests on github are very welcome! 🙂


6 Comments

Starting 1000 instances on SWITCHengines

Is it really possible with Openstack to start 1000 instances, make a parallel computation, and then save the data and delete the instances ?
To answer this question we tested it on SWITCHengines. I had a lot of troubles getting this work, and I have to thank other Openstack Operators I been chatting with: Mattia Belluco, Matteo Panella and Anton Aksola.
Our Openstack control plane is deployed with a dedicated pet VM for each Openstack service (Nova, Cinder, Neutron, Glance and Keystone) and a generic controller VM where we run the mysql and the rabbitmq services. This configuration makes possible to monitor each Openstack service as an isolated VM, and it makes easier for us to identify bottlenecks in the control plane.
For this experiment we never used the web interface, but the Openstack CLI with this reference command line.

openstack server create \
--image "Ubuntu Xenial 16.04 (SWITCHengines)" \
--flavor c1.small \
--network demo-network \
--user-data cloud-init.txt \
--key-name mykey \
--min 100 \
--max 100 test

The c1.small flavor has just 1 CPU core and 1GB of RAM.

We did the experiment in 4 steps, trying with 100, 200, 400 and 1000 instances. To make sure that the instances were really started and operational, we used cloud-init to make them phone home to a registration server. This is a very easy cloud-init feature to use, here is an example cloud-init.txt file:


#cloud-config
phone_home:
url: http://x.x.x.x:8000/$INSTANCE_ID/
post: [ hostname, fqdn ]

In this github gist we share the python code to run the registration service.

The first test with 100 instances did not work. We tried a few runs and we always had a minimum of 4 to a maximum of 7 instances that did not start for various reasons. Monitoring our control plane we noticed that we were saturating the CPUs and memory of the nova and neutron pets.
We increased the resources for both the nova and the neutron pets from 4 to 16 CPU cores and we doubled the memory from 8 GB to 16 GB.
After these changes we were able to start 100 instances without problems. We noticed that the neutron pet had an higher load than the nova pet during the process of creating 100 instances.

When we tried with 200 instances, those were all reported as Running by Openstack but we always had a minimum of 8 to a maximum of 20 instances not phoning home. Looking at the serial console with the command:

openstack console log show

we noticed that these instances were not able to get an IP address from the DHCP server, and the DHCP client would give up after 300 seconds. Using the hint that the neutron pet was more loaded than the nova pet, we found out that the nova instances reached the RUNNING state while the corresponding neutron ports were still in the BUILDING phase.
Thinking of a race condition between nova instances and neutron ports, I asked on the Openstack Developers mailing list, and it turned out that we had a wrong configuration.

We changed our nova.conf as follows:

vif_plugging_is_fatal=True
vif_plugging_timeout=300

After fixing the configuration we had the same result, but instead of the instances starting and not being able to obtain an IP address, they never started and were reported in ERROR state by Openstack.
The real challenge was not to schedule 200 instances, but to allocate 200 network ports.
Troubleshooting in this direction we observed that the rabbitmq queues of the neutron dhcp agents were filling up during the ports creation. For each created port the dhcp agent had to add a corresponding line to the file /var/lib/neutron/dhcp/$UUID/host. Where $UUID is the corresponding Neutron network UUID.

We looked into the detail of what happens when a neutron port is created. Using the guru meditation report we traced down the culprit in a slow “ip route list” call.
This command is called everytime a neutron port is created:

time sudo ip netns exec qdhcp-7a1cfb7f-2960-45f5-903f-0d602450525a ip route list
default via 10.10.0.1 dev tapaf136b11-a5
10.10.0.0/16 dev tapaf136b11-a5 proto kernel scope link src 10.10.0.2
real 0m0.048s
user 0m0.000s
sys 0m0.016s

However calling the same command within neutron-rootwrap takes about 10 times longer:

time sudo neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec qdhcp-7a1cfb7f-2960-45f5-903f-0d602450525a ip route list dev tapaf136b11-a5
default via 10.10.0.1
10.10.0.0/16 proto kernel scope link src 10.10.0.2
real 0m0.713s
user 0m0.472s
sys 0m0.172s

Once we identified this bottleneck, we changed the configuration again to enable the Openstack rootwrap to work in daemon mode.
We had to change the agent section of neutron.conf

[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
root_helper_daemon=sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

After this change, we were able to start successfully 200, 400 and 1000 instances.
With 1000 instances we still get a HTTP 504 gateway timeout error.
This is because the nova-api server takes longer than the reverse proxy timeout to answer the request. The reverse proxy replies with HTTP 504 but the nova-api server will later finish to process the request with a HTTP 200. This is easily fixed using a longer timeout, but we plan to trace the problem in detail to shorten the processing time of the request.

Finally the answer is yes, with Openstack it is really possible to start 1000 instances quickly to have compute power just when needed.


Deploy Kubernetes on the SWITCHengines Openstack cloud

Increasing demand for container orchestration tools is coming from our users. Kubernetes has currently a lot of hype, and often it comes the question if we are providing a Kubernetes cluster at SWITCH.

At the moment we suggest that our users deploy their own Kubernetes cluster on top of SWITCHengines. To make sure our Openstack deployment works with this solution we tried ourself.

After deploying manually with kubeadm to learn the tool, I found a well written ansible playbook from Francois Deppierraz. I extended the playbook to make Kubernetes aware that SWITCHengines implements the LBaaSv2, and the patch is now merged in the original version.

The first problem I discovered deploying Kubernetes is the total lack of support for IPv6. Because instances in SWITCHengines get IPv6 addresses by default, I run into problems running the playbook and nothing was working. The first thing you should do is to create your own tenant network with a router, with IPv4 only connectivity. This is already explained in detail in our standard documentation.

Now we are ready to clone the ansible playbook:

git clone https://github.com/infraly/k8s-on-openstack

Because the ansible playbook creates instances through the Openstack API, you will have to source your Openstack configuration file. We extend a little bit the usual configuration file with more variables that are specific to this ansible playbook. Lets see a template:

export OS_USERNAME=username
export OS_PASSWORD=mypassword
export OS_PROJECT_NAME=myproject
export OS_PROJECT_ID=myproject_uuid
export OS_AUTH_URL=https://keystone.cloud.switch.ch:5000/v2.0
export OS_REGION_NAME=ZH
export KEY=keyname
export IMAGE="Ubuntu Xenial 16.04 (SWITCHengines)"
export NETWORK=k8s
export SUBNET_UUID=subnet_uuid
export FLOATING_IP_NETWORK_UUID=network_uuid

Lets review what changes. It is important to add also the variable OS_PROJECT_ID because the Kubernetes code that creates Load Balancers requires this value, and it is not able to extract it from the project name. To find the uuid just use the Openstack cli:

openstack project show myprojectname -f value -c id

The KEY is the name of an existing keypair that will be used to start the instances. The IMAGE is also self explicative, at the moment only Xenial is tested by me. The variable NETWORK is the name of the tenant network you created earlier. When you created a network you created also a subnet, and you need to set the uuid into SUBNET_UUID. The last variable is FLOATING_IP_NETWORK_UUID that tells kubernetes the network where to get floating IPs. In SWITCHengines this network is always called public, so you can extract the uuid like this:

openstack network show public -f value -c id

You can customize your configuration even more, reading the README file in the git repository you will find more options like the flavors to use or the cluster size. When your configuration file is ready you can run the playbook:

source /path/to/config_file
cd k8s-on-openstack
ansible-playbook site.yaml

It will take a few minutes to go through all the tasks. When everything is done you can ssh into the kubernetes master instance and check that everything is running as expected:

ubuntu@k8s-master:~$ kubectl get nodes
NAME         STATUS    AGE       VERSION
k8s-1        Ready     2d        v1.6.2
k8s-2        Ready     2d        v1.6.2
k8s-3        Ready     2d        v1.6.2
k8s-master   Ready     2d        v1.6.2

I found very useful adding bash completion for kubectl:

source <(kubectl completion bash)

Lets deploy an instance of nginx to test if everything works:

kubectl run my-nginx --image=nginx --replicas=2 --port=80

This will create two containers with nginx. You can monitor the progress with the commands:

kubectl get pods
kubectl get events

At this stage you have your containers running but the service is still not accessible from the outside. One option is to use the Openstack LBaaS to expose it, you can do it with this command:

kubectl expose deployment my-nginx --port=80 --type=LoadBalancer

The expose command will create the Openstack Load Balancer and will configure it. To know the public floating ip address you can use this command to describe the service:

ubuntu@k8s-master:~$ kubectl describe service my-nginx
Name:			my-nginx
Namespace:		default
Labels:			run=my-nginx
Annotations:		
Selector:		run=my-nginx
Type:			LoadBalancer
IP:			10.109.12.171
LoadBalancer Ingress:	10.8.10.15, 86.119.34.151
Port:				80/TCP
NodePort:			30620/TCP
Endpoints:		10.40.0.1:80,10.43.0.1:80
Session Affinity:	None
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  1m		1m		1	service-controller			Normal		CreatingLoadBalancer	Creating load balancer
  10s		10s		1	service-controller			Normal		CreatedLoadBalancer	Created load balancer

Conclusion

Following this blog post you should be able to deploy Kubernetes on Openstack to understand how things work. For a real deployment you might want to make some customisations, we encourage you to share any patch to the ansible playbook with github pull requests.
Please note that Kubernetes is not bug free. When you will delete your deployment you might find this bug where Kubernetes is not able to delete correctly the load balancer. Hopefully this is fixed by the time you read this blog post.