Setting up Event Store with kubernetes on google cloud

To prepare myself for my new job, which will involve some kubernetes stuff, I've played around with it somewhat lately, as you could see in this post. This post is taking things one step further without making it that much more advanced. The goal for this post is to set up a Event Store cluster on google container engine with a simple script. A prerequisite to get any of this working is that you have installed gcloud and kubectl.

If you don't want to read the whole post and just go to the code you can look it up on github. The naïve and master branches will be described in this post.

Disclaimer

What I will describe here will expose the Event Store cluster to the public and is something you should not do, I do it to make it easier to test that it works. I haven't done any performance tests or reliability tests on my setup either, which you should probably do before using it in production.

The end goal

I did two different approaches, which both will be covered in this post, that had the same end goal. I wanted to have a cluster of Event Store nodes running behind a headless kubernetes service and nginx on top of that to add access to the public. Using a headless kubernetes service will result in a service registered with a dns registration that resolves to all the IPs for the associated containers, and this is exactly what EventStore needs to to discovery through dns.

Configuring nginx

I put this section first since it is the same for both approaches.

Nginx configuraion

The configuration of nginx will be stored in a configmap and looks like this, https://github.com/mastoj/eventstore-kubernetes/blob/master/nginx/frontend.conf:

upstream es {
    server es.default.svc.cluster.local:2113;
}
server {
    listen 2113;
    location / {
        proxy_set_header    X-Real-IP $remote_addr;
        proxy_set_header    Host      $http_host;
        proxy_pass          http://es;
    }
}

If you know nginx, this is just basic configuration. First we create an upstream that can be references later on by our proxy_pass when someone is visiting the path /. The url es.default.svc.cluster.local is the dns registration that our Event Store node will get when we get to that point. The server section just defines that we should listen on port 80 and proxy the traffic to the upstream defined.

To create the configmap we can execute this command

kubectl create configmap nginx-es-frontend-conf --from-file=nginx/frontend.conf

The nginx deployment

This is basically the same as I used in the previous post, if you read that one. The specification looks like this:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: frontend-es
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: frontend-es
        track: stable
    spec:
      containers:
        - name: nginx
          image: "nginx:1.9.14"
          lifecycle:
            preStop:
              exec:
                command: ["/usr/sbin/nginx","-s","quit"]
          volumeMounts:
            - name: "nginx-es-frontend-conf"
              mountPath: "/etc/nginx/conf.d"
      volumes:
        - name: "nginx-es-frontend-conf"
          configMap:
            name: "nginx-es-frontend-conf"
            items:
              - key: "frontend.conf"
                path: "frontend.conf"

It only contains one container, the nginx one, which uses the configmap created above for configuration. We only need one replica at the moment.

To create it we run the following command

kubectl create -f deployments/frontend-es.yaml

The nginx service

To create a public IP we need to create a service on top of this deployment. The specification for the service looks like this:

kind: Service
apiVersion: v1
metadata:
  name: "frontend-es"
spec:
  selector:
    app: "frontend-es"
  ports:
    - protocol: "TCP"
      port: 2113
      targetPort: 2113
  type: LoadBalancer

To create the service and finish up the nginx part of the post we run the following command

kubectl create -f services/frontend-es.yaml

First approach - the naïve one

The first thing I wanted to do was to get a cluster up and running without persisting the data to disk, that is, only keep the data in the container. A cluster like that might work for development, but not in production. My grand masterplan was to just add persistent to that cluster after the cluster was up and running, which did not work. How to get persistent will be covered under Second approach. Before we get into the persistent part, let's get this cluster up and running.

Creating the deployment

The deployment file is really simple:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: es
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: es
    spec:
      containers:
        - name: es
          image: "eventstore/eventstore"
          env: 
            - name: EVENTSTORE_INT_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: EVENTSTORE_EXT_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: EVENTSTORE_INT_TCP_PORT
              value: "1111"
            - name: EVENTSTORE_EXT_TCP_PORT
              value: "1112"
            - name: EVENTSTORE_INT_HTTP_PORT
              value: "2114"
            - name: EVENTSTORE_EXT_HTTP_PORT
              value: "2113"
            - name: EVENTSTORE_CLUSTER_SIZE
              value: "3"
            - name: EVENTSTORE_CLUSTER_DNS
              value: "es.default.svc.cluster.local"
            - name: EVENTSTORE_CLUSTER_GOSSIP_PORT
              value: "2114"
            - name: EVENTSTORE_GOSSIP_ALLOWED_DIFFERENCE_MS
              value: "600000"
            - name: EVENTSTORE_INT_HTTP_PREFIXES
              value: "http://*:2114/"
            - name: EVENTSTORE_EXT_HTTP_PREFIXES
              value: "http://*:2113/"
          ports:
            - containerPort: 2113
            - containerPort: 2114
            - containerPort: 1111
            - containerPort: 1112

We use the Event Store docker image from docker hub. This image doesn't allow command line arguments, so we need to use environment variables to configure it. You can read about Event Store configuration here. Every container (we are running three here) need to use their own IP during configuration, and with kubernetes we can access that data through status.podIP when we configure the environment variables.

Creating is as simple as before:

kubectl create -f deployments/eventstore.yaml

That should create a three nodes, but as of this moment they will fail to find each other, and that is why we need the service.

Creating the service

Creating the service is even easier than the deployment. The specification file looks like this:

kind: Service
apiVersion: v1
metadata:
  name: "es"
spec:
  selector:
    app: "es"
  ports:
    - protocol: "TCP"
      port: 2113
      targetPort: 2113
  clusterIP: None

We only expose the 2113 port, which means that we will only be able to talk over http to the event store cluster. To the service we name all the containers that has the label app: "es". The last thing to know is that we set the clusterIP to None, this will not create one single IP in the DNS for this service, instead it will resolve to all the IPs of the containers and that is exactly what Event Store needs to be able to configure itself.

Again we are using kubectl to create the service:

kubectl create -f services/eventstore.yaml

When this is created we should now be ready to test it.

Test

To test that it works run the following command:

kubectl get services

In the result from that command you will find an External IP. To access Event Store, open the browser and go to <external ip>:2113. If everything works as expected you should now have access to Event Store.

Challenges with approach one

The major challenge is persistent data, how do you map one persistent volume to each node in the cluster? Leave a comment on the post if you have any idea.

Let us say that you solve the first problem, how do you make sure the nodes get the same persistent volume the next time you restart the cluster?

Both those two problems is what got me started working on the second approach.

There is a third problem, which we won't fix, and that is increasing the number of replicas won't work. The reason be that Event Store doesn't support elastic scaling, so increasing the number of replicas will only add "clones" to the cluster, not increase its size.

Second approach - adding persistent data

We are still going to use deployments, but instead of letting the number of replicas define the number of nodes we will create one deployment with one replica per node. This way we can force each node to get access to the same persistent data volume when it is restarted, and using deployments will also handle restart for us.

Generating deployment

The difference from the first approach here is that we will generate the deployment from a template instead of creating one. For this to work we need both a template and a simple script that generates the deployments. The template file looks like this:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: es-${nodenumber}
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: es-${nodenumber}
        escluster: es
    spec:
      containers:
        - name: es-${nodenumber}
          image: "eventstore/eventstore"
          env: 
            - name: EVENTSTORE_INT_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: EVENTSTORE_EXT_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: EVENTSTORE_INT_TCP_PORT
              value: "1111"
            - name: EVENTSTORE_EXT_TCP_PORT
              value: "1112"
            - name: EVENTSTORE_INT_HTTP_PORT
              value: "2114"
            - name: EVENTSTORE_EXT_HTTP_PORT
              value: "2113"
            - name: EVENTSTORE_CLUSTER_SIZE
              value: "${nodecount}"
            - name: EVENTSTORE_CLUSTER_DNS
              value: "es.default.svc.cluster.local"
            - name: EVENTSTORE_CLUSTER_GOSSIP_PORT
              value: "2114"
            - name: EVENTSTORE_GOSSIP_ALLOWED_DIFFERENCE_MS
              value: "600000"
            - name: EVENTSTORE_INT_HTTP_PREFIXES
              value: "http://*:2114/"
            - name: EVENTSTORE_EXT_HTTP_PREFIXES
              value: "http://*:2113/"
            - name: EVENTSTORE_DB
              value: "/usr/data/eventstore/data"
            - name: EVENTSTORE_LOG
              value: "/usr/data/eventstore/log"
          ports:
            - containerPort: 2113
            - containerPort: 2114
            - containerPort: 1111
            - containerPort: 1112
          volumeMounts:
            - mountPath: "/usr/data/eventstore"
              name: espd
      volumes:
        - name: espd
          gcePersistentDisk:
            pdName: esdisk-${nodenumber}
            fsType: ext4

The template looks almost the same as in the first case, but we have now added two; nodenumber and nodecount. We have also added a gcePersistentDisk which we mount to usr/data/eventstore, and that is the parent folder we use for logs and data in the configuration of Event Store, EVENTSTORE_DB and EVENTSTORE_LOGS. We have also added a new label escluster which will be used for the service to identify which nodes that should be included in the service.

To generate the actual deploy files we run a bash script that has the following content:

    for ((c=1; c<=$count; c++ ))
    do
        cat ../templates/es_deployment_template.yaml | sed -e "s/\${nodenumber}/$c/" | sed -e "s/\${nodecount}/$count/" > .tmp/es_deployment_$c.yaml
    done

Running that will generate one file for each node, and each file has its own volume mounted. The script as a whole can be found here.

When the files has been generated we can then run:

    for ((c=1; c<=$count; c++ ))
    do
        kubectl apply -f .tmp/es_deployment_$c.yaml
    done

This code will create one deployment per file. Since we are using deployments our pods will be restarted if they crashes.

Creating the service

This is exactly the same as in the first approach, but with a minor change. We will use the escluster label to identify the pods to add to the service. The file is here

The create cluster script

It is not reasonable to execute any of this by hand, so I created this script. I will dissect it here:

#!/bin/bash

function init {
    rm -rf .tmp
    mkdir -p .tmp
}

function validateInput {
    count=$1
    re='^[0-9]+$'
    if ! [[ $count =~ $re ]] ; then
        echo "error: Not a number" >&2; exit 1
    fi
}

The first part is just some house keeping and validating input arguments. The plan is that you should be able to create a cluster of any size by running: ./create_cluster.sh <size>.

function createSpecs {
    local count=$1
    for ((c=1; c<=$count; c++ ))
    do
        cat ../templates/es_deployment_template.yaml | sed -e "s/\${nodenumber}/$c/" | sed -e "s/\${nodecount}/$count/" > .tmp/es_deployment_$c.yaml
    done
}

function createDeployments {
    local count=$1
    for ((c=1; c<=$count; c++ ))
    do
        kubectl apply -f .tmp/es_deployment_$c.yaml
    done
}

The next part defines functions to generate the deployment files and how they should be executed.

function createEsService {
    kubectl create -f ../services/eventstore.yaml
}

This finish of the Event Store part of the script by creating a service on top of the nodes created by deployment.

function addNginxConfig {
    kubectl create configmap nginx-es-frontend-conf --from-file=../nginx/frontend.conf
}

function createFrontendDeployment {
    kubectl create -f ../deployments/frontend-es.yaml
}

function createFrontendService {
    kubectl create -f ../services/frontend-es.yaml
}

The next part is basically what we described in the section about nginx setup.

function createDisks {
    local count=$1
    for ((c=1; c<=$count; c++ ))
    do
        if gcloud compute disks list esdisk-$c | grep esdisk-$c; then
            echo "creating disk: esdisk-$c" 
            gcloud compute disks create --size=10GB esdisk-$c
        else
            echo "disk already exists: esdisk-$c"
        fi
    done
}

A simple helper function to create disks on google cloud.

function createEsCluster {
    local count=$1
    createSpecs $count
    createDeployments $count
    createEsService
}

function createFrontEnd {
    addNginxConfig
    createFrontendDeployment
    createFrontendService
}

The last two functions is just to make it easier to read what is going on.

init
validateInput $1 #sets the variable $count
createDisks $count
createEsCluster $count
createFrontEnd

With all the functions defined it is quite clear what is going on in this script.

Test

The easier way to test it is to create the cluster:

./create_cluster.sh 5

Add some data to the cluster. You do that by finding the external IP of the nginx service and then follow the getting started instructions for Event Store to add some data.

With that in place you can kill pods as you like to simulate failures with:

kubectl delete pod --now <pod id>

When you kill a pod a new should be created, but this time it should use the same persistent disk as the old one.

You can even delete the whole cluster and rebuild it again. I have added delete cluster script that will delete everything but the disks. If you then create the cluster again the data you added should still be there, since we are using the same disks for the cluster.

If you want to change the size of the cluster you can actually do that as well, just delete the cluster and create it again with a larger cluster size and that should work.

Note that if you delete and create the cluster you might end up with a new IP.

Summary

I wouldn't say that this is perfect, but it definitely a start. One drawback is that it doesn't allow for zero downtime increase of cluster size. It could probably be added, but it is out of scope for this post. I haven't tested the performance either, and that is probably something you should do before actually using it. As I mentioned earlier, in a production environment you shouldn't expose Event Store to the public.

There is probably a lot more comments one can have about this setup, but I leave that up to you. Feel free to come with both positive and negative comments :).