Kafka-aware (L7) security for Kubernetes

Kafka-aware (L7) security for Kubernetes
dzisky.media

When it comes to Kafka, fine-grained security is very important. Usually, data flowing through Kafka comes from lots of components and often consist of information that shouldn’t be consumed by anyone unprivileged. With the help of Cilium we can enforce Kafka-aware security policies directly on the Linux kernel level, which means without any changes to the application code or container configuration.

With a standard Kafka setup, any user or application can write any messages to any topic, as well as read data from any topics.

A typical approach for securing Kafka is by issuing SSL certificates for each client and forcing Kafka brokers to verify their validity. Another option is to go for SASL which can be applied in a few different ways (classic username/password, Kerberos, etc), but it’s not really that easy to start with. And that’s only Authentication, moving forward to Authorization — things become even more complicated (using kafka-acl command in bigger scale is challenging). Let’s see how can we do it completely differently and independently from Kafka.

We are going to use a tool called Cilium which uses eBPF technology to secure Kafka on a completely different level. First, let’s talk about Cilium itself. A good explanation is on their own website:

Existing Linux network security mechanisms (e.g., iptables) only operate at the network and transport layers (i.e., IP addresses and ports) and lack visibility into the microservices layer.

Cilium brings API-aware network security filtering to Linux container frameworks like Docker and Kubernetes. Using a new Linux kernel technology called eBPF, Cilium provides a simple and efficient way to define and enforce both network-layer and application-layer security policies based on container/pod identity.

So what you need to know about Cilium is that it’s a tool that you can install on Kubernetes and it uses eBPF to bring next level or shall we say cloud-native network filtering. What’s eBPF then? Best explained by Brendan Gregg:

eBPF does to Linux what JavaScript does to HTML. (Sort of.) So instead of a static HTML website, JavaScript lets you define mini programs that run on events like mouse clicks, which are run in a safe virtual machine in the browser. And with eBPF, instead of a fixed kernel, you can now write mini programs that run on events like sending/receiving TCP packet, which are run in a safe virtual machine in the kernel. In reality, eBPF is more like the v8 virtual machine that runs JavaScript, rather than JavaScript itself. eBPF is part of the Linux kernel.

How does all this translate to securing Kafka on Kubernetes then? Cilium, by using eBPF technology can talk to Linux kernel directly and apply security policies to pods like “pod X can only read from Kafka topic Y and pod Z can write only to Kafka topic W”. And all of that is being enforced by applying simple YAML definitions onto our Kubernetes cluster.

We have enough theory now, so let’s get to the point. Assuming you have a Kafka cluster running on your Kubernetes cluster, the first thing you need to do is to install Cilium of course. I won’t cover it here as it’s pretty straightforward by following instructions on Cilium docs. If you have Cilium installed you are ready to secure your Kafka cluster with one (or more) simple YAML file(s).

As I mentioned earlier “with a standard Kafka setup, any user or application can write any messages to any topic, as well as read data from any topics”. So if you just installed Kafka it’s wide open. Let’s say we want to have one producer which is only allowed to write to two specific topics, and two consumers who can only read from one of the two topics. A very simple example but just to show you how easy we can move from a completely insecure Kafka cluster to a fully secure one. The only thing we need to do in order to achieve the above is to apply (kubectl apply -f make-my-kafka-secure.yaml) the following YAML file:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
description: "Kafka policies"
metadata:
  name: "secure-kafka"
specs:
  - endpointSelector:
      matchLabels:
        app: kafka
    ingress:
    - fromEndpoints:
      - matchLabels:
          app: kafka-producer1
      toPorts:
      - ports:
        - port: "9092"
          protocol: TCP
        rules:
          kafka:
          - role: "produce"
            topic: "example-topic"
          - role: "produce"
            topic: "another-example-topic"
    - fromEndpoints:
      - matchLabels:
          app: kafka
  - endpointSelector:
      matchLabels:
        app: kafka
    ingress:
    - fromEndpoints:
      - matchLabels:
          app: consumer1
      toPorts:
      - ports:
        - port: "9092"
          protocol: TCP
        rules:
          kafka:
          - role: "consume"
            topic: "example-topic"
  - endpointSelector:
      matchLabels:
        app: kafka
    ingress:
    - fromEndpoints:
      - matchLabels:
          app: consumer2
      toPorts:
      - ports:
        - port: "9092"
          protocol: TCP
        rules:
          kafka:
          - role: "consume"
            topic: "another-example-topic"

So what happens when we do that? Kubernetes will create a new CiliumNetworkPolicy resource which will be picked up by Cilium which from that point will monitor all Kafka pods (it will find them by labels) and block them from doing anything else than specified in the rules section of YAML file:

rules:
 kafka:
  - role: "produce"
  topic: "topicA"
  - role: "consume"
  topic: "topicB"

Let’s get back to the whole YAML file and break it down into pieces.

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
description: "Kafka policies"
metadata:
  name: "secure-kafka"

That’s pretty simple, right? As any Kubernetes resource, we need to specify apiVersion, description, kind and at least the name in the metadata section.

- endpointSelector:
      matchLabels:
        app: kafka
    ingress:
    - fromEndpoints:
      - matchLabels:
          app: kafka-producer1
      toPorts:
      - ports:
        - port: "9092"
          protocol: TCP
        rules:
          kafka:
          - role: "produce"
            topic: "example-topic"
          - role: "consume"
            topic: "another-example-topic"
    - fromEndpoints:
      - matchLabels:
          app: kafka

Next, in the spec section, we need to tell Cilium how to find out Kafka broker by providing its pod label in “endpointSelector”.

And then what rule to apply against that broker. In “ingress” > “fromEndpoint” we specify our clients — so pods which will be connecting to our Kafka cluster (so either produce or consume something to/from Kafka). “toPorts” simply tells on which port is Kafka listening and then in rules we basically tell what the pod specified in “fromEndpoints” will be allowed to do. A full list of possible rules parameters is available here:

https://docs.cilium.io/en/stable/policy/language/#layer-7-examples

…that’s how simple it is to apply Kafka-aware security policies for Kubernetes. Logic like “allow a pod to only produce on Kafka topic “topicA” and consume on topic “topicB” is easily achievable with Cilium.

You also need to keep in mind that neither code of the applications in the containers, nor pod configurations have been changed, yet we gained full control over the Authentication and Authorization of our Kafka setup by simply applying YAML configuration. Last but not least — when the consumer (or producer) tries to do something that it is not allowed to — it receives an actual Kafka error message, like:

WARN Error while fetching metadata with correlation id 20 : {example-topic=TOPIC_AUTHORIZATION_FAILED} (org.apache.kafka.clients.NetworkClient)

And here we come to the point of how Cilium works — it understands the whole traffic flowing within your cluster and can secure it accordingly. And it’s not limited to Kafka only. You can use it pretty much the same way for HTTP traffic (think of rules like “pod A can only do a GET call to pod B on /info endpoint”). If you are interested in all of its capabilities, take a look at Cilium Docs, or wait for my next article ;)