Kafka Operator for Kubernetes – Confluent Operator to establish a Cloud-Native Apache Kafka Platform

Confluent Operator is now GA for production deployments (Download Confluent Operator for Kafka here). This is a Kafka Operator for Kubernetes which provides automated provisioning and operations of an Apache Kafka cluster and its whole ecosystem (Kafka Connect, Schema Registry, KSQL, etc.) on any Kubernetes infrastructure.

I want to share a slide deck which explains:

  • Why Kubernetes is getting more and more traction to build a cloud-native infrastructure
  • Why this is relevant for Apache Kafka and Confluent Platform
  • The challenges running Kafka on Kubernetes
  • How Confluent Operator solves these problems providing a powerful Kafka Operator for Kubernetes

Cloud-Native vs. SaaS / Serverless

Software as a Service (SaaS) and Serverless Platforms provide software and services in the public cloud as managed service. Cloud-native infrastructures allow you to leverage the features of SaaS / Serverless in your own self-managed infrastructure (either on premise or in public cloud, and without vendor-lockin).

What is Cloud Native? Many different definitions exist on the web. Two definitions which I like are “The Twelve-Factor App” and “10 key characteristics of The New Stack“.

Some of the key benefits of cloud-native infrastructures:

  • Scalable
  • Flexible
  • Agile
  • Elastic
  • Automated

This is very different from traditional bare metal or VM infrastructures. Even if you use containers like Docker, you don’t automatically get above benefits. Providing cloud-native infrastructure is a key requirement to build a DevOps infrastructure and culture. Note that technology is just one part of a fully successful DevOps mentality, of course.

Kubernetes Won the Container War

In the beginning, many cloud-native container platforms built their own cloud-native technology and infrastructure. Many of these solutions were open source, but only one took over. Just take a look at these Google Trends of last five years:

In the meantime, most cloud-native infrastructure providers (such as Red Hat OpenShift, Mesosphere, Pivotal Cloud Foundry) moved their whole strategy around supporting Kubernetes. These vendors enhance the user experience and add additional features to differentiate from vanilla Kubernetes. OpenShift made this decision a few years earlier than most others; take a look how the above trends reflect this decision. Furthermore, Kubernetes is also available as managed service on all major cloud providers (AWS, Azure, GCP) in the meantime.

Stateful Kubernetes Deployments using Operator Pattern

Kubernetes was mainly used for stateless deployments in the early phases (for instance to deploy REST microservices). Today, people deploy everything on Kubernetes because it adds a lot of value – as discussed in the section about cloud-native infrastructure above. This includes the Kafka backend and clients.

Stateful deployments of backend services leverage the Kubernetes Operator pattern. For many infrastructure components, like databases, messaging, search engines, etc. The implementation of the Operator Pattern includes standard Kubernetes objects like StatefulSets, ConfigMaps, Secrets and Persistent Volumes. However, the secret sauce are the custom Kubernetes Controller and Custom Resource Definitions (CRDs) which implement unique application functionality for the specific stateful deployment.

Challenges running Kafka on Kubernetes

Apache Kafka became the de facto standard for event streaming platforms. Apache Kafka and its ecosystem provides a powerful option to build reliable, scalable, mission-critical distributed systems. Therefore, as you can image, it is harder to operate than a traditional messaging system or database which do not scale elastically without downtime and just use active/passive for high availability.

Kubernetes environments are similar: Very powerful but not easy to operate. Hence the combination of both, Kafka and Kubernetes, does not make it easier. Here are some challenges running the Apache Kafka ecosystem on Kubernetes:

  • Translating an existing architecture to Kubernetes
  • Failover handling
  • Data rebalancing
  • Communication between ZooKeeper, Kafka Brokers, Clients (Java, REST, Connect, KSQL), Schema Registry, etc.
  • External access from / to outside Kubernetes cluster
  • Persistent storage options on premise and in the cloud
  • Security configuration
  • Rolling upgrades
  • Etc.

This is the secret sauce which a Kubernetes Operator has to implement and automate. Consequently, a Kafka Operator sounds like a very good and valuable component.

Confluent Operator as Kafka Operator to establish a Cloud-Native Kafka Platform

Confluent has long experience running Kafka on Kubernetes:

Confluent Cloud runs on Kubernetes using a Kafka Operator to offer “Serverless Kafka”: Confluent Cloud provides mission-critical SLAs on all three major cloud providers (Google GCP, Microsoft Azure, Amazon AWS), consumption-based pricing and throughput of several GB / sec using a single Kafka cluster. Seems like running Kafka on Kubernetes using a Kafka Operator is not a bad idea.

Slide Deck: Confluent Operator for Kafka Ecosystem on Kubernetes

My slide deck describes the journey and the features of Confluent Operator to deploy and operate Kafka in a cloud-native way similar to how Kafka and its ecosystem (like Kafka Connect, Schema Registry, KSQL) is deployed in Confluent Cloud.

Confluent Operator enables you to:

  • Provisioning, management and operations of Confluent Platform (including ZooKeeper, Apache Kafka, Kafka Connect, KSQL, Schema Registry, REST Proxy, Control Center)
  • Deployment on any Kubernetes Platform (Vanilla K8s, OpenShift, Rancher, Mesosphere, Cloud Foundry, Amazon EKS, Azure AKS, Google GKE, etc.)
  • Automate provisioning of Kafka pods in minutes
  • Monitor SLAs through Confluent Control Center or Prometheus
  • Scale Kafka elastically, handle fail-over & Automate rolling updates
  • Automate security configuration
  • Built on our first hand knowledge of running Confluent at scale
  • Fully supported for production usage

Here is the Agenda of the slide deck:

  • Cloud Native vs. SaaS / Serverless Kafka
  • The Emergence of Kubernetes
  • Kafka on K8s Deployment Challenges
  • Confluent Operator as Kafka Operator

Click on the button to load the content from www.slideshare.net.

Load content

Also check out the documentation for Confluent Operator.

Please let me know if you have any comments or feedback.

Kai Waehner

bridging the gap between technical innovation and business value for real-time data streaming, processing and analytics

Recent Posts

CIO Summit: The State of AI and Why Data Streaming is Key for Success

The CIO Summit in Amsterdam provided a valuable perspective on the state of AI adoption…

14 hours ago

Cathay: From Premium Airline to Integrated Travel Ecosystem with Data Streaming

Cathay Pacific is evolving beyond aviation, rebranding as Cathay to offer a seamless travel and…

4 days ago

How Data Streaming and AI Help Telcos to Innovate: Top 5 Trends from MWC 2025

As the telecom and tech industries rapidly evolve, real-time data streaming is emerging as the…

7 days ago

Data Streaming as the Technical Foundation for a B2B Marketplace

A B2B data marketplace empowers businesses to exchange, monetize, and leverage real-time data through self-service…

1 week ago

Data Streaming with Apache Kafka and Flink in the Media Industry: Disney+ Hotstar and JioCinema

The $8.5 billion merger of Disney+ Hotstar and Reliance’s JioCinema marks a transformative moment for…

2 weeks ago

Online Model Training and Model Drift in Machine Learning with Apache Kafka and Flink

The rise of real-time AI and machine learning is reshaping the competitive landscape. Traditional batch-trained…

3 weeks ago