Event streaming with Apache Kafka at the edge is not cutting edge anymore. It is a common approach to providing the same open, flexible, and scalable architecture at the edge as in the cloud or data center. Possible locations for a Kafka edge deployment include retail stores, cell towers, trains, small factories, restaurants, etc. I already discussed the concepts and architectures in detail in the past: “Apache Kafka is the New Black at the Edge” and “Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments“. This blog post is an add-on focusing on use cases across industries for Kafka at the edge.
To be clear before you read on: Edge is NOT a data center.
And “Edge Kafka” is not simply yet another IoT project using Kafka in a remote location. Edge Kafka is actually an essential component of a streaming nervous system that spans IoT (or OT in Industrial IoT) and non-IoT (traditional data-center / cloud infrastructures).
The post’s focus is scenarios where the Kafka clients AND the Kafka brokers are running on the edge. This enables edge processing, integration, decoupling, low latency, and cost-efficient data processing.
Some IoT projects are built like “normal Kafka projects”, i.e., built in the (edge) data center or cloud. For instance, bigger factories can provide infrastructure to deploy a reliable Kafka cluster with stable network connectivity to the cloud. Unfortunately, many IoT projects require real edge capabilities.
What’s different at the edge?
Offline business continuity is important even if the connection to the central data center or cloud is not available. Disconnected / offline sites do often not require or provide high availability (because it is not worth the efforts): Local pre-processing, real-time analytics with low latency, only online (i.e., connection to the data center or cloud) from time to time or with low bandwidth.
Often these projects need to deploy Kafka brokers across hundreds of locations. A single broker is often good enough, without high availability, but for back pressure and local processing. Use cases exist across industries, including retail stores, trains, restaurants, cell towers, small factories, etc.
Low-footprint, low-touch, little-or-no-DevOps-required installations of Kafka brokers (not just clients) are mandatory for many of these use-cases. In these cases, no IT experts are available “on-site” to operate Kafka. Hence, using certified OEM hardware is a great option to install and operate Kafka at the edge.
Hybrid and not cloud-only: Consumer IoT (CIoT) always includes the users in their smart home, ride-share, retail store, etc.), Industrial IoT (IIoT) always includes tangible good (cars, food, energy, …)
Using one single technical infrastructure enables building edge and hybrid architectures. No need for a ton of different frameworks and products is required. This is a huge benefit from a development, testing, operations, support point of view!
Industries for Kafka at the Edge include manufacturing, pharma, carmakers, telecommunications, retailing, energy, restaurants, gaming, healthcare, public sector, aerospace, transportation, and others.
Architectures and use cases include data integration, pre-processing and replication to the cloud, big and small data edge processing, and analytics, disconnected offline scenarios, very low footprint scenarios with hundreds of locations, scenarios without the high-availability, and others.
Various examples for Kafka deployments at the edge exist. Almost all of these use cases are related to several of the above categories and requirements, such as low hardware footprint, disconnected offline processing, hundred of locations, and hybrid architectures.
I have worked with enterprises across industries and the globe on the following scenarios:
A great practical example of edge computing in retailing is fast-food chain Chick-fil-A. They deployed a Kubernetes cluster in each of their 2000 restaurants for real-time analytics at the edge without an internet connection. The hardware is pretty small and provides an Intel quadcore processor with 8 GB RAM and SSD:
Let’s make this “Kafka at the edge” thing more clear with a specific example. In this case, I use the railway and transportation industry. But this can easily be mapped to your industry and use case.
The following example shows an edge and hybrid solution for railways to improve the customer experience and increase the revenue of the railway company. It leverages offline edge processing for customer communication, replication to the cloud for analytics, and integration with 3rd party interfaces and APIs from partners.
Local processing at the edge is happening on the train. But each train also replicates relevant data in real-time to the cloud – if there are internet connectivity and free network resources. If the train is not online, Kafka is handling the backpressure and replicating to the cloud when online again:
Kafka on the train is NOT just used for real-time messaging and handling backpressure. These are already great reasons for using Kafka at the edge. Still, the even bigger value is created when Kafka is also used for data integration (restaurant, traveler information, loyalty system, etc.) and data processing (up-/cross-selling, real-time delay information, etc.) at the edge. This way, only one single platform is required to solve all the different problems:
Trains (and many other edge locations) are offline regularly. For instance, a train drives through a tunnel or reaches an area with no cell connectivity. Local processing is still possible. Business continuity is the key to improve customer experience and increase sales processes – even if the train disconnected from the internet. Passengers can still use the mobile app to see traveler information, buy food in the restaurant, or watch movies stores on the train’s local server. As soon as the train has internet connectivity again, the purchases from passengers are transferred to the loyalty system in the cloud, the latest delay information is consumed from the cloud and stored on the edge Kafka broker in the train, etc. etc. etc.:
Data processing does not stop with the hybrid integration between the edge (train) and cloud (CRM, loyalty system, etc.). Different divisions or partner companies need to integrate, too. Instead of using non-scalable, synchronous REST API calls / API Management for partner integration, streaming replication with Kafka-native technologies is a much better, scalable approach:
I hope this story about Kafka at the edge helped you better understand how you can leverage event streaming in your industry and use cases to build an end-to-end streaming infrastructure from edge to cloud.
Finally, it is important to discuss how to deploy Kafka at the edge. To be clear: Kafka still needs some computing power.
Obviously, this depends on many factors: The hardware vendors and infrastructure you are working with, specific SLAs and HA requirements, and so on. The good news is that Kafka can be deployed in many infrastructures, including bare metal, VMs, containers, Kubernetes, etc. The other good news is that new hardware for computing resources (even for the “edge”) typically has 4, 8, or even 16GB RAM because this is the smallest chip vendors produce these days (for these environments such as small factories, retail stores, etc).
Minimum hardware requirements for running a very small footprint Kafka are a single-core processor and a few 100MB RAM. This already allows decent edge processing with 100+Mb/sec throughput on a single Kafka node (with replication factor = 1). However, real values depend on the number of partitions, message size, network speed, and other characteristics. Don’t expect the same performance and scalability as in the data center or cloud!
Thus, you can deploy a Kafka broker on a Raspberry Pi, but not on some small embedded device! The latter is where the Kafka clients can run.
Check out the “Infrastructure Checklist for Apache Kafka at the Edge” if you plan to go that direction!
From a technical perspective, deployment of Kafka at the edge is the same as in a data center or cloud. However, the environment and requirements are a little bit different as we learned above. Some additional features definitely help with deploying and operating Kafka at the edge.
I work for Confluent. Hence, I provide you “The Confluent Way” of deploying Kafka at the edge in your future projects, including innovative, differentiating features:
Kafka is a great solution for the edge. It enables deploying the same open, scalable, and reliable technology at the edge, data center, and the cloud. This is relevant across industries. Kafka is used in more and more places where nobody has seen it before. Edge sites include retail stores, restaurants, cell towers, trains, and many others. I hope the various use cases and architectures inspired you a little bit.
What are your experiences at the edge? What are your use cases? Did you or do you plan to use Apache Kafka and its ecosystem? What is your strategy? Let’s connect on LinkedIn and discuss it!
In the age of digitization, the concept of pricing is no longer fixed or manual.…
In the rapidly evolving landscape of intelligent traffic systems, innovative software provides real-time processing capabilities,…
In the fast-paced world of finance, the ability to prevent fraud in real-time is not…
Choosing between Apache Kafka, Azure Event Hubs, and Confluent Cloud for data streaming is critical…
In today's data-driven world, understanding data at rest versus data in motion is crucial for…
If you ask your favorite large language model, Microsoft Fabric appears to be the ultimate…
View Comments