Choosing between Apache Kafka, Azure Event Hubs, and Confluent Cloud for data streaming is critical when building a Microsoft Fabric Lakehouse. Apache Kafka offers scalability and flexibility but requires self-management and additional features for security and governance. Azure Event Hubs provides a fully managed service with tight Azure integration but has limitations in Kafka compatibility, scalability, and advanced features. Confluent Cloud delivers a complete, managed data streaming platform for analytical and transactional scenarios with enterprise features like multi-cloud support and disaster recovery. Each option caters to different needs, and this blog post will guide you in selecting the right data streaming solution for your use case.
This is part three of a blog series about Microsoft Fabric and its relation to other data platforms on the Azure cloud:
Subscribe to my newsletter to get an email about a new blog post every few weeks.
Please read the other two articles to understand why Microsoft Fabric is not a silver bullet for every data problem. And how data streaming and Microsoft Fabric are complementary. This article focuses on choosing the right data streaming service for Microsoft Fabric data ingestion and beyond for many other use cases.
Apache Kafka has established itself as the cornerstone of data streaming, offering far more than traditional messaging systems. It provides a persistent event log that guarantees ordering and enables true decoupling of data producers and consumers and data consistency across real-time, batch and request-response APIs. Kafka Connect, which facilitates seamless integration with various data sources and sinks, and Kafka Streams, which allows for continuous stateless and stateful stream processing, complement the Kafka architecture. With its robust capabilities, Kafka is used by over 150,000 organizations worldwide. This underscores its status as a new software category, as recognized in the Forrester Wave for Streaming Data.
Benefits:
Cons:
In summary, self-managed Apache Kafka does not make much sense in the cloud when you leverage other SaaS like Microsoft Fabric, Snowflake, Databricks, MongoDB Atlas, etc. Also from a TCO perspective.
The Kafka protocol has become a de facto standard for many cloud-native services, such as Azure Event Hubs, Confluent’s KORA Engine or WarpStream. It is the foundation of these cloud services without relying on some or all of the open source Kafka implementation itself to enable a cloud-native experience.
Plenty of Kafka cloud services exist in the meantime. Every large software and cloud vendor has some kind of fully managed or partially managed Kafka cloud offering. Amazon, Microsoft, Google, IBM, Oracle, etc. While Confluent is the leader in the cloud-agnostic Kafka space, there are plenty of other vendors, such as Cloudera, instaclustr, Aiven, Redpanda, Streamnative, to name a few.
Check out the latest data streaming landscape to learn more about all these Kafka (and Flink) vendors and their trade-offs.
The following focuses on a comparison between Azure Event Hubs vs. Confluent Cloud, the two most common options for Kafka on the Azure cloud. Each offers unique advantages and limitations. The following is not a complete list, but the most critical aspects to compare.
Azure Event Hubs is a proprietary, real-time data ingestion service on Microsoft Azure, designed for large-scale data ingestion into lakehouses. While it offers some Kafka API compatibility, it is not a complete replacement for Kafka.
Confluent Cloud offers a fully managed data streaming platform powered by Apache Kafka and Flink and integrates seamlessly with the Azure ecosystem. As a strategic Microsoft partner, Confluent provides a unified security, management, and billing experience, with integrations across Azure services.
Azure Event Hubs works well as the data ingestion layer into Microsoft Fabric (if you can live with the drawbacks listed above). However, it has many limitations so that you can easily quality out Azure Event Hubs as the right Kafka solution.
“Qualify out” because of product limitations is often much easier than trying to fit several products into an architecture and comparing them.
Choosing the right Kafka option requires careful consideration of your specific use cases. Here are scenarios where Azure Event Hubs may not be suitable:
If you have any of the above requirements, it is an easy decision to qualify out Azure Event Hubs. Instead, look at Confluent and other vendors that provide the required capabilities.
When embarking on a data streaming journey, it’s essential to focus on business value and long-term strategy. Establishing a data streaming organization with a center of excellence can maximize the platform’s strategic value.
Don’t just look at the first use case; a data streaming platform is strategic and adds more value as more people use the same data products. Expertise and 24/7 support are crucial, and Confluent excels in this area focusing dedicatedly on data streaming and a vast customer base. By fostering a data-driven culture, organizations can unlock the full potential of their data streaming investments.
Choosing the right data streaming platform – Apache Kafka, Azure Event Hubs, or Confluent Cloud – depends on your specific use case within the Microsoft Fabric Lakehouse and beyond. Apache Kafka offers flexibility and scalability but requires self-management. Azure Event Hubs is a good choice for plain data ingestion into the Azure ecosystem powered by OneLake and Microsoft Fabric, but has limitations in Kafka compatibility and advanced features for a more complex enterprise architecture and especially critical, operational workloads. Confluent Cloud provides a full-featured, managed service with enterprise-level capabilities, making it ideal for strategic deployments across multiple use cases. Each option has its strengths, and careful consideration of your requirements will guide you to the best fit.
What cloud services do you use for data streaming on the Azure cloud? Is the use case just data ingestion into one lakehouse or do you have multiple consumers of the data? Do you also build operational applications with the Apache Kafka ecosystem, maybe including hybrid cloud or disaster recovery scenarios? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.
Siemens Healthineers, a global leader in medical technology, delivers solutions that improve patient outcomes and…
Discover my journey to achieving Lufthansa HON Circle (Miles & More) status in 2025. Learn…
Data streaming is a new software category. It has grown from niche adoption to becoming…
Apache Kafka and Apache Flink are leading open-source frameworks for data streaming that serve as…
This blog delves into Cardinal Health’s journey, exploring how its event-driven architecture and data streaming…
In the age of digitization, the concept of pricing is no longer fixed or manual.…