Microsoft Fabric and OneLake Azure Lakehouse vs Databricks and Snowflake Cloud
Read More

What is Microsoft Fabric for Azure Cloud (Beyond the Buzz) and how it Competes with Snowflake and Databricks

If you ask your favorite large language model, Microsoft Fabric appears to be the ultimate solution for any data challenge you can imagine. That’s also the impression many people get from Microsoft’s sales teams. But is it really the silver bullet it’s made out to be? This article takes a closer look exploring the glossy marketing and sales definition of the platform and then deconstructing it from a more practical perspective. Learn what Microsoft Fabric is truly built for, and how it fits into the wider data landscape, especially in comparison to other major players in the data analytics market like Databricks and Snowflake.
Read More
Multi-Cloud Replication in Real-Time with Apache Kafka and Cluster Linking
Read More

Multi-Cloud Replication in Real-Time with Apache Kafka and Cluster Linking

Multiple Apache Kafka clusters are the norm; not an exception anymore. Hybrid integration and multi-cloud replication for migration or disaster recovery are common use cases. This blog post explores a real-world success story from financial services around the transition of a large traditional bank from on-premise data centers into the public cloud for multi-cloud data sharing between AWS and Azure.
Read More
Apache Iceberg Open Table Format for Data Lake Lakehouse Streaming wtih Kafka Flink Databricks Snowflake AWS GCP Azure
Read More

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.
Read More
Tiered Storage for Apache Kafka - Use Cases Architecture Benefits.png
Read More

Why Tiered Storage for Apache Kafka is a BIG THING…

Apache Kafka added Tiered Storage to separate compute and storage. The capability enables more scalable, reliable and cost-efficient enterprise architectures. This blog post explores the architecture, use cases, benefits, and a case study for storing Petabytes of data in the Kafka commit log. The end discusses why Tiered Storage does NOT replace other databases and how Apache Iceberg might change future Kafka architectures even more.
Read More
The Heart of the Data Mesh Beats Real Time with Apache Kafka
Read More

The Heart of the Data Mesh Beats Real-Time with Apache Kafka

If there were a buzzword of the hour, it would undoubtedly be “data mesh”! This new architectural paradigm unlocks analytic and transactional data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios. The data mesh addresses the most common weaknesses of the traditional centralized data lake or data platform architecture. And the heart of a decentralized data mesh infrastructure must be real-time, reliable, and scalable. Learn how the de facto standard for data streaming, Apache Kafka, plays a crucial role in building a data mesh.
Read More
Stream Exchange for Data Sharing with Apache Kafka in a Data Mesh
Read More

Streaming Data Exchange with Kafka and a Data Mesh in Motion

Data Mesh is a new architecture paradigm that gets a lot of buzzes these days. This blog post looks into this principle deeper to explore why no single technology is the perfect fit to build a  Data Mesh. Examples show why an open and scalable decentralized real-time platform like Apache Kafka is often the heart of the Data Mesh infrastructure, complemented by many other data platforms to solve business problems.
Read More
Serverless Kafka for Data in Motion as Rescue for Data at Rest in the Data Lake
Read More

Serverless Kafka in a Cloud-native Data Lake Architecture

Apache Kafka became the de facto standard for processing data in motion. Kafka is open, flexible, and scalable. Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use a serverless Kafka SaaS offering to focus on business logic. However, hybrid scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden. This blog post explores how to leverage cloud-native and serverless Kafka offerings in a hybrid cloud architecture. We start from the perspective of data at rest with a data lake and explore its relation to data in motion with Kafka.
Read More
How to choose the right Apache Kafka Offering - Confluent Cloudera Red Hat IBM Amazon AWS MSK
Read More

Comparison of Open Source Apache Kafka vs Vendors including Confluent, Cloudera, Red Hat, Amazon MSK

Apache Kafka became the de facto standard for event streaming. Various vendors added Kafka and related tooling to their offerings or provide a Kafka cloud service. This blog post uses the car analogy – from the motor engine to the self-driving car – to explore the different Kafka offerings available on the market. The goal is not a feature-by-feature comparison. Instead, the intention is to educate about the different deployment models, product strategies, and trade-offs from the available options.
Read More