Databricks Archives - Kai Waehner

Lakehouse and Data Streaming - Competitor or Complementary

4.5K views
12 minute read

How Microsoft Fabric Lakehouse Complements Data Streaming (Apache Kafka, Flink, et al.)

ByKai Waehner
12. October 2024

In today’s data-driven world, understanding data at rest versus data in motion is crucial for businesses. Data streaming frameworks like Apache Kafka and Apache Flink enable real-time data processing. Meanwhile, lakehouses like Snowflake, Databricks, and Microsoft Fabric excel in long-term data storage and detailed analysis, perfect for reports and AI training. This blog post explores how these technologies complement each other in enterprise architecture.

4.5K views
8 minute read

What is Microsoft Fabric for Azure Cloud (Beyond the Buzz) and how it Competes with Snowflake and Databricks

ByKai Waehner
4. October 2024

If you ask your favorite large language model, Microsoft Fabric appears to be the ultimate solution for any data challenge you can imagine. That’s also the impression many people get from Microsoft’s sales teams. But is it really the silver bullet it’s made out to be? This article takes a closer look exploring the glossy marketing and sales definition of the platform and then deconstructing it from a more practical perspective. Learn what Microsoft Fabric is truly built for, and how it fits into the wider data landscape, especially in comparison to other major players in the data analytics market like Databricks and Snowflake.

18.3K views
8 minute read

The Shift Left Architecture – From Batch and Lakehouse to Real-Time Data Products with Data Streaming

ByKai Waehner
15. June 2024

Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences. This blog post introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg. Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics / AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.

Data Streaming Landscape 2023 with Apache Kafka Flink and much more

8.8K views
13 minute read

The Data Streaming Landscape 2023

ByKai Waehner
21. December 2022
1 share

Data streaming is a new software category to process data in motion. Apache Kafka is the de facto standard used by over 100,000 organizations. Plenty of vendors offer Kafka platforms and cloud services. Many complementary stream processing engines like Apache Flink and SaaS offerings have emerged. And competitive technologies like Pulsar and Redpanda try to get market share. This blog post explores the data streaming landscape of 2023 to summarize existing solutions and market trends.

Case Studies for Cloud Native Analytics with Data Warehouse Data Lake Data Streaming Lakehouse

7.8K views
7 minute read

Case Studies: Cloud-native Data Streaming for Data Warehouse Modernization

ByKai Waehner
18. July 2022
43 shares

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 4: Case Studies for cloud-native data streaming and data warehouses.

Data Warehouse and Data Lake Modernization with Data Streaming

6.3K views
9 minute read

Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure

ByKai Waehner
15. July 2022

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 3: Data Warehouse Modernization: From Legacy On-Premise to Cloud-Native Infrastructure.

Reverse ETL Anti Pattern vs Event Streaming with Apache Kafka

7.7K views
9 minute read

When to Use Reverse ETL and when it is an Anti-Pattern

ByKai Waehner
30. September 2021
1 share

This blog post explores why software vendors (try to) introduce new solutions for Reverse ETL, when Reverse ETL is really needed, and how it fits into the enterprise architecture. The involvement of event streaming to process data in motion is a key piece of Reverse ETL for real-time use cases.

Technology Evangelist

Kai Waehner

Databricks

How Microsoft Fabric Lakehouse Complements Data Streaming (Apache Kafka, Flink, et al.)

The Data Streaming Landscape 2023

Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure

When to Use Reverse ETL and when it is an Anti-Pattern

Global Field CTO

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j