ETL Archives - Kai Waehner

2.7K views
8 minute read

Apache Flink: Overkill for Simple, Stateless Stream Processing and ETL?

ByKai Waehner
14. January 2025

Discover when Apache Flink is the right tool for your stream processing needs. Explore its role in stateful and stateless processing, the advantages of serverless Flink SaaS solutions like Confluent Cloud, and how it supports advanced analytics and real-time data integration together with Apache Kafka. Dive into the trade-offs, deployment options, and strategies for leveraging Flink effectively across cloud, on-premise, and edge environments, and when to use Kafka Streams or Single Message Transforms (SMT) within Kafka Connect for ETL instead of Flink.

Apache Iceberg Open Table Format for Data Lake Lakehouse Streaming wtih Kafka Flink Databricks Snowflake AWS GCP Azure

14.3K views
11 minute read

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

ByKai Waehner
13. July 2024

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.

18.4K views
8 minute read

The Shift Left Architecture – From Batch and Lakehouse to Real-Time Data Products with Data Streaming

ByKai Waehner
15. June 2024

Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences. This blog post introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg. Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics / AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.

Snowflake and Apache Kafka Data Integration Anti Patterns Zero Reverse ETL

5.3K views
9 minute read

Snowflake Integration Patterns: Zero ETL and Reverse ETL vs. Apache Kafka

ByKai Waehner
19. April 2024

Snowflake is a leading cloud-native data warehouse. Integration patterns include batch data integration, Zero ETL and near real-time data ingestion with Apache Kafka. This blog post explores the different approaches and discovers its trade-offs. Following industry recommendations, it is suggested to avoid anti-patterns like Reverse ETL and instead use data streaming to enhance the flexibility, scalability, and maintainability of enterprise architecture.

The State of Data Streaming for Healthcare in 2023 with Apache Kafka and Flink

3.7K views
6 minute read

The State of Data Streaming for Healthcare with Apache Kafka and Flink

ByKai Waehner
27. November 2023

This blog post explores the state of data streaming for the healthcare industry powered by Apache Kafka and Apache Flink. IT modernization and innovation with pioneering technologies like sensors, telemedicine, or AI/machine learning are explored. I look at enterprise architectures and customer stories from Humana, Recursion, BHG (former Bankers Healthcare Group), and more. A complete slide deck and on-demand video recording are included.

Data Streaming with Apache Kafka at Airlines - Lufthansa Case Study

6.8K views
5 minute read

How Lufthansa uses Apache Kafka for Middleware and Analytics

ByKai Waehner
24. September 2023

Aviation and travel are notoriously vulnerable to social, economic, and political events, as well as the ever-changing expectations of consumers. The coronavirus was just a piece of the challenge. This post explores how Lufthansa leverages data streaming powered by Apache Kafka as cloud-native middleware for mission-critical data integration projects and as data fabric for AI/machine learning scenarios such as real-time predictions in fleet management. An interactive conversation with Lufthansa as an on-demand video is added at the end as a highlight if you want to learn more.

Streaming ETL with Apache Kafka in Healthcare

5.3K views
4 minute read

Streaming ETL with Apache Kafka in the Healthcare Industry

ByKai Waehner
1. April 2022

IT modernization and innovative new technologies change the healthcare industry significantly. This blog series explores how data streaming with Apache Kafka enables real-time data processing and business process automation. This is part three: Streaming ETL. Examples include Babylon Health and Bayer.

20.4K views
13 minute read

When to use Apache Camel vs. Apache Kafka?

ByKai Waehner
28. January 2022
4 shares

Should I use Apache Camel or Apache Kafka for my next integration project? The question is very valid and comes up regularly. This blog post explores both open-source frameworks and explains the difference between application integration and event streaming. The comparison discusses when to use Kafka or Camel, when to combine them, when not to use them at all. A decision tree shows how you can quickly qualify out one for the other.

9.4K views
13 minute read

Is Apache Kafka an iPaaS or is Event Streaming its own Software Category?

ByKai Waehner
3. November 2021
3 shares

This post explores why Apache Kafka is the new black for integration projects, how Kafka fits into the discussion around cloud-native iPaaS solutions, and why event streaming is a new software category. A concrete real-world example shows the difference between event streaming and traditional integration platforms respectively iPaaS.

Reverse ETL Anti Pattern vs Event Streaming with Apache Kafka

7.7K views
9 minute read

When to Use Reverse ETL and when it is an Anti-Pattern

ByKai Waehner
30. September 2021
1 share

This blog post explores why software vendors (try to) introduce new solutions for Reverse ETL, when Reverse ETL is really needed, and how it fits into the enterprise architecture. The involvement of event streaming to process data in motion is a key piece of Reverse ETL for real-time use cases.

Technology Evangelist

Kai Waehner

ETL

Apache Flink: Overkill for Simple, Stateless Stream Processing and ETL?

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

Streaming ETL with Apache Kafka in the Healthcare Industry

When to use Apache Camel vs. Apache Kafka?

When to Use Reverse ETL and when it is an Anti-Pattern

Global Field CTO

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Demo Title