Shift Left Architecture Archives

1.3K views
5 minute read

Online Model Training and Model Drift in Machine Learning with Apache Kafka and Flink

ByKai Waehner
23. February 2025

The rise of real-time AI and machine learning is reshaping the competitive landscape. Traditional batch-trained models struggle with model drift, leading to inaccurate predictions and missed opportunities. Platforms like Apache Kafka and Apache Flink enable continuous model training and real-time inference, ensuring up-to-date, high-accuracy predictions. This blog explores TikTok’s groundbreaking AI architecture, its use of data streaming for real-time recommendations, and how businesses can leverage Kafka and Flink to modernize their ML pipelines. I also examine how data streaming complements platforms like Databricks, Snowflake, and Microsoft Fabric to create scalable, adaptive AI systems.

Apache Iceberg Open Table Format for Data Lake Lakehouse Streaming wtih Kafka Flink Databricks Snowflake AWS GCP Azure

15.0K views
11 minute read

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

ByKai Waehner
13. July 2024

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.