The Shift Left Architecture
Read More

The Shift Left Architecture – From Batch and Lakehouse to Real-Time Data Products with Data Streaming

Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences. This blog post introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg. Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics / AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.
Read More
RAG and Kafka Flink to Prevent Hallucinations in GenAI
Read More

Real-Time GenAI with RAG using Apache Kafka and Flink to Prevent Hallucinations

How do you prevent hallucinations from large language models (LLMs) in GenAI applications? LLMs need real-time, contextualized, and trustworthy data to generate the most reliable outputs. This blog post explains how RAG and a data streaming platform with Apache Kafka and Flink make that possible. A lightboard video shows how to build a context-specific real-time RAG architecture. Also, learn how the travel agency Expedia leverages data streaming with Generative AI using conversational chatbots to improve the customer experience and reduce the cost of service agents.
Read More
My Data Streaming Journey with Kafka and Flink - 7 Years at Confluent
Read More

My Data Streaming Journey with Kafka & Flink: 7 Years at Confluent

Time flies… I joined Confluent seven years ago when Apache Kafka was mainly used by a few tech giants and the company had ~100 employees. This blog post explores my data streaming journey, including Kafka becoming a de facto standard for over 100,000 organizations, Confluent doing an IPO on the NASDAQ stock exchange, 5000+ customers adopting a data streaming platform, and emerging new design approaches and technologies like data mesh, GenAI, and Apache Flink. I look at the past, present and future of my personal data streaming journey. Both, from the evolution of technology trends and the journey as a Confluent employee that started in a Silicon Valley startup and is now part of a global software and cloud company.
Read More
Apache Kafka and Snowflake Cost Efficiency and Data Governance
Read More

Apache Kafka + Flink + Snowflake: Cost Efficient Analytics and Data Governance

Snowflake is a leading cloud data warehouse and transitions into a data cloud that enables various use cases. The major drawback of this evolution is the significantly growing cost of the data processing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables a “shift left architecture” where business teams can reduce cost, provide better data quality, and process data more efficiently. The real-time capabilities and unification of transactional and analytical workloads using Apache Iceberg’s open table format enable new use cases and a best of breed approach without a vendor lock-in and the choice of various analytical query engines like Dremio, Starburst, Databricks, Amazon Athena, Google BigQuery, or Apache Flink.
Read More
When NOT to use Apache Kafka
Read More

When NOT to Use Apache Kafka? (Lightboard Video)

Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job? This blog post contains a lightboard video that gives you a twenty-minute explanation of the DOs and DONTs.
Read More
The Past Present and Future of Stream Processing
Read More

The Past, Present and Future of Stream Processing

Stream processing has existed for decades. The adoption grows with open source frameworks like Apache Kafka and Flink in combination with fully managed cloud services. This blog post explores the past, present and future of stream processing, including the relation of machine learning and GenAI, streaming databases, and the integration between data streaming and data lakes with Apache Iceberg.
Read More
ESG and Sustainability powered by Data Streaming with Apache Kafka and Flink
Read More

Green Data, Clean Insights: How Kafka and Flink Power ESG Transformations

This blog post explores the synergy between Environmental, Social, and Governance (ESG) principles and Kafka and Flink’s real-time data processing capabilities, unveiling a powerful alliance that transforms intentions into impactful change. Beyond just buzzwords, real-world deployments architectures across industries show the value of data streaming for better ESG ratings.
Read More
GenAI Demo with Kafka, Flink, LangChain and OpenAI
Read More

GenAI Demo with Kafka, Flink, LangChain and OpenAI

Generative AI (GenAI) enables automation and innovation across industries. This blog post explores a simple but powerful architecture and demo for the combination of Python, and LangChain with OpenAI LLM, Apache Kafka for event streaming and data integration, and Apache Flink for stream processing. The use case shows how data streaming and GenAI help to correlate data from Salesforce CRM, searching for lead information in public datasets like Google and LinkedIn, and recommending ice-breaker conversations for sales reps.
Read More
Data Streaming Landscape 2024 around Kafka Flink and Cloud
Read More

The Data Streaming Landscape 2024

The research company Forrester defines data streaming platforms as a new software category in a new Forrester Wave. Apache Kafka is the de facto standard used by over 100,000 organizations. Plenty of vendors offer Kafka platforms and cloud services. Many complementary open source stream processing frameworks like Apache Flink and related cloud offerings emerged. And competitive technologies like Pulsar, Redpanda, or WarpStream try to get market share leveraging the Kafka protocol. This blog post explores the data streaming landscape of 2024 to summarize existing solutions and market trends. The end of the article gives an outlook to potential new entrants in 2025.
Read More