data warehouse Archives - Page 2 of 2

34.7K views
18 minute read

When NOT to use Apache Kafka?

ByKai Waehner
4. January 2022
1 share

Apache Kafka is the de facto standard for event streaming to process data in motion. This blog post explores when NOT to use Apache Kafka. What use cases are not a good fit for Kafka? What limitations does Kafka have? How to qualify Kafka out as it is not the right tool for the job?

Stream Exchange for Data Sharing with Apache Kafka in a Data Mesh

9.3K views
10 minute read

Streaming Data Exchange with Kafka and a Data Mesh in Motion

ByKai Waehner
14. November 2021

Data Mesh is a new architecture paradigm that gets a lot of buzzes these days. This blog post looks into this principle deeper to explore why no single technology is the perfect fit to build a Data Mesh. Examples show why an open and scalable decentralized real-time platform like Apache Kafka is often the heart of the Data Mesh infrastructure, complemented by many other data platforms to solve business problems.

Serverless Kafka for Data in Motion as Rescue for Data at Rest in the Data Lake

7.6K views
12 minute read

Serverless Kafka in a Cloud-native Data Lake Architecture

ByKai Waehner
25. June 2021
1 share

Apache Kafka became the de facto standard for processing data in motion. Kafka is open, flexible, and scalable. Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use a serverless Kafka SaaS offering to focus on business logic. However, hybrid scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden. This blog post explores how to leverage cloud-native and serverless Kafka offerings in a hybrid cloud architecture. We start from the perspective of data at rest with a data lake and explore its relation to data in motion with Kafka.

ByKai Waehner
13. February 2017

Data Preparation: Comparison of Programming Languages, Frameworks and Tools for Data Preprocessing and (Inline) Data Wrangling in Machine Learning / Deep Learning Projects.

ByKai Waehner
20. October 2016

Log Analytics is the right framework or tool to monitor for Distributed Microservices. Comparison of Open source, SaaS and Enteprrise Products. Plus relation to big data components such as Apache Hadoop / Spark.

ByKai Waehner
4. February 2016

Slide deck from OOP 2016: Comparison of Frameworks and Products for Big Data Log Analytics and ITOA, e.g. Open Source ELK, TIBCO LogLogic / Unity, Splunk, Papertrail; Relation to Hadoop is also discussed.

ByKai Waehner
9. October 2015

Data Warehouses have existed for many years in almost every company. While they are still as good and relevant for the same use cases as they were 20 years ago, they cannot solve new, existing challenges and those sure to come in a ever-changing digital world. The upcoming sections will clarify when to still use a Data Warehouse and when to use a modern Live Datamart instead.

ByKai Waehner
10. September 2014

The article discusses what stream processing is, how it fits into a big data architecture with Hadoop and a data warehouse (DWH), when stream processing makes sense, and what technologies and products you can choose from. Comparison of open source and proprietary stream processing / streaming analytics alternatives: Apache Storm, Spark, IBM InfoSphere Streams, TIBCO StreamBase, Software AG’s Apama, etc.

ByKai Waehner
13. May 2014

Slides from my talk “Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?”…

ByKai Waehner
26. June 2013

In this blog post, I will show you how to „ETL“ all kinds of data to Amazon’s cloud data warehouse Redshift wit Talend’s big data components. You need not be a cloud or DWH expert, or an expert developer to integrate with Amazon’s cloud data warehouse Redshift. It is very easy with Talend’s integration solutions. Just drag&drop, configure, do some graphical mappings / transformations (if necessary), that’s it. Code is generated. Job runs. With Talend, you can easily „ETL“ all data from different sources to Redshift and store it there for under $1,000 per terabyte per year – even with the open source version!

Technology Evangelist

Kai Waehner

data warehouse

When NOT to use Apache Kafka?

Comparison: Data Preparation vs. Inline Data Wrangling in Machine Learning and Deep Learning Projects

Difference between a Data Warehouse and a Live Datamart?

Comparison of Stream Processing and Streaming Analytics Alternatives (Apache Storm, Spark, IBM InfoSphere Streams, TIBCO StreamBase, Software AG Apama)

“Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?” – Slides (including TIBCO Examples) from JAX 2014 Online

Global Field CTO

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Apache Flink: Overkill for Simple, Stateless Stream Processing and ETL?

Virgin Australia’s Journey with Apache Kafka: Driving Innovation in the Airline Industry