Hive Archives - Kai Waehner

Apache Iceberg Open Table Format for Data Lake Lakehouse Streaming wtih Kafka Flink Databricks Snowflake AWS GCP Azure

14.9K views
11 minute read

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

ByKai Waehner
13. July 2024

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.

ByKai Waehner
14. April 2015

Apache Hadoop is getting more and more relevant. Not just for big data processing (e.g. MapReduce), but also in fast data processing (e.g. stream processing). Recently, I published two blog posts on the TIBCO blog to show how you can leverage TIBCO BusinessWorks 6 and TIBCO StreamBase to realize big data and fast data Hadoop use cases.

ByKai Waehner
13. May 2014

Slides from my talk “Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?”…

Uncategorized

ByKai Waehner
14. March 2013

In March 2013, I was at 33rd Degree – “A Conference for Java Masters”. I had two talks, including a new one: “You are not Facebook or Google? Why you should still care about Big Data”. It is a great talk to give an overview about big data, especially from a business perspective (paradigm shift, business value, challenges). However, I also talk about alternatives for big data from a technology perspective, mainly about the defacto standard Apache Hadoop, its ecosystem, distributions, and tooling (i.e. big data suites).

Technology Evangelist

Kai Waehner

Hive

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

TIBCO BusinessWorks and StreamBase for Big Data Integration and Streaming Analytics with Apache Hadoop and Impala

“Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?” – Slides (including TIBCO Examples) from JAX 2014 Online

You are not Facebook or Google? Why you should still care about Big Data and Apache Hadoop Ecosystem (Pig, Hive, Hortonworks, Cloudera, MapR, Informatica, Talend)

Global Field CTO

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

How Apache Kafka and Flink Power Event-Driven Agentic AI in Real Time

The Top 20 Problems with Batch Processing (and How to Fix Them with Data Streaming)

Demo Title