Apache Iceberg Open Table Format for Data Lake Lakehouse Streaming wtih Kafka Flink Databricks Snowflake AWS GCP Azure
Read More

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.
Read More

TIBCO BusinessWorks and StreamBase for Big Data Integration and Streaming Analytics with Apache Hadoop and Impala

Apache Hadoop is getting more and more relevant. Not just for big data processing (e.g. MapReduce), but also in fast data processing (e.g. stream processing). Recently, I published two blog posts on the TIBCO blog to show how you can leverage TIBCO BusinessWorks 6 and TIBCO StreamBase to realize big data and fast data Hadoop use cases.
Read More

You are not Facebook or Google? Why you should still care about Big Data and Apache Hadoop Ecosystem (Pig, Hive, Hortonworks, Cloudera, MapR, Informatica, Talend)

In March 2013, I was at 33rd Degree – “A Conference for Java Masters”. I had two talks, including a new one: “You are not Facebook or Google? Why you should still care about Big Data”. It is a great talk to give an overview about big data, especially from a business perspective (paradigm shift, business value, challenges). However, I also talk about alternatives for big data from a technology perspective, mainly about the defacto standard Apache Hadoop, its ecosystem, distributions, and tooling (i.e. big data suites).
Read More