Apache Hadoop is getting more and more relevant. Not just for Big Data processing (e.g. MapReduce), but also for Fast Data processing (e.g. Stream Processing). Recently, I published two blog posts on the TIBCO blog to show how you can leverage TIBCO BusinessWorks 6 and TIBCO StreamBase to realize Big Data and Fast Data Hadoop use cases.
TIBCO ActiveMatrix BusinessWorks 6 + Apache Hadoop = Big Data Integration
Apache Hadoop was built for processing complex computations on Big Data stores (that is, terabytes to petabytes) with a MapReduce distributed computation model that runs easily on cheap commodity hardware.
A Hadoop distribution from vendors such as Hortonworks, Cloudera or MapR packages different projects of the Hadoop ecosystem. This assures that all used versions work together smoothly. On top of the packaging, Hadoop vendors offer tooling for deployment, administration and monitoring of Hadoop clusters. Commercial support completes their offerings.
The key challenge is to integrate the input and results of Hadoop processing into the rest of the enterprise. Using just a Hadoop distribution requires a lot of complex coding for integration services.
Continue here for the full article: TIBCO ActiveMatrix BusinessWorks 6 + Apache Hadoop = Big Data Integration
TIBCO StreamBase + Hadoop + Impala = Fast Data Streaming Analytics
As of today, Hadoop is evolving quickly. It is not only used for batch processing anymore. YARN, Storm, Spark, and several other solutions introduce modern paradigms to Hadoop. However, some problems still remain with Hadoop:
- No good, easy development tooling for the Hadoop ecosystem components such as Hive, Storm, Spark, etc.
- Missing maturity (a lot of alpha/beta/0.x versions) especially in management and monitoring tools, as well as security, connectivity, and APIs
- No “real time” (== seconds, milliseconds, microseconds), but “near real time” (still several seconds and more, much more when recovering from infrastructure faults)
- No operational analytics (human monitoring and proactive actions)
So why not combine the great benefits of Hadoop with the Fast Data streaming analytics tool TIBCO StreamBase with its mature, mission-critical deployments in several different industries, great graphical tooling, and operational real-time analytics (via TIBCO Live Datamart on top of StreamBase)?
This post shows how to realize a Fast Data use case with TIBCO StreamBase and the Hadoop framework’s Impala analytical database quickly and easily.
Continue here for the full article: TIBCO StreamBase + Hadoop + Impala = Fast Data Streaming Analytics
For a general introduction to Stream Processing and Streaming Analytics, I recommend the InfoQ article: Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse.
As always, I appreciate any feedback…
Related Tags
Kai Waehner
builds cloud-native event streaming infrastructures for real-time data processing and analytics
1 comment
Thanks for the article. Never really looked at Tibco before. I think I’ll have a look and see if it’s something we could implement to help better understand and make actionable our real-time data coming into our hadoop cluster.