Internet of things, cloud and mobile are the major drivers for stream processing. Use cases are network monitoring, intelligent surveillance, but also less technical things such as inventory management or fraud detection. The book helps a lot to get a basic understanding about history, concepts and patterns of the stream processing paradigm.
“Fundamentals of Stream Processing: Application Design, Systems, and Analytics” (www.amazon.com/Fundamentals-Stream-Processing-Application-Analytics/dp/1107015545) is one of only few books available about stream processing. Published in 2014 by Cambridge University Press. Authors are Henrique C. M. Andrade (JP Morgan, New York), Bugra Gedik (Bilkent University, Turkey), Deepak S. Turaga (IBM Thomas J. Watson Research Center, New York).
If you are not sure, if stream processing is relevant for you, you should take a look at my slides of a talk I gave at a conference some weeks ago: “Hadoop and Data Warehouse – Friends, Enemies or Profiteers? What about Real Time?” (https://www.kai-waehner.de/blog/2014/05/13/hadoop-and-data-warehouse-dwh-friends-enemies-or-profiteers-what-about-real-time-slides-including-tibco-examples-from-jax-2014-online/). These slides explain why stream processing is required as part of a big data architecture besides a Data Warehouse and Apache Hadoop to be able to realize “fast data” use cases. Actually, stream processing solves the problem of using a “too late architecture”.
The book starts with an introduction to stream processing explaining the motivation and need for this paradigm. Part 1 (Fundamentals) contains my favorite chapters of this book as you get a lot of knowledge about stream processing and why different academic and commercial projects and products emerged for stream processing – and how they differ. The end of part 1 gives an overview about academic systems such as TelegraphCQ, STREAM, Aurora and Borealis. Afterwards, commercial systems such as TIBCO BusinessEvents and Oracle CEP are discussed, before switching over to “real” stream processing frameworks and products: IBM InfoSphere Streams and TIBCO StreamBase as proprietary alternatives, and Twitter Storm (now Apache Storm) and Yahoo S4 as open source options.
Part 2 (application development – data flow programming) discusses some basic concepts of stream processing such as flow composition (static, dynamic, nested), flow manipulation (operators, punctuations, windowing), modularity and extensibility. Part 3 discusses the architecture of a stream processing system and its applications. Part 4 goes into more detail discussing design principles and patterns for functional / non-functional topics and data processing / transformation. Part 5 describes three use cases from different verticals (general operations monitoring, healthcare and semiconductor).
Altogether, you get a good overview about the stream processing paradigm, including history, basic concepts, design principles and use cases.
The book has a lot of great content. However, you should be aware that it is focused on IBM InfoSphere Streams (http://www-03.ibm.com/software/products/en/infosphere-streams) and its programming language SPL. All screenshots and code examples (and there are a lot!) use these. That is absolutely legitimate as the authors were involved in creating this product / this language respectively its academic ancestors.
If you want to or have to use another programming language, framework (such as Apache Storm) or product (such as TIBCO StreamBase), you can skip several parts of this book (all code examples; chapter about IBM InfoSphere Streams architecture; implementation of use cases; etc). Also, you should keep this in mind when reading the first part of the book, which compares different products.
Nevertheless, the book is still worthy reading as you learn a lot about stream processing concepts in general.
There is only a few things to criticize. What I did not like:
If you want to use a great product, you should try out TIBCO StreamBase (http://www.tibco.com/products/event-processing/complex-event-processing/streambase-complex-event-processing/default.jsp), which has much better and easier tooling than IBM InfoSphere Streams (watch out, I work for TIBCO, so this sentence might also be biased – so try out both, and decide by yourself)! Besides IBM and TIBCO, in my opinion, there are no other good stream processing products on the market at the time of this writing.
There are some nice open source frameworks such as Apache Storm (https://storm.incubator.apache.org/) or Apache Spark (http://spark.apache.org/), which support stream processing, but you will miss good tool support, maturity and commercial support here.
This book is not good for evaluation of different products as it is very IBM-biased. Nevertheless I really enjoyed reading the chapters about history, concepts and design patterns. I learned a lot about stream processing in general. As you do not have many options, if you want to buy a book about stream processing, my recommendation is to buy this one, even if you do not care about IBM’s product, but want to use another one such as TIBCO StreamBase.
Feel free to contact me for any feedback or discussion via Twitter (@KaiWaehner), email (kontakt@kai-waehner.de) or social network (LinkedIn, Xing).
Kai Wähner
Technologies like Real-Time Locating Systems (RTLS) and Digital Twin are transforming manufacturing processes in the…
As a global leader in the commercial vehicle sector, Daimler Truck is not only committed…
In the age of digitization, the concept of pricing is no longer fixed or manual.…
In the rapidly evolving landscape of intelligent traffic systems, innovative software provides real-time processing capabilities,…
In the fast-paced world of finance, the ability to prevent fraud in real-time is not…
Choosing between Apache Kafka, Azure Event Hubs, and Confluent Cloud for data streaming is critical…