Condition Monitoring and Predictive Maintenance with Apache Kafka

Apache Kafka for Condition Monitoring and Predictive Maintenance in Industrial IoT
The manufacturing industry is moving away from just selling machinery, devices, and other hardware. Software and services increase revenue and margins. Equipment-as-a-Service (EaaS) even outsources the maintenance to the vendor. This paradigm shift is only possible with reliable and scalable real-time data processing leveraging an event streaming platform such as Apache Kafka. This post explores how Kafka-native Condition Monitoring and Predictive Maintenance help with this innovation.

The manufacturing industry is moving away from just selling machinery, devices, and other hardware. Software and services increase revenue and margins. A former cost center becomes a profit center for innovation. Equipment-as-a-Service (EaaS) even outsources the maintenance to the vendor. This paradigm shift is only possible with reliable and scalable real-time data processing leveraging an event streaming platform such as Apache Kafka. This post explores how the next generation of software for Condition Monitoring and Predictive Maintenance can help build new innovative products and improve the OEE for customers.

Apache Kafka for Condition Monitoring and Predictive Maintenance in Industrial IoT

Condition Monitoring and Predictive Maintenance

Let’s define the two terms first as no standard definition exists. Some literature sees condition monitoring as a major component of predictive maintenance.  However, others interpret the latter as a more modern software leveraging machine learning. Both terms are sometimes used as synonyms, too.

Modern Maintenance Strategies and Goals

The main goal of modern maintenance strategies is a more efficient and optimized usage of machines and resources. Reactive maintenance or time-based/usage-based preventive measurements are suboptimal. Therefore, modern condition-based maintenance strategies take over.

Industrial IoT / Industry 4.0 enable several benefits on the shop floor level:

  • Maintain instead of repair
  • No (un)planned downtime
  • Maintenance optimizations and no unnecessary work
  • No negative financial impact
  • Optimized productivity
  • Improved overall equipment effectiveness (OEE)
  • Move from an isolated to a company-wide view

The machine operator is interested in the following questions:

  • Is the machine running normally? (Detect anomalies, classify errors)
  • How long can the engine still run? (Remaining useful life – RUL, time to the first failure)
  • Why does the machine run abnormally? (Sensor monitoring, root cause analysis)

Condition Monitoring and Predictive Maintenance

Condition Monitoring is the process of monitoring a parameter of condition in machinery (vibration, temperature, etc.) to identify a significant change indicative of a developing fault. It is a substantial component of predictive maintenance. The use of condition monitoring allows scheduling maintenance or taking other actions to prevent consequential damages and avoid its consequences. Condition monitoring has a unique benefit: It addresses conditions that shorten the expected lifespan before developing into a major failure.

Predictive maintenance techniques help determine the condition of in-service equipment to estimate when maintenance is necessary. The central promise of predictive maintenance is to allow convenient scheduling of corrective maintenance and prevent unexpected equipment failures.

TL;DR: Both approaches promise cost savings over routine or time-based preventive maintenance because maintenance tasks only are performed when warranted. However, modern maintenance means digitalization. That does not come for free.

Condition monitoring and predictive maintenance only work well if the infrastructure and software are reliable, scalable, and real-time. The main trade-off is a reasonable risk and costs analysis to plan the total cost of ownership (TCO) and return on investment (ROI).

Equipment as a Service (EaaS) as new Business Model

Equipment-as-a-Service (EaaS) is a business model that involves renting out equipment to end-users and collecting periodic subscription payments for using the equipment.

This service-driven business model, also known as Machine-as-a-Service, provides a variety of benefits to both sides:

  • The EaaS provider (OEMs and machine builders) can improve the product design (R&D, digital twin, etc.), plan recurring revenue, and provide predictive maintenance services.
  • The customer (manufacturers) can optimize machine utilization and productivity (with the help of the EaaS software) and reduce the overall cost (moving Capital Expenditures (CapEx) to Operating Expenses (OpEx) and reducing operations costs).

EaaS is only a successful business model if condition monitoring and predictive maintenance are stable 24/7 and continuously collect, process, and analyze incoming data streams.

Apache Kafka for Industrial IoT / Industry 4.0

Apache Kafka is the de facto standard for event streaming. Industrial IoT / Industry 4.0 deployments across the globe use event streaming in edge and hybrid cloud deployments. Here is an example of a smart factory architecture that combines event streaming in the public cloud, factories, and at the edge:
Hybrid Edge to Cloud Architecture for Low Latency with 5G Kafka and AWS Wavelength
Kafka is an information technology (IT). It collects data from operational technology (OT) devices and machines at the edge. Kafka is soft real-time and not suitable for embedded systems or robotics. If you wonder about the relation, read the post “Apache Kafka is NOT Hard Real-Time BUT Used Everywhere in Automotive and Industrial IoT“.
Nevertheless, Kafka is suitable for mission-critical low-latency use cases such as condition monitoring and predictive maintenance where the end-to-end latency is a few milliseconds. Here is an example leveraging 5G together with Kafka and ksqlDB on Kubernetes:
Low Latency 5G Use Cases with AWS Wavelength based on AWS Outposts and Confluent

Data in Motion with Event Streaming and Stream Processing

Condition monitoring and predictive maintenance require an event-based architecture to collect, process, and analyze data in motion. Traditional IIoT platforms are proprietary, inflexible, often not scalable, and not happy to integrate across different vendors and various standards. On the contrary, Kafka-native stream processing is an open, flexible, and scalable technology to implement data integration processing across IoT interfaces.
Let’s look at two examples: Stateless condition monitoring with Kafka Streams and predictive maintenance with ksqlDB and TensorFlow. To be clear: These are just examples. Any other technology can be integrated (with its pros and cons), like Apache Flink for stream processing, cloud-based ML platforms, proprietary IoT edge platforms for the last-mile integration, etc.
Here is the basic setup to build condition monitoring and predictive maintenance with Kafka:
Sensor Events from Machines PLCs Scada IoT
On the left side, we see the Kafka log that stores and forwards events. On the right side, various machines ingest sensor data in real-time. This architecture works at any scale and in real-time. Some Confluent customers leverage Confluent Cloud to process 10GB and more per second.
The IoT integration between machines, PLCs, sensors, etc., is either implemented with Kafka Connect or other APIs for MQTT, OPC-UA, REST/HTTP, files, or any different open or proprietary interface. Let’s now explore the two examples. That’s not the topic of this post. “Kafka and PLC4x for Industrial IoT Integration” and “Kafka as a Modern Data Historian” are great resources to learn more.

Stateless Condition Monitoring with Kafka Streams

The following diagram shows Kafka-native condition monitoring analyzing temperature spikes in real-time:
Stateless Condition Monitoring with Kafka Streams
The example is implemented with Kafka Streams, a Java-based library that can be embedded into any application. The business logic continuously monitors the sensor data. High volumes of data are processed in real-time. However, only relevant events showing temperature spikes over 100 degrees are forwarded to another Kafka topic. Any interested consumer gets it, for instance, a real-time alerting system or a batch report.
The application is stateless. It processes event by event. This capability is already compelling to realize streaming ETL for filtering or transformations. Any complex business logic is also possible within the application.

Stateful Predictive Maintenance with ksqlDB

While stateless stream processing is already powerful, stateful stream processing solves even more business problems. The following example shows how a Kafka-native ksqlDB microservice implements stateful stream processing to detect anomalies continuously:
Stateful Predictive Maintenance with Kafka and ksqlDB
A one-hour sliding window continuously aggregates the temperature spikes from sensors. Consumers use the data in real-time to proactively act on defined thresholds. For instance, the data science team could have analyzed historical data to determine that more than ten temperature spikes with an average of over 100 degrees significantly increase the risk of an outage. In that case, the machine operator is alerted in real-time to do maintenance.

Applied Machine Learning in Real-time with Kafka and TensorFlow

Simple business logic already solves many problems and improves the OEE and maintenance processes. Machine Learning adds additional “magic” to make condition monitoring and predictive maintenance even better.
The great news is that the architecture does not need to change. Analytic models can be embedded into a Kafka application like any other business logic. I talked about Kafka and Artificial Intelligence (AI)/Machine Learning(ML)/Deep Learning (DL) a lot in the past. Check out these posts to learn more:
Here is an example with ksqlDB and an embedded TensorFlow model:
Real Time Machine Learning with Kafka KSQL and TensorFlow
A ksqlDB user-defined function (UDF) embeds the model. This model uses an unsupervised autoencoder for anomaly detection in real-time within the Kafka application. Supervised algorithms are possible the same way.
This architecture solves the impedance mismatch between the data science team and production engineers intelligently. Data scientists use Python and a Jupyther notebook for rapid prototyping and model development. The production team deploys the ksqlDB query in a cluster for real-time scoring at scale. You can learn from an excellent Github project that implements this separation of concerns with a Kappa architecture for a Connected Car infrastructure to do predictive maintenance with MQTT and Kafka:
Kappa Architecture with Apache Kafka MQTT Kubernetes and Tensorflow for Streaming Machine Learning

Equipment-as-a-Service with Fully-Managed Kafka

Many manufacturers created a new business model: Equipment-as-a-Service (EaaS). Think about it: Many buyers do not want to operate machines and worry about maintenance. McKinsey published an excellent report about industry trends that shows why manufacturers want to provide machinery and devices as a service and get good margins:
McKinsey Report about Equipment as a Service
EaaS takes over this burden from the buyer. The machine vendor continuously monitors if the engine or other components needs maintenance. Late maintenance means an irreparable engine. Early maintenance means higher costs. The solution is to determine the service life of the engine and use optimal maintenance times. Hence, the machine vendor has to provide this subscription maintenance service the best way it can, for its interest and a better customer experience. 
Many manufacturers use Kafka and event streaming for their next-generation software solutions that run on top of the machinery or in the cloud connecting to it. Many modern IIoT services leverage a fully-managed and truly serverless Kafka solution like Confluent Cloud. The vendors want/need to focus on the business problems, not operating the infrastructure for event streaming.
Digital Twins play a vital role in this discussion; no matter if you use the buzzword or just the concepts behind it 🙂 Here are a few articles related to fully-managed Kafka for building machine-as-a-service offerings with Digital Twins:

Video Recording – Apache Kafka in Industrial IoT

Here is a video recording walking you through the use case of prediction monitoring with the Kafka ecosystem:
YouTube

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

 

Event Streaming for Next-Generation IoT Platforms and Equipment Services

This post showed how event streaming with the Kafka ecosystem enables new business models for manufacturers to sell machinery. Kafka-native stream processing allows using a single technology for different use cases such as condition monitoring or predictive maintenance. Stateless and stateful streaming analytics is beneficial to make proactive and predictive decisions in real-time at scale. This architecture is possible everywhere, in one or multiple cloud and/or regions, on-premise in data centers, at the edge outside the data center, or any combination of hybrid architectures.
Of course, other use cases not covered but necessary include integration with the ERP and MES systems, like direct connectivity between Kafka and SAP. Also, when you think about condition monitoring and predictive maintenance, not all data comes from sensors and interfaces such as OPC-UA or MQTT. Image, video, and sound processing are part of many scenarios. Kafka can handle large messages (with some trade-offs). Learn how and where this makes sense in a dedicated blog post.
How do you leverage event streaming at the shop floor level for condition monitoring and predictive maintenance? What technologies and architectures do you use? What projects did you already work on or are in the planning? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

Dont‘ miss my next post. Subscribe!

We don’t spam! Read our privacy policy for more info.
If you have issues with the registration, please try a private browser tab / incognito mode. If it doesn't help, write me: kontakt@kai-waehner.de

Leave a Reply
You May Also Like
When not to use Apache Kafka
Read More

When NOT to use Apache Kafka?

Apache Kafka is the de facto standard for event streaming to process data in motion. This blog post explores when NOT to use Apache Kafka. What use cases are not a good fit for Kafka? What limitations does Kafka have? How to qualify Kafka out as it is not the right tool for the job?
Read More
Read More

Apache Kafka + Vector Database + LLM = Real-Time GenAI

Generative AI (GenAI) enables advanced AI use cases and innovation but also changes how the enterprise architecture looks like. Large Language Models (LLM), Vector Databases, and Retrieval Augmentation Generation (RAG) require new data integration patterns. Data streaming with Apache Kafka and Apache Flink processes incoming data sets in real-time at scale, connects various platforms, and enables decoupled data products.
Read More