Apache Kafka became the de facto standard for processing data in motion across enterprises and industries. Cybersecurity is a key success factor across all use cases. Kafka is not just used as a backbone and source of truth for data. It also monitors, correlates, and proactively acts on events from real-time and batch data sources to detect anomalies and respond to incidents. This blog series explores use cases and architectures for Kafka in the cybersecurity space, including situational awareness, threat intelligence, forensics, air-gapped and zero trust environments, and SIEM / SOAR modernization. This post is part six: SIEM / SOAR Modernization.
This blog series explores why security features such as RBAC, encryption, and audit logs are only the foundation of a secure event streaming infrastructure. Learn about use cases, architectures, and reference deployments for Kafka in the cybersecurity space:
SIEM (Security information and event management) and SOAR (security orchestration, automation and response) are terms coined by Gartner (like so often in the industry).
SIEM combines security information management (SIM) and security event management (SEM). They provide analysis of security alerts generated by applications and network hardware. Vendors sell SIEM as software, as appliances, or as managed services; these products are also used for logging security data and generating reports for compliance purposes.
SOAR tools automate security incident management investigations via a workflow automation workbook. The cyber intelligence API enables the playbook to automate research related to the ticket (lookup potential phishing URL, suspicious hash, etc.). The first responder determines the criticality of the event. At this level, it is either a normal or an escalation event. SOAR includes security incident response platforms (SIRPs), Security orchestration and automation (SOA), and threat intelligence platforms (TIPs).
In summary, SIEM and SOAR are key pieces of a modern cybersecurity infrastructure. The capabilities, use cases, and architectures are different for every company.
In practice, many products in this area will mix these functions, so there will often be some overlap. Many commercial vendors also promote their own terminology.
The leaders in Gartner’s Magic Quadrant for SIEM 2021 are Exabeam, IBM, Securonix, Splunk, Rapid, LogRhythm. Elastic is a niche player for SIEM but very prevalent in Kafka architectures for other use cases.
The Gartner Market Guide for SOAR 2020 includes Anomali, Cyware, D3 Security, DFLabs, EclecticIQ, FireEye, Fortinet (CyberSponse), Honeycomb.
These are obviously not complete lists of SIEM and SOAR vendors. Even more complex: Gartner says, “SIEM vendors are adopting and acquiring/integrating SOAR solutions in their ecosystems”. Now, if you ask another research analyst or the vendors themselves, you will get even more different opinions 🙂
Hence, as always in the software business, do your own evaluation to solve your business problems. Instead of evaluating vendors, you might first check your pain points and capabilities that solve the problems. Capabilities include data aggregation, correlation, dashboards, alerting, compliance, forensic analysis to implement log analytics, threat detection, and incident management.
If you read some of the analyst reports or vendor websites/whitepapers, it becomes clear that the Kafka ecosystem also has many overlaps regarding capabilities.
SIEM and SOAR platforms provide you with various challenges:
The consequence is a complex and expensive spaghetti architecture with different proprietary protocols:
The first post of this blog series explored why Kafka helps a central streaming backbone to avoid such a costly and complex spaghetti architecture. But let’s dig deeper into this discussion.
A modern cybersecurity architecture has the following requirements:
Kafka is NOT a SIEM or SOAR. But it enables an open, real-time, and portable data architecture:
The following diagram how the event streaming platform fits into the enterprise architecture together with SIEM/SOAR and other applications:
Here are the benefits of this approach for an enterprise architecture:
Kafka-powered enterprise architecture is open and flexible—no need to try a big bang change. Business units change solutions based on their current need. Often, a combination of different technologies and migration from legacy to modern tools is the consequence. Streaming ETL as ingestion and pre-processing layer for SIEM/SOAR is one use case for Kafka.
However, ingestion and preprocessing for SIEM/SOAR is a tiny fraction of what Kafka can do with the traffic. Other business applications, ML platforms, BI tools, etc., can also consume the data from the event-based central nervous system. At their own speed. With their own technology. That’s what makes Kafka so unique and successful.
This section shows a few reference architectures and migration scenarios for SIEM and SOAR deployments.
Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine. Elasticsearch is developed in Java and dual-licensed under the source-available Server Side Public License and the Elastic license. Other parts fall under the proprietary (source-available) Elastic License.
The open-source Elasticsearch project is NOT a complete SIEM but often a key building block. That’s why it comes up in SIEM discussions regularly.
Most scalable Elasticsearch architectures leverage Kafka as the ingestion layer. Other tools like Logstash or Beam are okay for small deployments. But Kafka provides various advantages:
The following diagram shows the reference architecture for an end-to-end integration from data sources via Kafka into Elastic:
Splunk provides proprietary software. It is a leading SIEM player in the market. Splunk makes machine data accessible across an organization by identifying data patterns, providing metrics, diagnosing problems, and providing intelligence for business operations. The biggest complaint I hear about Splunk regularly is the costly licensing model. Also, the core of Splunk (like almost every other SIEM) is based on batch processing.
The combination of Kafka and Splunk reduces costs significantly. Here is the Confluent reference architecture:
A prominent example of combining Confluent and Splunk is Intel’s Cyber Intelligence Platform (CIP), which I covered in part 3 of this blog series.
The above architecture shows an open, flexible architecture. Splunk provides several integration points. But at its core, Splunk uses the proprietary S2S (“Splunk-to-Splunk”) protocol. All universal forwarders (UF) broadcast directly to indexers or heavy forwarders (HF). The Confluent Splunk S2S Source Connector provides a way to integrate Splunk with Apache Kafka. The connector receives data from Splunk UFs.
This approach allows customers to cost-effectively & reliably read data from Splunk Universal Forwarders to Kafka. It enables users to forward data from universal forwarders into a Kafka topic to unlock the analytical capabilities of the data.
The direct S2S integration is beneficial if a company does not have Kafka out front and the data goes straight to Splunk indexers. To leverage Kafka, the connector “taps” into the Universal Forwarder infrastructure. Often, companies have 10,000s of UFs. If you are in the lucky situation of not having hundreds or thousands of Splunk UFs in place, then the regular Splunk sink connector for Kafka Connect might be sufficient for you.
The data is processed, filtered, aggregated in real-time at scale with Kafka-native tools such as Kafka Streams or ksqlDB. The processed data is ingested into Splunk; and potentially many other real-time or batch consumers that are completely decoupled from Splunk.
Most customers I talk to don’t use just one tool for solving their cybersecurity challenges. For that reason, Kafka is the perfect backbone for true decoupling and ingestion layer for different SIEM and SOAR tools:
Kafka’s commit log stores the incoming data. Each consumer consumes the data as it can. In this example, different SIEM tools consume the same events. Elasticsearch and Splunk consume the same raw data in different near real-time or batch processes. Both are completely independent in how they consume the events.
IBM QRadar cannot process high volumes of data. Hence, ksqlDB continuously preprocesses the raw data and puts it into a new Kafka topic. QRadar consumes the aggregated data.
Obviously, SOAR can consume data similarly. As discussed earlier in this article, the SOAR functionality could also be part of one of the SIEMs. However, then it would (have to) consume the data in real-time to provide true situational awareness.
In the field, I see several reasons to migrate workloads away from a deployed SIEM:
Groupon published an exciting success story: “We Replaced Splunk at 100TB Scale in 120 Days“:
The new platform leverages Kafka for high volume processing in real-time, migration, and backpressure handling. Elasticsearch provides reports, analytics, and dashboards.
The article also covers the repeating message that Logstash is not ideal for these kinds of workloads. Hence, the famous ELK stack with Elasticsearch, Logstash, and Kibana in most real-world deployments is actually an EKK stack with Elasticsearch, Kafka, and Kibana.
This story should make you aware that Logstash, FluentD, Sumo Logic, Cribl, and other log analytics platforms are built for exactly this use case. Kafka-native processing enables the same but also many other use cases.
Another key advantage of Kafka is the ability to operate as a resilient, hybrid migration pathway from on-premise to one or multiple clouds. Confluent can be deployed everywhere to coordinate log traffic across multiple data centers and cloud providers. I explored hybrid Kafka architectures in another blog post in detail.
I covered various SIEMs in this post. SOAR is a more modern concept than SIEM. Hence, the awareness and real-world deployments are still limited. I am glad that I found at least one public example I can show you in this post.
Cortex Data Lake is a Kafka-native SOAR that collects, transforms, and integrates enterprise security data at scale in real-time. Billions of messages pass through their Kafka clusters. Confluent Schema Registry enforces data governance. Palo Alto Networks (PANW) has multiple Kafka clusters in production with a size from 10 to just under 100 brokers each. Check out Palo Alto Networks’ engineering blog for more details.
Here is the architecture of Cortex Data Lake:
PANW’s design principles overlap significantly with the unique characteristics of Apache Kafka:
If you look at these design principles, it is obvious why the backbone of PANW’s product is Kafka.
SIEM and SOAR are critical for every enterprise’s cybersecurity strategy. One SIEM/SOAR is typically not good enough. It is either not cost-efficient or includes scalability/performance issues.
Kafka-native SIEM/SOAR modernization is prevalent across industries. The central event-based backbone enables the integration with different SIEM/SOAR products. Other consumers (like ML platforms, BI tools, business applications) can also access the data. Innovative Kafka integrations like Confluent’s S2S connector enable the modernization of monolithic Splunk deployments and significantly reduce costs. Many next-generation SOARs such as PANW’s Cortex Data Lake are even based on top of Kafka.
Do you use any SIEM / SOAR? How and why do you use or plan to use the Kafka ecosystem together with these tools? How does your (future) architecture look like? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.
In the age of digitization, the concept of pricing is no longer fixed or manual.…
In the rapidly evolving landscape of intelligent traffic systems, innovative software provides real-time processing capabilities,…
In the fast-paced world of finance, the ability to prevent fraud in real-time is not…
Choosing between Apache Kafka, Azure Event Hubs, and Confluent Cloud for data streaming is critical…
In today's data-driven world, understanding data at rest versus data in motion is crucial for…
If you ask your favorite large language model, Microsoft Fabric appears to be the ultimate…