Compacted Topic is a feature of Apache Kafka to persist and query the latest up-to-date event of a Kafka Topic. The log compaction and key/value search is simple, cost-efficient and scalable. This blog post shows in a success story of Intersport how some use cases store data long term in Kafka with no other database. The retailer requires accurate stock info across the supply chain, including the point of sale (POS) in all international stores.
Intersport International Corporation GmbH, commonly known as Intersport, is headquartered in Bern, Switzerland, but its roots trace back to Austria. Intersport is a global sporting goods retail group that operates a network of stores selling sports equipment, apparel, and related products. It is one of the world’s largest sporting goods retailers and has a presence in many countries around the world.
Intersport stores typically offer a wide range of products for various sports and outdoor activities, including sports clothing, footwear, equipment for sports such as soccer, tennis, skiing, cycling, and more. The company often partners with popular sports brands to offer a variety of products to its customers.
Intersport actively promotes sports and physical activity and frequently sponsors sports events and initiatives to encourage people to lead active and healthy lifestyles. The specific products and services offered by Intersport may vary from one location to another, depending on local market demand and trends.
The company automates and innovates continuously with software capabilities like fully automated replenishment, drop shipping, personalized recommendations for customers, and other applications.
Intersport presented its data streaming success story together with the system integrator DCCS at the Data in Motion Tour 2023 in Vienna, Austria.
Here is a summary about the deployment, use cases, and project lifecycle at Intersport:
No, Apache Kafka is NOT a database. Apache Kafka is a distributed streaming platform that is designed for building real-time data pipelines and streaming applications. Users frequently apply it for ingesting, processing, and storing large volumes of event data in real time.
Apache Kafka does not provide the traditional features associated with databases, such as random access to stored data or support for complex queries. If you need a database for storage and retrieval of structured data, you would typically use a database system like MySQL, PostgreSQL, MongoDB, or others with Kafka to address different aspects of your data processing needs.
However, Apache Kafka is a database if you focus on cost-efficient long-term storage and the replayability of historical data. I wrote a long article about the database characteristics of Apache Kafka. Read it to understand when (not) to use Kafka as a database. The emergence of Tiered Storage for Kafka created even more use cases.
In this blog post, I want to focus on one specific feature of Apache Kafka for long-term storage and query functionality: Compacted Topics.
Kafka is a distributed event streaming platform, and topics are the primary means of organizing and categorizing data within Kafka. “Compacted Topic” in Apache Kafka refers to a specific type of Kafka Topic configuration that is used to keep only the most recent value for each key within the topic.
In a compacted topic, Kafka ensures that, for each unique key, only the latest message (or event) associated with that key is retained. The system effectively discards older messages with the same key. A Compacted Topic is often used for scenarios where you want to maintain the latest state or record for each key. This can be useful in various applications, such as maintaining the latest user profile information, aggregating statistics, or storing configuration data.
Here are some key characteristics and use cases for compacted topics in Kafka:
Intersport stores all articles in Compacted Topics, i.e., with no retention time. Article records can change several times. Topic compaction cleans out outdated records. Only the most recent version is relevant.
A model comprises several SKUS as a nested array:
Intersport and DCCS presented their important points and benefits of leveraging Kafka. The central integration hub uses compacted topics for storing and retrieving articles:
Providing the right information at the right time is crucial across the supply chain. Data consistency matters, as not every system is real-time. This is one of the most underestimated sweet spots of Apache Kafka combining real-time messaging with a persistent event store.
Intersport is an excellent example in the retail industry for persisting information long-term in Kafka Topics leveraging Kafka’s feature “Compacted Topics“. The benefits are simple usage, cost-efficient event store of the latest up-to-date information, and fast key/value queries, and no need for another database. Hence, Kafka can replace a database for some specific scenarios, like storing and querying the inventory of each store at Intersport.
If you want to learn about other use cases and success stories for data streaming with Kafka and Flink in the retail industry, check out these articles:
How do you use data streaming with Kafka and Flink? What retail use cases did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.
Technologies like Real-Time Locating Systems (RTLS) and Digital Twin are transforming manufacturing processes in the…
As a global leader in the commercial vehicle sector, Daimler Truck is not only committed…
In the age of digitization, the concept of pricing is no longer fixed or manual.…
In the rapidly evolving landscape of intelligent traffic systems, innovative software provides real-time processing capabilities,…
In the fast-paced world of finance, the ability to prevent fraud in real-time is not…
Choosing between Apache Kafka, Azure Event Hubs, and Confluent Cloud for data streaming is critical…