An advertising platform requires real-time capabilities to provide dynamic targeting, ad personalization, ad fraud detection, budget allocation, and event-driven marketing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables context-specific advertising at any scale. Real-world success stories from Pinterest, Uber, Reddit, Unity, buzzwil, and TV-Insight show different solutions and architectures for serving ads in marketing campaigns, embedded into mobile apps, and as SaaS software products.
An advertising (ads) platform is a digital system or service that allows businesses and advertisers to create, manage, and optimize their advertising campaigns across various channels. These platforms provide tools and features to target specific audiences, allocate budgets, track performance, and measure the effectiveness of advertising efforts.
Examples of advertising platforms include Google Ads, Facebook Ads, and programmatic advertising platforms that automate ad placement across websites and apps. These platforms play a crucial role in digital marketing, enabling advertisers to reach their target audience online and achieve their marketing objectives.
Navigating these challenges requires a data-driven platform and a deep understanding of the digital advertising landscape, constant monitoring, and optimization.
An advertising platform should be real-time for several important reasons:
In today’s fast-paced digital advertising landscape, where user behavior and market conditions can change rapidly, real-time capabilities are essential for advertisers to stay competitive, make data-driven decisions, and maximize the impact of their advertising campaigns. Real-time advertising platforms empower advertisers to be more agile, responsive, and effective in reaching their target audience.
Apache Kafka combines real-time messaging at any scale with true decoupling through its event store. The data streaming platform collects data, correlates real-time and historical events with stream processing, and shares created information with downstream consumers.
One of the most underestimated capabilities is the out-of-the-box capability of Apache Kafka to ensure data consistency across real-time and non-real-time systems. The heart of the enterprise architecture is real-time, scalable, and reliable. But any near real-time, batch or request-response communication can produce or consume at its own pace with its own API or programming language.
Apache Flink is ideal for data correlation. No matter if the task is data integration (aka streaming ETL) or advanced stateful business and application logic. Apache Kafka and Apache Flink are a match made in heaven for data streaming.
Real-world success stories show how data streaming with Kafka and Flink helps build a next-generation advertising platform. These technologies solve the abovementioned challenges to provide real-time and consistent information across all applications.
Advertising platforms are either directly embedded into customer-facing applications or built as software or SaaS products that other companies buy and leverage.
The following success stories explore ad platforms built with data streaming:
Pinterest is an American image-sharing and social media service designed to enable the saving and discovery of information (specifically “ideas”) like recipes, home, style, motivation, and inspiration on the internet.
The content of ads is very close to the actual content. Naturally, users engage with the content and ads:
Pinterest talked about its Kafka-powered advertising platform for the first time in 2018 at a Kafka Summit. The Ad platform leverages Kafka for the data ingestion pipeline and stream processing with Kafka Streams to enable a real-time feedback loop. Recommendation engine (via machine learning), budgeting, and new ads exploration are some of the critical use cases.
The continuous feedback loop enables real-time updates in seconds. Stateful stream processing with Kafka Streams correlates events from users, ads, budget, and other interfaces to decide on ads serving.
Real-time (even at an extreme scale) is critical for Pinterest. When a new ad is created, the ads platform does not know about the user engagement with this ad on different surfaces. The faster the ads platform knows about the performance of the newly created ad, the better value can be provided to the user.
There is a balance between exploiting good ads and exploring new ads. The solution was adding a boosting factor to new ads to increase the probability of winning an auction.
Listen to the talk from Pinterest for more details, best practices, and lessons learned in developing and operating a scalable, real-time advertising platform with stateful stream processing using Kafka Streams.
Buzzvil provides a lock screen advertising platform that connects partners and advertisers:
Buzvill’s advertising platform is data-driven and built with Apache Kafka in the cloud. It optimizes ad spending through automation, behavioral analytics, audience targeting, rewards programs, and more. Data streaming enables a single source of truth for real-time ad transaction data.
They built the ad platform with Apache Kafka in a fully managed Confluent Cloud to focus on business logic and faster time-to-market.
Data streaming with Apache Kafka enables 18x faster data updates for ad bidding. Confluent Cloud saves 20-30% infrastructure cost.
TV-Insight developed a solution to help Joint Industry Committees (JIC), Broadcasters, and Advertisers to improve and evolve the data quality of existing TV measurement panels using return path data of connected devices.
The essential difference between TV-Insight and all other “panel boosting” initiatives and products is that TV-Insight uses real-time data. Therefore, it can provide a live TV reach for live decisions of regular TV ad blocks.
The TV-Insight application collects data from the Smart TV or Set-Top Box via GDPR compliance device tracking. The live extrapolation enables advertising optimization:
The technical architecture and data pipeline look like the following. Apache Kafka is the real-time messaging platform and event store. Apache Kafka’s stateful stream processing correlates events to calculate real-time ad serving in the advertising platform.
Unity is a cross-platform game engine developed by Unity Technologies. The engine has since been gradually extended to support a variety of desktop, mobile, console, and virtual reality platforms. The engine can create three-dimensional (3D) and two-dimensional (2D) games, interactive simulations, and other experiences. Industries outside video gaming have adopted the engine, such as film, automotive, architecture, engineering, and construction.
In 2019, Unity apps and content were installed 33 billion times, reaching 3 billion devices worldwide.
The 3D development platform and game engine is not the only product of Unity Technologies. Unity Ads is one of the largest monetization networks in the world:
Unity is a data-driven company:
A single data pipeline provides the foundational infrastructure for analytics, R&D, monetization, cloud services, etc., for real-time and batch processing leveraging Apache Kafka:
If you want to learn about Unity’s success story of migrating this platform from self-managed Kafka to the cloud, read the post on the Confluent Blog: “How Unity uses Confluent for real-time event streaming at scale“.
Uber provides an exciting food delivery app capability: Uber Eats allows ads embedding. With this ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more.
Uber wrote an excellent article that focuses on how they leveraged open source technology to build Uber’s first near real-time exactly-once events processing system. Uber leverages Kafka, Flink, and Pinot for its advertising platform. This perfectly combines the right technologies.
As Uber writes: “With every ad served, there are corresponding events per user (impressions, clicks). The responsibility of the ad events processing system is to manage the flow of events, cleanse them, aggregate clicks and impressions, attribute them to orders, and provide this data in an accessible format for reporting and analytics as well as dependent clients (e.g., other ads systems).”
While speed, scale, and reliability are always crucial for such a system, I want to emphasize the part about accuracy and why exactly-once processing with Kafka and Flink was a critical piece of the architecture.
The Aggregation Job implemented with Apache Flink does a lot of the heavy lifting: Data cleansing, persistence for order attribution, aggregation, and record UUID generation.
Exactly-once with Kafka and Flink is very important, as their blog post explains: “Uber can’t afford to overcount events. Double counting clicks results in overcharging advertisers and overreporting the success of ads. Both being poor customer experiences, this requires processing events exactly-once. Uber is the marketplace in which ads are being served, therefore our ad attribution must be 100% accurate.”
Reddit is an American social news aggregation, content rating, and discussion website. Registered users submit content to the site such as links, text posts, images, and videos, which other members then vote up or down.
Reddit’s ads platform allows advertisers to create ad campaigns and set both daily and lifetime budgets for a campaign. Here is Reddit’s decision tree to place advertisements:
The data pipeline leverages Kafka, Flink, and Druid to analyze campaign budgets in real-time. The platform leverages real-time plus historical user activity data to decide which ad to place. All within 30 milliseconds to avoid over-delivery and under-delivery (budget spent too quickly / slowly).
Watch Reddit’s talk from Druid Summit “Low Latency Real-Time Ads Pacing Queries” to learn more about their ads platform and use cases.
Real-world success stories from Pinterest, Uber, Unity, buzzwil, and TV-Insight showed how to embed real-time advertising into your applications or build a dedicated marketing product.
Data streaming with Apache Kafka and Apache Flink enables context-specific advertising at scale in real time. The cloud makes it possible to focus on business logic and faster time-to-market with a fully managed data streaming platform.
How do you leverage data streaming in marketing and advertising use cases? Do you deploy at the edge, in the cloud, or both? Or do you integrate 3rd party marketing platforms into your advertising platforms? Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.
In the age of digitization, the concept of pricing is no longer fixed or manual.…
In the rapidly evolving landscape of intelligent traffic systems, innovative software provides real-time processing capabilities,…
In the fast-paced world of finance, the ability to prevent fraud in real-time is not…
Choosing between Apache Kafka, Azure Event Hubs, and Confluent Cloud for data streaming is critical…
In today's data-driven world, understanding data at rest versus data in motion is crucial for…
If you ask your favorite large language model, Microsoft Fabric appears to be the ultimate…