Data Streaming with Apache Kafka and Flink in the Media Industry: Disney+ Hotstar and JioCinema

Data Streaming with Apache Kafka and Flink in the Media Industry at Netflix Disney Plus Hotstar and Reliance JioCinema
The $8.5 billion merger of Disney+ Hotstar and Reliance’s JioCinema marks a transformative moment for India’s media industry, combining two of the most influential streaming platforms into a data streaming powerhouse. This blog explores how technologies like Apache Kafka and Flink power these platforms, enabling massive-scale content distribution, real-time analytics, and user engagement. With tools like MirrorMaker and Cluster Linking, the merger presents opportunities for seamless Kafka migrations, hybrid multi-cloud flexibility, and new innovations like multi-angle viewing and advanced personalization. The transparency of both platforms about their Kafka-based architectures highlights their technical leadership and the lessons they offer the data streaming community. The integration of their infrastructures sets the stage for redefining media streaming in India, offering exciting insights and benchmarks for organizations leveraging data streaming at scale.

The media industry in India has witnessed a seismic shift with the $8.5 billion merger of Disney+ Hotstar and Reliance’s JioCinema. This collaboration brings together two of the country’s most influential data streaming deployments under one umbrella, creating a powerhouse for entertainment delivery. Beyond the headlines, this merger underscores the critical role of data streaming technologies, particularly Apache Kafka and Flink, in enabling large-scale content distribution and real-time data processing. This blog post explores the existing data streaming infrastructures and use cases. Additional, potential migrations leveraging Kafka tools for real-time data replication and synchronization without downtime of the production environments are explored.

Data Streaming with Apache Kafka and Flink in the Media Industry at Netflix Disney Plus Hotstar and Reliance JioCinema

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

Data streaming technologies like Apache Kafka and Flink are revolutionizing the media industry by enabling real-time data processing at an unprecedented scale. Media platforms, including Over-The-Top (OTT) services operated by telcos and media companies, leverage these technologies to deliver video, audio, and other content directly to viewers over the internet. The OTT services bypass traditional cable or satellite channels.

As these platforms cater to growing audiences with diverse needs, data streaming serves as the backbone for seamless content delivery, real-time user engagement, and operational efficiency. Data streaming ensures a superior viewing experience at scale.

Event-driven Architecture with Data Streaming using Apache Kafka and Flink in the Media Industry

Netflix is a leading global media company renowned for its extensive use of Apache Kafka and Flink. The media company powers critical use cases such as real-time personalization, anomaly detection, and monitoring at extreme scale. Its data streaming architecture processes billions of events daily, ensuring seamless content delivery and exceptional viewer experiences for a global audience.

Use Cases for Data Streaming in the Media Industry

Data streaming with technologies like Apache Kafka and Flink is transforming the media industry by enabling real-time data processing for seamless content delivery, personalized experiences, and operational efficiency.

  1. Live Video Streaming: Data streaming with Apache Kafka serves as a central event hub for managing log data, metadata, and control signals associated with live video streaming. It processes real-time data related to user interactions, stream health, and session analytics to ensure ultra-low latency and a seamless experience for live events like concerts and sports. The actual video streams are handled by Content Delivery Networks (CDNs).
  2. On-Demand Content Delivery: Media platforms use Kafka to reliably manage data pipelines, delivering movies, TV shows, and other content to millions of users.
  3. Personalized Recommendations: By integrating Kafka with analytics tools, platforms provide tailored suggestions based on user behavior, increasing viewer engagement and satisfaction.
  4. Real-Time Ad Targeting: Kafka enables real-time ad insertion by processing user events and contextual data, ensuring ads are relevant and timely.
  5. Monitoring and Anomaly Detection: Media companies use Kafka to monitor backend systems in real time, detecting and resolving issues proactively to ensure a smooth user experience.
  6. Churn Prediction: By analyzing behavioral patterns in real time, platforms can predict user churn and take corrective actions, such as offering discounts or new content recommendations.

Learn more about data streaming use cases in the telco and media industry from real world customer stories like Dish Network, British Telecom, Globe Telecom, Swisscom, and more:

Business Value of Data Streaming in Media

Data streaming technologies like Apache Kafka and Flink drive transformative business value in the media industry by enabling real-time insights, efficiency, and innovation:

  • Enhanced User Experience: Real-time at any scale capabilities enable faster content delivery, personalized recommendations, and reduced buffering.
  • Cost Optimization: Streamlined pipelines improve infrastructure utilization and reduce operational costs. The Shift Left Architecture is adopted across business units.
  • Revenue Growth: Precision in ad targeting and churn reduction leads to higher revenues.
  • Competitive Edge: Real-time analytics and content delivery position companies as leaders in their field.

Disney+ Hotstar (Disney) and JioCinema (Viacom18): Streaming Giants Shaping India’s Media Landscape

Disney+ Hotstar revolutionized OTT streaming in India with a robust freemium model. Catering to a diverse audience, it provided an extensive library of movies, TV shows, and sports, including exclusive streaming rights for the Indian Premier League (IPL), the world’s most popular cricket league. By blending free content with premium subscriptions, it attracted millions of users, leveraging IPL viewership as a major growth driver.

JioCinema, part of Reliance Jio, employs a mass-market approach, offering free streaming supported by Reliance’s vast 5G network. It gained significant traction by taking over the IPL digital streaming rights in 2023 in 4K resolution to over 32 million concurrent viewers, breaking records for live streaming.

Each platform used respectively uses IPL strategically—Hotstar with a premium model and JioCinema for mass-market penetration. Post-merger, the unified platform could combine these approaches, delivering enhanced IPL experiences powered by a consolidated Kafka-based streaming infrastructure.

Both platforms share a commitment to innovation, scalability, and user engagement, making them ideal candidates for heavy Apache Kafka usage.

Both Disney+ Hotstar and JioCinema (Viacom18) are renowned for their openness in discussing their technical data streaming architectures, similar to Netflix. They frequently presented at conferences like Kafka Summit and industry events, sharing insights about their data streaming strategies and implementations.

This transparency achieves several goals:

  • Showcasing Innovation: Highlighting their advanced use of Kafka and Flink establishes their leadership in tech innovation.
  • Talent Acquisition: Open discussions attract engineers who want to work on cutting-edge systems.
  • Industry Collaboration: Sharing experiences fosters collaboration within the streaming and open-source communities.

By examining their presentations and publications, we gain a deeper understanding of their use of Kafka to achieve extreme scalability and efficiency.

Data Streaming Solves the Challenges and Extreme Scale of OTT Services in the Media Industry

Running platforms of this scale comes with its share of challenges:

  • Massive Throughput: Kafka handles billions of messages daily, requiring extensive partitioning and scaling strategies.
  • Fault Tolerance: Platforms implement advanced disaster recovery and replication strategies to ensure zero downtime, even during critical events like IPL.
  • Cost vs. Performance Trade-Offs: Streaming 4K video for millions of users demands balancing high infrastructure costs with user expectations.

Data streaming with Apache Kafka and Flink is a key piece of the data strategy to solve these challenges.

Disney+ Hotstar: Gamification at Extreme Scale

Disney+ Hotstar’s “Watch N Play” feature transformed live sports streaming, particularly cricket, into an interactive experience. Viewers predict outcomes, answer trivia, and participate in polls, earning points for rewards or leaderboard rankings, adding a competitive and social element to the platform.

Hotstar’s presentation from Kafka Summit 2019 is still very impressive and worth watching. Here is a summary about the OTT services serving millions of cricket fans:

Disney Plus Hotstar OTT Media Service for Cricket with Apache Kafka
Source: Disney+ Hotstar

Powered by Apache Kafka, Disney+ Hotstar’s infrastructure processed millions of real-time interactions per second. The integration of data sources via Kafka Connect enables seamless analytics and rewards. This gamified approach enhances user engagement and extends to broader applications like e-sports, interactive TV, and IoT-driven fan experiences, making Hotstar a leader in innovative streaming.

Disney+ Hotstar runs ~15 different Kafka Connect clusters with over 2000+ connectors and auto-scaling based on traffic, as they presented in another Kafka Summit talk in 2021.

Disney Plus Hotstar Kafka Connect Integration Pipeline from Roku Apple Fire TV to Analytics
Source: Disney+ Hotstar

Single Message Transforms (SMT) are used within the Kafka Connect integration for stateless streaming ETL. Integration use cases include masking/filtering of PlI, sampling of data, and schema validation and enforcement.

JioCinema: Multiple Kafka Clusters and Deployment Strategies

JioCinema leverages a robust enterprise architecture built on Apache Kafka, Flink, and Spark. As showcased at Kafka Summit India 2024, data streaming is central to its platform, enabling real-time analytics, personalized recommendations, and seamless content delivery.

JioCinema Telco Cloud Enterprise Architecture with Apache Kafka Spark Flink
Source: JioCinema

Initially, JioCinema operated a single Kafka cluster handling 1,000+ topics and 100,000+ partitions for diverse use cases.

Over time, the platform transitioned to multiple Kafka clusters with different SLAs and architectures, optimizing uptime, performance, and costs for specific workloads, as explained by Kushal Khandelwal, Head of Data Platform.

Jio Cinema - Viacom18 - One Kafka Cluster does NOT fit All Use Cases Uptime SLAs and Cost
Source: JioCinema

This shift from a monolithic to a segmented architecture highlights the scalability and flexibility of Kafka. This approach ensures JioCinema meets the demands of high traffic and complex SLAs. Their success reflects the common journey of organizations scaling data streaming infrastructures to achieve operational excellence.

Use Cases for Kafka in Disney+ Hotstar and JioCinema

Disney+ Hotstar and JioCinema rely on Apache Kafka to power diverse use cases, from IPL cricket streaming to real-time personalization and ad targeting.

IPL Cricket Streaming at Massive Scale

The Indian Premier League (IPL) is the crown jewel of streaming in India, drawing millions of concurrent viewers. Here’s how Kafka and Flink support IPL’s massive scale:

  • Concurrent Viewers: During IPL 2023, JioCinema hit a record of over 32 million concurrent viewers, streaming matches in 4K resolution. Disney+ Hotstar has also scaled to tens of millions of viewers in past IPL seasons.
  • Data Throughput: JioCinema and Hotstar handle millions of messages per second with Kafka, ensuring uninterrupted video delivery.
  • Kafka Infrastructure: Reports reveal that JioCinema operates over 100 Kafka clusters, managing tens of thousands of partitions. These clusters handle not only video streaming but also ancillary tasks, like ad placement and user analytics.
  • Connectors: Both platforms rely on hundreds of Kafka Connect connectors to integrate with databases, storage systems, and real-time analytics platforms.

On-Demand Streaming and Catalog Management

Both platforms use Kafka to deliver on-demand content to millions of users, ensuring quick access to movies and TV shows. Kafka’s reliable event streaming guarantees smooth playback and dynamic scaling during peak usage.

Real-Time Personalization and Recommendations

Personalization is central to user retention. Kafka streams user behavior data to machine learning systems in real time, enabling both platforms to recommend content tailored to individual preferences. Customer loyalty and Rewards platform often leverage Kafka and Flink under the hood.

Ad Targeting and Revenue Optimization

By processing user data in real time, Kafka enables precise ad targeting with context-specific advertisements. This not only improves ad effectiveness but also enhances viewer experience by ensuring ads are contextually relevant. Many real-time advertising platforms are powered by a data streaming platform using Apache Kafka and Flink.

Content Quality Monitoring

Both platforms use Kafka for continuous real-time monitoring of video stream quality, automatically adjusting bitrate or rerouting streams during disruptions to maintain a consistent viewing experience.

Data Streaming for M&A, Merger and Migrations

The merger of Disney+ Hotstar and JioCinema presents a significant opportunity to integrate their Kafka-based infrastructures, paving the way for a unified, more efficient system. Such transitions are a natural fit for Apache Kafka and its ecosystem. Migrations are a core capability. Tools like MirrorMaker and Cluster Linking allow seamless data movement between clusters for continuous replication and a later lift and shift. The usage of data streaming for migration projects enables zero-downtime and business continuity.

Here are some opportunities and benefits of data streaming for integrations and migrations:

  1. Integrated Pipelines: A combined Kafka architecture could streamline content delivery, reduce costs, and support advanced analytics, providing an optimized infrastructure for their vast user base.
  2. Expanded Use Cases: The merger might drive innovations such as multi-angle viewing, live interactive features, and more personalized experiences powered by real-time data.
  3. Hybrid and Multi-Cloud Flexibility: Transitions like these often span hybrid and multi-cloud environments, making Kafka’s flexibility essential for connecting and scaling across platforms.
  4. Multi-Organization Integration: Merging Kafka clusters across distinct organizations, as in this case, is a common use case where Kafka’s tools excel.
  5. Technical Leadership: Both platforms are transparent about their Kafka implementations, and we can anticipate new insights from their efforts to integrate and scale, highlighting lessons for the broader streaming industry.

In conclusion, Kafka and Flink are not just enablers but drivers of success for Disney+ Hotstar and JioCinema. Data streaming at scale creates new benchmarks for innovation and user experience in the media industry.

Do you see similar opportunities in your organization? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter. And make sure to download my free book about data streaming use cases.

Dont‘ miss my next post. Subscribe!

We don’t spam! Read our privacy policy for more info.
If you have issues with the registration, please try a private browser tab / incognito mode. If it doesn't help, write me: kontakt@kai-waehner.de

You May Also Like
How to do Error Handling in Data Streaming
Read More

Error Handling via Dead Letter Queue in Apache Kafka

Recognizing and handling errors is essential for any reliable data streaming pipeline. This blog post explores best practices for implementing error handling using a Dead Letter Queue in Apache Kafka infrastructure. The options include a custom implementation, Kafka Streams, Kafka Connect, the Spring framework, and the Parallel Consumer. Real-world case studies show how Uber, CrowdStrike, Santander Bank, and Robinhood build reliable real-time error handling at an extreme scale.
Read More