Druid Summit 2024

Tuesday, October 22, 2024 | San Francisco Bay Area

Agenda

Tuesday, October 22nd

TIME (PDT)

SESSIONS

8:00 AM – 9:00 AM

Registration and Breakfast

9:00 AM – 9:30 AM

SEQUOIA BALLROOM

Keynote: Powering Event-Driven Data with Apache Druid

The distinction between OLTP and OLAP is becoming less relevant as data architectures shift toward entities and events. In this session, we’ll delve into how Apache Druid’s event-first approach synthesizes entities from streams, playing a crucial role in modern, decomposed databases. We’ll explore real-world examples of how Druid helps teams navigate complex queries and evolving infrastructure. This discussion aims to share how Druid’s unique perspective on event-centric data querying is shaping the way we think about modern data systems.

Eric Tschetter

Chief Officer, Emerging Solutions

Imply

9:30 AM – 10:00 AM

SEQUOIA BALLROOM

Keynote: Innovation with Druid and Imply Polaris: 2024 and Beyond

Join us as we explore a pivotal year for both Apache Druid and Imply Polaris. This keynote will review 2024’s key advancements—from enhancing Druid’s flexibility in data ingestion and establishing SQL the standard query language, to the development of the Event DB, unlocking new possibilities for data analysis. On the Imply Polaris side, we’ll showcase its evolution into a robust, cloud-native platform for running Druid. Looking ahead to 2025, we’ll explore Druid’s continued innovation, supporting hot, warm, and cold queries across real-time and historical data, pushing the boundaries of price-performance, and integrating seamlessly into the broader data ecosystem.

Gabriel Tavridis

SVP, Product

Imply

Speaker

Will Xu

Product Manager – Druid

Imply

10:00 AM – 10:15 AM

Break

10:15 AM – 10:45 AM

SEQUOIA BALLROOM

Unlocking sub second query performance on Lakehouse: Integrating Apache Druid with Apache Iceberg

Modern data architectures often combine Apache Druid, an open-source high performance datastore, with Apache Iceberg, an open-source format for huge analytic tables. The ingestion of Apache Iceberg tables into Apache Druid provides organizations with concurrent streaming analytics and fast data exploration.

As Apache Iceberg becomes the de facto open table format for analytical datasets, there is a growing need to have specific Apache Iceberg tables containing high cardinality, highly dimensional event data available for rapid and open-ended data exploration. This talk discusses how Druid addresses this need by extending its ingestion layer to read the Iceberg table format. This integration helps Druid power interactive dashboards and support slice-n-dice analytics within Data Lakehouses.

By the end of this session, participants will understand:
* How Iceberg tables can be ingested into Druid
* Real-world use cases

Atul Mohan

Software Engineer

Apple

10:15 AM – 10:45 AM

ACACIA BALLROOM

Building Natural Language Queries for Druid

In this session we will go over the flow of building a natural language query translation service for SaaS distribution of Druid and handling the unique considerations for Druid and the SaaS platform.

Tom Fenwick

Software Engineer

Imply

10:45 AM – 11:15 AM

SEQUOIA BALLROOM

Seamless Ingestion of Delta Lake Tables into Apache Druid for Faster Analytics

Delta Lake is an open-source storage layer that brings reliability and performance to data lakes. In this session, we will explore the fundamentals of Delta Lake and demonstrate how to ingest Delta Lake tables into Apache Druid for sub-second query latency. Additionally, we’ll introduce a new upcoming feature in Apache Druid: scheduled batch supervisors, which can facilitate continuous data ingestion natively in Druid, ensuring that your Druid tables stay up-to-date with the latest data from Delta Lake or any input source.

Abhishek Balaji Radhakrishnan

Staff Software Engineer

Imply

Venki Korukanti

Software Engineer

Databricks

10:45 AM – 11:15 AM

ACACIA BALLROOM

Panel: Real-Time Data Experiences

Analytics applications powered by Apache Druid are fun places to be. Take a seat as user experience leaders from Google, the Data Visualisation Society, and Imply bring their minds together, and answer your questions on what users of the future want from their data experiences.

11:15 AM – 11:30 AM

Break

11:30 AM – 12:00 PM

SEQUOIA BALLROOM

Panel: Lakehouse Analytics

Apache Iceberg and Delta Lake are coming into the big top, and Apache Druid is ready! Get ready to hear from Apache Druid PMC members and industry experts on just where Apache Druid fits, how it can be used, and what’s in the crystal ball for Druid.

11:30 AM – 12:00 PM

ACACIA BALLROOM

Real-Time Analytics with Apache Druid at Branch.io

Branch.io has been leveraging Apache Druid for real-time analytics for over eight years, handling more than 600,000 queries daily. In this presentation, we will explore the diverse use cases that Druid addresses for Branch.io, detailing our data ingestion methods through both real-time and batch processes. We will discuss various optimizations implemented to enhance query performance and reduce costs, including techniques such as cardinality capping, data rollup to higher granularity, and multidimensional partitioning. Additionally, we’ll cover our packaging and deployment strategies, and conclude with insights into our approach to query debugging.

Ramesh Shanmugam

Principal Data Engineer

Branch.io

Speaker

Andrei Harbunou

Senior Software Engineer

Branch.io

12:00 PM – 1:00 PM

Lunch

1:00 PM – 1:30 PM

SEQUOIA BALLROOM

Harnessing the Power of Imply Polaris: Use Cases and Architecture

Imply Polaris is a cloud-native database-as-a-service that simplifies the developer experience by providing all the advantages of Druid, plus auto-scaling for seamless data ingestion, enhanced security, exclusive features, and an easy interface to visualize your data in real time. Leading companies like Cisco ThousandEyes, Nestle and Zillow Group have achieved better efficiencies at scale with Polaris.
In this session, we’ll provide an overview of how Polaris serves as the easy button for Druid – including an architectural overview, product differentiators, and top technical use cases such as asynchronous query, time series analysis and more.

Larissa Klitzke

Senior Product Marketing Manager

Imply

1:00 PM – 2:00 PM

ACACIA BALLROOM

Optimizing Druid Configurations at Netflix through Parallel Testing and Metrics Analysis

As a data-driven company, Netflix continually seeks to enhance the performance and reliability of its data infrastructure. This talk will delve into our sophisticated approach to optimizing Apache Druid configurations through parallel runs and A/B testing methodologies. We will explore how Netflix tests various Druid setups by running them concurrently against dual systems, enabling a direct comparison of key performance metrics reported by different clusters. Attendees will gain insights into the following areas:
1. **Cluster Management and Deployment:** An overview of Netflix’s strategies for managing and deploying Druid clusters, emphasizing automation and scalability.
2. **Centralized Logging and Metrics:** Techniques for aggregating and analyzing logs and metrics to facilitate real-time monitoring and post-mortem analysis.
3. **Cluster Architecture Patterns:** Best practices and patterns employed by Netflix to architect Druid clusters for optimal performance and reliability.
4. **Parallel Testing Framework:** Detailed methodologies for executing parallel runs and conducting A/B testing to evaluate different Druid configurations, including the tools and frameworks used.

This session will provide practical knowledge and actionable insights, empowering attendees to apply similar strategies within their own organizations to optimize Druid deployments. Join us to learn how Netflix leverages advanced testing and analytical techniques to push the boundaries of what is possible with Apache Druid.

Ben Sykes

Software Engineer

Netflix

1:30 PM – 2:00 PM

SEQUOIA BALLROOM

Transforming Stadium Merchandising with Retailcloud: Harnessing Imply for Real-Time Analytics

Join us for an in-depth look at Retailcloud’s groundbreaking Data and Analytics Platform (DAP), engineered to handle and analyze millions of transactions in real-time during major sporting events such as the NBA, NFL, and Golf tournaments. This innovative platform focuses on stadium merchandising operations, where the need for timely insights is paramount to optimizing sales performance. By harnessing the powerful capabilities of Confluent and Imply, we deliver real-time analytics that provide our customers with actionable insights into sales trends and performance metrics. This session will explore how these insights empower our clients to make swift, informed decisions, thereby maximizing their sales potential during high-profile events. Discover how Retailcloud is transforming the landscape of real-time analytics in stadium merchandising.

Saravanan Vijayappan

VP Of Engineering

Retailcloud

Speaker

Jamsheer Chandranthodi

Director of Platform Engineering and Data Analytics

Retailcloud

2:00 PM – 2:15 PM

Break

2:15 PM – 3:00 PM

SEQUOIA BALLROOM

Tracing Service Dependencies at Scale with Druid and Flink

At Salesforce, we manage approximately 300 million distributed spans to infer service dependencies. We have successfully utilized a combination of Druid and Flink to handle this scale with high availability. Attendees will gain insights into how to leverage Apache Druid for precise service dependency representation and achieve high-performance querying. We also employ Apache Superset to slice and dice the data persisted on the Druid backend according to user requirements. Additionally, we leverage Grafana to represent a time-series view of edge health metrics, such as latencies and errors, which are directly served from the Druid backend. They will also learn how to integrate streaming technologies to manage large-scale data aggregations effectively, optimizing storage and cost efficiency. This session will provide practical knowledge and techniques for leveraging Druid in real-world applications, enabling participants to enhance their data infrastructure and analytics capabilities.

Sudeep Kumar

Principal Engineer

Salesforce

Neelesh Korade

Senior Product Manager – Availability and Infrastructure Engineering (A&IE)

Salesforce

2:15 PM – 3:00 PM

ACACIA BALLROOM

Panel: Operations and Optimization

Come along and meet people who have implemented and tuned Druid in situations small and large – very very large! Panelists will share some of their key tips and tricks, and be open to your questions, whether that’s about deploying to Kubernetes or optimizing your segments!

3:00 PM – 3:30 PM

SEQUOIA BALLROOM

Querying at the speed of light: an end-to-end journey of Druid at Atlassian

At Atlassian, our mission is to unleash the potential of every team, and the Confluence Big Data Platform team harnesses Druid to do just that. Join us to learn how we have integrated Druid into our platform to effortlessly serve millions of queries per day at sub-second latency, all while seamlessly ingesting gigabytes of data every hour through our ever resilient army of Kinesis streams. You’ll be witnessing firsthand how we effectively tune our clusters to handle our platform’s large scale as well as how we architected an entirely separate Druid cluster to proactively prepare for traffic. Step into our world of engineering big data and see how we use every feature Druid has to offer to optimize our production queries so that our users can enjoy analytics features without looking at loading animations.

Julien Calfayan

Software Engineer – Confluence AI and Big Data Platform

Atlassian

Speaker

Gautam Jethwani

Software Engineer – Confluence AI and Big Data Platform

Atlassian

3:00 PM – 3:30 PM

ACACIA BALLROOM

Customer facing analytics: the Tapcart story

Andrew Li

Senior Software Engineer

Tapcart

3:30 PM – 3:45 PM

Break

3:45 PM – 4:30 PM

SEQUOIA BALLROOM

Panel: Real-Time Streaming Analytics

Jump right into the fresh, fast, and flavorsome river of real-time data analytics. Come around the campfire to discuss Apache Druid’s place in real-time architectures with experts from across the streaming ecosystem, including Apache Flink, Apache Kafka, Amazon Kinesis, and Python.

3:45 PM – 4:30 PM

ACACIA BALLROOM

Scaling Analytics for 10M Experiences

Roblox built a cost-effective analytics solution using Druid to serve millions of experiences on our platform. This system provides free analytics to all creators, handling games ranging from 1 to 100 million monthly active users. Our approach heavily utilizes approximation techniques to manage scale and cost. This talk will cover our year-long journey with this system in production, focusing on key optimizations we’ve implemented.

Willis Kennedy

Senior Software Engineer

Roblox

4:45 PM – 5:00 PM

Break

5:00 PM – 5:30 PM

SEQUOIA BALLROOM

Closing Keynote: Charting the Future of Druid

What lies ahead for Apache Druid? Join us as we explore the evolving landscape of Druid’s query and storage engines, and how they are positioned to address the biggest challenges in event data for the future.

Gian Merlino

Co-Founder and Chief Technology Officer

Imply

5:30 PM – 7:30 PM

Closing Reception