Tuesday, October 22, 2024 | San Francisco Bay Area

Agenda

Tuesday, October 22nd

TIME
SESSIONS

8:00 AM – 9:00 AM
Registration and Breakfast

9:00 AM – 9:30 AM
BALLROOM

Opening Keynote
FJ Yang

Fangjin Yang
Co-Founder and Chief Executive Officer
Imply

9:30 AM – 10:00 AM
BALLROOM

Product Keynote
Gabriel Tavridis

Gabriel Tavridis
Vice President, Product
Imply

10:00 AM – 10:30 AM
BALLROOM

Querying at the speed of light: an end-to-end journey of Druid at Atlassian
At Atlassian, our mission is to unleash the potential of every team, and the Confluence Big Data Platform team harnesses Druid to do just that. Join us to learn how we have integrated Druid into our platform to effortlessly serve millions of queries per day at sub-second latency, all while seamlessly ingesting gigabytes of data every hour through our ever resilient army of Kinesis streams. You’ll be witnessing firsthand how we effectively tune our clusters to handle our platform’s large scale as well as how we architected an entirely separate Druid cluster to proactively prepare for traffic. Step into our world of engineering big data and see how we use every feature Druid has to offer to optimize our production queries so that our users can enjoy analytics features without looking at loading animations.
Speaker

Julien Calfayan
Software Engineer – Confluence AI and Big Data Platform
Atlassian

Speaker

Gautam Jethwani
Software Engineer
Atlassian

10:30 AM – 10:45 AM
Morning Break

10:45 AM – 11:15 AM
BALLROOM

Tracing Service Dependencies at Scale with Druid and Flink
At Salesforce, we manage approximately 300 million distributed spans to infer service dependencies. We have successfully utilized a combination of Druid and Flink to handle this scale with high availability. Attendees will gain insights into how to leverage Apache Druid for precise service dependency representation and achieve high-performance querying. We also employ Apache Superset to slice and dice the data persisted on the Druid backend according to user requirements. Additionally, we leverage Grafana to represent a time-series view of edge health metrics, such as latencies and errors, which are directly served from the Druid backend. They will also learn how to integrate streaming technologies to manage large-scale data aggregations effectively, optimizing storage and cost efficiency. This session will provide practical knowledge and techniques for leveraging Druid in real-world applications, enabling participants to enhance their data infrastructure and analytics capabilities.
Speaker

Sudeep Kumar
Principal Engineer
Salesforce

11:15 AM – 11:45 AM
BALLROOM

Unlocking sub second query performance on Lakehouse: Integrating Apache Druid with Apache Iceberg
Modern data architectures often combine Apache Druid, an open-source high performance datastore, with Apache Iceberg, an open-source format for huge analytic tables. The ingestion of Apache Iceberg tables into Apache Druid provides organizations with concurrent streaming analytics and fast data exploration.

As Apache Iceberg becomes the de facto open table format for analytical datasets, there is a growing need to have specific Apache Iceberg tables containing high cardinality, highly dimensional event data available for rapid and open-ended data exploration. This talk discusses how Druid addresses this need by extending its ingestion layer to read the Iceberg table format. This integration helps Druid power interactive dashboards and support slice-n-dice analytics within Data Lakehouses.

By the end of this session, participants will understand:
* How Iceberg tables can be ingested into Druid
* Real-world use cases

Speaker

Atul Mohan
Software Engineer
Apple

11:45 AM – 12:15 PM
BALLROOM

Panel: Real-time streaming analytics
Jump right into the fresh, fast, and flavorsome river of real-time data analytics. Come around the campfire to discuss Apache Druid’s place in real-time architectures with experts from across the streaming ecosystem, including Apache Flink, Apache Kafka, Amazon Kinesis, and Python.

12:15 PM – 1:15 PM
Lunch

1:15 PM – 1:45 PM
BALLROOM

Multi-level HA of Apache Druid on K8s Architecture
This session will explain in detail how to ensure multi-level high availability from top to bottom in the Apache Druid on K8s architecture, and its large-scale commercial application in Shopee.
Speaker

Benedict Jin
Expert Engineer
Shopee

1:15 PM – 1:45 PM
CUBEROOM

Harnessing the Power of Imply Polaris: Use Cases and Architecture
Speaker

Larissa Klitzke
Senior Product Marketing Manager
Imply

1:45 PM – 2:15 PM
BALLROOM

Real-Time Analytics with Apache Druid at Branch.io
Branch.io has been leveraging Apache Druid for real-time analytics for over eight years, handling more than 600,000 queries daily. In this presentation, we will explore the diverse use cases that Druid addresses for Branch.io, detailing our data ingestion methods through both real-time and batch processes. We will discuss various optimizations implemented to enhance query performance and reduce costs, including techniques such as cardinality capping, data rollup to higher granularity, and multidimensional partitioning. Additionally, we’ll cover our packaging and deployment strategies, and conclude with insights into our approach to query debugging.
Speaker

Ramesh Shanmugam
Principal Data Engineer
Branch.io

Speaker

Andrei Harbunou
Senior Software Engineer
Branch.io

2:15 PM – 2:30 PM
Afternoon Break

2:30 PM – 3:00 PM
BALLROOM

Ingesting Delta Lake Tables into Druid
Delta Lake is a popular open-source storage format. In this session, we will explore how to ingest data from Delta Lake tables into Apache Druid using the druid-delta-lake-extensions. This extension enables users to read and ingest Delta tables into Druid which can then be queried from Druid for sub-second query latency. We will also cover the new Delta input source ingestion capabilities.
Speaker

Abhishek Balaji Radhakrishnan
Staff Software Engineer
Imply

2:30 PM – 3:00 PM
CUBEROOM

Scaling Analytics for 10M Experiences
Roblox built a cost-effective analytics solution using Druid to serve millions of experiences on our platform. This system provides free analytics to all creators, handling games ranging from 1 to 100 million monthly active users. Our approach heavily utilizes approximation techniques to manage scale and cost. This talk will cover our year-long journey with this system in production, focusing on key optimizations we’ve implemented.
Speaker

Willis Kennedy
Senior Software Engineer
Roblox

3:00 PM – 3:30 PM
BALLROOM

Panel: Lakehouse analytics
Apache Iceberg and Delta Lake are coming into the big top, and Apache Druid is ready! Get ready to hear from Apache Druid PMC members and industry experts on just where Apache Druid fits, how it can be used, and what’s in the crystal ball for Druid.

3:30 PM – 3:45 PM
Afternoon Break

3:45 PM – 4:15 PM
BALLROOM

Optimizing Druid Configurations at Netflix through Parallel Testing and Metrics Analysis
As a data-driven company, Netflix continually seeks to enhance the performance and reliability of its data infrastructure. This talk will delve into our sophisticated approach to optimizing Apache Druid configurations through parallel runs and A/B testing methodologies. We will explore how Netflix tests various Druid setups by running them concurrently against dual systems, enabling a direct comparison of key performance metrics reported by different clusters. Attendees will gain insights into the following areas:
1. **Cluster Management and Deployment:** An overview of Netflix’s strategies for managing and deploying Druid clusters, emphasizing automation and scalability.
2. **Centralized Logging and Metrics:** Techniques for aggregating and analyzing logs and metrics to facilitate real-time monitoring and post-mortem analysis.
3. **Cluster Architecture Patterns:** Best practices and patterns employed by Netflix to architect Druid clusters for optimal performance and reliability.
4. **Parallel Testing Framework:** Detailed methodologies for executing parallel runs and conducting A/B testing to evaluate different Druid configurations, including the tools and frameworks used.

This session will provide practical knowledge and actionable insights, empowering attendees to apply similar strategies within their own organizations to optimize Druid deployments. Join us to learn how Netflix leverages advanced testing and analytical techniques to push the boundaries of what is possible with Apache Druid.

Speaker

Ben Sykes
Software Engineer
Netflix

3:45 PM – 4:15 PM
CUBEROOM

Transforming Stadium Merchandising with Retailcloud: Harnessing Imply for Real-Time Analytics
Join us for an in-depth look at Retailcloud’s groundbreaking Data and Analytics Platform (DAP), engineered to handle and analyze millions of transactions in real-time during major sporting events such as the NBA, NFL, and Golf tournaments. This innovative platform focuses on stadium merchandising operations, where the need for timely insights is paramount to optimizing sales performance. By harnessing the powerful capabilities of Confluent and Imply, we deliver real-time analytics that provide our customers with actionable insights into sales trends and performance metrics. This session will explore how these insights empower our clients to make swift, informed decisions, thereby maximizing their sales potential during high-profile events. Discover how Retailcloud is transforming the landscape of real-time analytics in stadium merchandising.
Speaker

Saravanan Vijayappan
VP Of Engineering
Retailcloud

4:15 PM – 4:45 PM
BALLROOM

Panel: Dev, Ops, and Optimization
Come along and meet people who have implemented and tuned Druid in situations small and large – very very large! Panelists will share some of their key tips and tricks, and be open to your questions, whether that’s about deploying to Kubernetes or optimizing your segments!

4:15 PM – 4:45 PM
CUBEROOM

Building Natural Language Queries for Druid
In this session we will go over the flow of building a natural language query translation service for SaaS distribution of Druid and handling the unique considerations for Druid and the SaaS platform.
Speaker

Tom Fenwick
Software Engineer
Imply

5:00 PM – 5:30 PM
BALLROOM

Closing Keynote
Speaker

Gian Merlino
Co-Founder and Chief Technology Officer
Imply

5:30 PM – 7:30 PM
Networking Reception