The November 2019 meetup of the Bay area Apache Flink group was hosted at Cloudera HQ in Palo Alto. We had two talks by Lakshmi Rao and Gyula Fora representing Lyft and Cloudera respectively. Marton Balassi acted as the host of the meetup and introduced the group to Cloudera’s commitment to Flink.
Talk #1: Log aggregation for Flink pipelines by Gyula Fora
Log aggregation and monitoring is a common challenge in data processing applications. We will show you how you can build your own customizable
logging solution based on components that are readily available in the Cloudera platform.
We will cover the following components:
- Configuring Kafka based logging for Flink jobs
- Implement scalable real-time log indexing in Flink
- Log search and dashboards using Solr and Hue
Talk #2: Running Flink in production: The good, the bad and the in-between by Lakshmi Rao
The streaming platform team at Lyft has been running Flink jobs in production for more than a year now, powering critical use cases like improving pickup ETA accuracy, dynamic pricing, generating machine learning features for fraud detection, real-time analytics among many others. Broadly, the jobs fall into two abstraction layers: applications (Flink jobs that run on the native platform) and analytics (that leverage Dryft, Lyft’s fully managed data processing engine). This talk will give an overview of the platform architecture, deployment model and user experience. The talk will also dive deeper into some of the challenges and the lessons that were learnt, running Flink jobs at scale, specifically around scaling Flink connectors, dealing with event time skew (source synchronization) and highlight common patterns of problems observed across several Flink jobs. Finally, the talk will give insights into how we are re-architecting the streaming platform @ Lyft using a Kubernetes based deployment.
This is a companion discussion topic for the original entry at https://www.youtube.com/watch?v=mVfeJpqW0qY