# Real-Time Data Infrastructure at Uber

{% file src="<https://1133441777-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MP3gpNOfmHBf90k26iY%2Fuploads%2F0x5YD5xq0HRgYlGSzbr9%2FPaper%20-%20Real%20time%20infrastructure%20at%20Uber.pdf?alt=media&token=5ef946d4-457a-4532-a16f-809574978589>" %}

#### Key Points:

1. **Architecture and Components:**
   * Uber leverages a range of open-source technologies, prominently including Apache Kafka, Apache Flink, and Apache Pinot, to build a robust real-time data infrastructure.
   * Apache Flink is used extensively for stream processing, supporting both SQL and low-level API interfaces to accommodate different user needs. This enables efficient transformation of SQL queries into Flink jobs, simplifying the deployment and scaling of streaming applications.
2. **Challenges and Solutions:**
   * **Resource Estimation and Auto-scaling:** Empirical analysis helps in configuring resources for various job types (e.g., CPU-bound vs. memory-bound jobs), and continuous monitoring aids in dynamic scaling based on load.
   * **Job Monitoring and Failure Recovery:** Automated job monitoring and failure recovery mechanisms ensure high reliability and operational efficiency​.
3. **Unified Platform:**
   * Uber has developed a unified architecture to manage stream processing pipelines, integrating various platforms to handle business logic, workflow management, and SQL compilation. This layered architecture enhances scalability and extensibility​.
4. **Lessons Learned:**
   * **Open Source Adoption:** Uber’s reliance on open-source technologies has facilitated rapid iteration and adaptation to evolving engineering needs, though it has required significant customization to fit Uber's specific requirements.
   * **Rapid System Development:** Standardizing interfaces and leveraging a monorepo approach has enabled efficient management of multiple applications, promoting rapid development and deployment​.
5. **Future Directions:**
   * **Unification of Streaming and Batch Processing:** Aiming to simplify the development process by unifying the semantics of stream and batch processing.
   * **Multi-region and Cloud Agnosticism:** Enhancing system resilience and flexibility through multi-region deployments and the ability to run infrastructure both on-premises and in the cloud​​.
