Distributed Data Processing
Batch Processing Frameworks
- Apache Hadoop: Framework for distributed storage and processing
- Apache Spark: Unified analytics engine for large-scale data processing
- Apache Hive: Data warehouse software for reading, writing, and managing data
- Presto/Trino: Distributed SQL query engine for big data
- Apache Pig: Platform for analyzing large datasets
- Databricks: Unified analytics platform built on Spark
Spark Streaming: Real-time data processing with Spark
Apache Flink: Stream and batch processing framework
Apache Beam: Unified model for batch and streaming data processing
Apache Storm: Distributed real-time computation system
Apache Samza: Distributed stream processing framework
Apache Pulsar: Distributed messaging and streaming platform