Data Lakes & File Standards
Data Lake Platforms
- Amazon S3: Object storage service for data lakes
- Azure Data Lake Storage: Scalable data lake solution for big data analytics
- Google Cloud Storage: Object storage for companies of all sizes
- Databricks Delta Lake: Open-source storage layer for reliability in data lakes
- Cloudera Data Platform: Enterprise data cloud for data management
- Dremio: Data lake engine for analytics
- Parquet: Columnar storage file format
- ORC (Optimized Row Columnar): Columnar storage format for Hadoop
- Avro: Row-based data serialization system
- CSV: Comma-separated values format
- JSON: JavaScript Object Notation format
- Protocol Buffers: Google’s language-neutral, platform-neutral extensible mechanism
- Feather: Fast on-disk format for data frames
- Arrow: Cross-language development platform for in-memory data
Apache Sedona: Cluster computing system for spatial data
Apache Iceberg: High-performance format for huge analytic datasets
Apache Hudi: Data lake platform with record-level updates and deletes
Delta Lake: Storage layer for ACID transactions on data lakes