ADR-0005: Apache Iceberg as Enforced Table Format
Status
Section titled “Status”Accepted
Context
Section titled “Context”floe needs a table format that supports:
- ACID transactions
- Time travel and versioning
- Schema evolution
- Partition evolution
- Multiple compute engines (DuckDB, Snowflake, Spark)
Decision
Section titled “Decision”Enforce Apache Iceberg as the table format (non-pluggable).
Why Iceberg?
Section titled “Why Iceberg?”- Open standard: Apache License, vendor-neutral
- Multi-engine support: Works with DuckDB, Spark, Snowflake, BigQuery
- ACID guarantees: Snapshot isolation, serializable isolation
- Time travel: Query historical data via snapshot IDs
- Schema evolution: Add/remove/rename columns without rewriting data
- Partition evolution: Change partitioning scheme without data migration
Why NOT pluggable?
Section titled “Why NOT pluggable?”Making table format pluggable would fragment the ecosystem:
- Different formats have incompatible metadata structures
- Cross-engine compatibility would be impossible
- Plugin implementations would need per-format logic
- Data sharing between teams would fail
Implementation
Section titled “Implementation”- All dbt models write to Iceberg tables
- PyIceberg for Python interactions
- Polaris catalog (or Glue, Unity Catalog) manages Iceberg metadata
- Storage plugins (S3, GCS, Azure) provide FileIO implementations
Consequences
Section titled “Consequences”Positive
Section titled “Positive”- Guaranteed cross-engine compatibility
- Time travel available everywhere
- ACID transactions by default
- Enables data sharing and collaboration
Negative
Section titled “Negative”- Teams cannot use Delta Lake or Hudi
- Requires Iceberg-compatible catalog (Polaris, Glue, Unity)
- Learning curve for Iceberg concepts