ADR-0009: dbt Owns SQL Transformation
Status
Section titled “Status”Accepted
Context
Section titled “Context”Floe needs to handle SQL transformations across multiple compute targets (DuckDB, Snowflake, BigQuery, etc.). Key challenges:
- SQL dialect differences between targets
- Dependency resolution between models
- Incremental processing logic
- Data testing
Options considered:
- Build custom SQL handling - Parse, transpile, manage dependencies ourselves
- Use dbt - Leverage existing, proven tooling
- Hybrid - Light wrapper around dbt
- Target-specific code - Different implementations per target
Decision
Section titled “Decision”dbt owns SQL transformation. Floe does not parse, transpile, or manage SQL dependencies.
Consequences
Section titled “Consequences”Positive
Section titled “Positive”- Proven tooling - dbt handles dialect translation via adapters
- Dependency resolution -
ref()andsource()macros work out of the box - Incremental processing -
is_incremental()macro handles complexity - Data testing - dbt tests validate data quality
- Large ecosystem - dbt packages, community, documentation
- Simpler Floe - Less code to maintain
Negative
Section titled “Negative”- dbt dependency - Users must structure transforms as dbt projects
- Learning curve - Users need dbt knowledge
- Less flexibility - Can’t support non-dbt SQL patterns
- Version coupling - Must support dbt version changes
Neutral
Section titled “Neutral”- CompiledArtifacts just point to dbt project (don’t rewrite SQL)
- Floe adds value in orchestration, data isolation, observability
- Future non-dbt transforms (Python, Flink) handled separately
Responsibility Split
Section titled “Responsibility Split”| Concern | Owner |
|---|---|
| SQL dialect translation | dbt (via adapters) |
| Dependency resolution | dbt (ref(), source()) |
| Incremental processing | dbt (is_incremental()) |
| Data tests | dbt |
| Orchestration | Dagster |
| Data isolation | Floe |
| Pipeline lifecycle | Floe |
| Observability | Floe |
Execution Runtime (Pluggable)
Section titled “Execution Runtime (Pluggable)”While dbt framework is enforced for SQL transformation DSL, the execution environment WHERE dbt compiles and runs is pluggable via DBTPlugin (ADR-0043):
| Implementation | Description | Entry Point |
|---|---|---|
| LocalDBTPlugin | dbt-core via CLI subprocess | floe.dbt |
| FusionDBTPlugin | dbt Fusion (Rust-based) via CLI subprocess | floe.dbt |
| CloudDBTPlugin | dbt Cloud API (deferred to Epic 8+) | floe.dbt |
Key Distinction:
- dbt Framework (ENFORCED): SQL transformation DSL, models, tests, macros, Jinja templating
- dbt Execution Environment (PLUGGABLE): WHERE dbt compiles (local dbt-core, dbt Fusion, dbt Cloud)
Platform teams select the execution environment in manifest.yaml:
plugins: dbt_compiler: provider: fusion # or local, or cloudData engineers use dbt framework features (models, tests, macros) regardless of execution environment.
CompiledArtifacts Pattern
Section titled “CompiledArtifacts Pattern”// Floe points to dbt project, doesn't rewrite SQLtype DBTConfig struct { ProjectDir string // Path to dbt project Target string // dbt target (profiles.yml) EnvVars map[string]string // Environment variables Commands []string // ["dbt run", "dbt test"] Select string // Model selection Exclude string // Model exclusion}What Floe Does NOT Do
Section titled “What Floe Does NOT Do”- ❌ Parse SQL to understand structure
- ❌ Transpile SQL between dialects
- ❌ Manage model dependencies
- ❌ Handle incremental logic
- ❌ Run data tests directly
What Floe DOES Do
Section titled “What Floe DOES Do”- ✅ Orchestrate dbt runs via Dagster
- ✅ Provide environment variables (connections, etc.)
- ✅ Collect observability from dbt runs
- ✅ Manage data isolation (namespace-based)
- ✅ Provision compute targets