Skip to content

ADR-0018: Opinionation Boundaries

Accepted

floe must balance two competing needs:

  1. Strong opinions - Make it easy to get started with proven defaults
  2. Flexibility - Allow organizations to use their existing infrastructure

Without clear boundaries, this creates confusion:

  • Which components are required vs optional?
  • What can Platform Teams customize?
  • What do Data Engineers inherit without choice?

Define clear opinionation boundaries:

  1. ENFORCED - Core platform identity, non-negotiable
  2. PLUGGABLE - Platform Team selects once, Data Engineers inherit

These standards define floe’s core identity:

ComponentStandardRationale
Table FormatApache IcebergOpen, multi-engine, ACID, time-travel
TelemetryOpenTelemetryVendor-neutral industry standard
Data LineageOpenLineageIndustry standard for lineage
DeploymentKubernetes-nativePortable, declarative infrastructure
ConfigurationDeclarative YAMLExplicit over implicit
Transformationdbt-centric”dbt owns SQL” - proven, target-agnostic

Why these are enforced:

  • Iceberg: Provides the open table format foundation. Without it, multi-engine access and time-travel are not possible.
  • OpenTelemetry/OpenLineage: Provide consistent observability. Custom formats fragment the ecosystem.
  • Kubernetes: Provides the deployment abstraction. Supporting Docker Compose creates testing parity issues.
  • dbt: Provides the transformation layer. Building custom SQL handling duplicates proven tooling.

PLUGGABLE Components (Platform Team Choice)

Section titled “PLUGGABLE Components (Platform Team Choice)”

Platform Team selects these ONCE in manifest.yaml. Data Engineers inherit them:

ComponentDefaultAlternatives
StorageMinIOS3, ADLS2, GCS
ComputeDuckDBSpark, Snowflake, Databricks, BigQuery, Redshift
IngestiondltAirbyte (external)
OrchestrationDagsterAirflow, Prefect, Argo Workflows
CatalogPolarisAWS Glue, Hive Metastore, Nessie
Semantic LayerCubedbt Semantic Layer, None
Data Qualitydbt testsGreat Expectations, Soda (future)
SecretsK8s SecretsExternal Secrets Operator, Vault
IdentityKeycloakDex, Authentik, Zitadel, Okta, Auth0, Azure AD
Telemetry BackendOTLP CollectorDatadog, Grafana Cloud
  • Clear boundaries - Teams know what they can change
  • Consistent foundation - Iceberg + OTel + OpenLineage everywhere
  • Flexibility where it matters - Choose your compute, orchestrator
  • Batteries included - Defaults work out of the box
  • Less flexibility - Cannot swap out Iceberg for Delta Lake
  • Learning curve - Teams must learn enforced standards
  • Potential lock-in - Dependent on Iceberg ecosystem
  • Plugin system provides escape hatch for most customization needs
  • Enforced standards are industry-leading choices
  • Platform Team can still customize significantly via plugins
CriteriaExample
Core platform identityIceberg table format
Cross-cutting concernOpenTelemetry observability
Industry standardOpenLineage lineage
Deployment modelKubernetes-native
Significant re-architecture to swapdbt transformation
CriteriaExample
Multiple valid options existCompute: DuckDB vs Snowflake
Organization already has choiceOrchestration: existing Airflow
Different scale requirementsSpark vs DuckDB
Cloud provider preferenceAWS Glue vs Polaris
Cost considerationsManaged vs self-hosted

Per ADR-0037 (Composability Principle), when making something PLUGGABLE, choose between:

  1. Plugin Interface (ABC with entry points) - Preferred for extensibility
  2. Configuration Switch (if/else on config value) - Only for fixed enums

Decision Tree:

┌────────────────────────────────────────────────────────────────┐
│ Question: How should we make this component pluggable? │
└─────────────────────────────┬──────────────────────────────────┘
┌───────────────────────────────┐
│ Multiple implementations │
│ exist OR may exist? │
└───────────┬───────────────────┘
┌───────────┴──────────┐
│ │
YES NO
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ User needs to │ │ Fixed set of │
│ swap or extend? │ │ options (enum)? │
└─────────┬─────────┘ └─────────┬────────┘
│ │
┌────────┴────────┐ ┌────────┴────────┐
YES NO YES NO
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ PLUGIN │ │ CONFIG │ │ CONFIG │ │ ENFORCE │
│ ✅ │ │ ✅ │ │ ✅ │ │ ✅ │
└─────────┘ └─────────┘ └─────────┘ └─────────┘

Examples:

ScenarioDecisionRationale
Observability backends (Jaeger, Datadog)PluginMultiple implementations, users swap backends (ADR-0035)
Storage backends (S3, GCS, Azure)PluginMultiple implementations, different credentials (ADR-0036)
Compute engines (DuckDB, Snowflake)PluginMultiple implementations, organization choice
Environment (dev, staging, prod)ConfigurationFixed enum, no custom implementations
Log level (DEBUG, INFO, WARN)ConfigurationFixed enum, no custom implementations
OpenTelemetry SDK (emission)EnforceNo alternatives, core platform identity

Why Plugin Interface is Preferred:

  1. Extensibility: Community can add new implementations without core changes
  2. Testing: Mock plugin interface in tests (no real services needed)
  3. Composability: Aligns with ADR-0037 principle of small, interchangeable components
  4. Decoupling: Core doesn’t know about implementation details

When Configuration is Acceptable:

  1. Fixed set of values: Environment (dev/staging/prod), log levels (DEBUG/INFO)
  2. No new implementations expected: Boolean flags, simple toggles
  3. Trivial logic: No complex behavior differences between options

Anti-Pattern: Configuration Switch for Extensible Behavior

# ❌ BAD: Configuration switch (coupling)
def get_backend(config: dict):
if config["type"] == "jaeger":
return JaegerBackend()
elif config["type"] == "datadog":
return DatadogBackend()
# Every new backend requires core changes
# ✅ GOOD: Plugin interface (composable)
registry = PluginRegistry()
backend = registry.discover("floe.observability")[config["type"]]
┌─────────────────────────────────────────────────────────────────────────┐
│ Platform Team Decision │
│ │
│ "We use Snowflake for compute, existing Airflow for orchestration" │
└───────────────────────────────────┬─────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ manifest.yaml │
│ │
│ plugins: │
│ compute: │
│ type: snowflake # Pluggable ✓ │
│ orchestrator: │
│ type: airflow # Pluggable ✓ │
│ catalog: │
│ type: polaris # Pluggable ✓ (example catalog selection) │
│ │
│ # ENFORCED (cannot change): │
│ # - Iceberg table format │
│ # - OpenTelemetry observability │
│ # - OpenLineage lineage │
│ # - dbt transformation │
│ # - Kubernetes deployment │
└───────────────────────────────────┬─────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Data Engineers Inherit │
│ │
│ # floe.yaml - Data engineers ONLY define: │
│ transforms: │
│ - type: dbt # Enforced ✓ (must use dbt) │
│ path: models/ │
│ │
│ # They inherit without choice: │
│ # - Snowflake compute (from platform) │
│ # - Airflow orchestration (from platform) │
│ # - Polaris catalog (from platform) │
│ # - Iceberg tables (enforced) │
│ # - OpenTelemetry (enforced) │
└─────────────────────────────────────────────────────────────────────────┘
floe.yaml
# BAD: Per-pipeline compute selection causes drift
transforms:
- type: dbt
compute: snowflake # ❌ Should be inherited from platform

DON’T: Allow per-environment compute targets

Section titled “DON’T: Allow per-environment compute targets”
manifest.yaml
# BAD: Different compute per environment causes drift
environments:
development:
compute: duckdb # ❌ Causes "works in dev, fails in prod"
production:
compute: snowflake # ❌ Environment drift
manifest.yaml
# GOOD: Single compute target, no drift
plugins:
compute:
type: snowflake # ✓ Same for dev, staging, prod