Skip to content

ADR-0037: Composability as Core Principle

Accepted

floe must scale gracefully across vastly different organizational structures:

  1. Single-team startup - 5 data engineers, one DuckDB instance
  2. Enterprise with multiple teams - Shared platform, standardized tooling
  3. Data Mesh organization - Federated domains, autonomous product teams

Traditional approaches fail at these extremes:

  • Monolithic configuration: Works for single-team, but Data Mesh requires hundreds of config variants
  • Per-team customization: Creates drift, breaks governance, prevents cross-team data sharing
  • Rewrite between models: Scaling from 2-tier to 3-tier (Data Mesh) becomes a migration project

The tension: We need ONE architecture that supports both extremes without rewriting.

Composability is the CORE architectural principle guiding all design decisions.

Composability = Small, interchangeable components with clean interfaces that combine to form complex systems without modification.

1. Plugin Architecture > Configuration Switches

Section titled “1. Plugin Architecture > Configuration Switches”

Rule: If multiple implementations exist OR may exist in the future, use a plugin interface (NOT an if/else configuration switch).

Rationale: Plugin interfaces enable new implementations without changing the core. Configuration switches create coupling and require core changes for every variant.

Example:

# ❌ BAD: Configuration switch (coupling)
def get_observability_backend(config: dict) -> Backend:
if config["type"] == "jaeger":
return JaegerBackend(config)
elif config["type"] == "datadog":
return DatadogBackend(config)
# Future: Add elif for every new backend = core changes
# ✅ GOOD: Plugin interface (composable)
class ObservabilityPlugin(ABC):
@abstractmethod
def get_otlp_exporter_config(self) -> dict:
pass
# Future: New backends register via entry points = no core changes

Entry Points (current plugin categories are listed in the Plugin Catalog):

  • floe.computes - Compute engines (DuckDB, Snowflake, Spark, etc.)
  • floe.orchestrators - Orchestration platforms (Dagster, Airflow)
  • floe.catalogs - Catalog backends (Polaris, Glue, Hive)
  • floe.storage - Storage backends (S3, GCS, Azure, MinIO)
  • floe.telemetry_backends - Telemetry backends (Jaeger, Datadog, Grafana Cloud)
  • floe.lineage_backends - Lineage backends (Marquez, Atlan, OpenMetadata)
  • floe.dbt - DBT compilation environments (local, fusion, cloud)
  • floe.semantic_layers - Semantic layers (Cube, dbt Semantic Layer)
  • floe.ingestion - Ingestion tools (dlt, Airbyte)
  • floe.quality - Data quality tooling
  • floe.rbac - Access-control policy generators
  • floe.alert_channels - Alert delivery backends
  • floe.secrets - Secrets management (K8s Secrets, ESO, Vault)
  • floe.identity - Identity providers (OIDC, Keycloak)

Note: PolicyEnforcer and DataContract are now core modules in floe-core, not plugins.

Rule: Define abstract base classes (ABCs) representing behavior contracts, NOT concrete implementations.

Rationale: Interfaces enable multiple implementations to coexist without core knowing about them. Concrete classes create tight coupling.

Example:

# ✅ GOOD: ABC defines contract
class ComputePlugin(ABC):
name: str
version: str
floe_api_version: str
@abstractmethod
def generate_profiles(self, artifacts: CompiledArtifacts) -> dict[str, Any]:
"""Generate dbt profiles.yml section for this compute target."""
pass
# Multiple implementations register via entry points
class DuckDBPlugin(ComputePlugin): ...
class SnowflakePlugin(ComputePlugin): ...
class SparkPlugin(ComputePlugin): ...
# Core discovers plugins WITHOUT importing them
registry = PluginRegistry()
plugins = registry.discover("floe.computes")

Benefit: Adding BigQueryPlugin requires ZERO changes to core code.

Rule: Point to detailed documentation instead of duplicating content. Reveal complexity only when needed.

Rationale: Prevents documentation bloat, ensures single source of truth, reduces cognitive load for beginners.

Example:

# ❌ BAD: Duplicate plugin architecture in CLAUDE.md (800 lines)
## Plugin Architecture
[Full plugin interface definitions, entry points, examples...]
# ✅ GOOD: Pointer to detailed docs
## Plugin Architecture
floe uses plugin interfaces for extensibility. The Plugin Catalog lists the current implementation categories.
**See:** docs/architecture/plugin-system/index.md

Application:

  • CLAUDE.md: High-level overview + links to details
  • Architecture docs: Complete specifications
  • Skills: Domain expertise loaded on-demand

Rule: Default configuration should be simple (2-tier). Advanced features (3-tier Data Mesh) are opt-in extensions, NOT separate systems.

Rationale: Teams start simple, add complexity only when needed, without rewriting existing configuration.

Example:

manifest.yaml
# ✅ Simple: 2-tier configuration (single-team)
plugins:
compute: duckdb
catalog: polaris
# floe.yaml
transforms:
- type: dbt
path: models/
# ✅ Advanced: 3-tier configuration (Data Mesh)
# enterprise-manifest.yaml
scope: enterprise # NEW FIELD - enables Data Mesh
plugins:
compute: snowflake
catalog: polaris
# domain-manifest.yaml
scope: domain # NEW FIELD - inherits enterprise defaults
approved_products: [sales, marketing]
# floe.yaml
# Unchanged - same schema as 2-tier
transforms:
- type: dbt
path: models/

Key: Same Manifest schema, different scope field. No breaking changes when scaling.

  • Scales without rewriting - 2-tier to 3-tier is configuration change, not code migration
  • Extensibility without core changes - New plugins via entry points, no core modifications
  • Clear interfaces - ABCs document contracts explicitly
  • Ecosystem growth - Community can build plugins without forking
  • Progressive disclosure - Beginners see simple docs, experts find details
  • Testing efficiency - Mock plugins via fixtures, test interfaces not implementations
  • Upfront design cost - Requires thinking about interfaces before implementations
  • Learning curve - Teams must understand plugin architecture
  • Abstraction overhead - More files/classes than hardcoded if/else
  • Discovery complexity - Entry points less obvious than import statements
  • Trade-off: Flexibility for upfront design cost
  • Mitigated by: Clear documentation, reference plugins, testing utilities
  • Industry precedent: Python ecosystem (pytest plugins, Sphinx extensions), Jenkins, Kubernetes
CriteriaExampleDecision
Multiple implementations exist todayDuckDB, Snowflake, Spark, DatabricksPlugin ✅
Organization may swap implementationJaeger → Datadog observability backendPlugin ✅
User needs to extend behaviorAdd custom policy enforcement rulesPlugin ✅
Industry has competing standardsODCS v3, Protobuf data contractsPlugin ✅
CriteriaExampleDecision
Single implementation, no alternativesOpenTelemetry SDK (emission standard)Configuration ✅
Simple parameter tuningLog level (DEBUG, INFO, WARN)Configuration ✅
Boolean feature flagEnable/disable lineage collectionConfiguration ✅
Fixed set of options (enum)Environment (dev, staging, prod)Configuration ✅
from abc import ABC, abstractmethod
from importlib.metadata import entry_points
class PluginRegistry:
"""Singleton registry for all plugin types."""
def discover(self, group: str) -> dict[str, Any]:
"""Discover plugins by entry point group.
Args:
group: Entry point group (e.g., "floe.computes")
Returns:
Dictionary mapping plugin names to plugin classes
"""
plugins = {}
for ep in entry_points(group=group):
plugin_class = ep.load()
plugins[ep.name] = plugin_class()
return plugins
class ObservabilityPlugin(ABC):
"""Plugin interface for observability backends.
Responsibilities:
- Generate OTLP Collector exporter configuration
- Generate OpenLineage transport configuration
- Provide Helm values for deploying backend services
"""
name: str # e.g., "jaeger", "datadog"
version: str # Plugin version
floe_api_version: str # Supported floe-core API version
@abstractmethod
def get_otlp_exporter_config(self) -> dict[str, Any]:
"""Generate OTLP Collector exporter configuration.
Returns:
Dictionary matching OTLP Collector config schema
"""
pass
@abstractmethod
def get_lineage_config(self) -> dict[str, Any]:
"""Generate OpenLineage transport configuration.
Returns:
Dictionary with 'type' and backend-specific config
"""
pass
@abstractmethod
def get_helm_values_override(self) -> dict[str, Any]:
"""Generate Helm values for deploying backend services.
Returns:
Helm values dictionary for backend chart
"""
pass
pyproject.toml
[project.entry-points."floe.observability"]
jaeger = "floe_observability_jaeger:JaegerPlugin"
datadog = "floe_observability_datadog:DatadogPlugin"

Before (Configuration Switch - Coupled):

manifest.yaml
observability:
backend: jaeger # Hardcoded if/else in core
jaeger:
endpoint: "http://jaeger:14250"
# Future: Add datadog section, modify core parsing logic

After (Plugin Interface - Composable):

manifest.yaml
plugins:
observability: jaeger # Plugin name
# floe-observability-jaeger plugin provides:
# - get_otlp_exporter_config() → OTLP Collector config
# - get_lineage_config() → OpenLineage transport
# - get_helm_values_override() → Jaeger Helm chart values
# Future: Install floe-observability-datadog plugin
# plugins:
# observability: datadog # Zero core changes

Benefits:

  • Adding Datadog: Install plugin, change config value (NO core changes)
  • Testing: Mock ObservabilityPlugin interface (NO real Jaeger needed)
  • Custom backends: Implement interface, register via entry point

Existing Code → Composable Architecture:

  1. Identify hardcoded implementations - Search for if/else on config[“type”]
  2. Extract ABC - Define interface representing contract
  3. Create entry point group - e.g., floe.observability
  4. Refactor implementations - Convert to plugin classes
  5. Update PluginRegistry - Discover via entry points
  6. Update docs - Document plugin interface in interfaces/

Timeline: Gradual (no big bang) - plugins can coexist with legacy code during migration.

DON’T: Use if/else for extensible behavior

Section titled “DON’T: Use if/else for extensible behavior”
# ❌ ANTI-PATTERN: Coupled to core
def get_backend(config: dict):
if config["type"] == "jaeger":
return JaegerBackend()
elif config["type"] == "datadog":
return DatadogBackend()
# Every new backend requires core changes

DON’T: Hardcode implementation in configuration schema

Section titled “DON’T: Hardcode implementation in configuration schema”
# ❌ ANTI-PATTERN: Schema knows about implementations
class PlatformManifest(BaseModel):
jaeger_config: Optional[JaegerConfig] = None
datadog_config: Optional[DatadogConfig] = None
# Every new backend adds field

DO: Use plugin interface with dynamic discovery

Section titled “DO: Use plugin interface with dynamic discovery”
# ✅ PATTERN: Composable, extensible
class PlatformManifest(BaseModel):
plugins: dict[str, str] # {"observability": "jaeger"}
registry = PluginRegistry()
backend_plugin = registry.discover("floe.observability")["jaeger"]
config = backend_plugin.get_otlp_exporter_config()