Skip to content

ADR-0008: Standalone Repository Architecture

Accepted

floe is designed as a 100% standalone open-source project. We need to decide how to organize the codebase to:

  • Enable community contribution
  • Support enterprise self-hosting
  • Allow external systems to integrate via contracts
  • Maintain clear boundaries between enforced and pluggable components

Considerations:

  • Apache 2.0 licensing throughout
  • Plugin architecture for extensibility
  • Contract-based integration for external orchestration systems
  • Independent versioning

Organize floe as a single public repository with:

  1. Core packages - Enforced components (floe-core, floe-iceberg). Note: dbt framework is enforced, but dbt compilation environment is pluggable (ADR-0043)
  2. Plugins - Pluggable components (compute, orchestrator, catalog, etc.)
  3. Contracts - Well-defined interfaces for external integration
  • Clear licensing - Apache 2.0 for everything
  • Community contribution - Open development, public roadmap
  • Enterprise option - Self-host with any external orchestration
  • Extensible - Plugin architecture for customization
  • Contract-based - External systems integrate via documented contracts
  • External integration complexity - External systems must implement contracts
  • No managed option - Users manage their own infrastructure
  • Clear interface contracts (CompiledArtifacts, Observability)
  • Semantic versioning for compatibility
  • Comprehensive documentation required
floe/
├── floe-core/ # Schemas, interfaces, enforcement engine
├── floe-cli/ # CLI for Platform Team and Data Team
├── floe-dbt/ # ENFORCED: dbt framework (compilation environment is pluggable)
├── floe-iceberg/ # ENFORCED: Iceberg utilities (not pluggable)
├── plugins/ # PLUGGABLE: Selected by Platform Team
│ ├── floe-compute-duckdb/ # Compute plugins
│ ├── floe-compute-spark/
│ ├── floe-compute-snowflake/
│ ├── floe-orchestrator-dagster/ # Orchestration plugins
│ ├── floe-orchestrator-airflow/
│ ├── floe-catalog-polaris/ # Catalog plugins
│ ├── floe-catalog-glue/
│ ├── floe-semantic-cube/ # Semantic layer plugins
│ ├── floe-ingestion-dlt/ # Ingestion plugins
│ └── floe-secrets-eso/ # Secrets plugins
├── charts/
│ ├── floe-platform/ # Meta-chart: assembles plugin charts
│ └── floe-jobs/ # Base chart for pipeline jobs
└── docs/ # Runtime documentation

Key Design Principle: Only ENFORCED components (Iceberg, OpenTelemetry, OpenLineage) live at the top level. All PLUGGABLE components follow the plugin pattern under plugins/. Note: dbt framework integration (floe-dbt) is core but uses pluggable DBTPlugin for execution (ADR-0043). See ADR-0016 for the platform enforcement architecture.

ComponentStatusRationale
Synthetic Data (SDV)DeferredRequires ML infrastructure, future plugin
Flink streamingDeferredDesign for extensibility, implement later. See ADR-0014

The runtime executes pipelines on whatever data exists (real, synthetic, or test fixtures).

External systems can integrate with floe through well-defined contracts documented in docs/contracts/:

ContractLocationPurpose
CompiledArtifactsfloe/floe-core (Pydantic)Runtime configuration schema
Observability AttributesDocumented in contractsOpenTelemetry/OpenLineage conventions
floe.yaml Schemafloe/floe-coreUser-facing config format
Helm Valuesfloe/chartsRuntime deployment config
  • Runtime-owned contracts: Defined as Pydantic models in floe-core, exported as JSON Schema
  • Standard contracts: OpenTelemetry and OpenLineage follow industry standards
  • Semantic versioning (MAJOR.MINOR.PATCH)
  • Breaking changes require major version bump
  • Plugin API versioning for compatibility

Each plugin is a self-contained package with:

  1. Python code (src/) - implements the ABC interface from floe-core
  2. Helm chart (chart/) - deploys the service (if applicable)
  3. Entry point - registered via pyproject.toml
plugins/floe-orchestrator-dagster/
├── src/
│ ├── __init__.py
│ └── plugin.py # DagsterOrchestratorPlugin class
├── chart/
│ ├── Chart.yaml
│ ├── values.yaml
│ └── templates/
│ ├── webserver.yaml
│ ├── daemon.yaml
│ └── services.yaml
└── pyproject.toml # Entry point registration

Plugins register via Python entry points (standard, proven pattern used by pytest, Dagster, DataHub):

plugins/floe-orchestrator-dagster/pyproject.toml
[project]
name = "floe-orchestrator-dagster"
version = "1.0.0"
dependencies = ["floe-core>=1.0.0", "dagster>=1.6.0"]
[project.entry-points."floe.orchestrators"]
dagster = "floe_orchestrator_dagster:DagsterOrchestratorPlugin"
[project.entry-points."floe.charts"]
dagster = "floe_orchestrator_dagster:chart"
floe_core/registry.py
class PluginRegistry:
"""Discovers and loads plugins via entry points."""
def discover_all(self) -> None:
"""Scan all installed packages for floe.* entry points."""
...
def get_orchestrator(self, name: str) -> OrchestratorPlugin:
"""Get orchestrator plugin by name."""
...
def get_compute(self, name: str) -> ComputePlugin:
"""Get compute plugin by name."""
...
def list_available(self) -> dict[str, list[str]]:
"""List all available plugins by type."""
...
def validate_manifest(self, manifest: Manifest) -> list[str]:
"""Validate manifest config against available plugins."""
...

To ensure compatibility between floe-core and plugins:

floe_core/plugin_api.py
from typing import Final
FLOE_PLUGIN_API_VERSION: Final[str] = "1.0"
FLOE_PLUGIN_API_MIN_VERSION: Final[str] = "1.0"
# Every plugin must declare API compatibility
from dataclasses import dataclass
@dataclass
class PluginMetadata:
name: str # e.g., "dagster"
version: str # Plugin version, e.g., "1.0.0"
floe_api_version: str # Required - checked at load time
description: str
author: str
floe_core/registry.py
def load_plugin(self, entry_point) -> Plugin:
plugin_class = entry_point.load()
metadata = plugin_class.metadata
# Check API version compatibility
if not is_compatible(metadata.floe_api_version, FLOE_PLUGIN_API_MIN_VERSION):
raise PluginIncompatibleError(
f"Plugin {metadata.name} requires API v{metadata.floe_api_version}, "
f"but minimum supported is v{FLOE_PLUGIN_API_MIN_VERSION}"
)
return plugin_class()
Change TypeAPI Version ImpactExample
Add optional methodMinor (1.0 → 1.1)Add health_check() with default impl
Add required methodMajor (1.0 → 2.0)Add abstract validate() method
Remove methodMajor (1.0 → 2.0)Remove deprecated old_method()
Change signatureMajor (1.0 → 2.0)Change run(config) to run(config, context)
TypeEntry PointExample Plugins
Computefloe.computesduckdb, spark, snowflake, databricks, bigquery
Orchestratorfloe.orchestratorsdagster, airflow
Catalogfloe.catalogspolaris, glue, hive
Semantic Layerfloe.semantic_layerscube, none
Ingestionfloe.ingestiondlt, airbyte
Secretsfloe.secretseso, vault, k8s