Skip to content

floe Architecture Summary

This document summarizes the architectural redesign of floe with a platform enforcement model.

floe has been redesigned with a four-layer architecture and platform enforcement model that:

  1. Separates platform configuration from pipeline code
  2. Enforces guardrails at compile time
  3. Uses a plugin system for flexibility
  4. Stores immutable platform artifacts in OCI registries

floe supports two organizational patterns:

PatternConfiguration ModelUse Case
CentralizedPlatform → Pipeline (2-file)Traditional centralized data team
Data MeshEnterprise → Domain → Product (3-tier)Federated domain ownership

For Data Mesh, the configuration hierarchy extends:

  • Enterprise Platform: Global governance, approved plugins
  • Domain Platform: Domain-specific choices, domain namespace
  • Data Products: Input/output ports, SLAs, data contracts

floe is built on composability as a core architectural principle (ADR-0037):

  • Plugin Architecture > Configuration Switches: Extensibility via entry points (floe.computes, floe.orchestrators, etc.), not if/else config
  • Interface > Implementation: Define ABCs (ComputePlugin, TelemetryBackendPlugin, LineageBackendPlugin), not concrete classes
  • Progressive Disclosure: Point to detailed docs, don’t duplicate content
  • Opt-in Complexity: Start simple (2-tier), with architecture direction toward Data Mesh-compatible (3-tier) governance. See Capability Status for the current alpha-validated state.

14 plugin categories enable flexibility while maintaining enforced standards (see Plugin Catalog for implementation truth):

  • Compute, Orchestrator, Catalog, Storage, TelemetryBackend, LineageBackend
  • DBT, Semantic Layer, Ingestion, Quality, RBAC, Alert Channel, Secrets, Identity

Note: PolicyEnforcer and DataContract are now core modules in floe-core, not plugins.

See: ADR-0037: Composability Principle

Layer 4: DATA (Ephemeral Jobs)
│ Owner: Data Engineers
│ K8s: Jobs (run-to-completion)
│ Config: floe.yaml
Layer 3: SERVICES (Long-lived)
│ Owner: Platform Engineers
│ K8s: Deployments, StatefulSets
│ Deploy: floe platform deploy
Layer 2: CONFIGURATION (Enforcement)
│ Owner: Platform Engineers
│ Storage: OCI Registry (immutable)
│ Config: manifest.yaml
Layer 1: FOUNDATION (Framework Code)
│ Owner: floe Maintainers
│ Distribution: PyPI, Helm
FileOwnerPurpose
manifest.yamlPlatform TeamDefine guardrails (rarely changes)
floe.yamlData EngineersDefine pipelines (changes frequently)

ENFORCED (Non-negotiable):

  • Apache Iceberg (table format)
  • OpenTelemetry (observability)
  • OpenLineage (data lineage)
  • dbt (transformation)
  • Kubernetes-native (deployment)

PLUGGABLE (Platform Team selects once):

  • Compute: DuckDB, Spark, Snowflake, Databricks, BigQuery
  • Orchestration: Dagster, Airflow, Prefect
  • Catalog: Polaris, AWS Glue, Hive
  • Storage: S3, GCS, Azure Blob, MinIO
  • Observability Backend: Jaeger, Datadog, Grafana Cloud, AWS X-Ray
  • Semantic Layer: Cube, dbt Semantic Layer, None
  • Ingestion: dlt, Airbyte
  • Secrets: K8s Secrets, External Secrets Operator, Vault, Infisical
  • Identity: Keycloak, Dex, Authentik, Okta, Auth0
ADRTitleAction
ADR-0008Repository SplitAMENDED: Added plugin architecture + API versioning
ADR-0010Target-Agnostic ComputeAMENDED: Added ComputePlugin interface
ADR-0012Data Classification GovernanceAMENDED: Added quality gates section
ADR-0016Platform Enforcement ArchitectureAMENDED: Added four-layer details + OCI storage
ADR-0017K8s Testing InfrastructureExisted (created in previous session)
ADR-0018Opinionation BoundariesAMENDED: Added plugin vs configuration decision criteria
ADR-0019Platform Services LifecycleNEW: Long-lived vs ephemeral
ADR-0020Ingestion PluginsNEW: dlt + Airbyte
ADR-0021Data Architecture PatternsNEW: Medallion, Kimball, Data Vault
ADR-0035Observability Plugin InterfaceNEW: Pluggable observability backends (Jaeger, Datadog, Grafana Cloud)
ADR-0036Storage Plugin InterfaceNEW: PyIceberg FileIO pattern for S3, GCS, Azure, MinIO
ADR-0037Composability PrincipleNEW: Core architectural principle for plugin design
ADR-0038Data Mesh ArchitectureNEW: Unified Manifest schema, 3-tier inheritance
DocumentPurpose
four-layer-overview.mdComprehensive layer diagram and details
platform-enforcement.mdHow platform constraints are enforced
platform-services.mdLayer 3 services (orchestrator, catalog, etc.)
plugin-system/Plugin structure and discovery
interfaces/Abstract Base Classes for all plugins
opinionation-boundaries.mdWhat’s enforced vs pluggable
platform-artifacts.mdOCI registry storage model

floe documents 14 plugin categories for extensibility (see plugin-system/index.md for the canonical registry and implemented ABCs):

Plugin TypePurposeEntry PointADR
ComputePluginWhere dbt transforms executefloe.computesADR-0010
OrchestratorPluginJob scheduling and executionfloe.orchestratorsADR-0033
CatalogPluginIceberg table catalogfloe.catalogsADR-0008
StoragePluginObject storage (S3, GCS, Azure, MinIO)floe.storageADR-0036
TelemetryBackendPluginOTLP telemetry backends (traces, metrics, logs)floe.telemetry_backendsADR-0035
LineageBackendPluginOpenLineage backends (data lineage)floe.lineage_backendsADR-0035
DBTPlugindbt compilation environment (local/fusion/cloud)floe.dbtADR-0043
SemanticLayerPluginBusiness intelligence APIfloe.semantic_layersADR-0001
IngestionPluginData loading from sourcesfloe.ingestionADR-0020
SecretsPluginCredential managementfloe.secretsADR-0023/0031
IdentityPluginUser authentication (OIDC)floe.identityADR-0024
DataQualityPluginData quality validation frameworksfloe.qualityADR-0044
RBACPluginNamespace and service-account isolationfloe.rbacEpic 7B
AlertChannelPluginContract violation alert deliveryfloe.alert_channelsEpic 15

Note: PolicyEnforcer and DataContract are now core modules in floe-core, not plugins.

See: interfaces/ for complete ABC definitions with method signatures

class ComputePlugin(ABC):
def generate_dbt_profile(self, config: ComputeConfig) -> dict
def get_required_dbt_packages(self) -> list[str]
def validate_connection(self, config: ComputeConfig) -> ConnectionResult
def get_resource_requirements(self, workload_size: str) -> ResourceSpec

Example: TelemetryBackendPlugin and LineageBackendPlugin

Section titled “Example: TelemetryBackendPlugin and LineageBackendPlugin”

Observability uses two independent plugins (ADR-0035):

class TelemetryBackendPlugin(ABC):
"""Configure OTLP backends for traces, metrics, logs."""
def get_otlp_exporter_config(self) -> dict[str, Any]
def get_helm_values_override(self) -> dict[str, Any]
class LineageBackendPlugin(ABC):
"""Configure OpenLineage backends for data lineage."""
def get_transport_config(self) -> dict[str, Any]
def get_namespace_mapping(self) -> dict[str, str]
class StoragePlugin(ABC):
def get_pyiceberg_fileio(self) -> FileIO
def get_warehouse_uri(self, namespace: str) -> str
def get_dbt_profile_config(self) -> dict[str, Any]
def get_dagster_io_manager_config(self) -> dict[str, Any]
def get_helm_values_override(self) -> dict[str, Any]
floe/
├── floe-core/ # Schemas, interfaces, enforcement engine
├── floe-cli/ # CLI for Platform Team and Data Team
├── floe-dbt/ # ENFORCED: dbt framework; runtime PLUGGABLE (ADR-0043)
├── floe-iceberg/ # ENFORCED: Iceberg utilities
├── plugins/ # ALL PLUGGABLE COMPONENTS (see Plugin Catalog)
│ ├── floe-compute-duckdb/
│ ├── floe-compute-spark/
│ ├── floe-compute-snowflake/
│ ├── floe-orchestrator-dagster/
│ ├── floe-orchestrator-airflow/
│ ├── floe-catalog-polaris/
│ ├── floe-catalog-glue/
│ ├── floe-storage-s3/ # S3-compatible storage, including MinIO endpoints
│ ├── floe-observability-jaeger/
│ ├── floe-observability-datadog/
│ ├── floe-semantic-cube/
│ ├── floe-ingestion-dlt/
│ ├── floe-secrets-eso/
│ ├── floe-secrets-infisical/
│ └── floe-identity-keycloak/
├── charts/
│ ├── floe-platform/ # Meta-chart for platform services
│ └── floe-jobs/ # Base chart for pipeline jobs
└── docs/

This section shows the target command shape. In the current alpha, floe platform compile is implemented for platform manifests and make compile-demo is the supported Customer 360 artifact path. Root data-team lifecycle commands and floe platform test are planned/stubbed, not current alpha workflows.

Terminal window
floe platform compile # Validate and build artifacts
floe platform test # Planned/stub: run policy tests
floe platform publish # Push to OCI registry
floe platform deploy # Deploy services to K8s
floe platform status # Check service health
Terminal window
floe init --platform=v1.2.3 # Planned: pull platform artifacts
floe compile # Planned: validate against platform
floe run # Planned: execute pipeline
floe test # Planned: run dbt tests

The documentation was consolidated to reduce complexity:

  • 4 existing ADRs amended (vs creating separate overlapping ADRs)
  • 4 new ADRs created (only for truly distinct decisions)
  • Reduced from 10 planned ADRs to 8 actual ADRs
  • Cross-references maintained between all documents

For Data Mesh support, floe introduces additional resource types:

ResourceOwnerPurpose
EnterpriseManifestCentral Platform TeamGlobal governance, approved plugins
DomainManifestDomain Platform TeamDomain-specific choices
DataProductProduct TeamInput/output ports, SLAs
DataContractAuto-generatedCross-domain data sharing contracts

See ADR-0021: Data Architecture Patterns for full Data Mesh documentation.

The following documents have been updated to support the Data Mesh architecture pattern:

DocumentChanges
04-building-blocks.mdAdded Data Mesh schemas, CLI commands, three-tier config model
05-runtime-view.mdAdded Data Mesh workflows (product registration, contracts, lineage)
06-deployment-view.mdAdded Data Mesh topology, domain namespaces, multi-cluster patterns
07-crosscutting.mdUpdated configuration hierarchy for federated governance
DocumentChanges
four-layer-overview.mdAdded Data Mesh layer extension diagram
platform-enforcement.mdAdded three-tier enforcement model, data contracts
ADR-0021Comprehensive Data Mesh architecture (already existed)
DocumentChanges
compiled-artifacts.mdAdded domain_context and data_product fields
  1. Implement floe-core schemas (Pydantic models)
  2. Implement plugin interfaces (ABCs)
  3. Create reference plugins and implementation primitives (DuckDB, Dagster, Polaris, Cube, dlt)
  4. Create Helm charts for platform deployment
  5. Implement CLI commands
  6. Create integration tests using K8s (ADR-0017)
  7. Implement Data Mesh resource types (EnterpriseManifest, DomainManifest, DataProduct, DataContract)
  8. Implement cross-domain data contract validation
  9. Implement federated lineage with domain-qualified namespaces