floe Architecture Summary
This document summarizes the architectural redesign of floe with a platform enforcement model.
Executive Summary
Section titled “Executive Summary”floe has been redesigned with a four-layer architecture and platform enforcement model that:
- Separates platform configuration from pipeline code
- Enforces guardrails at compile time
- Uses a plugin system for flexibility
- Stores immutable platform artifacts in OCI registries
Key Architectural Decisions
Section titled “Key Architectural Decisions”Organizational Patterns
Section titled “Organizational Patterns”floe supports two organizational patterns:
| Pattern | Configuration Model | Use Case |
|---|---|---|
| Centralized | Platform → Pipeline (2-file) | Traditional centralized data team |
| Data Mesh | Enterprise → Domain → Product (3-tier) | Federated domain ownership |
For Data Mesh, the configuration hierarchy extends:
- Enterprise Platform: Global governance, approved plugins
- Domain Platform: Domain-specific choices, domain namespace
- Data Products: Input/output ports, SLAs, data contracts
Composability Principle
Section titled “Composability Principle”floe is built on composability as a core architectural principle (ADR-0037):
- Plugin Architecture > Configuration Switches: Extensibility via entry points (
floe.computes,floe.orchestrators, etc.), not if/else config - Interface > Implementation: Define ABCs (ComputePlugin, TelemetryBackendPlugin, LineageBackendPlugin), not concrete classes
- Composition Contracts > Cross-Plugin Coupling: Plugins declare capabilities and requirements;
floe-corevalidates compatibility and passes typed bindings between plugins - Progressive Disclosure: Point to detailed docs, don’t duplicate content
- Opt-in Complexity: Start simple (2-tier), with architecture direction toward Data Mesh-compatible (3-tier) governance. See Capability Status for the current alpha-validated state.
15 plugin categories enable flexibility while maintaining enforced standards (see Plugin Catalog for implementation truth):
- Compute, Orchestrator, Catalog, Storage, TelemetryBackend, LineageBackend
- DBT, Semantic Layer, Ingestion, Quality, RBAC, Alert Channel, Secrets, Identity, Network Security
Note: PolicyEnforcer and DataContract are now core modules in floe-core, not plugins.
See: ADR-0037: Composability Principle
Storage-side plugin composition is tracked in Plugin Composition Uplift Tracker. The immediate target is storage/catalog/compute/orchestrator/deployment composition for the Iceberg runtime path; broader plugin uplift is staged after that PR.
Four-Layer Architecture
Section titled “Four-Layer Architecture”Layer 4: DATA (Ephemeral Jobs) │ Owner: Data Engineers │ K8s: Jobs (run-to-completion) │ Config: floe.yaml ▼Layer 3: SERVICES (Long-lived) │ Owner: Platform Engineers │ K8s: Deployments, StatefulSets │ Deploy: floe platform deploy ▼Layer 2: CONFIGURATION (Enforcement) │ Owner: Platform Engineers │ Storage: OCI Registry (immutable) │ Config: manifest.yaml ▼Layer 1: FOUNDATION (Framework Code) │ Owner: floe Maintainers │ Distribution: PyPI, HelmTwo-File Configuration Model
Section titled “Two-File Configuration Model”| File | Owner | Purpose |
|---|---|---|
manifest.yaml | Platform Team | Define guardrails, selected plugins, and platform-owned service/destination settings (rarely changes) |
floe.yaml | Data Engineers | Define product transforms and declarative ingestion sources (changes frequently) |
For ingestion, this keeps the platform/data-engineer split concrete:
platform teams select and configure dlt, Polaris, and MinIO/S3 once in
manifest.yaml; product teams declare source name, file format, path,
destination raw table, write mode, and schema contract in floe.yaml.
Opinionation Boundaries
Section titled “Opinionation Boundaries”ENFORCED (Non-negotiable):
- Apache Iceberg (table format)
- OpenTelemetry (observability)
- OpenLineage (data lineage)
- dbt (transformation)
- Kubernetes-native (deployment)
PLUGGABLE (Platform Team selects once):
- Compute: DuckDB, Spark, Snowflake, Databricks, BigQuery
- Orchestration: Dagster, Airflow, Prefect
- Catalog: Polaris, AWS Glue, Hive
- Storage: S3, GCS, Azure Blob, MinIO
- Observability Backend: Jaeger, Datadog, Grafana Cloud, AWS X-Ray
- Semantic Layer: Cube, dbt Semantic Layer, None
- Ingestion: dlt, Airbyte
- Secrets: K8s Secrets, External Secrets Operator, Vault, Infisical
- Identity: Keycloak, Dex, Authentik, Okta, Auth0
Documentation Structure
Section titled “Documentation Structure”ADRs Created/Amended
Section titled “ADRs Created/Amended”| ADR | Title | Action |
|---|---|---|
| ADR-0008 | Repository Split | AMENDED: Added plugin architecture + API versioning |
| ADR-0010 | Target-Agnostic Compute | AMENDED: Added ComputePlugin interface |
| ADR-0012 | Data Classification Governance | AMENDED: Added quality gates section |
| ADR-0016 | Platform Enforcement Architecture | AMENDED: Added four-layer details + OCI storage |
| ADR-0017 | K8s Testing Infrastructure | Existed (created in previous session) |
| ADR-0018 | Opinionation Boundaries | AMENDED: Added plugin vs configuration decision criteria |
| ADR-0019 | Platform Services Lifecycle | NEW: Long-lived vs ephemeral |
| ADR-0020 | Ingestion Plugins | NEW: dlt + Airbyte |
| ADR-0021 | Data Architecture Patterns | NEW: Medallion, Kimball, Data Vault |
| ADR-0035 | Observability Plugin Interface | NEW: Pluggable observability backends (Jaeger, Datadog, Grafana Cloud) |
| ADR-0036 | Storage Plugin Interface | NEW: PyIceberg FileIO pattern for S3, GCS, Azure, MinIO |
| ADR-0037 | Composability Principle | NEW: Core architectural principle for plugin design |
| ADR-0038 | Data Mesh Architecture | NEW: Unified Manifest schema, 3-tier inheritance |
Architecture Documents Created
Section titled “Architecture Documents Created”| Document | Purpose |
|---|---|
four-layer-overview.md | Comprehensive layer diagram and details |
platform-enforcement.md | How platform constraints are enforced |
platform-services.md | Layer 3 services (orchestrator, catalog, etc.) |
plugin-system/ | Plugin structure and discovery |
plugin-composition-uplift-tracker.md | Composition resolver and typed adapter adoption plan |
interfaces/ | Abstract Base Classes for all plugins |
opinionation-boundaries.md | What’s enforced vs pluggable |
platform-artifacts.md | OCI registry storage model |
Key Interfaces
Section titled “Key Interfaces”floe documents 15 plugin categories for extensibility (see plugin-system/index.md for the canonical registry and implemented ABCs):
| Plugin Type | Purpose | Entry Point | ADR |
|---|---|---|---|
ComputePlugin | Where dbt transforms execute | floe.computes | ADR-0010 |
OrchestratorPlugin | Job scheduling and execution | floe.orchestrators | ADR-0033 |
CatalogPlugin | Iceberg table catalog | floe.catalogs | ADR-0008 |
StoragePlugin | Object storage (S3, GCS, Azure, MinIO) | floe.storage | ADR-0036 |
TelemetryBackendPlugin | OTLP telemetry backends (traces, metrics, logs) | floe.telemetry_backends | ADR-0035 |
LineageBackendPlugin | OpenLineage backends (data lineage) | floe.lineage_backends | ADR-0035 |
DBTPlugin | dbt compilation environment (local/fusion/cloud) | floe.dbt | ADR-0043 |
SemanticLayerPlugin | Business intelligence API | floe.semantic_layers | ADR-0001 |
IngestionPlugin | Data loading from sources | floe.ingestion | ADR-0020 |
SecretsPlugin | Credential management | floe.secrets | ADR-0023/0031 |
IdentityPlugin | User authentication (OIDC) | floe.identity | ADR-0024 |
DataQualityPlugin | Data quality validation frameworks | floe.quality | ADR-0044 |
RBACPlugin | Namespace and service-account isolation | floe.rbac | Epic 7B |
AlertChannelPlugin | Contract violation alert delivery | floe.alert_channels | Epic 15 |
NetworkSecurityPlugin | Network isolation and pod security policy | floe.network_security | Epic 7C |
Note: PolicyEnforcer and DataContract are now core modules in floe-core, not plugins.
See: interfaces/ for complete ABC definitions with method signatures
Example: Product Ingestion Boundary
Section titled “Example: Product Ingestion Boundary”Customer 360 now exercises the ingestion boundary without requiring data engineers to write Dagster or dlt code:
floe.yaml ingestion.sources -> CompiledArtifacts.plugins.ingestion.config.sources -> Dagster ingestion asset construction -> DltIngestionPlugin -> Iceberg raw tables via Polaris + MinIO/S3The supported alpha filesystem formats are CSV, JSONL, and Parquet. E2E coverage validates the Customer 360 CSV demo path separately from the CSV/JSONL/Parquet platform matrix so the demo remains simple while the platform still guards common landed-file ingestion issues.
Example: ComputePlugin
Section titled “Example: ComputePlugin”class ComputePlugin(ABC): def generate_dbt_profile(self, config: ComputeConfig) -> dict def get_required_dbt_packages(self) -> list[str] def validate_connection(self, config: ComputeConfig) -> ConnectionResult def get_resource_requirements(self, workload_size: str) -> ResourceSpecExample: TelemetryBackendPlugin and LineageBackendPlugin
Section titled “Example: TelemetryBackendPlugin and LineageBackendPlugin”Observability uses two independent plugins (ADR-0035):
class TelemetryBackendPlugin(ABC): """Configure OTLP backends for traces, metrics, logs.""" def get_otlp_exporter_config(self) -> dict[str, Any] def get_helm_values_override(self) -> dict[str, Any]
class LineageBackendPlugin(ABC): """Configure OpenLineage backends for data lineage.""" def get_transport_config(self) -> dict[str, Any] def get_namespace_mapping(self) -> dict[str, str]Example: StoragePlugin
Section titled “Example: StoragePlugin”Target semantic surface:
class StoragePlugin(ABC): def get_deployment_binding(self) -> StorageDeploymentBinding def get_pyiceberg_fileio(self) -> FileIOStorage plugins emit neutral, secret-free storage bindings. floe-core
validates compatibility; catalog, compute, orchestrator, and Helm renderers
translate the typed deployment bindings they own.
During migration, the live ABC may still require legacy helper methods for existing plugins. Those methods are compatibility surface, not the composition contract.
Repository Structure
Section titled “Repository Structure”floe/├── floe-core/ # Schemas, interfaces, enforcement engine├── floe-cli/ # CLI for Platform Team and Data Team├── floe-dbt/ # ENFORCED: dbt framework; runtime PLUGGABLE (ADR-0043)├── floe-iceberg/ # ENFORCED: Iceberg utilities│├── plugins/ # ALL PLUGGABLE COMPONENTS (see Plugin Catalog)│ ├── floe-compute-duckdb/│ ├── floe-compute-spark/│ ├── floe-compute-snowflake/│ ├── floe-orchestrator-dagster/│ ├── floe-orchestrator-airflow/│ ├── floe-catalog-polaris/│ ├── floe-catalog-glue/│ ├── floe-storage-minio/ # MinIO storage with S3-compatible protocol support│ ├── floe-observability-jaeger/│ ├── floe-observability-datadog/│ ├── floe-semantic-cube/│ ├── floe-ingestion-dlt/│ ├── floe-secrets-eso/│ ├── floe-secrets-infisical/│ └── floe-identity-keycloak/│├── charts/│ ├── floe-platform/ # Meta-chart for platform services│ └── floe-jobs/ # Base chart for pipeline jobs│└── docs/CLI Commands
Section titled “CLI Commands”This section shows the target command shape. In the current alpha, floe platform compile is implemented for platform manifests and make compile-demo is the supported Customer 360 artifact path. Root data-team lifecycle commands and floe platform test are planned/stubbed, not current alpha workflows.
Platform Team
Section titled “Platform Team”floe platform compile # Validate and build artifactsfloe platform test # Planned/stub: run policy testsfloe platform publish # Push to OCI registryfloe platform deploy # Deploy services to K8sfloe platform status # Check service healthData Team Target Lifecycle
Section titled “Data Team Target Lifecycle”floe init --platform=v1.2.3 # Planned: pull platform artifactsfloe compile # Planned: validate against platformfloe run # Planned: execute pipelinefloe test # Planned: run dbt testsConsolidation Achieved
Section titled “Consolidation Achieved”The documentation was consolidated to reduce complexity:
- 4 existing ADRs amended (vs creating separate overlapping ADRs)
- 4 new ADRs created (only for truly distinct decisions)
- Reduced from 10 planned ADRs to 8 actual ADRs
- Cross-references maintained between all documents
Data Mesh Resources
Section titled “Data Mesh Resources”For Data Mesh support, floe introduces additional resource types:
| Resource | Owner | Purpose |
|---|---|---|
EnterpriseManifest | Central Platform Team | Global governance, approved plugins |
DomainManifest | Domain Platform Team | Domain-specific choices |
DataProduct | Product Team | Input/output ports, SLAs |
DataContract | Auto-generated | Cross-domain data sharing contracts |
See ADR-0021: Data Architecture Patterns for full Data Mesh documentation.
Documents Updated for Data Mesh Support
Section titled “Documents Updated for Data Mesh Support”The following documents have been updated to support the Data Mesh architecture pattern:
Runtime Documentation
Section titled “Runtime Documentation”| Document | Changes |
|---|---|
04-building-blocks.md | Added Data Mesh schemas, CLI commands, three-tier config model |
05-runtime-view.md | Added Data Mesh workflows (product registration, contracts, lineage) |
06-deployment-view.md | Added Data Mesh topology, domain namespaces, multi-cluster patterns |
07-crosscutting.md | Updated configuration hierarchy for federated governance |
Architecture Documentation
Section titled “Architecture Documentation”| Document | Changes |
|---|---|
four-layer-overview.md | Added Data Mesh layer extension diagram |
platform-enforcement.md | Added three-tier enforcement model, data contracts |
ADR-0021 | Comprehensive Data Mesh architecture (already existed) |
Contracts
Section titled “Contracts”| Document | Changes |
|---|---|
compiled-artifacts.md | Added domain_context and data_product fields |
Next Steps (Implementation Phase)
Section titled “Next Steps (Implementation Phase)”- Implement floe-core schemas (Pydantic models)
- Implement plugin interfaces (ABCs)
- Create reference plugins and implementation primitives (DuckDB, Dagster, Polaris, Cube, dlt)
- Create Helm charts for platform deployment
- Implement CLI commands
- Create integration tests using K8s (ADR-0017)
- Implement Data Mesh resource types (EnterpriseManifest, DomainManifest, DataProduct, DataContract)
- Implement cross-domain data contract validation
- Implement federated lineage with domain-qualified namespaces
Related Documents
Section titled “Related Documents”- ADR Index - All Architecture Decision Records
- Architecture Index - All architecture documentation
- Guides - Implementation guides (Arc42)
- Contracts - Interface contracts