Skip to content

Four-Layer Architecture Overview

This document provides a comprehensive overview of floe’s four-layer architecture.

┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 4: DATA LAYER (Ephemeral Jobs) │
│ Owner: Data Engineers │
│ K8s Resources: Jobs (run-to-completion) │
│ Config: floe.yaml │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ • dbt run pods → Execute transformations │ │
│ │ • dlt ingestion jobs → Load external data │ │
│ │ • Quality check jobs → Validate data quality │ │
│ │ • Orchestrator workers → Scaled by orchestrator as needed │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ Lifecycle: Run-to-completion, stateless, per-execution pods │
└─────────────────────────────────────┬───────────────────────────────────────┘
│ Connects to (K8s Service Discovery)
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 3: SERVICES LAYER (Long-lived) │
│ Owner: Platform Engineers │
│ K8s Resources: Deployments, StatefulSets │
│ Deployment: `floe platform deploy` │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ORCHESTRATOR CATALOG SEMANTIC OBSERVABILITY│ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ ┌───────────┐ │ │
│ │ │ Dagster │ │ Polaris │ │ Cube │ │ OTLP │ │ │
│ │ │ Webserver │ │ Server │ │ Server │ │ Collector │ │ │
│ │ │ Daemon │ │ │ │ │ │ Prometheus│ │ │
│ │ │ PostgreSQL │ │ PostgreSQL │ │ Redis │ │ Grafana │ │ │
│ │ └─────────────┘ └─────────────┘ └──────────┘ └───────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Lifecycle: Always running, rolling updates, stateful (databases, caches) │
└─────────────────────────────────────┬───────────────────────────────────────┘
│ Configured by
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 2: CONFIGURATION LAYER (Enforcement) │
│ Owner: Platform Engineers │
│ Storage: OCI Registry (immutable, versioned) │
│ Config: manifest.yaml │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ • Plugin selection (compute, orchestrator, catalog, semantic) │ │
│ │ • Governance policies (classification, access control, retention) │ │
│ │ • Data architecture rules (medallion/kimball, naming conventions) │ │
│ │ • Quality gates (test coverage, required tests, block/warn/notify) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Lifecycle: Versioned artifacts, rarely changes, published to OCI registry │
└─────────────────────────────────────┬───────────────────────────────────────┘
│ Built on
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 1: FOUNDATION LAYER (Framework Code) │
│ Owner: floe Maintainers │
│ Distribution: PyPI, Helm registry │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ floe-core │ Schemas, interfaces, enforcement engine │ │
│ │ floe-cli │ CLI for Platform Team and Data Team │ │
│ │ floe-dbt │ dbt framework (ENFORCED); runtime PLUGGABLE │ │
│ │ floe-iceberg │ Iceberg utilities (ENFORCED) │ │
│ │ plugins/* │ Pluggable implementations │ │
│ │ charts/* │ Helm charts for deployment │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ENFORCED STANDARDS: Iceberg, OTel, OpenLineage, dbt framework, K8s-native │
└─────────────────────────────────────────────────────────────────────────────┘

The Foundation layer contains the core framework code that defines floe’s capabilities.

PackagePurposeDistribution
floe-coreSchemas, interfaces, enforcement enginePyPI
floe-cliCLI for both personasPyPI
floe-dbtdbt framework integration (enforced); runtime via DBTPlugin (pluggable)PyPI
floe-icebergIceberg utilities (enforced)PyPI
plugins/*Pluggable implementationsPyPI
charts/*Helm chartsHelm registry

Enforced Standards:

  • Apache Iceberg (table format)
  • OpenTelemetry (observability)
  • OpenLineage (data lineage)
  • dbt (transformation framework - SQL compilation ENFORCED; execution environment PLUGGABLE via ADR-0043)
  • Kubernetes-native (deployment)

The Configuration layer contains platform artifacts that enforce guardrails.

Artifact Storage: OCI Registry (immutable, versioned, signed)

Contents:

  • Plugin selection configuration
  • Governance policies (classification, access control)
  • Data architecture rules (naming conventions, layer constraints)
  • Quality gates (test coverage, required tests)

Workflow:

Terminal window
floe platform compile # Build artifacts
floe platform test # Planned/stub: run policy tests
floe platform publish # Push to OCI registry

The Services layer contains long-lived services that support data operations.

Service TypeExamplesK8s Resource
OrchestratorDagster webserver, daemonDeployment
CatalogPolaris serverDeployment
Semantic LayerCube serverDeployment
ObservabilityOTLP Collector, PrometheusDeployment
DatabasesPostgreSQLStatefulSet
CachesRedisStatefulSet
StorageMinIOStatefulSet

Deployment:

Terminal window
floe platform deploy # Deploy all services
floe platform status # Check health
floe platform logs # View logs

The Data layer contains ephemeral jobs that execute data operations.

Job TypeTriggerK8s Resource
dbt runSchedule/manualJob
dbt testPost-runJob
dlt ingestionScheduleJob
Quality checksPost-runJob

Execution:

Terminal window
floe init --platform=v1.2.3 # Planned: pull platform artifacts
floe compile # Planned: validate against platform
floe run # Planned: execute pipeline
AspectLayer 3 (Services)Layer 4 (Data)
K8s ResourceDeployment, StatefulSetJob
LifecycleLong-lived, upgradedRun-to-completion
StateStatefulStateless
ScalingFixed replicas or HPAPer-execution
OwnerPlatform TeamData Team (execution)
Deploymentfloe platform deployTriggered by orchestrator
LayerOwnerResponsibilities
Foundationfloe maintainersFramework code, releases
ConfigurationPlatform TeamPlugin selection, policies, architecture
ServicesPlatform TeamDeploy, upgrade, operate
DataData TeamPipeline code, transforms, schedules

The four-layer architecture extends to support Data Mesh deployments with federated domain ownership:

┌─────────────────────────────────────────────────────────────────────────────┐
│ Layer 4: DATA PRODUCTS (Ephemeral) │
│ • Data product pipelines run as K8s Jobs │
│ • Each product has defined input/output ports │
│ • Cross-domain dependencies tracked via data contracts │
└────────────────────────────────────┬────────────────────────────────────────┘
┌────────────────────────────────────┴────────────────────────────────────────┐
│ Layer 3: DOMAIN SERVICES (Long-lived, per-domain) │
│ • Orchestrator per domain (domain autonomy) │
│ • Domain-specific semantic layer │
│ • Connected to shared catalog and observability │
└────────────────────────────────────┬────────────────────────────────────────┘
┌────────────────────────────────────┴────────────────────────────────────────┐
│ Layer 2: FEDERATED CONFIGURATION │
│ • Enterprise manifest: Global governance, approved plugins │
│ • Domain manifest: Domain-specific choices, namespace │
│ • Inheritance: Enterprise → Domain → Product │
└────────────────────────────────────┬────────────────────────────────────────┘
┌────────────────────────────────────┴────────────────────────────────────────┐
│ Layer 1: FOUNDATION (Same as centralized) │
│ • Additional schemas: EnterpriseManifest, DomainManifest, DataProduct │
│ • Additional CLI: floe enterprise/domain/product commands │
└─────────────────────────────────────────────────────────────────────────────┘

See ADR-0021: Data Architecture Patterns for full Data Mesh documentation.