ADR-0016: Platform Enforcement Architecture
Status
Section titled “Status”Accepted
Context
Section titled “Context”Traditional data platform configurations mix platform concerns (compute targets, governance, service selection) with pipeline concerns (transforms, schedules) in a single configuration file. This leads to:
- Data engineers making platform decisions they shouldn’t
- Inconsistent governance across teams
- Configuration drift between environments
- No clear ownership boundaries
We need an architecture that:
- Separates platform configuration from pipeline configuration
- Enforces platform guardrails at compile time
- Provides clear ownership boundaries between Platform Team and Data Team
- Prevents environment drift (same compute in dev/staging/prod)
Decision
Section titled “Decision”Adopt a platform enforcement architecture with:
-
Two-File Configuration Model
manifest.yaml- Platform Team defines guardrails (immutable)floe.yaml- Data Engineers define pipelines (inherits platform constraints)
-
Immutable Platform Artifacts
- Platform configuration is compiled to versioned OCI artifacts
- Data pipelines reference and inherit from these artifacts
- Non-compliant pipelines fail at compile time
-
Environment-Agnostic Compute
- Compute target is set ONCE at platform level
- Same compute across dev/staging/prod (no drift)
- DuckDB is a viable production choice
-
Four-Layer Architecture
- Layer 1: Foundation (framework code, open source)
- Layer 2: Configuration (platform enforcement, immutable)
- Layer 3: Services (long-lived, stateful)
- Layer 4: Data (ephemeral jobs)
Consequences
Section titled “Consequences”Positive
Section titled “Positive”- Clear separation of concerns - Platform Team owns infrastructure, Data Team owns transforms
- Compile-time enforcement - Non-compliant pipelines fail before runtime
- No environment drift - Same compute/policies across all environments
- Versioned platform - Platform changes are auditable via OCI registry
- Governance by default - Naming conventions, quality gates enforced automatically
Negative
Section titled “Negative”- Two workflows - Platform Team and Data Team have separate processes
- Upfront planning - Platform decisions must be made before data engineering starts
- Version coordination - Platform upgrades require data pipeline validation
Neutral
Section titled “Neutral”- Platform Team uses
floe platform compile/publish/deploy - Data Team uses planned root lifecycle commands such as
floe init,floe compile, andfloe runonce those workflows are productized - OCI registry becomes infrastructure requirement
Configuration Model
Section titled “Configuration Model”manifest.yaml (Platform Team)
Section titled “manifest.yaml (Platform Team)”apiVersion: floe.dev/v1kind: Manifestmetadata: name: acme-data-platform version: "1.2.3" scope: enterprise
plugins: compute: type: duckdb # Set ONCE, inherited by all pipelines orchestrator: type: dagster catalog: type: polaris semantic_layer: type: cube ingestion: type: dlt
data_architecture: pattern: medallion naming: enforcement: strict # off | warn | strict
governance: quality_gates: minimum_test_coverage: 80 block_on_failure: truefloe.yaml (Data Team)
Section titled “floe.yaml (Data Team)”apiVersion: floe.dev/v1kind: DataProductmetadata: name: customer-analytics version: "1.0"
platform: ref: oci://registry.acme.com/floe-platform:v1.2.3
# Data engineers ONLY define transforms and schedules# All platform concerns are inheritedtransforms: - type: dbt path: models/
schedule: cron: "0 6 * * *"Compile-Time Enforcement
Section titled “Compile-Time Enforcement”$ floe compile # planned root data-team command; not alpha-supported yet
[1/4] Loading platform artifacts from oci://registry.acme.com/floe-platform:v1.2.3 ✓ Platform version: 1.2.3 ✓ Compute: duckdb ✓ Architecture: medallion (strict enforcement)
[2/4] Validating transforms ✓ 12 dbt models found
[3/4] Enforcing naming conventions ✓ bronze_customers: valid ✗ ERROR: 'stg_payments' violates naming convention Expected: bronze_*, silver_*, or gold_* prefix
[4/4] Compilation FAILEDFour-Layer Architecture (Detailed)
Section titled “Four-Layer Architecture (Detailed)”The platform enforcement model defines four distinct layers with clear ownership and lifecycle:
┌─────────────────────────────────────────────────────────────────────────────┐│ LAYER 4: DATA (Ephemeral Jobs) ││ Owner: Data Engineers ││ K8s Resources: Jobs (run-to-completion) ││ Config: floe.yaml ││ ││ • dbt run pods ││ • Pipeline job executions ││ • Quality check jobs │└─────────────────────────────────────┬───────────────────────────────────────┘ │ Connects to ▼┌─────────────────────────────────────────────────────────────────────────────┐│ LAYER 3: SERVICES (Long-lived) ││ Owner: Platform Engineers ││ K8s Resources: Deployments, StatefulSets ││ Deployment: `floe platform deploy` ││ ││ • Orchestrator services (Dagster/Airflow webserver, daemon, PostgreSQL) ││ • Catalog services (Polaris server, PostgreSQL) ││ • Semantic layer (Cube server, Redis cache) ││ • Observability (OTLP Collector, Prometheus, Grafana) ││ • Object storage (MinIO or cloud S3/GCS/ADLS) │└─────────────────────────────────────┬───────────────────────────────────────┘ │ Configured by ▼┌─────────────────────────────────────────────────────────────────────────────┐│ LAYER 2: CONFIGURATION (Enforcement) ││ Owner: Platform Engineers ││ Storage: OCI Registry (immutable, versioned) ││ Config: manifest.yaml ││ ││ • Plugin selection (compute, orchestrator, catalog, semantic, ingestion) ││ • Governance policies (classification, access control, retention) ││ • Data architecture rules (naming conventions, layer constraints) ││ • Quality gates (test coverage, required tests, block/warn/notify) │└─────────────────────────────────────┬───────────────────────────────────────┘ │ Built on ▼┌─────────────────────────────────────────────────────────────────────────────┐│ LAYER 1: FOUNDATION (Framework Code) ││ Owner: floe Maintainers ││ Distribution: PyPI, Helm registry ││ ││ • floe-core: Schemas, interfaces, enforcement engine ││ • floe-dbt: dbt integration (enforced) ││ • floe-iceberg: Iceberg utilities (enforced) ││ • plugins/*: Pluggable implementations ││ • charts/*: Helm charts for deployment │└─────────────────────────────────────────────────────────────────────────────┘Layer Boundaries
Section titled “Layer Boundaries”| Aspect | Layer 3 (Services) | Layer 4 (Data) |
|---|---|---|
| K8s Resource | Deployment, StatefulSet | Job |
| Lifecycle | Long-lived, upgraded | Run-to-completion |
| State | Stateful (databases, caches) | Stateless |
| Scaling | Fixed replicas or HPA | Per-execution |
| Owner | Platform Team | Data Team (execution) |
| Deployment | floe platform deploy | Triggered by orchestrator |
| Upgrades | Rolling updates | New job pods per run |
Platform Artifacts: OCI Registry Storage
Section titled “Platform Artifacts: OCI Registry Storage”Platform artifacts are stored in OCI-compliant registries. This enables:
- Immutability: Once published, artifacts cannot be modified
- Versioning: Semantic versioning (v1.2.3) with tags
- Signing: Content signing via cosign for supply chain security
- Enterprise-ready: All cloud providers offer OCI registries
Artifact Structure
Section titled “Artifact Structure”oci://registry.example.com/floe-platform:v1.2.3├── manifest.json # Platform metadata├── policies/ # Compiled governance policies│ ├── classification.json # Data classification rules│ ├── access-control.json # RBAC definitions│ └── quality-gates.json # Quality requirements├── catalog/ # Catalog configuration│ ├── namespaces.json # Approved namespaces│ └── schema-registry.json # Schema constraints└── architecture/ # Data architecture rules ├── naming-rules.json # Naming conventions └── layer-constraints.json # Medallion/Kimball rulesPlatform Team Workflow
Section titled “Platform Team Workflow”# 1. Edit platform configurationvim manifest.yaml
# 2. Validate and build artifactsfloe platform compile
# 3. Run policy testsfloe platform test
# 4. Version controlgit commit -m "Update platform v1.2.3" && git push
# 5. Publish to OCI registryfloe platform publish v1.2.3# Output: oci://registry.example.com/floe-platform:v1.2.3
# 6. Deploy long-lived services to K8sfloe platform deployData Team Workflow
Section titled “Data Team Workflow”# 1. Pull platform artifactsfloe init --platform=v1.2.3 # planned root data-team command# Pulls from oci://registry.example.com/floe-platform:v1.2.3
# 2. Edit pipeline configurationvim floe.yaml
# 3. Validate against platform constraintsfloe compile # planned root data-team command# Validates naming, quality gates, etc.
# 4. Execute pipelinefloe run # planned root data-team commandWhy OCI Registry?
Section titled “Why OCI Registry?”| Consideration | OCI Registry | Alternatives |
|---|---|---|
| K8s-native | ✅ ORAS, Helm 3.8+ standard | S3 requires custom tooling |
| Versioning | ✅ Built-in tags + digests | Manual version management |
| Signing | ✅ cosign integration | Varies by provider |
| Enterprise | ✅ ECR, ACR, GCR, Harbor | Additional infra needed |
| Caching | ✅ CDN-backed by registries | Custom CDN setup |
Data Mesh Extension
Section titled “Data Mesh Extension”For organizations adopting Data Mesh, the two-file model extends to a three-tier hierarchy:
Enterprise Platform (enterprise-manifest.yaml) │ Global governance, approved plugins ▼Domain Platform (domain-manifest.yaml) │ Domain-specific choices, domain namespace ▼Data Products (floe.yaml) │ Input/output ports, SLAs, contractsEach tier inherits from its parent and can add domain-specific policies. See ADR-0021: Data Architecture Patterns for full Data Mesh documentation.
References
Section titled “References”- Four-Layer Overview - Four-layer architecture details
- Opinionation Boundaries - Opinionation boundaries
- ADR-0008 - Repository structure
- ORAS (OCI Registry As Storage)
- Helm OCI Support