ADR-0026: Data Contract Architecture
Status
Section titled “Status”Accepted
Context
Section titled “Context”Data products in floe require formal contracts to define expectations between producers and consumers. Without standardized contracts, organizations face:
- Schema disagreements: No formal declaration of columns, types, and constraints
- SLA ambiguity: Freshness, availability, and quality expectations are undocumented
- Runtime blind spots: No monitoring when contracts are violated
- Governance gaps: Data classification, ownership, and terms are inconsistent
The architecture design review identified several gaps:
- Data-001: Data product creation flow undocumented (Critical)
- R-006: Data Mesh contract race conditions (High)
- R-009: Schema evolution undocumented (High)
Standards Evaluated
Section titled “Standards Evaluated”Multiple data contract standards were evaluated:
| Standard | Maturity | Tooling | Adoption | Governance |
|---|---|---|---|---|
| ODCS v3 | Production | datacontract-cli | High | Linux Foundation (Bitol) |
| dbt Contracts | Limited | dbt native | High (dbt users) | dbt Labs |
| Soda Contracts | Beta | Soda CLI | Medium | Soda |
| Custom YAML | N/A | Build ourselves | N/A | Internal |
ODCS v3 is the clear winner based on:
- Linux Foundation backing (vendor-neutral governance)
- Comprehensive schema, SLA, and quality definitions
- datacontract-cli provides parsing, validation, and drift detection
- Growing industry adoption as the standard format
Decision
Section titled “Decision”Enforce ODCS v3 (Open Data Contract Standard) as the data contract format. This is a core module, not a plugin.
Why Enforced (Not Pluggable)?
Section titled “Why Enforced (Not Pluggable)?”Data contracts follow the same pattern as other enforced standards:
| Enforced Standard | Rationale |
|---|---|
| Apache Iceberg (ADR-0005) | Table format cannot be swapped without fragmenting ecosystem |
| dbt (ADR-0009) | SQL transformation DSL cannot be swapped mid-project |
| OpenTelemetry (ADR-0006) | Observability format cannot vary between teams |
| ODCS v3 (this ADR) | Contract format cannot vary between data products |
Making contracts pluggable would cause:
- Interoperability failure: Different teams using different formats cannot share contracts
- Tooling fragmentation: Each format needs separate validation, drift detection, monitoring
- Governance chaos: No consistent way to enforce enterprise policies
- Unrealistic switching: Organizations don’t swap contract standards after choosing
Architecture
Section titled “Architecture”Core Modules (enforced, not pluggable):├── DataContract -> Uses ODCS v3 (this ADR)├── PolicyEnforcer -> Compile-time validation (ADR-0015)├── dbt -> SQL transformation (ADR-0009)├── Iceberg -> Table format (ADR-0005)└── OpenTelemetry -> Observability (ADR-0006)
Pluggable Components (platform team selects):├── Compute -> DuckDB, Snowflake, Spark, etc.├── Orchestrator -> Dagster, Airflow 3.x├── Catalog -> Polaris, Glue, Hive└── SemanticLayer -> Cube, dbt Semantic LayerIntegration Points
Section titled “Integration Points”1. Compile-Time (PolicyEnforcer)
Contracts are validated during the planned root floe compile flow:
- Schema completeness
- SLA definition validity
- Classification requirements (PII, PHI)
- Version compatibility with previous contract
2. Runtime (ContractMonitor)
Contracts are monitored during execution:
- Freshness SLA checks
- Schema drift detection
- Quality threshold validation
- Violations emitted as OpenLineage FAIL events
3. dbt Integration
ODCS contracts complement dbt contracts:
- dbt contracts: Column-level constraints enforced at build time
- ODCS contracts: SLA, quality, governance enforced at runtime
# datacontract.yaml (ODCS v3)apiVersion: v3.0.2kind: DataContractname: customersversion: 2.0.0
models: customers: elements: customer_id: type: string required: true primaryKey: true email: type: string classification: pii
slaProperties: freshness: value: "PT6H"4. Iceberg Integration
Contracts define expectations for Iceberg tables:
- Schema matches contract definition
- Freshness measured via table snapshots
- Quality checks run against table data
Contract Resolution Flow
Section titled “Contract Resolution Flow”1. Parse floe.yaml -> Extract output_ports, metadata
2. Check for datacontract.yaml -> If exists: Validate ODCS v3 format -> If absent: Generate from port definitions
3. Validate contract -> Schema completeness -> SLA validity -> Enterprise policy compliance
4. Store in CompiledArtifacts -> artifacts.data_contracts[]
5. At runtime: Register with ContractMonitor -> Continuous SLA monitoring -> Drift detectionEnforcement Levels
Section titled “Enforcement Levels”Platform teams configure enforcement in manifest.yaml:
data_contracts: enforcement: alert_only # off | warn | alert_only | block monitoring: enabled: true freshness_check_interval: 15m schema_drift_check_interval: 1h| Level | Compile-Time | Runtime |
|---|---|---|
off | No validation | No monitoring |
warn | Log warnings | Log violations |
alert_only | Fail on critical | Emit OpenLineage FAIL |
block | Fail on any error | Block processing |
Three-Tier Contract Inheritance
Section titled “Three-Tier Contract Inheritance”Contracts follow the Data Mesh inheritance model:
Enterprise Contracts (base policies) | vDomain Contracts (domain-specific additions) | vData Product Contracts (specific implementations)Inheritance Rules:
- Child contracts can only strengthen parent contracts
- Child contracts cannot weaken parent contracts
- Classifications inherit from parent if not specified
Consequences
Section titled “Consequences”Positive
Section titled “Positive”- Consistent format: All data products use ODCS v3
- Unified tooling: datacontract-cli for all validation and monitoring
- Clear governance: Linux Foundation-backed standard
- dbt compatibility: Complements existing dbt contracts
- Runtime visibility: SLA violations are observable
- Enterprise policies: Inheritance model enforces governance
Negative
Section titled “Negative”- No format choice: Teams cannot use alternative contract formats
- Learning curve: Teams must learn ODCS v3 syntax
- Migration effort: Existing contracts need conversion to ODCS
Neutral
Section titled “Neutral”- datacontract-cli dependency: Required for validation and drift detection
- Alert-only default: Processing continues on violations (configurable)