Skip to content

Data Product Lifecycle Guide

This guide describes the intended lifecycle of a data product in Floe. Some lifecycle commands shown here are planned user-facing commands, not alpha-supported commands.

Alpha-supported path: use the Customer 360 docs and generated demo artifacts for the current release. The root floe validate, floe compile, and floe run commands exist as data-team stubs or planned lifecycle entry points; floe init is not implemented as a current user command.

A data product moves through five phases:

┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ INIT │──►│ DEVELOP │──►│ VALIDATE │──►│ COMPILE │──►│ RUN │
│ │ │ │ │ │ │ │ │ │
│ manual setup│ │ Edit models │ │ validate │ │ compile │ │ run │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘

Current alpha: create a data product directory manually or start from demo/customer-360. A future init command is planned for simple, centralized, and Data Mesh modes, but it is not a current executable workflow.

my-product/
├── floe.yaml # Data product definition
├── datacontract.yaml # Optional: explicit data contract
├── models/
│ ├── bronze/
│ │ └── .gitkeep
│ ├── silver/
│ │ └── .gitkeep
│ └── gold/
│ └── .gitkeep
├── tests/
│ └── .gitkeep
└── README.md
apiVersion: floe.dev/v1
kind: DataProduct
metadata:
name: my-product
version: "1.0.0"
owner: data-team@example.com
domain: sales
repository: github.com/acme/my-product # Required for identity
platform:
ref: oci://registry.example.com/platform:v1.0.0
transforms:
- type: dbt
path: models/
schedule:
cron: "0 6 * * *"
timezone: UTC
# Data Mesh mode only:
output_ports:
- name: customers
table: gold.my_product.customers
sla:
freshness: "6h"
availability: "99.9%"
-- models/bronze/bronze_raw_customers.sql
{{ config(materialized='table') }}
SELECT
id,
email,
created_at,
_loaded_at
FROM {{ source('salesforce', 'accounts') }}
-- models/silver/silver_customers.sql
{{ config(
materialized='incremental',
unique_key='customer_id'
) }}
SELECT
id AS customer_id,
LOWER(TRIM(email)) AS email,
created_at,
CURRENT_TIMESTAMP AS updated_at
FROM {{ ref('bronze_raw_customers') }}
WHERE email IS NOT NULL
{% if is_incremental() %}
AND _loaded_at > (SELECT MAX(_loaded_at) FROM {{ this }})
{% endif %}
models/silver/schema.yml
version: 2
models:
- name: silver_customers
description: Cleaned and deduplicated customer data
columns:
- name: customer_id
description: Unique customer identifier
tests:
- not_null
- unique
- name: email
description: Customer email address
tests:
- not_null
meta:
classification: pii
datacontract.yaml
apiVersion: v3.0.2
kind: DataContract
name: my-product-customers
version: 1.0.0
owner: data-team@example.com
models:
customers:
elements:
customer_id:
type: string
primaryKey: true
email:
type: string
format: email
classification: pii
slaProperties:
freshness:
value: "PT6H"
element: updated_at
availability:
value: "99.9%"

Planned lifecycle behavior for the future data-team validation command:

[1/4] Validating floe.yaml
✓ Schema valid
✓ Platform reference resolved
[2/4] Validating dbt project
✓ Models compile
✓ Sources defined
✓ Tests discovered: 5
[3/4] Validating data contracts
✓ datacontract.yaml valid (ODCS v3)
✓ Contract matches output ports
[4/4] Checking platform compliance
✓ Naming conventions: bronze_, silver_, gold_
✓ Quality gates: test coverage 80% (required: 80%)
✓ Classification: PII fields marked
Validation PASSED

Current alpha: floe validate is a stub. Use make compile-demo and make demo-customer-360-validate for the supported Customer 360 path.

CheckDescription
Schemafloe.yaml matches Pydantic schema
PlatformManifest reference resolves
dbtModels compile without errors
NamingModels follow naming conventions
QualityTest coverage meets minimum
ClassificationPII fields properly marked
Contractdatacontract.yaml valid ODCS format

Planned lifecycle behavior for the future data-team compile command:

[1/7] Loading platform artifacts
✓ Platform: acme-platform v1.2.3
✓ Compute: duckdb (enforced)
[2/7] Validating product identity
Product ID: sales.my_product
Repository: github.com/acme/my-product
✓ Namespace available, registering...
✓ Product registered in catalog
[3/7] Resolving inheritance
✓ Enterprise → Domain → Product
[4/7] Compiling dbt project
✓ manifest.json generated
✓ 12 models compiled
[5/7] Processing data contracts
✓ Auto-generated contract from ports
✓ Merged with explicit datacontract.yaml
✓ Contract version: 1.0.0
✓ Contract registered: sales.my_product/customers:1.0.0
[6/7] Generating orchestration
✓ Dagster definitions created
[7/7] Writing artifacts
✓ .floe/artifacts.json
Compilation COMPLETE

Current alpha: floe compile is a stub. Customer 360 artifacts are generated through make compile-demo, which calls uv run floe platform compile with the demo spec and manifest.

During compilation, the product namespace is registered in the Iceberg catalog:

┌─────────────────────────────────────────────────────────────────────────────┐
│ PRODUCT IDENTITY REGISTRATION │
│ │
│ 1. Generate Product ID │
│ └── product_id = f"{domain}.{name}" → "sales.my_product" │
│ │
│ 2. Check Catalog │
│ └── catalog.validate_product_identity(product_id, repository) │
│ │
│ 3. Registration Decision │
│ ├── AVAILABLE → Register namespace with floe.product.* properties │
│ ├── VALID → Update product version metadata │
│ └── CONFLICT → FAIL: "Namespace owned by different repository" │
│ │
│ 4. Contract Registration │
│ └── catalog.register_contract(product_id, contract, version, hash) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘

Identity Conflict Error:

[planned compile]
[2/7] Validating product identity
✗ ERROR: Namespace 'sales.my_product' owned by different repository
Owner: github.com/acme/other-repo
Expected: github.com/acme/my-product
Resolution: Choose a different product name or contact
the namespace owner: other-team@acme.com
Compilation FAILED

The repository field in floe.yaml is used to verify ownership:

metadata:
name: my-product
domain: sales
repository: github.com/acme/my-product # Required for identity
.floe/artifacts.json
{
"version": "0.1.0",
"metadata": {
"compiled_at": "2026-01-03T10:00:00Z",
"product_name": "my-product",
"product_version": "1.0.0"
},
"mode": "centralized",
"plugins": {
"compute": { "type": "duckdb" },
"orchestrator": { "type": "dagster" }
},
"transforms": [...],
"data_contracts": [
{
"name": "my-product-customers",
"version": "1.0.0",
"models": [...],
"sla": {
"freshness_hours": 6.0,
"availability_percent": 99.9
}
}
]
}

Planned lifecycle behavior for the future data-team run command:

[1/4] Starting runtime
✓ ContractMonitor initialized
✓ Dagster code server started
[2/4] Executing transforms
▶ bronze_raw_customers (2m 15s)
├── Rows: 150,000
└── OpenLineage: COMPLETE
▶ silver_customers (4m 30s)
├── Rows: 148,500
└── OpenLineage: COMPLETE
[3/4] Running quality checks
✓ not_null: customer_id (pass)
✓ unique: customer_id (pass)
✓ not_null: email (pass)
[4/4] Validating contracts
✓ freshness: 0.1h (SLA: 6h)
✓ schema: no drift
✓ availability: 100%
Pipeline COMPLETE (6m 45s)

Current alpha: floe run is a stub. Run execution is validated through the Customer 360 platform/demo path documented in Customer 360 Golden Demo.

┌──────────────────────────────────────────────────────────────────────────────┐
│ RUNTIME EXECUTION │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Dagster │──────│ dbt run │──────│ Quality │ │
│ │ Scheduler │ │ (models) │ │ Tests │ │
│ └─────────────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌───────────────────────────────────────┐ │
│ │ OpenLineage Events │ │
│ │ │ │
│ │ START ─► RUNNING ─► COMPLETE/FAIL │ │
│ └───────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ContractMonitor (Continuous) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Freshness │ │ Schema │ │ Quality │ │ │
│ │ │ Check (15m) │ │ Drift (1h) │ │ Check (6h) │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │
│ │ │ │ │ │ │
│ │ └─────────────────┴─────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Violations → OpenLineage FAIL │ │ │
│ │ │ → Prometheus Metrics │ │ │
│ │ └─────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
EventTable Action
First product runIceberg tables created in catalog
Model changeTable updated (incremental or replace)
Schema changeTable altered or recreated
Delete modelTable retained (manual cleanup)
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ planned │──────│ Polaris │──────│ Object │
│ compile │ │ Catalog │ │ Storage │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
│ Register │ Vend │ Store
│ namespace │ credentials │ data
│ │ │
▼ ▼ ▼
sales.my_product Temp S3 creds s3://bucket/sales/my_product/
# datacontract.yaml - version bump required for breaking changes
version: 1.0.0 → 2.0.0 # If removing/changing columns
# Non-breaking (MINOR bump)
- Add optional column
- Relax nullability
# Breaking (MAJOR bump)
- Remove column
- Change type
- Add required column
$ planned compile
ERROR: Model silver_payments violates naming convention
$ # Fix naming
$ mv models/silver/stg_payments.sql models/silver/silver_payments.sql
$ planned compile
✓ Compilation COMPLETE
$ planned run
ERROR: Transform silver_customers failed
[View logs]
sqlalchemy.exc.OperationalError: connection refused
$ # Fix connection, retry
$ planned run --retry-failed
$ planned run
WARNING: Contract violation detected
Contract: my-product-customers
Type: freshness_violation
Message: Data is 8 hours old, SLA is 6 hours
Continuing (alert_only mode)
  1. Version early: Set metadata.version from the start
  2. Test everything: Aim for 100% test coverage on gold models
  3. Document contracts: Use descriptions in datacontract.yaml
  4. Start lenient: Use alert_only before block enforcement
  5. Monitor from day one: Set up dashboards early