floe.yaml Schema Reference
This document provides the complete schema reference for floe.yaml configuration files.
Overview
Section titled “Overview”floe.yaml is the configuration file for floe data products. It defines:
- Platform reference (enforced configuration)
- Transforms (dbt models)
- Ingestion sources
- Schedules
- Environment overrides
Minimal Example
Section titled “Minimal Example”apiVersion: floe.dev/v1kind: DataProductmetadata: name: customer-analytics version: "1.0.0" domain: sales
platform: ref: oci://registry.example.com/platform:v1.0.0
transforms: - type: dbt path: models/Schema Structure
Section titled “Schema Structure”FloeSpec├── apiVersion: string (required)├── kind: string (required)├── metadata: MetadataSpec (required)├── platform: PlatformRef (required)├── transforms: TransformSpec[] (required)│ ├── type: string (required)│ ├── path: string (required)│ ├── compute: string (optional) ← Select from platform's approved list│ └── profiles_dir: string (optional)├── ingestion: IngestionSpec[] (optional)├── schedule: ScheduleSpec (optional)├── environments: EnvironmentOverride[] (optional)└── quality: QualitySpec (optional)Root Fields
Section titled “Root Fields”apiVersion
Section titled “apiVersion”Type: string
Required: Yes
Pattern: floe.dev/v[0-9]+
The API version for the floe.yaml schema.
apiVersion: floe.dev/v1Type: string
Required: Yes
Enum: DataProduct
The resource kind. Currently only DataProduct is supported.
kind: DataProductMetadataSpec
Section titled “MetadataSpec”metadata.name
Section titled “metadata.name”Type: string
Required: Yes
Pattern: ^[a-z][a-z0-9-]*$
Max Length: 63
The unique name of the data product within its domain.
metadata: name: customer-360metadata.version
Section titled “metadata.version”Type: string
Required: Yes
Pattern: ^[0-9]+\.[0-9]+\.[0-9]+$
Semantic version of the data product.
metadata: version: "1.2.3"metadata.domain
Section titled “metadata.domain”Type: string
Required: Yes
Pattern: ^[a-z][a-z0-9-]*$
The domain that owns this data product. Used for namespace prefixing.
metadata: domain: salesmetadata.description
Section titled “metadata.description”Type: string
Required: No
Max Length: 1000
Human-readable description of the data product.
metadata: description: "Unified customer view across all touchpoints"metadata.owner
Section titled “metadata.owner”Type: string
Required: No
Format: Email
Team or person responsible for this data product.
metadata: owner: sales-analytics@acme.commetadata.labels
Section titled “metadata.labels”Type: map[string]string
Required: No
Key-value labels for organization and filtering.
metadata: labels: team: analytics cost-center: sales environment: productionPlatformRef
Section titled “PlatformRef”platform.ref
Section titled “platform.ref”Type: string
Required: Yes
Format: OCI URI
Reference to the platform manifest OCI artifact.
platform: ref: oci://ghcr.io/acme/platform:v1.0.0platform.cache
Section titled “platform.cache”Type: boolean
Required: No
Default: true
Whether to cache the platform artifact locally.
platform: ref: oci://ghcr.io/acme/platform:v1.0.0 cache: trueTransformSpec
Section titled “TransformSpec”transforms[].type
Section titled “transforms[].type”Type: string
Required: Yes
Enum: dbt
The transform type. Currently only dbt is supported.
transforms: - type: dbttransforms[].path
Section titled “transforms[].path”Type: string
Required: Yes
Path to the transform source files, relative to floe.yaml.
transforms: - type: dbt path: models/transforms[].profiles_dir
Section titled “transforms[].profiles_dir”Type: string
Required: No
Default: .floe/profiles
Path to generated dbt profiles directory.
transforms: - type: dbt path: models/ profiles_dir: .dbt/transforms[].compute
Section titled “transforms[].compute”Type: string
Required: No
Default: Platform’s default compute
Select the compute engine for this transform from the platform’s approved list. This enables multi-compute pipelines where different steps can use different compute engines.
Validation: Must be a compute name from manifest.yaml plugins.compute.approved[].
# manifest.yaml (Platform Team)plugins: compute: approved: - name: duckdb config: { threads: 8 } - name: spark config: { cluster: "spark-thrift.svc" } default: duckdb
# floe.yaml (Data Engineers)transforms: # Heavy processing on Spark cluster - type: dbt path: models/staging/ compute: spark # Select from approved list
# Analytical metrics on DuckDB - type: dbt path: models/marts/ compute: duckdb
# Simple transforms use default - type: dbt path: models/seeds/ # compute: (uses platform default → duckdb)Environment Parity: Each transform uses the SAME compute across all environments (dev/staging/prod). This is NOT for per-environment compute selection (which would cause environment drift).
Step 1: dev=Spark, staging=Spark, prod=Spark ✓ No driftStep 2: dev=DuckDB, staging=DuckDB, prod=DuckDB ✓ No driftIngestionSpec
Section titled “IngestionSpec”ingestion[].name
Section titled “ingestion[].name”Type: string
Required: Yes
Pattern: ^[a-z][a-z0-9_]*$
Unique name for the ingestion pipeline.
ingestion: - name: github_eventsingestion[].type
Section titled “ingestion[].type”Type: string
Required: Yes
Enum: dlt, airbyte
The ingestion plugin type.
ingestion: - name: github_events type: dltingestion[].destination
Section titled “ingestion[].destination”Type: string
Required: Yes
Format: {namespace}.{table}
Target Iceberg table for ingested data.
ingestion: - name: github_events type: dlt destination: bronze.github_eventsingestion[].dlt
Section titled “ingestion[].dlt”Type: DltConfig
Required: When type: dlt
Configuration specific to dlt ingestion.
ingestion: - name: github_events type: dlt destination: bronze.github_events dlt: source: dlt.sources.github.github_reactions resource: issues write_disposition: merge incremental: cursor_column: updated_atdlt.source
Section titled “dlt.source”Type: string
Required: Yes
Python import path to the dlt source.
dlt.resource
Section titled “dlt.resource”Type: string
Required: No
Specific resource within the source.
dlt.write_disposition
Section titled “dlt.write_disposition”Type: string
Enum: append, replace, merge
Default: append
How to write data to the destination.
dlt.incremental
Section titled “dlt.incremental”Type: IncrementalConfig
Required: No
Configuration for incremental loading.
ingestion[].airbyte
Section titled “ingestion[].airbyte”Type: AirbyteConfig
Required: When type: airbyte
Configuration for external Airbyte connections.
ingestion: - name: salesforce_sync type: airbyte destination: bronze.salesforce airbyte: connection_id: "abc123-def456"ingestion[].secret_refs
Section titled “ingestion[].secret_refs”Type: map[string]string
Required: No
References to Kubernetes secrets for credentials.
ingestion: - name: github_events type: dlt secret_refs: github_token: github-api-tokenScheduleSpec
Section titled “ScheduleSpec”schedule.cron
Section titled “schedule.cron”Type: string
Required: No
Format: Cron expression
Cron schedule for running the pipeline.
schedule: cron: "0 */6 * * *" # Every 6 hoursschedule.timezone
Section titled “schedule.timezone”Type: string
Required: No
Default: UTC
Timezone for the schedule.
schedule: cron: "0 6 * * *" timezone: America/New_Yorkschedule.enabled
Section titled “schedule.enabled”Type: boolean
Required: No
Default: true
Whether the schedule is active.
schedule: cron: "0 6 * * *" enabled: false # Disable schedulingEnvironmentOverride
Section titled “EnvironmentOverride”environments[].name
Section titled “environments[].name”Type: string
Required: Yes
Enum: development, staging, production
Environment name to override.
environments: - name: developmentenvironments[].transforms
Section titled “environments[].transforms”Type: TransformOverride
Required: No
Transform-specific overrides for this environment. Note: Per-environment compute selection is NOT allowed (would cause environment drift). Use transforms[].compute instead for per-transform compute selection.
environments: - name: development transforms: # Per-environment overrides (e.g., reduced parallelism) threads: 4
# ❌ FORBIDDEN: Per-environment compute (causes drift)# environments:# - name: development# transforms:# compute: duckdb # Different compute per env = drift# - name: production# transforms:# compute: snowflake # "Works in dev, fails in prod"environments[].schedule
Section titled “environments[].schedule”Type: ScheduleOverride
Required: No
Schedule overrides for this environment.
environments: - name: development schedule: enabled: false # No scheduling in devQualitySpec
Section titled “QualitySpec”quality.minimum_coverage
Section titled “quality.minimum_coverage”Type: integer
Required: No
Default: From platform manifest
Range: 0-100
Minimum test coverage percentage.
quality: minimum_coverage: 80quality.required_tests
Section titled “quality.required_tests”Type: string[]
Required: No
Default: From platform manifest
Tests required for all models.
quality: required_tests: - not_null - uniqueComplete Example
Section titled “Complete Example”apiVersion: floe.dev/v1kind: DataProductmetadata: name: customer-360 version: "3.2.1" domain: sales description: "Unified customer view across all touchpoints" owner: sales-analytics@acme.com labels: team: analytics cost-center: sales
platform: ref: oci://ghcr.io/acme/platform:v1.0.0
transforms: # Heavy processing on Spark (large datasets) - type: dbt path: models/staging/ compute: spark # Select from platform's approved list
# Analytical metrics on DuckDB (smaller result set) - type: dbt path: models/marts/ compute: duckdb
# Seeds use platform default (no compute specified) - type: dbt path: models/seeds/
ingestion: - name: salesforce_accounts type: dlt destination: bronze.salesforce_accounts dlt: source: dlt.sources.salesforce.salesforce_source resource: accounts write_disposition: merge incremental: cursor_column: last_modified_date secret_refs: salesforce_token: salesforce-api-token
- name: zendesk_tickets type: dlt destination: bronze.zendesk_tickets dlt: source: dlt.sources.zendesk.zendesk_support resource: tickets write_disposition: append
schedule: cron: "0 */6 * * *" timezone: UTC
environments: - name: development schedule: enabled: false
- name: production quality: minimum_coverage: 100
quality: minimum_coverage: 80 required_tests: - not_null - uniqueJSON Schema
Section titled “JSON Schema”The complete JSON Schema for floe.yaml is generated from Pydantic models. The public CLI does not currently expose a schema export command; use Python from the repository when you need to inspect the current schema during alpha:
import json
from floe_core.schemas.floe_spec import FloeSpec
print(json.dumps(FloeSpec.model_json_schema(), indent=2))Alpha status: the root floe validate command exists as a data-team stub and is not yet the supported schema-validation path for users. For the current alpha, inspect the checked-in Customer 360 floe.yaml and run the demo artifact validation path documented in Build Your First Data Product.
JSON Schema Location
Section titled “JSON Schema Location”packages/floe-core/src/floe_core/schemas/├── floe_spec.py # Pydantic models├── floe_yaml_schema.json # Generated JSON Schema└── __init__.pyIDE Integration
Section titled “IDE Integration”Configure your IDE to use the JSON Schema for validation:
VS Code (settings.json):
{ "yaml.schemas": { "https://floe.dev/schemas/floe-yaml-v1.json": ["floe.yaml", "floe.yml"] }}JetBrains IDEs:
Settings > Languages & Frameworks > Schemas and DTDs > JSON Schema MappingsAdd: https://floe.dev/schemas/floe-yaml-v1.json → floe.yamlValidation Rules
Section titled “Validation Rules”Beyond schema validation, the following rules are enforced at compile time:
| Rule | Description | Error |
|---|---|---|
domain_namespace_match | Domain must match catalog namespace | DomainMismatchError |
version_semver | Version must be valid semver | InvalidVersionError |
transform_path_exists | Transform path must exist | PathNotFoundError |
platform_ref_resolvable | Platform OCI ref must be pullable | PlatformNotFoundError |
secret_refs_exist | Secret refs must exist in cluster | SecretNotFoundError |
naming_convention | Model names must match platform pattern | NamingViolationError |
compute_in_approved_list | Transform compute must be in platform’s approved list | InvalidComputeError |
Defaults
Section titled “Defaults”| Field | Default Value | Source |
|---|---|---|
platform.cache | true | Built-in |
transforms[].profiles_dir | .floe/profiles | Built-in |
transforms[].compute | plugins.compute.default | Platform manifest |
schedule.timezone | UTC | Built-in |
schedule.enabled | true | Built-in |
quality.* | Platform manifest | Inherited |