Glossary
This glossary defines the core terminology used throughout floe documentation.
Documentation Keywords (RFC 2119)
Section titled “Documentation Keywords (RFC 2119)”This documentation uses RFC 2119 keywords to indicate requirement levels in normative contexts (ADRs, requirements documents):
| Keyword | Meaning | When to Use |
|---|---|---|
| MUST | Absolute requirement | Non-negotiable platform enforcement, security requirements |
| MUST NOT | Absolute prohibition | Forbidden patterns, security violations |
| SHOULD | Strong recommendation | Best practice, deviation requires justification |
| SHOULD NOT | Discouraged | Allowed but not recommended |
| MAY | Optional | Truly optional features, implementer’s choice |
Examples:
- “All plugins MUST implement PluginMetadata” (enforced at discovery)
- “Platform teams SHOULD use DuckDB for local development” (recommendation)
- “Plugins MAY include additional helper methods” (optional)
Note: Guides and non-normative documentation use lowercase “must/should/may” for readability.
Configuration Types
Section titled “Configuration Types”Manifest
Section titled “Manifest”A configuration file that defines platform-level settings. Manifests are versioned, immutable, and stored in an OCI registry.
apiVersion: floe.dev/v1kind: Manifestmetadata: name: acme-platform version: "1.0.0" scope: enterprise # or "domain"Scope:
enterprise- Root-level manifest with no parent (defines global policies)domain- Inherits from a parent manifest viaparent:reference
DataProduct
Section titled “DataProduct”The unit of deployment in floe. A DataProduct defines transforms, schedules, and optionally input/output ports.
apiVersion: floe.dev/v1kind: DataProductmetadata: name: customer-analytics version: "1.0"DataProducts can reference a platform: or domain: manifest. Current alpha guides use an explicit platform environment contract or demo fixture rather than asking Data Engineers to rely on implicit platform-wide defaults.
CompiledArtifacts
Section titled “CompiledArtifacts”The output of Floe’s compilation pipeline. Contains resolved configuration after inheritance, validation, and compilation. In the current alpha, Customer 360 artifacts are produced through make compile-demo; the root data-team floe compile command is planned and not yet implemented.
Deployment Modes
Section titled “Deployment Modes”Simple Mode
Section titled “Simple Mode”The minimal authoring mode. Uses a floe.yaml data product file and resolves it against a Platform Engineer-approved manifest, platform environment contract, or documented demo fixture.
Centralized Mode
Section titled “Centralized Mode”Platform Team defines a Manifest (scope: enterprise), Data Team references it in their DataProduct.
Data Mesh Mode
Section titled “Data Mesh Mode”Three-tier hierarchy:
- Enterprise
Manifest(global policies) - Domain
Manifest(domain-specific settings, inherits from enterprise) DataProduct(references domain)
Architecture Layers
Section titled “Architecture Layers”Layer 1: Foundation
Section titled “Layer 1: Foundation”Framework code distributed via PyPI and Helm. Includes floe-core, floe-dbt, floe-iceberg, and plugins.
Layer 2: Configuration
Section titled “Layer 2: Configuration”Immutable, versioned configuration stored in OCI registry. Defines plugins, governance, and data architecture.
Layer 3: Services
Section titled “Layer 3: Services”Long-lived platform services deployed as Kubernetes Deployments/StatefulSets. Includes orchestrator, catalog, semantic layer, and observability stack.
Layer 4: Data
Section titled “Layer 4: Data”Ephemeral pipeline jobs running as Kubernetes Jobs. Executes dbt transforms, ingestion, and quality checks.
Plugin Types
Section titled “Plugin Types”Compute Plugin
Section titled “Compute Plugin”Where dbt transforms execute. Floe’s alpha examples use DuckDB; Platform Engineers can approve other compute targets such as Spark, Snowflake, Databricks, or BigQuery as plugins mature.
Orchestrator Plugin
Section titled “Orchestrator Plugin”Pipeline scheduling and execution. Floe’s documented alpha path uses Dagster-centered runtime artifacts; other orchestrators remain pluggable once validated.
Catalog Plugin
Section titled “Catalog Plugin”Iceberg catalog for table metadata. Polaris is the reference implementation in the alpha architecture; other catalogs require Platform Engineer validation.
Semantic Layer Plugin
Section titled “Semantic Layer Plugin”Analytics/BI consumption layer. Cube is the reference semantic layer integration; other semantic layer choices require platform validation.
Ingestion Plugin
Section titled “Ingestion Plugin”Data loading (EL). The dlt plugin exists as an implementation primitive; production ingestion options are selected through platform-approved plugins.
Enforced Standards
Section titled “Enforced Standards”Apache Iceberg
Section titled “Apache Iceberg”Open table format. All data in floe is stored as Iceberg tables. Non-negotiable.
OpenTelemetry
Section titled “OpenTelemetry”Vendor-neutral observability standard. All traces and metrics use OTel. Non-negotiable.
OpenLineage
Section titled “OpenLineage”Data lineage standard. All lineage events use OpenLineage format. Non-negotiable.
Transformation layer. “dbt owns SQL” - all transforms are dbt models. Non-negotiable.
Governance
Section titled “Governance”Data Classification
Section titled “Data Classification”Metadata attached to columns via dbt meta tags. Levels: public, internal, confidential, pii, phi.
Quality Gates
Section titled “Quality Gates”Compile-time checks that enforce test coverage, required tests, and naming conventions.
Naming Conventions
Section titled “Naming Conventions”Layer-based prefixes enforced at compile time. Medallion pattern uses bronze_*, silver_*, gold_*.
Data Mesh Concepts
Section titled “Data Mesh Concepts”Output Port
Section titled “Output Port”A published interface from a DataProduct. Defines the table, SLA, and access controls.
Input Port
Section titled “Input Port”A dependency on another data source. Can reference ingestion sources or other DataProducts.
Data Contract
Section titled “Data Contract”Agreement between a provider DataProduct and consumer DataProduct. Auto-generated when an input port references another product’s output port.
Namespace
Section titled “Namespace”Hierarchical identifier for lineage and catalog organization. Format: {project} or {domain}.{product}.
OCI Registry
Section titled “OCI Registry”Platform Artifacts
Section titled “Platform Artifacts”Compiled manifests stored as OCI artifacts. Versioned, immutable, and optionally signed.
Artifact Reference
Section titled “Artifact Reference”OCI reference format: oci://registry.example.com/artifact-name:version
CLI Commands
Section titled “CLI Commands”floe compile (planned data-team command)
Section titled “floe compile (planned data-team command)”Planned data-team lifecycle command that will validate DataProduct definitions against inherited manifests and produce CompiledArtifacts. It is not a current alpha workflow.
floe run (planned data-team command)
Section titled “floe run (planned data-team command)”Planned data-team lifecycle command that will execute a pipeline using CompiledArtifacts. It is not a current alpha workflow.
floe platform compile
Section titled “floe platform compile”Validates a Manifest and prepares it for publishing.
floe platform publish
Section titled “floe platform publish”Pushes compiled manifest to OCI registry.
floe platform deploy
Section titled “floe platform deploy”Deploys platform services (Layer 3) to Kubernetes.
Core Concepts
Section titled “Core Concepts”Plugin
Section titled “Plugin”An extensible component that implements a plugin interface (ABC) and registers via Python entry points. The canonical implementation-truth list is Plugin Catalog.
Note: PolicyEnforcer and DataContract are now core modules in floe-core, not plugins.
Entry Point
Section titled “Entry Point”Python packaging mechanism for plugin discovery. Plugins register in pyproject.toml under [project.entry-points."floe.<type>"] groups.
Contract
Section titled “Contract”A versioned interface between packages. The primary contract is CompiledArtifacts (floe-core → floe-dagster). Uses semantic versioning (MAJOR.MINOR.PATCH).
Interface
Section titled “Interface”An Abstract Base Class (ABC) defining methods a plugin must implement. All plugins inherit from an interface (e.g., ComputePlugin, OrchestratorPlugin).
Teams & Roles
Section titled “Teams & Roles”Platform Team
Section titled “Platform Team”The team responsible for:
- Writing and versioning
manifest.yaml - Selecting plugins (compute, orchestrator, catalog, etc.)
- Deploying platform services (Layer 3)
- Defining governance policies
Data Team
Section titled “Data Team”The team responsible for:
- Writing
floe.yaml(data product definitions) - Implementing dbt models and transformations
- Scheduling pipelines
- Consuming platform services
Deployment Concepts
Section titled “Deployment Concepts”Service
Section titled “Service”A long-lived Kubernetes Deployment or StatefulSet (Layer 3). Examples: Dagster webserver, Polaris catalog, Cube API. Managed by Platform Team.
An ephemeral Kubernetes Job (Layer 4) that runs to completion. Examples: dbt run, data quality checks, dlt ingestion. Created by orchestrator.
Governance Model
Section titled “Governance Model”Enforcement
Section titled “Enforcement”Compile-time validation that blocks deployment of non-compliant configurations. Example: Missing required dbt tests → compilation fails.
Validation
Section titled “Validation”Runtime checks that may warn or fail execution. Example: Data contract schema mismatch → alert sent, execution continues (depending on config).
Compliance
Section titled “Compliance”Adherence to governance policies defined in manifest.yaml. Enforced at compile-time, monitored at runtime.
Namespace Isolation
Section titled “Namespace Isolation”Security boundary separating data products within the platform. Each namespace has independent credentials, resource quotas, and access controls. Implemented via Kubernetes namespaces and Polaris catalog namespaces.
Related Documents
Section titled “Related Documents”- CompiledArtifacts Contract - Schema definition
- Observability Attributes - Telemetry conventions
- Four-Layer Overview - Architecture