Skip to content

Glossary

This glossary defines the core terminology used throughout floe documentation.

This documentation uses RFC 2119 keywords to indicate requirement levels in normative contexts (ADRs, requirements documents):

KeywordMeaningWhen to Use
MUSTAbsolute requirementNon-negotiable platform enforcement, security requirements
MUST NOTAbsolute prohibitionForbidden patterns, security violations
SHOULDStrong recommendationBest practice, deviation requires justification
SHOULD NOTDiscouragedAllowed but not recommended
MAYOptionalTruly optional features, implementer’s choice

Examples:

  • “All plugins MUST implement PluginMetadata” (enforced at discovery)
  • “Platform teams SHOULD use DuckDB for local development” (recommendation)
  • “Plugins MAY include additional helper methods” (optional)

Note: Guides and non-normative documentation use lowercase “must/should/may” for readability.

A configuration file that defines platform-level settings. Manifests are versioned, immutable, and stored in an OCI registry.

apiVersion: floe.dev/v1
kind: Manifest
metadata:
name: acme-platform
version: "1.0.0"
scope: enterprise # or "domain"

Scope:

  • enterprise - Root-level manifest with no parent (defines global policies)
  • domain - Inherits from a parent manifest via parent: reference

The unit of deployment in floe. A DataProduct defines transforms, schedules, and optionally input/output ports.

apiVersion: floe.dev/v1
kind: DataProduct
metadata:
name: customer-analytics
version: "1.0"

DataProducts can reference a platform: or domain: manifest. Current alpha guides use an explicit platform environment contract or demo fixture rather than asking Data Engineers to rely on implicit platform-wide defaults.

The output of Floe’s compilation pipeline. Contains resolved configuration after inheritance, validation, and compilation. In the current alpha, Customer 360 artifacts are produced through make compile-demo; the root data-team floe compile command is planned and not yet implemented.

The minimal authoring mode. Uses a floe.yaml data product file and resolves it against a Platform Engineer-approved manifest, platform environment contract, or documented demo fixture.

Platform Team defines a Manifest (scope: enterprise), Data Team references it in their DataProduct.

Three-tier hierarchy:

  1. Enterprise Manifest (global policies)
  2. Domain Manifest (domain-specific settings, inherits from enterprise)
  3. DataProduct (references domain)

Framework code distributed via PyPI and Helm. Includes floe-core, floe-dbt, floe-iceberg, and plugins.

Immutable, versioned configuration stored in OCI registry. Defines plugins, governance, and data architecture.

Long-lived platform services deployed as Kubernetes Deployments/StatefulSets. Includes orchestrator, catalog, semantic layer, and observability stack.

Ephemeral pipeline jobs running as Kubernetes Jobs. Executes dbt transforms, ingestion, and quality checks.

Where dbt transforms execute. Floe’s alpha examples use DuckDB; Platform Engineers can approve other compute targets such as Spark, Snowflake, Databricks, or BigQuery as plugins mature.

Pipeline scheduling and execution. Floe’s documented alpha path uses Dagster-centered runtime artifacts; other orchestrators remain pluggable once validated.

Iceberg catalog for table metadata. Polaris is the reference implementation in the alpha architecture; other catalogs require Platform Engineer validation.

Analytics/BI consumption layer. Cube is the reference semantic layer integration; other semantic layer choices require platform validation.

Data loading (EL). The dlt plugin exists as an implementation primitive; production ingestion options are selected through platform-approved plugins.

Open table format. All data in floe is stored as Iceberg tables. Non-negotiable.

Vendor-neutral observability standard. All traces and metrics use OTel. Non-negotiable.

Data lineage standard. All lineage events use OpenLineage format. Non-negotiable.

Transformation layer. “dbt owns SQL” - all transforms are dbt models. Non-negotiable.

Metadata attached to columns via dbt meta tags. Levels: public, internal, confidential, pii, phi.

Compile-time checks that enforce test coverage, required tests, and naming conventions.

Layer-based prefixes enforced at compile time. Medallion pattern uses bronze_*, silver_*, gold_*.

A published interface from a DataProduct. Defines the table, SLA, and access controls.

A dependency on another data source. Can reference ingestion sources or other DataProducts.

Agreement between a provider DataProduct and consumer DataProduct. Auto-generated when an input port references another product’s output port.

Hierarchical identifier for lineage and catalog organization. Format: {project} or {domain}.{product}.

Compiled manifests stored as OCI artifacts. Versioned, immutable, and optionally signed.

OCI reference format: oci://registry.example.com/artifact-name:version

Planned data-team lifecycle command that will validate DataProduct definitions against inherited manifests and produce CompiledArtifacts. It is not a current alpha workflow.

Planned data-team lifecycle command that will execute a pipeline using CompiledArtifacts. It is not a current alpha workflow.

Validates a Manifest and prepares it for publishing.

Pushes compiled manifest to OCI registry.

Deploys platform services (Layer 3) to Kubernetes.

An extensible component that implements a plugin interface (ABC) and registers via Python entry points. The canonical implementation-truth list is Plugin Catalog.

Note: PolicyEnforcer and DataContract are now core modules in floe-core, not plugins.

Python packaging mechanism for plugin discovery. Plugins register in pyproject.toml under [project.entry-points."floe.<type>"] groups.

A versioned interface between packages. The primary contract is CompiledArtifacts (floe-core → floe-dagster). Uses semantic versioning (MAJOR.MINOR.PATCH).

An Abstract Base Class (ABC) defining methods a plugin must implement. All plugins inherit from an interface (e.g., ComputePlugin, OrchestratorPlugin).

The team responsible for:

  • Writing and versioning manifest.yaml
  • Selecting plugins (compute, orchestrator, catalog, etc.)
  • Deploying platform services (Layer 3)
  • Defining governance policies

The team responsible for:

  • Writing floe.yaml (data product definitions)
  • Implementing dbt models and transformations
  • Scheduling pipelines
  • Consuming platform services

A long-lived Kubernetes Deployment or StatefulSet (Layer 3). Examples: Dagster webserver, Polaris catalog, Cube API. Managed by Platform Team.

An ephemeral Kubernetes Job (Layer 4) that runs to completion. Examples: dbt run, data quality checks, dlt ingestion. Created by orchestrator.

Compile-time validation that blocks deployment of non-compliant configurations. Example: Missing required dbt tests → compilation fails.

Runtime checks that may warn or fail execution. Example: Data contract schema mismatch → alert sent, execution continues (depending on config).

Adherence to governance policies defined in manifest.yaml. Enforced at compile-time, monitored at runtime.

Security boundary separating data products within the platform. Each namespace has independent credentials, resource quotas, and access controls. Implemented via Kubernetes namespaces and Polaris catalog namespaces.