Skip to content

Observability Attributes Contract

Version: 0.1.0

This contract records the observability fields implemented for the alpha path. Floe emits OpenTelemetry traces, structured logs, Prometheus-compatible metrics, and OpenLineage events using secret-free Floe context. Backend selection stays in deployment bindings, the OpenTelemetry Collector configuration, and the lineage backend plugin; product and plugin code emit portable signals.

floe_core.telemetry.context.ObservabilityContext is the canonical runtime context for traces, logs, and metric labels. to_span_attributes() emits these fields when the value is known:

AttributeRequired whenDescriptionExample
floe.product.nameAlwaysData product namecustomer-360
floe.product.versionAlwaysData product version0.1.0
floe.environmentAlwaysRuntime environmentdemo
floe.namespaceAlwaysFloe/catalog namespacecustomer_360
floe.run.idRuntime run knownDagster/product run IDrun-abc123
floe.asset.keyAsset knownDagster asset keycustomer_360.mart_customer_360
floe.stageStage knownRuntime stagedbt
floe.table.nameTable knownLogical table namemart_customer_360
floe.plugin.typePlugin knownPlugin categoryorchestrator
floe.plugin.namePlugin knownPlugin implementationdagster
floe.lineage.namespaceLineage configuredOpenLineage namespacecustomer-360

The implemented context above is the source of truth for Customer 360 proof. Do not introduce older pipeline, mode, or Dagster-specific aliases in new alpha evidence.

Runtime envelopes add floe.status when final status is known. It is not part of the base context constructor; it is set by the asset/lifecycle wrapper that observes success or failure.

Structured logs must use ObservabilityContext.to_log_fields(), which mirrors the canonical span attributes. Runtime logs for the alpha path include:

  • floe_asset_started
  • floe_asset_completed
  • floe_asset_failed
  • dbt_node_observed
  • plugin_lifecycle.observed

Logs must remain secret-free. Do not emit raw credentials, tokens, passwords, connection strings with userinfo, private keys, or backend secrets. Secret-like extra attributes are dropped or redacted by the shared context helpers.

Dagster runtime asset envelopes emit the alpha product metrics:

OpenTelemetry instrumentPrometheus seriesTypeMeaning
floe.asset.materializationsfloe_asset_materializations_totalCounterAsset completed successfully
floe.asset.failuresfloe_asset_failures_totalCounterAsset failed

Allowed labels are intentionally bounded:

OTel labelPrometheus labelIncluded
floe.product.namefloe_product_nameAlways
floe.environmentfloe_environmentAlways
floe.namespacefloe_namespaceAlways
floe.stagefloe_stageWhen known
floe.plugin.typefloe_plugin_typeWhen known
floe.plugin.namefloe_plugin_nameWhen known
floe.statusfloe_statusSuccess/failure/error/skipped

floe.run.id, floe.asset.key, and floe.table.name are excluded from the canonical metric label set because they are high-cardinality runtime values. Backend-specific proof helpers may read an exported floe_asset_key label when present, but new instrumentation should not depend on per-run or per-table labels for aggregate dashboards.

Customer 360 metric proof queries the Prometheus names by product, status, and plugin, for example:

floe_asset_materializations_total{
floe_product_name="customer-360",
floe_status="success",
floe_plugin_name=~".+"
}

Customer 360 is the alpha proof fixture for the platform operability contract. Validators should emit deterministic evidence keys under these families:

Key familyPurposeExample evidence
run_control.*Orchestrator run identity, final state, and product/job contextrun_control.dagster.status=success
storage.*Iceberg table data, metadata, and object-store readabilitystorage.customer_360_outputs=true
business.*Product-level business assertions from the generated martbusiness.customer_count=42
observability.traces.*Trace backend reachability, freshness, product/run context, and span depthobservability.traces.count=5
observability.logs.*Log backend readiness, freshness, product/run context, and structured runtime eventsobservability.logs.status=pass
observability.metrics.*Prometheus-compatible metric reachability, freshness, and contract metric samplesobservability.metrics.count=3
observability.lineage.*OpenLineage/Marquez namespace, jobs, runs, datasets, facets, and graph evidenceobservability.lineage.status=pass
observability.grafana.*Grafana datasource and curated dashboard panel query truthfulnessobservability.grafana.datasource.status=pass

Existing evidence.* keys remain compatible during alpha so older validation outputs and release notes can still be compared. New validators should use the expanded key families above and classify failures with the classes below.

Failure classUse when
product_failureThe Customer 360 run, model execution, data output, or business assertion failed.
platform_service_failureA required platform service is deployed but unhealthy or returning service-level errors.
backend_unreachableA backend API, service URL, tunnel, port-forward, or collector/exporter path is unavailable.
no_fresh_evidenceThe backend is reachable but has no records for the expected product, run, table, or proof window.
wrong_contextEvidence exists but belongs to another product, run, namespace, table, service, or datasource.
stale_evidenceEvidence exists only outside the accepted freshness window.
dashboard_datasource_driftGrafana panel queries are valid in a backend but fail or return empty results through the configured datasource.
contract_gapThe current runtime or backend cannot produce a required alpha evidence family yet.

Customer 360 lineage proof requires two pieces of evidence:

  1. Product run evidence in the product lineage namespace/job.
  2. Model/table run evidence linked to the product/Dagster run through OpenLineage ParentRunFacet.

Do not treat a single table event on the product run as complete lineage proof. Model/table runs must carry a parent run reference to the product run ID, and the validator classifies evidence against the product, run ID, and target table context.

Plugin lifecycle instrumentation emits spans, logs, and bounded metrics for startup, shutdown, health checks, and similar lifecycle phases. Lifecycle fields are:

FieldDescription
floe.plugin.typePlugin category
floe.plugin.namePlugin implementation name
floe.plugin.versionPlugin package/API version
floe.plugin.floe_api_versionFloe plugin API compatibility version
floe.plugin.lifecycle.phaseLifecycle phase
floe.plugin.lifecycle.statusFinal lifecycle status
floe.error.typeSanitized error class when a failure occurs

Lifecycle metrics are floe.plugin.lifecycle.duration and floe.plugin.lifecycle.failures. Allowed labels are plugin type, plugin name, lifecycle phase, and lifecycle status.

Never emit:

  • raw passwords, tokens, credentials, private keys, or secret values;
  • connection URLs containing userinfo credentials;
  • full environment dumps;
  • unbounded user data, row payloads, or PII;
  • high-cardinality metric labels such as run ID, trace ID, span ID, asset key, table name, file path, object key, or raw exception message.

When more context is needed, attach sanitized span/log attributes and keep metric labels bounded.

Floe emits OpenTelemetry and OpenLineage signals. The alpha proof profile wires those signals through the OpenTelemetry Collector to trace, log, and metric backends, and through the lineage backend plugin model to Marquez.

Common contributor endpoints are:

SignalBackendDefault URL
LogsLoki-compatible APIhttp://localhost:3101
MetricsPrometheus-compatible APIhttp://localhost:9090
TracesJaeger-compatible APIhttp://localhost:16686
LineageMarquez OpenLineage backendhttp://localhost:5100

Production backend choices are platform-owned deployment decisions. Product code must not couple directly to Loki, Prometheus, Jaeger, Tempo, or Marquez implementation details.