Skip to content

Customer 360 Validation

Platform Engineers and Data Engineers use this page to validate an already deployed Floe platform and Customer 360 data product. Start from the service URLs, run evidence, and platform access method provided for your environment, then run the checks described below.

Floe Contributors use make demo only in the remote release-validation workflow. In that contributor lane, run the Customer 360 evidence gate after make demo and after make demo-customer-360-run has completed:

Terminal window
make demo-customer-360-validate

The command loads its default evidence plan from demo/customer-360/validation.yaml. The manifest defines service URLs, expected platform pods, and argv-list commands for Dagster, storage, Marquez, Jaeger, and business metric checks.

The current alpha business/query proof is command-based against the generated Iceberg mart. Cube is charted but disabled by default and is not part of the Customer 360 alpha gate unless your platform enables it.

Use FLOE_DEMO_VALIDATION_MANIFEST=/path/to/validation.yaml for a different platform shape. Individual command overrides are also available, for example FLOE_DEMO_LINEAGE_CHECK_COMMAND, FLOE_DEMO_STORAGE_CHECK_COMMAND, FLOE_DEMO_CUSTOMER_COUNT_COMMAND, and FLOE_DEMO_LIFETIME_VALUE_COMMAND.

The default manifest uses the floe-dev namespace. DevPod/Flux release validation may deploy the same platform shape to floe-test; in that case run the validator with:

Terminal window
FLOE_DEMO_NAMESPACE=floe-test make demo-customer-360-validate

FLOE_DEMO_NAMESPACE changes the platform readiness namespace. Commands that embed a namespace in their argv list, such as the storage and business metric kubectl exec checks in the default manifest, must also be overridden for the live namespace:

Terminal window
export FLOE_DEMO_NAMESPACE=floe-test
export FLOE_DEMO_STORAGE_CHECK_COMMAND='kubectl exec -n floe-test deployment/floe-platform-dagster-webserver -- python -m floe_orchestrator_dagster.validation.iceberg_outputs --artifacts-path /app/demo/customer_360/compiled_artifacts.json --expected-table mart_customer_360 --recovery-mode repair'
export FLOE_DEMO_CUSTOMER_COUNT_COMMAND='kubectl exec -n floe-test deployment/floe-platform-dagster-webserver -- python /app/demo/customer_360/scripts/customer360_metric.py --source iceberg --artifacts-path /app/demo/customer_360/compiled_artifacts.json customer-count'
export FLOE_DEMO_LIFETIME_VALUE_COMMAND='kubectl exec -n floe-test deployment/floe-platform-dagster-webserver -- python /app/demo/customer_360/scripts/customer360_metric.py --source iceberg --artifacts-path /app/demo/customer_360/compiled_artifacts.json total-lifetime-value'
make demo-customer-360-validate

Command override values are shell command strings parsed into argv by the validator. They are not JSON or YAML argv arrays.

Current validator output keys, including alpha compatibility keys:

  • platform.ready
  • dagster.customer_360_run
  • run_control.namespace
  • run_control.runtime_context
  • run_control.dagster.status
  • run_control.dagster.job_name
  • run_control.dagster.api_reachable
  • storage.customer_360_outputs
  • storage.iceberg.customer_360_outputs
  • observability.logs.status
  • observability.logs.count
  • observability.metrics.status
  • observability.metrics.count
  • observability.traces.status
  • observability.traces.count
  • observability.lineage.status
  • observability.lineage.count
  • observability.lineage.product_run_count
  • observability.lineage.model_table_count
  • observability.lineage.dataset_count
  • observability.lineage.lineage_graph_depth
  • observability.lineage.lineage_graph_requested_depth
  • observability.lineage.lineage_graph_count
  • observability.run_id
  • lineage.marquez_customer_360
  • tracing.jaeger_customer_360
  • business.customer_count
  • business.total_lifetime_value

New validator work should prefer the evidence key families defined in Observability Attributes Contract.

Expected successful runner evidence:

status=PASS
dagster.run_id=<run-id>
dagster.job_name=customer_360

Expected successful validation evidence:

status=PASS
evidence.business.customer_count=<non-negative integer>
evidence.business.total_lifetime_value=<non-negative decimal>
evidence.dagster.customer_360_run=true
evidence.lineage.marquez_customer_360=true
evidence.run_control.dagster.api_reachable=true
evidence.run_control.dagster.job_name=customer_360
evidence.run_control.dagster.status=pass
evidence.run_control.namespace=floe-dev
evidence.run_control.runtime_context=local
evidence.observability.lineage.status=pass
evidence.observability.lineage.count=<positive integer>
evidence.observability.lineage.product_run_count=<positive integer>
evidence.observability.lineage.model_table_count=<positive integer>
evidence.observability.lineage.dataset_count=<positive integer>
evidence.observability.lineage.lineage_graph_depth=<connected hop count, at least 2>
evidence.observability.lineage.lineage_graph_requested_depth=3
evidence.observability.lineage.lineage_graph_count=<positive integer>
evidence.observability.logs.status=pass
evidence.observability.logs.count=<positive integer>
evidence.observability.metrics.status=pass
evidence.observability.metrics.count=<positive integer>
evidence.observability.run_id=<same run id>
evidence.observability.traces.status=pass
evidence.observability.traces.count=<positive integer>
evidence.platform.ready=true
evidence.storage.customer_360_outputs=true
evidence.storage.iceberg.customer_360_outputs=true
evidence.tracing.jaeger_customer_360=true

The evidence maps to the release surfaces as follows:

  • Business evidence comes from querying the generated Customer 360 mart metrics.
  • Dagster evidence proves the configured customer-360 run completed.
  • Log evidence proves the log backend has structured records for the product and run ID.
  • Metric evidence proves Prometheus-compatible series exist for the product, status, and plugin.
  • Lineage evidence proves Marquez has namespace-scoped product run evidence, model/table run evidence linked to that run, materialized dataset evidence, and lineage graph depth for the Customer 360 table.
  • Storage evidence proves the expected Iceberg output table is readable.
  • Tracing evidence proves Jaeger contains Customer 360 run traces by service, product, and run ID.
ServiceAlpha classificationCheckPass criteria
DagsterUI and APIOpen run history or query GraphQL runsLatest Customer 360 run succeeded and carries the expected product/run context
MinIOUI and APIOpen object browser or query the configured object storeCustomer 360 output data and Iceberg metadata objects are visible
MarquezAPI/admin only in the current Floe chartQuery namespace, jobs, runs, datasets, and lineage API endpointsProduct run evidence exists, model/table runs exist, and ParentRunFacet linkage points at the product/Dagster run
LokiAPI-onlyQuery /ready and /loki/api/v1/query_range by product and run IDLogs include customer-360 and the current dagster.run_id
PrometheusAPI and optional UIQuery floe_asset_materializations_total by floe_product_name, floe_status, and floe_plugin_nameFresh samples exist for customer-360 with floe_status="success"
GrafanaOptional UIInspect only if the platform provisions curated dashboards backed by the active datasourcePanels shown in the alpha demo use validated Loki or Prometheus queries and are not empty because of datasource drift
JaegerUI and APISearch service customer-360 with tags floe.product.name and floe.run.id, then inspect model/table spansTrace exists for the current run and includes runtime/plugin spans plus floe.table.name or dbt model evidence for mart_customer_360
PolarisAPI, UI when provisioned by the selected catalog profileQuery the catalog for Customer 360 tablesCustomer 360 tables are registered
CubeNot currently part of the default alpha proofValidate only when the semantic layer is enabled for the platformSemantic queries prove access to Customer 360 metrics, not only process health

Root / returning 404 is not a failure for API-only alpha surfaces when the documented health and query endpoints pass. This is expected for the current Floe chart Marquez deployment and for Loki. The contributor make demo lane exposes Loki and Prometheus direct API endpoints by default; Grafana is a curated presentation surface only when the platform provisions validated dashboards.

Useful manual queries:

{service_name=~".+"} |= "customer-360" |= "<dagster.run_id>"

Loki API examples:

Terminal window
curl -fsS http://localhost:3101/ready
curl -fsS 'http://localhost:3101/loki/api/v1/query_range' \
--get \
--data-urlencode 'query={service_name=~".+"} |= "customer-360" |= "<dagster.run_id>"' \
--data-urlencode 'limit=20' | jq .
floe_asset_materializations_total{
floe_product_name="customer-360",
floe_status="success",
floe_plugin_name=~".+"
}

Jaeger API query shape:

service=customer-360
tags={"floe.product.name":"customer-360","floe.run.id":"<dagster.run_id>"}

After finding the run trace, inspect spans for floe.table.name=mart_customer_360 or equivalent dbt model span evidence for mart_customer_360.

Marquez evidence must include both the product job run, usually namespace=customer-360 job=customer-360, and model/table run records for mart_customer_360 whose ParentRunFacet points at the same Dagster run ID. The validator also queries namespace, dataset, and lineage graph APIs so root or UI availability is never treated as lineage proof.

Marquez API examples:

Terminal window
curl -fsS http://localhost:5100/api/v1/namespaces/customer-360 | jq .
curl -fsS http://localhost:5100/api/v1/namespaces/customer-360/jobs | jq .
curl -fsS http://localhost:5100/api/v1/namespaces/customer-360/jobs/customer-360/runs | jq .
curl -fsS http://localhost:5100/api/v1/namespaces/customer-360/datasets | jq .
curl -fsS 'http://localhost:5100/api/v1/lineage?nodeId=dataset:customer-360:customer_360.main.mart_customer_360&depth=3' | jq .

Current contract-gap classes are explicit follow-ups rather than softened passes:

  • marquez_model_table_run_detail: product run evidence exists, but the runtime has not emitted model/table runs linked to the product run. Tracked by #368.
  • marquez_dataset_detail: product and model/table runs exist, but Marquez does not expose the materialized Customer 360 dataset through the namespace dataset API. Tracked by #368 and #362.
  • marquez_lineage_graph_detail: product, model/table, and dataset evidence exist, but the lineage graph for mart_customer_360 has no queryable depth. Tracked by #368 and #362.

Related alpha follow-ups:

  • #368: emit model/table-linked OpenLineage and dataset graph depth for Customer 360.
  • #362: provide first-class catalog and lineage inspection paths.
  • #360: add model/table-level runtime traces for Customer 360.

Use the validator status or expanded alpha failure class to decide where to debug first. The current validator emits the existing runtime statuses; new validator work should use the full alpha class set below.

StatusMeaningFirst action
backend_unreachableThe backend API, service URL, tunnel, or collector path is unavailableCheck service pods, URLs, and port-forwards before rerunning the product
no_fresh_evidenceThe backend is reachable but returned no records for the expected product/run/tableConfirm the run ID and that the relevant signal exporter is enabled
stale_evidenceRecords exist only outside the freshness windowTrigger a new Customer 360 run and validate against the new run ID
wrong_contextRecords exist but match another product, run, or tableCheck FLOE_DEMO_RUN_ID, the validation manifest, and service URLs
product_failureEvidence shows the Customer 360 run or model/table execution failedDebug Dagster/dbt/storage output before investigating observability backends
platform_service_failureA required platform service is deployed but unhealthy or returning service-level errorsCheck the service pod logs, readiness, and backend-specific health endpoint
dashboard_datasource_driftGrafana panel queries fail or return empty results through the configured datasourceCompare the panel datasource with the backend API that returns live evidence
contract_gapThe current runtime or backend cannot produce a required alpha evidence family yetTrack the missing signal as an implementation gap instead of treating the product run as failed