ADR-0022: Security & RBAC Model
Status
Section titled “Status”Accepted
RFC 2119 Compliance: This ADR uses MUST/SHOULD/MAY keywords per RFC 2119. See glossary.
Context
Section titled “Context”floe deploys data pipeline infrastructure across multiple Kubernetes namespaces with several interconnected services (Dagster, Polaris, Cube, MinIO). Without a documented security model, organizations cannot:
- Understand the trust boundaries between components
- Apply least-privilege principles consistently
- Configure authentication for API access
- Implement network segmentation
- Meet compliance requirements (SOC2, ISO 27001)
Key security requirements:
- Isolation: Job pods MUST NOT access data outside their assigned namespace (namespace isolation enforced via NetworkPolicy and RBAC)
- Least Privilege: Services MUST have only the minimum required permissions (least privilege principle)
- Authentication: APIs MUST authenticate requests
- Network Segmentation: Control traffic flow between layers
- Auditability: All access MUST be logged to OpenTelemetry backends for audit trails
Decision
Section titled “Decision”Implement a layered security model aligned with the four-layer architecture:
- Kubernetes RBAC for internal cluster access
- API Authentication for external service access
- Network Policies for traffic segmentation
- Pod Security Standards for workload hardening
Consequences
Section titled “Consequences”Positive
Section titled “Positive”- Defense in depth: Multiple security layers
- Clear boundaries: Each layer has defined trust boundaries
- Audit trail: All access logged via K8s audit + OTel
- Compliance ready: Meets SOC2/ISO 27001 controls
Negative
Section titled “Negative”- Complexity: More configuration to manage
- Debugging: Network policies can complicate troubleshooting
- Performance: mTLS adds latency (if enabled)
Neutral
Section titled “Neutral”- Standard Kubernetes security patterns
- Optional service mesh for enhanced security
Namespace Strategy
Section titled “Namespace Strategy”floe uses dedicated namespaces per architectural layer:
┌─────────────────────────────────────────────────────────────────────────────┐│ KUBERNETES CLUSTER ││ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ floe-platform (Layer 3 - Platform Services) │ ││ │ │ ││ │ • Dagster (webserver, daemon) │ ││ │ • Polaris (catalog API) │ ││ │ • Cube (semantic layer) │ ││ │ • MinIO (object storage) │ ││ │ • OTel Collector, Prometheus, Grafana │ ││ │ • PostgreSQL instances (for Dagster, Polaris) │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ floe-jobs (Layer 4 - Ephemeral Job Pods) │ ││ │ │ ││ │ • dbt run jobs │ ││ │ • dlt ingestion jobs │ ││ │ • Data quality jobs │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ floe-<domain>-domain (Data Mesh - per domain) │ ││ │ │ ││ │ • Domain-specific jobs │ ││ │ • Domain service accounts │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────────────────┘Namespace Configuration
Section titled “Namespace Configuration”# floe-platform namespaceapiVersion: v1kind: Namespacemetadata: name: floe-platform labels: app.kubernetes.io/part-of: floe floe.dev/layer: "3" pod-security.kubernetes.io/enforce: baseline pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted---# floe-jobs namespaceapiVersion: v1kind: Namespacemetadata: name: floe-jobs labels: app.kubernetes.io/part-of: floe floe.dev/layer: "4" pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restrictedService Accounts
Section titled “Service Accounts”Platform Service Accounts
Section titled “Platform Service Accounts”| Service Account | Namespace | Purpose | Permissions |
|---|---|---|---|
floe-platform-admin | floe-platform | Platform management | Full namespace admin |
floe-dagster | floe-platform | Dagster webserver/daemon | Create jobs in floe-jobs, read secrets |
floe-polaris | floe-platform | Polaris catalog | Read/write catalog secrets, S3 access |
floe-cube | floe-platform | Cube semantic layer | Read catalog, read secrets |
floe-minio | floe-platform | MinIO storage | PVC access |
Job Service Accounts
Section titled “Job Service Accounts”| Service Account | Namespace | Purpose | Permissions |
|---|---|---|---|
floe-job-runner | floe-jobs | dbt/dlt jobs | Read secrets, emit telemetry |
floe-job-<domain> | floe- | Domain jobs | Domain-scoped secrets |
Service Account Definitions
Section titled “Service Account Definitions”# Dagster service account with job creation permissionsapiVersion: v1kind: ServiceAccountmetadata: name: floe-dagster namespace: floe-platform---apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: floe-dagster-role namespace: floe-jobsrules: - apiGroups: ["batch"] resources: ["jobs"] verbs: ["create", "get", "list", "watch", "delete"] - apiGroups: [""] resources: ["pods", "pods/log"] verbs: ["get", "list", "watch"]---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: floe-dagster-job-creator namespace: floe-jobssubjects: - kind: ServiceAccount name: floe-dagster namespace: floe-platformroleRef: kind: Role name: floe-dagster-role apiGroup: rbac.authorization.k8s.io# Job runner service account (minimal permissions)apiVersion: v1kind: ServiceAccountmetadata: name: floe-job-runner namespace: floe-jobs---apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: floe-job-runner-role namespace: floe-jobsrules: - apiGroups: [""] resources: ["secrets"] verbs: ["get"] resourceNames: ["compute-credentials", "catalog-credentials"] - apiGroups: [""] resources: ["configmaps"] verbs: ["get"] resourceNames: ["floe-job-config"]---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: floe-job-runner-binding namespace: floe-jobssubjects: - kind: ServiceAccount name: floe-job-runner namespace: floe-jobsroleRef: kind: Role name: floe-job-runner-role apiGroup: rbac.authorization.k8s.ioAPI Authentication
Section titled “API Authentication”Polaris Catalog (OAuth2)
Section titled “Polaris Catalog (OAuth2)”Polaris uses OAuth2 Client Credentials flow for service-to-service authentication:
┌─────────────────┐ 1. Client Credentials ┌─────────────────┐│ Job Pod │ ─────────────────────────────►│ Polaris ││ (dbt/dlt) │ │ Catalog │└─────────────────┘ └────────┬────────┘ │ │ │ 2. Access Token (JWT) │ │◄─────────────────────────────────────────────────┘ │ │ 3. Catalog API calls with Bearer token ▼┌─────────────────┐│ Iceberg Tables │└─────────────────┘Configuration:
plugins: catalog: type: polaris config: uri: http://polaris.floe-platform.svc.cluster.local:8181 auth: type: oauth2 client_id_ref: polaris-client-credentials client_secret_ref: polaris-client-credentials token_endpoint: http://polaris.floe-platform.svc.cluster.local:8181/api/catalog/v1/oauth/tokensCube Semantic Layer (JWT)
Section titled “Cube Semantic Layer (JWT)”Cube uses JWT with security context for row-level security:
# Cube configurationsecurity: jwt: key_ref: cube-jwt-secret algorithms: ["HS256"] claims_namespace: "https://floe.dev/"
# Security context in JWT payload{ "sub": "service-account:floe-job-runner", "https://floe.dev/namespace": "sales.customer-360", "https://floe.dev/roles": ["data_reader"], "exp": 1704067200}Row-Level Security in Cube:
// cube.js security contextcube(`orders`, { sql: `SELECT * FROM iceberg.gold.orders`,
dimensions: { namespace: { sql: `namespace`, type: `string`, }, },
// Filter by namespace from JWT queryRewrite: (query, { securityContext }) => { if (securityContext.namespace) { query.filters.push({ member: `orders.namespace`, operator: `equals`, values: [securityContext.namespace], }); } return query; },});Cross-Service Authentication Flow
Section titled “Cross-Service Authentication Flow”┌─────────────────────────────────────────────────────────────────────────────┐│ AUTHENTICATION FLOW ││ ││ 1. Dagster schedules job ││ │ ││ ▼ ││ 2. Job pod created with ServiceAccount (floe-job-runner) ││ │ ││ ├─► 3a. Read secrets from K8s (mounted as env vars) ││ │ ││ ├─► 3b. Authenticate to Polaris (OAuth2 → JWT) ││ │ └─► Polaris vends short-lived S3 credentials ││ │ ││ ├─► 3c. Execute dbt (uses Polaris-vended credentials) ││ │ ││ └─► 3d. Emit telemetry to OTel Collector (see note below) ││ │└─────────────────────────────────────────────────────────────────────────────┘Telemetry Emission Security
Section titled “Telemetry Emission Security”Telemetry emission to the OTel Collector uses network-policy-based protection by default, with an optional service mesh upgrade path for organizations requiring authenticated internal traffic.
Default Configuration (Network Policy Protection)
Section titled “Default Configuration (Network Policy Protection)”# Network policy restricts OTel Collector access to floe-jobs namespace onlyapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: otel-collector-ingress namespace: floe-platformspec: podSelector: matchLabels: app: otel-collector policyTypes: - Ingress ingress: # Only allow from floe-jobs namespace - from: - namespaceSelector: matchLabels: name: floe-jobs ports: - protocol: TCP port: 4317 # OTLP gRPC - protocol: TCP port: 4318 # OTLP HTTP # Allow from floe-platform (internal services) - from: - podSelector: {} ports: - protocol: TCP port: 4317 - protocol: TCP port: 4318Security rationale: This approach is common in Kubernetes observability deployments where:
- Network policies enforce namespace-level isolation
- Telemetry data flows are internal (pod-to-service within cluster)
- The OTel Collector is not exposed externally
- Telemetry injection would require already having cluster access
Authenticated Telemetry (Service Mesh)
Section titled “Authenticated Telemetry (Service Mesh)”For organizations with strict compliance requirements (e.g., zero-trust networking, FedRAMP), enable authenticated telemetry via service mesh:
# Istio configuration for authenticated telemetryapiVersion: security.istio.io/v1beta1kind: PeerAuthenticationmetadata: name: otel-collector-mtls namespace: floe-platformspec: selector: matchLabels: app: otel-collector mtls: mode: STRICT # Require mTLS for all connections---apiVersion: security.istio.io/v1beta1kind: AuthorizationPolicymetadata: name: otel-collector-authz namespace: floe-platformspec: selector: matchLabels: app: otel-collector action: ALLOW rules: - from: - source: # Only allow job pods and platform services principals: - "cluster.local/ns/floe-jobs/sa/floe-job-runner" - "cluster.local/ns/floe-platform/sa/*" to: - operation: ports: ["4317", "4318"]Benefits of authenticated telemetry:
- mTLS encrypts telemetry in transit
- Service identity verification prevents telemetry injection
- Audit trail of which service accounts emitted telemetry
Configuration Selection
Section titled “Configuration Selection”| Requirement | Solution | Configuration |
|---|---|---|
| Standard deployment | Network policies | Default (no additional config) |
| Zero-trust / FedRAMP | Service mesh + mTLS | security.service_mesh.mtls: strict |
| Air-gapped / regulated | Service mesh + mTLS + audit | Full service mesh with access logging |
security: telemetry: # Options: network_policy (default) | service_mesh protection: network_policy
# Enable service mesh for authenticated internal traffic service_mesh: enabled: false # Set to true for mTLS on all internal traffic type: istio mtls: strictRecommendation: Most deployments SHOULD use default network policy. Service mesh (Istio/Linkerd) SHOULD be enabled only when compliance requires mTLS or advanced traffic controls.
Network Policies
Section titled “Network Policies”Default Deny Policy
Section titled “Default Deny Policy”# Default deny all ingress/egress in floe-jobsapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: default-deny-all namespace: floe-jobsspec: podSelector: {} policyTypes: - Ingress - EgressAllow Job → Platform Services
Section titled “Allow Job → Platform Services”# Allow job pods to access platform servicesapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-jobs-to-platform namespace: floe-jobsspec: podSelector: matchLabels: floe.dev/job-type: data-pipeline policyTypes: - Egress egress: # Allow to Polaris catalog - to: - namespaceSelector: matchLabels: name: floe-platform podSelector: matchLabels: app: polaris ports: - protocol: TCP port: 8181
# Allow to OTel Collector - to: - namespaceSelector: matchLabels: name: floe-platform podSelector: matchLabels: app: otel-collector ports: - protocol: TCP port: 4317 # OTLP gRPC - protocol: TCP port: 4318 # OTLP HTTP
# Allow to MinIO - to: - namespaceSelector: matchLabels: name: floe-platform podSelector: matchLabels: app: minio ports: - protocol: TCP port: 9000
# Allow DNS resolution - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53Allow External Compute (Cloud DWH)
Section titled “Allow External Compute (Cloud DWH)”# Allow jobs to connect to external data warehouses (Snowflake, BigQuery)apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-external-compute namespace: floe-jobsspec: podSelector: matchLabels: floe.dev/job-type: data-pipeline policyTypes: - Egress egress: # Snowflake (HTTPS) - to: - ipBlock: cidr: 0.0.0.0/0 ports: - protocol: TCP port: 443Platform Services Internal Communication
Section titled “Platform Services Internal Communication”# Allow platform services to communicate internallyapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-platform-internal namespace: floe-platformspec: podSelector: {} policyTypes: - Ingress - Egress ingress: # Allow from same namespace - from: - podSelector: {} egress: # Allow to same namespace - to: - podSelector: {} # Allow DNS - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system ports: - protocol: UDP port: 53Network Policy Summary
Section titled “Network Policy Summary”| Source | Destination | Ports | Status |
|---|---|---|---|
| floe-jobs → floe-platform/polaris | 8181 | ALLOW | |
| floe-jobs → floe-platform/otel-collector | 4317, 4318 | ALLOW | |
| floe-jobs → floe-platform/minio | 9000 | ALLOW | |
| floe-jobs → external (HTTPS) | 443 | ALLOW (compute targets) | |
| floe-jobs → * | * | DENY (default) | |
| floe-platform → floe-platform | * | ALLOW (internal) | |
| external → floe-platform | ingress | ALLOW (via Ingress) |
Pod Security Standards
Section titled “Pod Security Standards”Job Pods (Restricted)
Section titled “Job Pods (Restricted)”Job pods run with the restricted Pod Security Standard:
# Job pod security contextapiVersion: batch/v1kind: Jobmetadata: name: dbt-run-customer-360 namespace: floe-jobsspec: template: spec: serviceAccountName: floe-job-runner securityContext: runAsNonRoot: true runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 seccompProfile: type: RuntimeDefault containers: - name: dbt image: ghcr.io/floe/dbt:1.7 securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: ["ALL"] volumeMounts: - name: tmp mountPath: /tmp - name: dbt-home mountPath: /home/dbt volumes: - name: tmp emptyDir: {} - name: dbt-home emptyDir: {}Platform Services (Baseline)
Section titled “Platform Services (Baseline)”Platform services run with the baseline Pod Security Standard (some require capabilities):
# Platform pod security (example: Dagster)spec: securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 1000 containers: - name: dagster-webserver securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] add: ["NET_BIND_SERVICE"] # If binding to port < 1024Service Mesh (Optional)
Section titled “Service Mesh (Optional)”For enhanced security, organizations can deploy a service mesh:
Istio Configuration
Section titled “Istio Configuration”# Enable mTLS for all floe namespacesapiVersion: security.istio.io/v1beta1kind: PeerAuthenticationmetadata: name: floe-mtls namespace: floe-platformspec: mtls: mode: STRICT---# Authorization policyapiVersion: security.istio.io/v1beta1kind: AuthorizationPolicymetadata: name: polaris-authz namespace: floe-platformspec: selector: matchLabels: app: polaris action: ALLOW rules: - from: - source: principals: - "cluster.local/ns/floe-jobs/sa/floe-job-runner" - "cluster.local/ns/floe-platform/sa/floe-dagster" to: - operation: methods: ["GET", "POST", "PUT", "DELETE"] paths: ["/api/*"]Benefits of Service Mesh
Section titled “Benefits of Service Mesh”| Feature | Without Mesh | With Mesh |
|---|---|---|
| mTLS | Manual certificates | Automatic |
| Traffic observability | OTel instrumentation | Automatic |
| Retries/circuit breaking | Application code | Configuration |
| Zero-trust networking | Network policies | + identity-based |
Audit Logging
Section titled “Audit Logging”Kubernetes Audit Policy
Section titled “Kubernetes Audit Policy”apiVersion: audit.k8s.io/v1kind: Policyrules: # Log all secret access - level: Metadata resources: - group: "" resources: ["secrets"] namespaces: ["floe-platform", "floe-jobs"]
# Log all job creation - level: RequestResponse resources: - group: "batch" resources: ["jobs"] namespaces: ["floe-jobs"]
# Log RBAC changes - level: RequestResponse resources: - group: "rbac.authorization.k8s.io" resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]Application Audit Events
Section titled “Application Audit Events”# Emit audit events via OpenTelemetryfrom opentelemetry import trace
tracer = trace.get_tracer("floe.audit")
def audit_catalog_access(user: str, table: str, operation: str): with tracer.start_as_current_span("audit.catalog_access") as span: span.set_attribute("audit.user", user) span.set_attribute("audit.table", table) span.set_attribute("audit.operation", operation) span.set_attribute("audit.timestamp", datetime.now(timezone.utc).isoformat())Configuration Schema
Section titled “Configuration Schema”# manifest.yaml security sectionsecurity: # Pod Security Standards enforcement pod_security: platform_level: baseline # For floe-platform namespace jobs_level: restricted # For floe-jobs namespace
# Network policy mode network_policies: enabled: true default_deny: true # Default deny in job namespace allow_external_https: true # Allow jobs to reach cloud DWH
# Service mesh (optional) service_mesh: enabled: false type: istio | linkerd mtls: strict | permissive
# API authentication api_auth: polaris: type: oauth2 client_id_ref: polaris-credentials client_secret_ref: polaris-credentials cube: type: jwt secret_ref: cube-jwt-secret algorithms: ["HS256"]
# Audit configuration audit: enabled: true secret_access: true job_lifecycle: trueSecurity Checklist
Section titled “Security Checklist”Pre-Deployment
Section titled “Pre-Deployment”- Namespaces created with correct Pod Security labels
- Service accounts created with least-privilege roles
- Network policies deployed and tested
- Secrets created (not committed to git)
- TLS certificates configured for ingress
Post-Deployment
Section titled “Post-Deployment”- Verify pods running as non-root
- Test network policy enforcement
- Confirm audit logging working
- Run security scan (Trivy, Kubescape)
Ongoing
Section titled “Ongoing”- Regular secret rotation
- Dependency vulnerability scanning
- RBAC permission review
- Audit log review
References
Section titled “References”- Kubernetes RBAC
- Pod Security Standards
- Network Policies
- Polaris Security - OAuth2 authentication
- Cube Security - JWT authentication
- ADR-0016: Platform Enforcement Architecture - Four-layer architecture
- ADR-0019: Platform Services Lifecycle - Service deployment
- ADR-0023: Secrets Management - Credential management