Skip to content

OCI Registry Requirements

This document describes the requirements and configuration for OCI registries used to store floe platform artifacts.

floe stores platform configuration as OCI artifacts in container registries:

oci://registry.example.com/
├── floe-platform:v1.2.3 # Platform configuration artifacts
├── floe-platform-chart:v1.2.3 # Helm chart for platform deployment
└── plugins/
├── floe-dagster-chart:v1.0.0 # Plugin charts
└── floe-cube-chart:v1.0.0

Why OCI Registry?

  • Immutable, versioned storage (same as container images)
  • Universal availability (every cloud has a registry)
  • Native support in Helm 3.8+ for chart storage
  • Content-addressable (SHA256 digests ensure integrity)
  • Supports signing (cosign/sigstore)
RegistryOCI ArtifactsCosign SigningAuth MethodNotes
Amazon ECRYesYesIRSA / IAMRecommended for AWS
Azure Container RegistryYesYesManaged Identity / SPRecommended for Azure
Google Artifact RegistryYesYesWorkload Identity / SARecommended for GCP
GitHub Container RegistryYesYesPAT / GITHUB_TOKENGood for open source
HarborYesYesLDAP / OIDC / BasicAir-gapped ready
Docker HubLimitedYesPATRate limits, not recommended
manifest.yaml
artifacts:
registry:
uri: oci://123456789.dkr.ecr.us-east-1.amazonaws.com/floe
auth:
type: aws-irsa # Uses pod service account

ECR Policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
],
"Resource": "arn:aws:ecr:us-east-1:123456789:repository/floe/*"
},
{
"Effect": "Allow",
"Action": "ecr:GetAuthorizationToken",
"Resource": "*"
}
]
}
manifest.yaml
artifacts:
registry:
uri: oci://myregistry.azurecr.io/floe
auth:
type: azure-managed-identity
manifest.yaml
artifacts:
registry:
uri: oci://us-central1-docker.pkg.dev/my-project/floe
auth:
type: gcp-workload-identity
manifest.yaml
artifacts:
registry:
uri: oci://ghcr.io/my-org/floe
auth:
type: token
token_ref: ghcr-token # K8s Secret reference
manifest.yaml
artifacts:
registry:
uri: oci://harbor.internal.company.com/floe
auth:
type: basic
username_ref: harbor-credentials
password_ref: harbor-credentials
tls:
insecure_skip_verify: false
ca_cert_ref: harbor-ca-cert # Custom CA certificate

floe supports artifact signing via cosign for supply chain security.

Signature verification is configurable per environment:

manifest.yaml
artifacts:
signing:
enabled: true
enforcement: warn | enforce | off
# warn: Log warning but allow unsigned artifacts
# enforce: Reject unsigned artifacts
# off: No verification (development only)

Use OIDC-based keyless signing in GitHub Actions:

.github/workflows/publish.yml
- name: Install cosign
uses: sigstore/cosign-installer@v3
- name: Login to registry
run: echo "${{ secrets.REGISTRY_TOKEN }}" | helm registry login ghcr.io -u ${{ github.actor }} --password-stdin
- name: Publish and sign
env:
COSIGN_EXPERIMENTAL: "true" # Enable keyless signing
run: |
floe platform compile
floe platform publish v${{ github.run_number }}
# Automatic signing with GitHub OIDC identity

For environments without OIDC, use key-based signing:

Terminal window
# Generate key pair (one-time)
cosign generate-key-pair
# Sign during publish
floe platform publish v1.2.3 --sign --key cosign.key
# Verify during planned pull
floe init --platform=v1.2.3 --verify --key cosign.pub # planned target-state command
┌─────────────────┐ 1. Pull artifact ┌─────────────────┐
│ planned floe init │ ◄──────────────────────│ OCI Registry │
└────────┬────────┘ └─────────────────┘
│ 2. Check signature (if enforcement enabled)
┌─────────────────┐ 3. Verify signature ┌─────────────────┐
│ Local verify │ ◄────────────────────────│ Rekor (log) │
└────────┬────────┘ └─────────────────┘
│ 4. Signature valid? Continue or reject
┌─────────────────┐
│ Deploy platform │
└─────────────────┘

For environments without internet access, use bundle export/import:

Terminal window
# Export platform artifacts to tarball
floe platform export \
--version=v1.2.3 \
--output=platform-v1.2.3.tar \
--include-charts \
--include-signatures

Transfer platform-v1.2.3.tar to air-gapped environment via approved media.

Terminal window
# Import to internal Harbor registry
floe platform import \
--bundle=platform-v1.2.3.tar \
--registry=oci://harbor.internal/floe
# Verify import
floe platform list --registry=oci://harbor.internal/floe
# floe.yaml (in air-gapped environment)
platform:
ref: oci://harbor.internal/floe/platform:v1.2.3
name: Publish Platform
on:
push:
tags: ['v*']
jobs:
publish:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
id-token: write # For keyless signing
steps:
- uses: actions/checkout@v4
- name: Install floe CLI
run: pip install floe-cli
- name: Login to GHCR
run: echo "${{ secrets.GITHUB_TOKEN }}" | helm registry login ghcr.io -u ${{ github.actor }} --password-stdin
- name: Compile and publish
env:
COSIGN_EXPERIMENTAL: "true"
run: |
floe platform compile
floe platform publish ${{ github.ref_name }}
publish:
stage: deploy
image: python:3.11
script:
- pip install floe-cli
- echo "$CI_REGISTRY_PASSWORD" | helm registry login $CI_REGISTRY -u $CI_REGISTRY_USER --password-stdin
- floe platform compile
- floe platform publish $CI_COMMIT_TAG
only:
- tags

Registry unavailability can block platform deployments and compilations. This section defines the resilience strategy.

All OCI operations use exponential backoff with jitter:

manifest.yaml
artifacts:
registry:
uri: oci://ghcr.io/my-org/floe
resilience:
retry:
max_attempts: 3
initial_delay_ms: 1000
max_delay_ms: 30000
backoff_multiplier: 2.0
jitter: 0.1 # 10% jitter

Retry Behavior:

AttemptDelay (with jitter)Total Wait
1~1s1s
2~2s3s
3~4s7s
Fail-7s total
artifacts:
registry:
resilience:
timeouts:
connect_ms: 5000 # TCP connection timeout
read_ms: 30000 # Read timeout per chunk
total_ms: 300000 # Total operation timeout (5 min)

Prevents cascading failures when registry is unavailable:

artifacts:
registry:
resilience:
circuit_breaker:
enabled: true
failure_threshold: 5 # Open after 5 consecutive failures
success_threshold: 2 # Close after 2 consecutive successes
half_open_timeout_ms: 60000 # Try again after 60s

Circuit Breaker States:

┌─────────────────────────────────────────────────────────────────────────────┐
│ CIRCUIT BREAKER STATES │
│ │
│ ┌──────────┐ 5 failures ┌──────────┐ 60s timeout ┌─────────┐│
│ │ CLOSED │ ────────────────► │ OPEN │ ───────────────► │HALF-OPEN││
│ │ (normal) │ │ (reject) │ │ (probe) ││
│ └────┬─────┘ └──────────┘ └────┬────┘│
│ │ ▲ │ │
│ │ │ failure │ │
│ │ └─────────────────────────────┘ │
│ │ │ │
│ │ 2 successes success │ │
│ └◄───────────────────────────────────────────────────────────┘ │
│ │
│ CLOSED: All requests pass through, failures counted │
│ OPEN: All requests fail-fast with cached error │
│ HALF-OPEN: One request allowed to probe, result determines next state │
└─────────────────────────────────────────────────────────────────────────────┘

When pulling multiple artifacts (e.g., platform manifest + charts), failures are handled as follows:

┌─────────────────────────────────────────────────────────────────────────────┐
│ MULTI-ARTIFACT PULL FLOW │
│ │
│ ┌────────────┐ │
│ │ Pull Start │ │
│ └─────┬──────┘ │
│ │ │
│ ▼ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ 1. Pull │───►│ 2. Pull │───►│ 3. Pull │ │
│ │ Manifest │ │ Chart │ │ Plugins │ │
│ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────┐ ┌───────┐ ┌───────┐ │
│ │Success│ │ Fail │ │ N/A │ │
│ └───────┘ └───┬───┘ └───────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Retry │ │
│ │ (3 tries)│ │
│ └────┬─────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ │
│ │ Success │ │ Fail │ │
│ │Continue │ │ Rollback│ │
│ └─────────┘ └────┬────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │Clean up Step 1│ │
│ │(remove pulled)│ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘

Partial Failure Behavior:

  • If any artifact fails after retries, rollback all previously pulled artifacts
  • No partial state left behind
  • User sees consolidated error with all attempted operations

To reduce registry load and improve resilience:

artifacts:
registry:
cache:
enabled: true
local_path: /var/cache/floe/oci
ttl_hours: 24 # Time-to-live for cached artifacts
max_size_gb: 10 # Max cache size
immutable_tags: true # v1.2.3 tags never re-fetched

Cache Behavior:

Tag TypeCache BehaviorRationale
Semver (v1.2.3)Immutable, never re-fetchSemantic versions are immutable
Latest (latest)TTL-based, re-fetch after expiryMutable tags
SHA (sha256:abc)Immutable, never re-fetchContent-addressable

For registries with rate limits (Docker Hub, some GCR tiers):

artifacts:
registry:
rate_limiting:
respect_retry_after: true # Honor Retry-After header
max_requests_per_minute: 50 # Self-imposed limit
quota_buffer_percent: 20 # Reserve 20% of quota

Rate Limit Response:

HTTP StatusAction
429 Too Many RequestsWait for Retry-After, then retry
503 Service UnavailableExponential backoff retry
Quota exceededFail with clear error, log quota state

OCI registry operations emit the following metrics:

MetricTypeLabelsDescription
floe_oci_pull_duration_secondsHistogramregistry, artifactPull operation duration
floe_oci_pull_totalCounterregistry, statusTotal pull attempts
floe_oci_circuit_breaker_stateGaugeregistry0=closed, 1=open, 2=half-open
floe_oci_cache_hits_totalCounterregistryCache hit count
floe_oci_cache_misses_totalCounterregistryCache miss count

Alert Rules:

groups:
- name: oci-registry
rules:
- alert: OCIRegistryUnavailable
expr: floe_oci_circuit_breaker_state == 1
for: 5m
labels:
severity: critical
annotations:
summary: "OCI registry circuit breaker is open"
- alert: OCIRegistrySlowPulls
expr: histogram_quantile(0.99, floe_oci_pull_duration_seconds) > 30
for: 10m
labels:
severity: warning
annotations:
summary: "OCI registry pulls are slow (p99 > 30s)"

Docker Hub has rate limits (100 pulls/6 hours for anonymous). Not recommended for production.

Configure registry mirror for rate-limited or slow registries:

# manifest.yaml (optional)
artifacts:
registry:
uri: oci://ghcr.io/my-org/floe
cache:
enabled: true
mirror: oci://harbor.internal/cache/ghcr.io

Helm 3.8+ caches OCI artifacts locally:

Terminal window
# Cache location
~/.cache/helm/registry/
# Clear cache
helm registry logout ghcr.io
rm -rf ~/.cache/helm/registry/ghcr.io
# Full artifacts configuration schema
artifacts:
registry:
uri: string # OCI registry URI (oci://...)
auth:
type: aws-irsa | azure-managed-identity | gcp-workload-identity | token | basic
# Token-based
token_ref: string # K8s Secret reference
# Basic auth
username_ref: string
password_ref: string
tls:
insecure_skip_verify: boolean # Skip TLS verification (not recommended)
ca_cert_ref: string # Custom CA certificate Secret reference
cache:
enabled: boolean
mirror: string # Pull-through cache registry
signing:
enabled: boolean
enforcement: warn | enforce | off
public_key_ref: string # For key-based verification