Manual recovery: Change user IDs with strict security policies

Important
Please contact ITRS Support before performing this procedure and follow their guidance.

When this applies:

You need to change runAsUser on an existing installation
Your cluster has Pod Security Admission (PSA) or Gatekeeper/Kyverno policies enforcing restricted security contexts
The required ownership migration requires privileged operations that violate your security policies

Limitation: The ownership migration process requires running privileged jobs (as root with elevated capabilities) to change file ownership of persistent volumes. These jobs cannot run while strict security policies are enforced.

What gets migrated:

The migration jobs handle all TimescaleDB/PostgreSQL persistent volumes:

PGDATA: Main database data directory (mandatory)
WALDIR (TimescaleDB only): Separate write-ahead log volume (mandatory)
Tablespaces (TimescaleDB only): Optional additional storage volumes for timeseries data (if timescale.timeseriesDiskCount > 0)

All volumes must have their ownership changed when runAsUser changes, otherwise the database will fail to start with permission errors.

Prerequisites Copied

Before starting, ensure you have:

Contact with ITRS Support engineer
Cluster administrator access
A maintenance window (database pods will restart)
Note your current and target UID/GID values

Recovery procedure Copied

Step 1: Update security context configuration Copied

Update your Helm values file with new UIDs. Critical: Change both runAsUser and fsGroup:

# values.yaml
securityContext:
  pod:
    runAsUser: 5000      # New UID
    runAsGroup: 5000     # New GID
    supplementalGroups: [5000]
    fsGroup: 5000        # MUST change with runAsUser
    fsGroupChangePolicy: OnRootMismatch

Apply the configuration using Helm:

helm upgrade iax itrs/iax-platform \
  --namespace itrs \
  --values values.yaml \
  --wait

Note
Pods will fail to start at this point due to ownership mismatch. This is expected.

Step 2: Temporarily relax security policies Copied

For Pod Security Admission:

# Change namespace from restricted to privileged temporarily
kubectl label namespace itrs pod-security.kubernetes.io/enforce=privileged --overwrite

# Verify
kubectl get namespace itrs -o jsonpath='{.metadata.labels.pod-security\.kubernetes\.io/enforce}'
# Should show: privileged

For Gatekeeper:

# Get your constraint names
kubectl get constraints -A

# Add itrs namespace to excludedNamespaces for each constraint
# Example for K8sPSPCapabilities constraint named "drop-all":
kubectl patch k8spspcapabilities drop-all --type=json -p='[
  {
    "op": "add",
    "path": "/spec/match/excludedNamespaces/-",
    "value": "itrs"
  }
]'

# Repeat for other constraints (e.g., K8sPSPAllowedUsers named "jpmc-allowed-user-ranges")
kubectl patch k8spspallowedusers jpmc-allowed-user-ranges --type=json -p='[
  {
    "op": "add",
    "path": "/spec/match/excludedNamespaces/-",
    "value": "itrs"
  }
]'

# Verify
kubectl get constraints -o yaml | grep -A 5 excludedNamespaces
# Should show 'itrs' in the list

Step 3: Run ownership migration jobs Copied

Create migration job manifests for PostgreSQL and TimescaleDB:

postgres-migration-job-0.yaml (for single replica or replica 0 in HA):

apiVersion: batch/v1
kind: Job
metadata:
  name: postgres-ownership-migration-0
  namespace: itrs
  annotations:
    iax.itrsgroup.com/delete-after-install: "true"
spec:
  backoffLimit: 3
  activeDeadlineSeconds: 21600
  template:
    metadata:
      labels:
        iax.itrsgroup.com/pre-upgrade-job: "true"
    spec:
      restartPolicy: Never
      securityContext:
        runAsUser: 0
        runAsGroup: 0
      containers:
      - name: fix-ownership
        image: docker.itrsgroup.com/iax/postgres:<VERSION>  # Replace <VERSION> with your platform version tag
        imagePullPolicy: IfNotPresent
        securityContext:
          runAsNonRoot: false
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - CHOWN
            - FOWNER
            - DAC_OVERRIDE
            - DAC_READ_SEARCH
            drop:
            - ALL
          seccompProfile:
            type: RuntimeDefault
        env:
        - name: TARGET_UID
          value: "5000"  # Your new UID
        - name: TARGET_GID
          value: "5000"  # Your new GID
        - name: REPLICA_INDEX
          value: "0"
        command:
        - /bin/bash
        - -c
        - |
          set -e
          
          echo "Starting ownership migration for PostgreSQL replica ${REPLICA_INDEX}"
          echo "Target ownership: ${TARGET_UID}:${TARGET_GID}"
          
          PGDATA="/var/lib/postgresql/data"
          
          # Check if data directory is empty (fresh install, skip migration)
          if [ ! -d "${PGDATA}" ] || [ -z "$(ls -A ${PGDATA} 2>/dev/null)" ]; then
            echo "Data directory is empty - skipping migration (fresh install)"
            exit 0
          fi
          
          # Get current ownership of PGDATA
          CURRENT_UID=$(stat -c '%u' "${PGDATA}")
          CURRENT_GID=$(stat -c '%g' "${PGDATA}")
          
          echo "Current PGDATA ownership: ${CURRENT_UID}:${CURRENT_GID}"
          
          # Check if migration is needed
          if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then
            echo "Ownership already correct - skipping migration"
            exit 0
          else
            echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
            echo "Changing ownership (may take several minutes for large databases)..."
            chown -R "${TARGET_UID}:${TARGET_GID}" "${PGDATA}"
            echo "Migration completed"
          fi
          
          echo ""
          echo "Migration completed successfully for replica ${REPLICA_INDEX}"
          echo "Final ownership: $(stat -c '%u:%g' ${PGDATA})"
          echo "Note: Permissions will be fixed by the pod startup script"
        volumeMounts:
        - name: pgdata
          mountPath: /var/lib/postgresql
      volumes:
      - name: pgdata
        persistentVolumeClaim:
          claimName: data-pgplatform-0

timescale-migration-job-0.yaml (for single replica or replica 0 in HA):

apiVersion: batch/v1
kind: Job
metadata:
  name: timescale-ownership-migration-0
  namespace: itrs
  annotations:
    iax.itrsgroup.com/delete-after-install: "true"
spec:
  backoffLimit: 3
  activeDeadlineSeconds: 21600
  template:
    metadata:
      labels:
        iax.itrsgroup.com/pre-upgrade-job: "true"
    spec:
      restartPolicy: Never
      securityContext:
        runAsUser: 0
        runAsGroup: 0
      containers:
      - name: fix-ownership
        image: docker.itrsgroup.com/iax/timescale:<VERSION>  # Replace <VERSION> with your platform version tag
        imagePullPolicy: IfNotPresent
        securityContext:
          runAsNonRoot: false
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - CHOWN
            - FOWNER
            - DAC_OVERRIDE
            - DAC_READ_SEARCH
            drop:
            - ALL
          seccompProfile:
            type: RuntimeDefault
        env:
        - name: TARGET_UID
          value: "5000"  # Your new UID
        - name: TARGET_GID
          value: "5000"  # Your new GID
        - name: REPLICA_INDEX
          value: "0"
        command:
        - /bin/bash
        - -c
        - |
          set -e
          
          echo "Starting ownership migration for TimescaleDB replica ${REPLICA_INDEX}"
          echo "Target ownership: ${TARGET_UID}:${TARGET_GID}"
          echo ""
          
          MIGRATION_NEEDED=false
          
          # Process data volume
          PGDATA="/var/lib/postgresql/data"
          echo "Processing data volume: ${PGDATA}"
          
          if [ ! -d "${PGDATA}" ] || [ -z "$(ls -A ${PGDATA} 2>/dev/null)" ]; then
            echo "Data directory is empty - skipping (fresh install)"
          else
            # Get current ownership
            CURRENT_UID=$(stat -c '%u' "${PGDATA}")
            CURRENT_GID=$(stat -c '%g' "${PGDATA}")
            echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}"
            
            if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then
              echo "Ownership already correct"
            else
              echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
              echo "Changing ownership (may take several minutes for large databases)..."
              MIGRATION_NEEDED=true
              chown -R "${TARGET_UID}:${TARGET_GID}" "${PGDATA}"
              echo "Data volume migration completed"
            fi
          fi
          echo ""
          
          # Process WAL volume
          PGWAL="/var/lib/postgresql/wal/pg_wal"
          echo "Processing WAL volume: ${PGWAL}"
          
          if [ ! -d "${PGWAL}" ] || [ -z "$(ls -A ${PGWAL} 2>/dev/null)" ]; then
            echo "WAL directory is empty - skipping (fresh install)"
          else
            # Get current ownership
            CURRENT_UID=$(stat -c '%u' "${PGWAL}")
            CURRENT_GID=$(stat -c '%g' "${PGWAL}")
            echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}"
            
            if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then
              echo "Ownership already correct"
            else
              echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
              MIGRATION_NEEDED=true
              
              echo "Changing WAL ownership (this is fast)..."
              chown -R "${TARGET_UID}:${TARGET_GID}" "${PGWAL}"
              echo "WAL volume migration completed"
            fi
          fi
          echo ""
          
          # Process tablespace volumes (if configured)
          # Uncomment and adjust based on your timescale.timeseriesDiskCount setting
          # Example below assumes timeseriesDiskCount=2:
          #
          # TABLESPACE="/var/lib/postgresql/tablespaces/timeseries_tablespace_1"
          # echo "Processing tablespace volume 1: ${TABLESPACE}"
          # if [ ! -d "${TABLESPACE}/data" ] || [ -z "$(ls -A ${TABLESPACE}/data 2>/dev/null)" ]; then
          #   echo "Tablespace 1 is empty - skipping"
          # else
          #   CURRENT_UID=$(stat -c '%u' "${TABLESPACE}/data")
          #   CURRENT_GID=$(stat -c '%g' "${TABLESPACE}/data")
          #   echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}"
          #   if [ "${CURRENT_UID}" != "${TARGET_UID}" ] || [ "${CURRENT_GID}" != "${TARGET_GID}" ]; then
          #     echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
          #     MIGRATION_NEEDED=true
          #     echo "Changing tablespace 1 ownership..."
          #     chown -R "${TARGET_UID}:${TARGET_GID}" "${TABLESPACE}"
          #     echo "Tablespace 1 migration completed"
          #   else
          #     echo "Ownership already correct"
          #   fi
          # fi
          # echo ""
          # (Repeat for tablespace 2, 3, etc. based on your timeseriesDiskCount)
          
          if [ "${MIGRATION_NEEDED}" = "false" ]; then
            echo "No migration needed for replica ${REPLICA_INDEX}"
          else
            echo "Replica ${REPLICA_INDEX} migrated successfully"
          fi
          echo "Note: Permissions will be fixed by the pod startup script"
        volumeMounts:
        - name: tsdata
          mountPath: /var/lib/postgresql
        - name: tswal
          mountPath: /var/lib/postgresql/wal
        # Uncomment and add tablespace volumes if configured:
        # - name: tablespace-1
        #   mountPath: /var/lib/postgresql/tablespaces/timeseries_tablespace_1
        # - name: tablespace-2
        #   mountPath: /var/lib/postgresql/tablespaces/timeseries_tablespace_2
      volumes:
      - name: tsdata
        persistentVolumeClaim:
          claimName: timescale-ha-data-timescale-0
      - name: tswal
        persistentVolumeClaim:
          claimName: timescale-ha-wal-timescale-0
      # Uncomment and add tablespace PVCs if configured:
      # - name: tablespace-1
      #   persistentVolumeClaim:
      #     claimName: timescale-ha-tablespace-data-1-timescale-0
      # - name: tablespace-2
      #   persistentVolumeClaim:
      #     claimName: timescale-ha-tablespace-data-2-timescale-0

Before applying, update the job manifests:

Replace <VERSION> in the image field with your actual platform version tag (e.g., 2.17.0)
Replace TARGET_UID and TARGET_GID environment variable values with your target values (5000 in the example above)
For TimescaleDB:
- If you have tablespace volumes configured (timescale.timeseriesDiskCount > 0), uncomment the tablespace sections in the script and add the corresponding volumeMounts and volumes
- Add one TABLESPACE section per tablespace (1, 2, 3, etc. based on your timeseriesDiskCount)
For HA configurations (multiple replicas):
- Create separate job manifests for each replica (replica 0, 1, 2, etc.)
- Update REPLICA_INDEX environment variable: "0", "1", "2", etc.
- Update job names: postgres-ownership-migration-0, postgres-ownership-migration-1, etc.
- Update claimName suffixes in volumes:
  - PostgreSQL: data-pgplatform-0, data-pgplatform-1, etc.
  - TimescaleDB: timescale-ha-data-timescale-0, timescale-ha-data-timescale-1, etc.
  - TimescaleDB WAL: timescale-ha-wal-timescale-0, timescale-ha-wal-timescale-1, etc.
  - TimescaleDB tablespaces: timescale-ha-tablespace-data-1-timescale-0, timescale-ha-tablespace-data-1-timescale-1, etc.

Apply the jobs:

# Single replica setup:
kubectl apply -f postgres-migration-job-0.yaml
kubectl apply -f timescale-migration-job-0.yaml

# For HA setups, apply jobs for all replicas:
kubectl apply -f postgres-migration-job-0.yaml
kubectl apply -f postgres-migration-job-1.yaml
# ... repeat for additional replicas
kubectl apply -f timescale-migration-job-0.yaml
kubectl apply -f timescale-migration-job-1.yaml
# ... repeat for additional replicas

# Wait for all jobs to complete
kubectl wait --for=condition=complete --timeout=600s job/postgres-ownership-migration-0 -n itrs
kubectl wait --for=condition=complete --timeout=600s job/timescale-ownership-migration-0 -n itrs
# ... wait for all replica jobs

# Check job logs to verify success
kubectl logs -n itrs job/postgres-ownership-migration-0
kubectl logs -n itrs job/timescale-ownership-migration-0
# ... check logs for all replica jobs

Scale down and up StatefulSets to pick up new ownership:

# Scale down (adjust replicas based on your HA configuration)
kubectl scale statefulset pgplatform -n itrs --replicas=0
kubectl scale statefulset timescale -n itrs --replicas=0

# Wait for pods to terminate
kubectl wait --for=delete pod -l app=pgplatform -n itrs --timeout=120s
kubectl wait --for=delete pod -l app=timescale -n itrs --timeout=120s

# Scale back up (use your original replica count)
kubectl scale statefulset pgplatform -n itrs --replicas=1  # or 2, 3, etc. for HA
kubectl scale statefulset timescale -n itrs --replicas=1   # or 2, 3, etc. for HA

# Wait for platform to be ready
sleep 60
kubectl wait --timeout=600s iaxplatform/iax-iax-platform --for jsonpath="{.status.status}"=DEPLOYED -n itrs

Step 4: Verify system health Copied

# Verify all pods are running
kubectl get pods -n itrs
# All pods should be in Running state

# Verify ownership changed on volumes
kubectl exec -n itrs pgplatform-0 -c postgres -- stat -c '%u:%g' /var/lib/postgresql/data
# Should show: 5000:5000 (or your target UID:GID)

kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/data
kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/wal/pg_wal
# Should show: 5000:5000

# If using tablespaces, verify tablespace ownership:
# kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/tablespaces/timeseries_tablespace_1/data

# Verify correct UIDs inside containers
kubectl exec -n itrs pgplatform-0 -c postgres -- id
kubectl exec -n itrs timescale-0 -c timescale -- id
# Should show: uid=5000 gid=5000 (or your target UIDs)

# Test database connectivity
kubectl exec -n itrs pgplatform-0 -c postgres -- psql -U postgres -c "SELECT 1"
kubectl exec -n itrs timescale-0 -c timescale -- psql -U postgres -c "SELECT 1"
# Should return: 1

# Verify no errors in logs
kubectl logs -n itrs pgplatform-0 -c postgres --tail=50
kubectl logs -n itrs timescale-0 -c timescale --tail=50

Step 5: Re-enable security policies Copied

For Pod Security Admission:

kubectl label namespace itrs pod-security.kubernetes.io/enforce=restricted --overwrite

# Verify
kubectl get namespace itrs -o jsonpath='{.metadata.labels.pod-security\.kubernetes\.io/enforce}'
# Should show: restricted

For Gatekeeper:

Remove itrs from excludedNamespaces. Find the index of “itrs” in the array:

# Get the index for capabilities constraint
ITRS_INDEX_CAP=$(kubectl get k8spspcapabilities drop-all -o jsonpath='{.spec.match.excludedNamespaces}' | tr ',' '\n' | nl -v 0 | grep -n itrs | cut -d: -f1 | head -1)
# Arrays are 0-indexed, but nl starts at 0, so subtract 1
kubectl patch k8spspcapabilities drop-all --type=json -p="[{\"op\": \"remove\", \"path\": \"/spec/match/excludedNamespaces/$((ITRS_INDEX_CAP-1))\"}]"

# Repeat for other constraints (e.g., allowed users)
ITRS_INDEX_USR=$(kubectl get k8spspallowedusers jpmc-allowed-user-ranges -o jsonpath='{.spec.match.excludedNamespaces}' | tr ',' '\n' | nl -v 0 | grep -n itrs | cut -d: -f1 | head -1)
kubectl patch k8spspallowedusers jpmc-allowed-user-ranges --type=json -p="[{\"op\": \"remove\", \"path\": \"/spec/match/excludedNamespaces/$((ITRS_INDEX_USR-1))\"}]"

Final verification:

# Check for constraint violations
kubectl get constraints -A
# Should show 0 violations in itrs namespace

# Verify pods still running after policy re-enablement
kubectl get pods -n itrs

Important notes Copied

Plan for downtime: Database pods will restart during the migration
Large databases: Migration time increases with data size (can take 10+ minutes for large datasets)
Security window: Minimize time with relaxed policies - complete procedure as quickly as possible
Schedule appropriately: Perform during maintenance window
Alternative approach: Deploy a fresh installation with correct UIDs if feasible

Why this complexity? Copied

PostgreSQL, TimescaleDB, etcd, and Kafka require their data directories to be owned by the specific UID running the process. Changing UIDs requires recursively changing file ownership on potentially large persistent volumes, which:

Requires running as root
Requires elevated Linux capabilities (CHOWN, FOWNER, DAC_OVERRIDE)
Takes time proportional to data size
Cannot be automated while strict security policies are active

Best practice: Configure the correct runAsUser and fsGroup before initial deployment to avoid this complexity.

Previous article Next article

Manual recovery: Change user IDs with strict security policies

Prerequisites Copied

Recovery procedure Copied

Step 1: Update security context configuration Copied

Step 2: Temporarily relax security policies Copied

Step 3: Run ownership migration jobs Copied

Step 4: Verify system health Copied

Step 5: Re-enable security policies Copied

Important notes Copied

Why this complexity? Copied

Was this topic helpful?

Your thoughts...

How can we improve this topic?

Your thoughts...

Thank you for your feedback!