Manual recovery: Change user IDs with strict security policies

Overview Copied

Important
Please contact ITRS Support for guidance before performing this procedure.

This procedure is applicable in the following scenarios:

You need to change runAsUser on an existing installation.
Your cluster enforces Pod Security Standards (PSS) or implements Gatekeeper or Kyverno policies.
The ownership migration requires privileged operations that conflict with your security policies.

Changing the runAsUser requires updating file ownership on all persistent volumes used by the database. This operation involves running privileged jobs (as root with elevated capabilities) to modify file ownership. These jobs cannot run while strict security policies are enforced, so it is important to temporarily relax relevant policies during the migration process.

Migration consideration Copied

Plan for downtime — database pods will restart during the migration.
Large databases — migration time increases with data size (can take 10+ minutes for large datasets).
Security window — minimize time with relaxed policies, complete procedure as quickly as possible.
Schedule appropriately — perform during maintenance window.
Alternative approach — deploy a fresh installation with correct UIDs if feasible.

Why is this procedure necessary? Copied

Database services (PostgreSQL, TimescaleDB, etcd, and Kafka) require data directories to be owned by the UID running the process. When changing runAsUser, you must thoroughly update file ownership on all persistent volumes. This operation:

Must run as root.
Requires elevated Linux capabilities (CHOWN, FOWNER, DAC_OVERRIDE)
Takes time proportional to the volume data size.
Cannot execute while strict security policies are enforced.

As a best practice, set the correct runAsUser and fsGroup during initial deployment to avoid this complexity.

What gets migrated? Copied

The migration jobs update all persistent volumes associated with PostgreSQL:

PGDATA — main database data directory (mandatory).
Tablespaces — optional additional storage volumes for database data (if any are configured).

Note
All volumes must have their ownership updated when runAsUser changes. Failing to do so will result in PostgreSQL failing to start due to permission errors.

Prerequisites Copied

Before starting this procedure, ensure you have:

Established contact with an ITRS Support engineer.
Cluster administrator access.
Scheduled a maintenance window (database pods will restart).
Documented your current and target UID/GID values.

Recovery procedure Copied

Update security context configuration Copied

Update your Helm values file with new UIDs. It’s critical that you update both runAsUser and fsGroup.

# values.yaml
securityContext:
  pod:
    runAsUser: 5000      # New UID
    runAsGroup: 5000     # New GID
    supplementalGroups: [5000]
    fsGroup: 5000        # MUST change with runAsUser
    fsGroupChangePolicy: OnRootMismatch

Apply the configuration using Helm.

helm upgrade iax itrs/iax-platform \
  --namespace itrs \
  --values values.yaml \
  --wait

Warning
At this stage, pods will fail to start because of ownership mismatch. This is normal and expected.

Temporarily relax security policies Copied

For Pod Security Admission, update the namespace label to temporarily permit privileged operations

# Change namespace from restricted to privileged temporarily
kubectl label namespace itrs pod-security.kubernetes.io/enforce=privileged --overwrite

# Verify
kubectl get namespace itrs -o jsonpath='{.metadata.labels.pod-security\.kubernetes\.io/enforce}'
# Should show: privileged

For Gatekeeper, temporarily exclude the namespace from constraint enforcement.

# Get your constraint names
kubectl get constraints -A

# Add itrs namespace to excludedNamespaces for each constraint
# Example for K8sPSPCapabilities constraint named "drop-all":
kubectl patch k8spspcapabilities drop-all --type=json -p='[
  {
    "op": "add",
    "path": "/spec/match/excludedNamespaces/-",
    "value": "itrs"
  }
]'

# Repeat for other constraints (e.g., K8sPSPAllowedUsers named "jpmc-allowed-user-ranges")
kubectl patch k8spspallowedusers jpmc-allowed-user-ranges --type=json -p='[
  {
    "op": "add",
    "path": "/spec/match/excludedNamespaces/-",
    "value": "itrs"
  }
]'

# Verify
kubectl get constraints -o yaml | grep -A 5 excludedNamespaces
# Should show 'itrs' in the list

Run ownership migration jobs Copied

Create migration job manifests for PostgreSQL and TimescaleDB.

Start with postgres-migration-job-0.yaml, which should be used for a single replica setup or for replica 0 in a high-availability (HA) configuration.

apiVersion: batch/v1
kind: Job
metadata:
  name: postgres-ownership-migration-0
  namespace: itrs
  annotations:
    iax.itrsgroup.com/delete-after-install: "true"
spec:
  backoffLimit: 3
  activeDeadlineSeconds: 21600
  template:
    metadata:
      labels:
        iax.itrsgroup.com/pre-upgrade-job: "true"
    spec:
      restartPolicy: Never
      securityContext:
        runAsUser: 0
        runAsGroup: 0
      containers:
      - name: fix-ownership
        image: proxy.itrsgroup.com/proxy/itrs-analytics/docker.itrsgroup.com/iax/postgres:<VERSION>  # Replace <VERSION> with your platform version tag
        imagePullPolicy: IfNotPresent
        securityContext:
          runAsNonRoot: false
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - CHOWN
            - FOWNER
            - DAC_OVERRIDE
            - DAC_READ_SEARCH
            drop:
            - ALL
          seccompProfile:
            type: RuntimeDefault
        env:
        - name: TARGET_UID
          value: "5000"  # Your new UID
        - name: TARGET_GID
          value: "5000"  # Your new GID
        - name: REPLICA_INDEX
          value: "0"
        command:
        - /bin/bash
        - -c
        - |
          set -e

          echo "Starting ownership migration for PostgreSQL replica ${REPLICA_INDEX}"
          echo "Target ownership: ${TARGET_UID}:${TARGET_GID}"

          PGDATA="/var/lib/postgresql/data"

          # Check if data directory is empty (fresh install, skip migration)
          if [ ! -d "${PGDATA}" ] || [ -z "$(ls -A ${PGDATA} 2>/dev/null)" ]; then
            echo "Data directory is empty - skipping migration (fresh install)"
            exit 0
          fi

          # Get current ownership of PGDATA
          CURRENT_UID=$(stat -c '%u' "${PGDATA}")
          CURRENT_GID=$(stat -c '%g' "${PGDATA}")

          echo "Current PGDATA ownership: ${CURRENT_UID}:${CURRENT_GID}"

          # Check if migration is needed
          if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then
            echo "Ownership already correct - skipping migration"
            exit 0
          else
            echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
            echo "Changing ownership (may take several minutes for large databases)..."
            chown -R "${TARGET_UID}:${TARGET_GID}" "${PGDATA}"
            echo "Migration completed"
          fi

          echo ""
          echo "Migration completed successfully for replica ${REPLICA_INDEX}"
          echo "Final ownership: $(stat -c '%u:%g' ${PGDATA})"
          echo "Note: Permissions will be fixed by the pod startup script"
        volumeMounts:
        - name: pgdata
          mountPath: /var/lib/postgresql
      volumes:
      - name: pgdata
        persistentVolumeClaim:
          claimName: data-pgplatform-0

Create the migration job manifest for TimescaleDB using timescale-migration-job-0.yaml, intended for a single replica setup or for replica 0 in a high-availability (HA) configuration.

apiVersion: batch/v1
kind: Job
metadata:
  name: timescale-ownership-migration-0
  namespace: itrs
  annotations:
    iax.itrsgroup.com/delete-after-install: "true"
spec:
  backoffLimit: 3
  activeDeadlineSeconds: 21600
  template:
    metadata:
      labels:
        iax.itrsgroup.com/pre-upgrade-job: "true"
    spec:
      restartPolicy: Never
      securityContext:
        runAsUser: 0
        runAsGroup: 0
      containers:
      - name: fix-ownership
        image: proxy.itrsgroup.com/proxy/itrs-analytics/docker.itrsgroup.com/iax/timescale:<VERSION>  # Replace <VERSION> with your platform version tag
        imagePullPolicy: IfNotPresent
        securityContext:
          runAsNonRoot: false
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - CHOWN
            - FOWNER
            - DAC_OVERRIDE
            - DAC_READ_SEARCH
            drop:
            - ALL
          seccompProfile:
            type: RuntimeDefault
        env:
        - name: TARGET_UID
          value: "5000"  # Your new UID
        - name: TARGET_GID
          value: "5000"  # Your new GID
        - name: REPLICA_INDEX
          value: "0"
        command:
        - /bin/bash
        - -c
        - |
          set -e

          echo "Starting ownership migration for TimescaleDB replica ${REPLICA_INDEX}"
          echo "Target ownership: ${TARGET_UID}:${TARGET_GID}"
          echo ""

          MIGRATION_NEEDED=false

          # Process data volume
          PGDATA="/var/lib/postgresql/data"
          echo "Processing data volume: ${PGDATA}"

          if [ ! -d "${PGDATA}" ] || [ -z "$(ls -A ${PGDATA} 2>/dev/null)" ]; then
            echo "Data directory is empty - skipping (fresh install)"
          else
            # Get current ownership
            CURRENT_UID=$(stat -c '%u' "${PGDATA}")
            CURRENT_GID=$(stat -c '%g' "${PGDATA}")
            echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}"

            if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then
              echo "Ownership already correct"
            else
              echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
              echo "Changing ownership (may take several minutes for large databases)..."
              MIGRATION_NEEDED=true
              chown -R "${TARGET_UID}:${TARGET_GID}" "${PGDATA}"
              echo "Data volume migration completed"
            fi
          fi
          echo ""

          # Process WAL volume
          PGWAL="/var/lib/postgresql/wal/pg_wal"
          echo "Processing WAL volume: ${PGWAL}"

          if [ ! -d "${PGWAL}" ] || [ -z "$(ls -A ${PGWAL} 2>/dev/null)" ]; then
            echo "WAL directory is empty - skipping (fresh install)"
          else
            # Get current ownership
            CURRENT_UID=$(stat -c '%u' "${PGWAL}")
            CURRENT_GID=$(stat -c '%g' "${PGWAL}")
            echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}"

            if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then
              echo "Ownership already correct"
            else
              echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
              MIGRATION_NEEDED=true

              echo "Changing WAL ownership (this is fast)..."
              chown -R "${TARGET_UID}:${TARGET_GID}" "${PGWAL}"
              echo "WAL volume migration completed"
            fi
          fi
          echo ""

          # Process tablespace volumes (if configured)
          # Uncomment and adjust based on your timescale.timeseriesDiskCount setting
          # Example below assumes timeseriesDiskCount=2:
          #
          # TABLESPACE="/var/lib/postgresql/tablespaces/timeseries_tablespace_1"
          # echo "Processing tablespace volume 1: ${TABLESPACE}"
          # if [ ! -d "${TABLESPACE}/data" ] || [ -z "$(ls -A ${TABLESPACE}/data 2>/dev/null)" ]; then
          #   echo "Tablespace 1 is empty - skipping"
          # else
          #   CURRENT_UID=$(stat -c '%u' "${TABLESPACE}/data")
          #   CURRENT_GID=$(stat -c '%g' "${TABLESPACE}/data")
          #   echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}"
          #   if [ "${CURRENT_UID}" != "${TARGET_UID}" ] || [ "${CURRENT_GID}" != "${TARGET_GID}" ]; then
          #     echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
          #     MIGRATION_NEEDED=true
          #     echo "Changing tablespace 1 ownership..."
          #     chown -R "${TARGET_UID}:${TARGET_GID}" "${TABLESPACE}"
          #     echo "Tablespace 1 migration completed"
          #   else
          #     echo "Ownership already correct"
          #   fi
          # fi
          # echo ""
          # (Repeat for tablespace 2, 3, etc. based on your timeseriesDiskCount)

          if [ "${MIGRATION_NEEDED}" = "false" ]; then
            echo "No migration needed for replica ${REPLICA_INDEX}"
          else
            echo "Replica ${REPLICA_INDEX} migrated successfully"
          fi
          echo "Note: Permissions will be fixed by the pod startup script"
        volumeMounts:
        - name: tsdata
          mountPath: /var/lib/postgresql
        - name: tswal
          mountPath: /var/lib/postgresql/wal
        # Uncomment and add tablespace volumes if configured:
        # - name: tablespace-1
        #   mountPath: /var/lib/postgresql/tablespaces/timeseries_tablespace_1
        # - name: tablespace-2
        #   mountPath: /var/lib/postgresql/tablespaces/timeseries_tablespace_2
      volumes:
      - name: tsdata
        persistentVolumeClaim:
          claimName: timescale-ha-data-timescale-0
      - name: tswal
        persistentVolumeClaim:
          claimName: timescale-ha-wal-timescale-0
      # Uncomment and add tablespace PVCs if configured:
      # - name: tablespace-1
      #   persistentVolumeClaim:
      #     claimName: timescale-ha-tablespace-data-1-timescale-0
      # - name: tablespace-2
      #   persistentVolumeClaim:
      #     claimName: timescale-ha-tablespace-data-2-timescale-0

Update the job manifests Copied

Before applying the migration jobs, make sure to update the job manifests to reflect your environment-specific settings, such as database names, credentials, namespaces, and any custom configuration required for your PostgreSQL or TimescaleDB deployment.

Replace <VERSION> in the image field with your actual platform version tag (for example, 2.17.0).
Replace TARGET_UID and TARGET_GID environment variable values with your target values (5000 in the example above).
For TimescaleDB:
- If you have tablespace volumes configured (timescale.timeseriesDiskCount > 0), uncomment the tablespace sections in the script and add the corresponding volumeMounts and volumes.
- Add one TABLESPACE section per tablespace (1, 2, 3, etc. based on your timeseriesDiskCount).
For HA configurations (multiple replicas):
- Create separate job manifests for each replica (replica 0, 1, 2).
- Update REPLICA_INDEX environment variable: "0", "1", "2", and others.
- Update job names: postgres-ownership-migration-0, postgres-ownership-migration-1, and others.
- Update claimName suffixes in volumes:
  - PostgreSQL: data-pgplatform-0, data-pgplatform-1, and others.
  - TimescaleDB: timescale-ha-data-timescale-0, timescale-ha-data-timescale-1.
  - TimescaleDB WAL: timescale-ha-wal-timescale-0, timescale-ha-wal-timescale-1.
  - TimescaleDB tablespaces: timescale-ha-tablespace-data-1-timescale-0, timescale-ha-tablespace-data-1-timescale-1.

Apply the jobs Copied

Apply the migration job manifests to run the ownership changes. For single-replica setups, apply one job per database. For HA configurations, apply separate jobs for each replica.

# Single replica setup:
kubectl apply -f postgres-migration-job-0.yaml
kubectl apply -f timescale-migration-job-0.yaml

# For HA setups, apply jobs for all replicas:
kubectl apply -f postgres-migration-job-0.yaml
kubectl apply -f postgres-migration-job-1.yaml
# ... repeat for additional replicas
kubectl apply -f timescale-migration-job-0.yaml
kubectl apply -f timescale-migration-job-1.yaml
# ... repeat for additional replicas

# Wait for all jobs to complete
kubectl wait --for=condition=complete --timeout=600s job/postgres-ownership-migration-0 -n itrs
kubectl wait --for=condition=complete --timeout=600s job/timescale-ownership-migration-0 -n itrs
# ... wait for all replica jobs

# Check job logs to verify success
kubectl logs -n itrs job/postgres-ownership-migration-0
kubectl logs -n itrs job/timescale-ownership-migration-0
# ... check logs for all replica jobs

Scaling StatefulSets to apply new ownership Copied

To ensure that the updated app ownership takes effect, you need to scale down and then scale up the relevant StatefulSets.

# Scale down (adjust replicas based on your HA configuration)
kubectl scale statefulset pgplatform -n itrs --replicas=0
kubectl scale statefulset timescale -n itrs --replicas=0

# Wait for pods to terminate
kubectl wait --for=delete pod -l app=pgplatform -n itrs --timeout=120s
kubectl wait --for=delete pod -l app=timescale -n itrs --timeout=120s

# Scale back up (use your original replica count)
kubectl scale statefulset pgplatform -n itrs --replicas=1  # or 2, 3, etc. for HA
kubectl scale statefulset timescale -n itrs --replicas=1   # or 2, 3, etc. for HA

# Wait for platform to be ready
sleep 60
kubectl wait --timeout=600s iaxplatform/iax-iax-platform --for jsonpath="{.status.status}"=DEPLOYED -n itrs

Verify system health Copied

After the migration jobs complete and pods restart, verify that the ownership changes were applied correctly and that all database services are functioning properly.

Run the following checks to ensure the system is healthy:

# Verify all pods are running
kubectl get pods -n itrs
# All pods should be in Running state

# Verify ownership changed on volumes
kubectl exec -n itrs pgplatform-0 -c postgres -- stat -c '%u:%g' /var/lib/postgresql/data
# Should show: 5000:5000 (or your target UID:GID)

kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/data
kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/wal/pg_wal
# Should show: 5000:5000

# If using tablespaces, verify tablespace ownership:
# kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/tablespaces/timeseries_tablespace_1/data

# Verify correct UIDs inside containers
kubectl exec -n itrs pgplatform-0 -c postgres -- id
kubectl exec -n itrs timescale-0 -c timescale -- id
# Should show: uid=5000 gid=5000 (or your target UIDs)

# Test database connectivity
kubectl exec -n itrs pgplatform-0 -c postgres -- psql -U postgres -c "SELECT 1"
kubectl exec -n itrs timescale-0 -c timescale -- psql -U postgres -c "SELECT 1"
# Should return: 1

# Verify no errors in logs
kubectl logs -n itrs pgplatform-0 -c postgres --tail=50
kubectl logs -n itrs timescale-0 -c timescale --tail=50

Re-enable security policies Copied

For Pod Security Admission, restore the namespace to restricted mode.

kubectl label namespace itrs pod-security.kubernetes.io/enforce=restricted --overwrite

# Verify
kubectl get namespace itrs -o jsonpath='{.metadata.labels.pod-security\.kubernetes\.io/enforce}'
# Should show: restricted

For Gatekeeper, remove itrs from excludedNamespaces by locating its index in the array.

# Get the index for capabilities constraint
ITRS_INDEX_CAP=$(kubectl get k8spspcapabilities drop-all -o jsonpath='{.spec.match.excludedNamespaces}' | tr ',' '\n' | nl -v 0 | grep -n itrs | cut -d: -f1 | head -1)
# Arrays are 0-indexed, but nl starts at 0, so subtract 1
kubectl patch k8spspcapabilities drop-all --type=json -p="[{\"op\": \"remove\", \"path\": \"/spec/match/excludedNamespaces/$((ITRS_INDEX_CAP-1))\"}]"

# Repeat for other constraints (e.g., allowed users)
ITRS_INDEX_USR=$(kubectl get k8spspallowedusers jpmc-allowed-user-ranges -o jsonpath='{.spec.match.excludedNamespaces}' | tr ',' '\n' | nl -v 0 | grep -n itrs | cut -d: -f1 | head -1)
kubectl patch k8spspallowedusers jpmc-allowed-user-ranges --type=json -p="[{\"op\": \"remove\", \"path\": \"/spec/match/excludedNamespaces/$((ITRS_INDEX_USR-1))\"}]"

Run the following command to check for any constraint violations across all namespaces.

# Check for constraint violations
kubectl get constraints -A
# Should show 0 violations in itrs namespace

# Verify pods still running after policy re-enablement
kubectl get pods -n itrs

Previous article Next article

Manual recovery: Change user IDs with strict security policies

Overview Copied

Migration consideration Copied

Why is this procedure necessary? Copied

What gets migrated? Copied

Prerequisites Copied

Recovery procedure Copied

Update security context configuration Copied

Temporarily relax security policies Copied

Run ownership migration jobs Copied

Update the job manifests Copied

Apply the jobs Copied

Scaling StatefulSets to apply new ownership Copied

Verify system health Copied

Re-enable security policies Copied

Was this topic helpful?

Your thoughts...

How can we improve this topic?

Your thoughts...

Thank you for your feedback!