Manual recovery: Change user IDs with strict security policies

Overview Copied

Important

These procedures apply only to ITRS Analytics installations deployed using Helm.

Please contact ITRS Support for guidance before performing this procedure.

This procedure is applicable in the following scenarios:

Changing the runAsUser requires updating file ownership on all persistent volumes used by the database. This operation involves running privileged jobs (as root with elevated capabilities) to modify file ownership. These jobs cannot run while strict security policies are enforced, so it is important to temporarily relax relevant policies during the migration process.

Migration consideration Copied

  1. Plan for downtime — database pods will restart during the migration.
  2. Large databases — migration time increases with data size (can take 10+ minutes for large datasets).
  3. Security window — minimize time with relaxed policies, complete procedure as quickly as possible.
  4. Schedule appropriately — perform during maintenance window.
  5. Alternative approach — deploy a fresh installation with correct UIDs if feasible.

Why is this procedure necessary? Copied

Database services (PostgreSQL, TimescaleDB, etcd, and Kafka) require data directories to be owned by the UID running the process. When changing runAsUser, you must thoroughly update file ownership on all persistent volumes. This operation:

As a best practice, set the correct runAsUser and fsGroup during initial deployment to avoid this complexity.

What gets migrated? Copied

The migration jobs update all persistent volumes associated with PostgreSQL:

Note

All volumes must have their ownership updated when runAsUser changes. Failing to do so will result in PostgreSQL failing to start due to permission errors.

Prerequisites Copied

Before starting this procedure, ensure you have:

Recovery procedure Copied

Update security context configuration Copied

  1. Update your Helm values file with new UIDs. It’s critical that you update both runAsUser and fsGroup.

    # values.yaml
    securityContext:
      pod:
        runAsUser: 5000      # New UID
        runAsGroup: 5000     # New GID
        supplementalGroups: [5000]
        fsGroup: 5000        # MUST change with runAsUser
        fsGroupChangePolicy: OnRootMismatch
    
  2. Apply the configuration using Helm.

    helm upgrade iax itrs/iax-platform \
      --namespace itrs \
      --values values.yaml \
      --wait
    

Warning

At this stage, pods will fail to start because of ownership mismatch. This is normal and expected.

Temporarily relax security policies Copied

  1. For Pod Security Admission, update the namespace label to temporarily permit privileged operations

    # Change namespace from restricted to privileged temporarily
    kubectl label namespace itrs pod-security.kubernetes.io/enforce=privileged --overwrite
    
    # Verify
    kubectl get namespace itrs -o jsonpath='{.metadata.labels.pod-security\.kubernetes\.io/enforce}'
    # Should show: privileged
    
  2. For Gatekeeper, temporarily exclude the namespace from constraint enforcement.

    # Get your constraint names
    kubectl get constraints -A
    
    # Add itrs namespace to excludedNamespaces for each constraint
    # Example for K8sPSPCapabilities constraint named "drop-all":
    kubectl patch k8spspcapabilities drop-all --type=json -p='[
      {
        "op": "add",
        "path": "/spec/match/excludedNamespaces/-",
        "value": "itrs"
      }
    ]'
    
    # Repeat for other constraints (e.g., K8sPSPAllowedUsers named "jpmc-allowed-user-ranges")
    kubectl patch k8spspallowedusers jpmc-allowed-user-ranges --type=json -p='[
      {
        "op": "add",
        "path": "/spec/match/excludedNamespaces/-",
        "value": "itrs"
      }
    ]'
    
    # Verify
    kubectl get constraints -o yaml | grep -A 5 excludedNamespaces
    # Should show 'itrs' in the list
    

Run ownership migration jobs Copied

  1. Create migration job manifests for PostgreSQL and TimescaleDB.

  2. Start with postgres-migration-job-0.yaml, which should be used for a single replica setup or for replica 0 in a high-availability (HA) configuration.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: postgres-ownership-migration-0
      namespace: itrs
      annotations:
        iax.itrsgroup.com/delete-after-install: "true"
    spec:
      backoffLimit: 3
      activeDeadlineSeconds: 21600
      template:
        metadata:
          labels:
            iax.itrsgroup.com/pre-upgrade-job: "true"
        spec:
          restartPolicy: Never
          securityContext:
            runAsUser: 0
            runAsGroup: 0
          containers:
          - name: fix-ownership
            image: proxy.itrsgroup.com/proxy/itrs-analytics/docker.itrsgroup.com/iax/postgres:<VERSION>  # Replace <VERSION> with your platform version tag
            imagePullPolicy: IfNotPresent
            securityContext:
              runAsNonRoot: false
              allowPrivilegeEscalation: false
              capabilities:
                add:
                - CHOWN
                - FOWNER
                - DAC_OVERRIDE
                - DAC_READ_SEARCH
                drop:
                - ALL
              seccompProfile:
                type: RuntimeDefault
            env:
            - name: TARGET_UID
              value: "5000"  # Your new UID
            - name: TARGET_GID
              value: "5000"  # Your new GID
            - name: REPLICA_INDEX
              value: "0"
            command:
            - /bin/bash
            - -c
            - |
              set -e
    
              echo "Starting ownership migration for PostgreSQL replica ${REPLICA_INDEX}"
              echo "Target ownership: ${TARGET_UID}:${TARGET_GID}"
    
              PGDATA="/var/lib/postgresql/data"
    
              # Check if data directory is empty (fresh install, skip migration)
              if [ ! -d "${PGDATA}" ] || [ -z "$(ls -A ${PGDATA} 2>/dev/null)" ]; then
                echo "Data directory is empty - skipping migration (fresh install)"
                exit 0
              fi
    
              # Get current ownership of PGDATA
              CURRENT_UID=$(stat -c '%u' "${PGDATA}")
              CURRENT_GID=$(stat -c '%g' "${PGDATA}")
    
              echo "Current PGDATA ownership: ${CURRENT_UID}:${CURRENT_GID}"
    
              # Check if migration is needed
              if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then
                echo "Ownership already correct - skipping migration"
                exit 0
              else
                echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
                echo "Changing ownership (may take several minutes for large databases)..."
                chown -R "${TARGET_UID}:${TARGET_GID}" "${PGDATA}"
                echo "Migration completed"
              fi
    
              echo ""
              echo "Migration completed successfully for replica ${REPLICA_INDEX}"
              echo "Final ownership: $(stat -c '%u:%g' ${PGDATA})"
              echo "Note: Permissions will be fixed by the pod startup script"
            volumeMounts:
            - name: pgdata
              mountPath: /var/lib/postgresql
          volumes:
          - name: pgdata
            persistentVolumeClaim:
              claimName: data-pgplatform-0
    
  3. Create the migration job manifest for TimescaleDB using timescale-migration-job-0.yaml, intended for a single replica setup or for replica 0 in a high-availability (HA) configuration.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: timescale-ownership-migration-0
      namespace: itrs
      annotations:
        iax.itrsgroup.com/delete-after-install: "true"
    spec:
      backoffLimit: 3
      activeDeadlineSeconds: 21600
      template:
        metadata:
          labels:
            iax.itrsgroup.com/pre-upgrade-job: "true"
        spec:
          restartPolicy: Never
          securityContext:
            runAsUser: 0
            runAsGroup: 0
          containers:
          - name: fix-ownership
            image: proxy.itrsgroup.com/proxy/itrs-analytics/docker.itrsgroup.com/iax/timescale:<VERSION>  # Replace <VERSION> with your platform version tag
            imagePullPolicy: IfNotPresent
            securityContext:
              runAsNonRoot: false
              allowPrivilegeEscalation: false
              capabilities:
                add:
                - CHOWN
                - FOWNER
                - DAC_OVERRIDE
                - DAC_READ_SEARCH
                drop:
                - ALL
              seccompProfile:
                type: RuntimeDefault
            env:
            - name: TARGET_UID
              value: "5000"  # Your new UID
            - name: TARGET_GID
              value: "5000"  # Your new GID
            - name: REPLICA_INDEX
              value: "0"
            command:
            - /bin/bash
            - -c
            - |
              set -e
    
              echo "Starting ownership migration for TimescaleDB replica ${REPLICA_INDEX}"
              echo "Target ownership: ${TARGET_UID}:${TARGET_GID}"
              echo ""
    
              MIGRATION_NEEDED=false
    
              # Process data volume
              PGDATA="/var/lib/postgresql/data"
              echo "Processing data volume: ${PGDATA}"
    
              if [ ! -d "${PGDATA}" ] || [ -z "$(ls -A ${PGDATA} 2>/dev/null)" ]; then
                echo "Data directory is empty - skipping (fresh install)"
              else
                # Get current ownership
                CURRENT_UID=$(stat -c '%u' "${PGDATA}")
                CURRENT_GID=$(stat -c '%g' "${PGDATA}")
                echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}"
    
                if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then
                  echo "Ownership already correct"
                else
                  echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
                  echo "Changing ownership (may take several minutes for large databases)..."
                  MIGRATION_NEEDED=true
                  chown -R "${TARGET_UID}:${TARGET_GID}" "${PGDATA}"
                  echo "Data volume migration completed"
                fi
              fi
              echo ""
    
              # Process WAL volume
              PGWAL="/var/lib/postgresql/wal/pg_wal"
              echo "Processing WAL volume: ${PGWAL}"
    
              if [ ! -d "${PGWAL}" ] || [ -z "$(ls -A ${PGWAL} 2>/dev/null)" ]; then
                echo "WAL directory is empty - skipping (fresh install)"
              else
                # Get current ownership
                CURRENT_UID=$(stat -c '%u' "${PGWAL}")
                CURRENT_GID=$(stat -c '%g' "${PGWAL}")
                echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}"
    
                if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then
                  echo "Ownership already correct"
                else
                  echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
                  MIGRATION_NEEDED=true
    
                  echo "Changing WAL ownership (this is fast)..."
                  chown -R "${TARGET_UID}:${TARGET_GID}" "${PGWAL}"
                  echo "WAL volume migration completed"
                fi
              fi
              echo ""
    
              # Process tablespace volumes (if configured)
              # Uncomment and adjust based on your timescale.timeseriesDiskCount setting
              # Example below assumes timeseriesDiskCount=2:
              #
              # TABLESPACE="/var/lib/postgresql/tablespaces/timeseries_tablespace_1"
              # echo "Processing tablespace volume 1: ${TABLESPACE}"
              # if [ ! -d "${TABLESPACE}/data" ] || [ -z "$(ls -A ${TABLESPACE}/data 2>/dev/null)" ]; then
              #   echo "Tablespace 1 is empty - skipping"
              # else
              #   CURRENT_UID=$(stat -c '%u' "${TABLESPACE}/data")
              #   CURRENT_GID=$(stat -c '%g' "${TABLESPACE}/data")
              #   echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}"
              #   if [ "${CURRENT_UID}" != "${TARGET_UID}" ] || [ "${CURRENT_GID}" != "${TARGET_GID}" ]; then
              #     echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}"
              #     MIGRATION_NEEDED=true
              #     echo "Changing tablespace 1 ownership..."
              #     chown -R "${TARGET_UID}:${TARGET_GID}" "${TABLESPACE}"
              #     echo "Tablespace 1 migration completed"
              #   else
              #     echo "Ownership already correct"
              #   fi
              # fi
              # echo ""
              # (Repeat for tablespace 2, 3, etc. based on your timeseriesDiskCount)
    
              if [ "${MIGRATION_NEEDED}" = "false" ]; then
                echo "No migration needed for replica ${REPLICA_INDEX}"
              else
                echo "Replica ${REPLICA_INDEX} migrated successfully"
              fi
              echo "Note: Permissions will be fixed by the pod startup script"
            volumeMounts:
            - name: tsdata
              mountPath: /var/lib/postgresql
            - name: tswal
              mountPath: /var/lib/postgresql/wal
            # Uncomment and add tablespace volumes if configured:
            # - name: tablespace-1
            #   mountPath: /var/lib/postgresql/tablespaces/timeseries_tablespace_1
            # - name: tablespace-2
            #   mountPath: /var/lib/postgresql/tablespaces/timeseries_tablespace_2
          volumes:
          - name: tsdata
            persistentVolumeClaim:
              claimName: timescale-ha-data-timescale-0
          - name: tswal
            persistentVolumeClaim:
              claimName: timescale-ha-wal-timescale-0
          # Uncomment and add tablespace PVCs if configured:
          # - name: tablespace-1
          #   persistentVolumeClaim:
          #     claimName: timescale-ha-tablespace-data-1-timescale-0
          # - name: tablespace-2
          #   persistentVolumeClaim:
          #     claimName: timescale-ha-tablespace-data-2-timescale-0
    

Update the job manifests Copied

Before applying the migration jobs, make sure to update the job manifests to reflect your environment-specific settings, such as database names, credentials, namespaces, and any custom configuration required for your PostgreSQL or TimescaleDB deployment.

  1. Replace <VERSION> in the image field with your actual platform version tag (for example, 2.17.0).
  2. Replace TARGET_UID and TARGET_GID environment variable values with your target values (5000 in the example above).
  3. For TimescaleDB:
    • If you have tablespace volumes configured (timescale.timeseriesDiskCount > 0), uncomment the tablespace sections in the script and add the corresponding volumeMounts and volumes.
    • Add one TABLESPACE section per tablespace (1, 2, 3, etc. based on your timeseriesDiskCount).
  4. For HA configurations (multiple replicas):
    • Create separate job manifests for each replica (replica 0, 1, 2).
    • Update REPLICA_INDEX environment variable: "0", "1", "2", and others.
    • Update job names: postgres-ownership-migration-0, postgres-ownership-migration-1, and others.
    • Update claimName suffixes in volumes:
      • PostgreSQL: data-pgplatform-0, data-pgplatform-1, and others.
      • TimescaleDB: timescale-ha-data-timescale-0, timescale-ha-data-timescale-1.
      • TimescaleDB WAL: timescale-ha-wal-timescale-0, timescale-ha-wal-timescale-1.
      • TimescaleDB tablespaces: timescale-ha-tablespace-data-1-timescale-0, timescale-ha-tablespace-data-1-timescale-1.

Apply the jobs Copied

Apply the migration job manifests to run the ownership changes. For single-replica setups, apply one job per database. For HA configurations, apply separate jobs for each replica.

# Single replica setup:
kubectl apply -f postgres-migration-job-0.yaml
kubectl apply -f timescale-migration-job-0.yaml

# For HA setups, apply jobs for all replicas:
kubectl apply -f postgres-migration-job-0.yaml
kubectl apply -f postgres-migration-job-1.yaml
# ... repeat for additional replicas
kubectl apply -f timescale-migration-job-0.yaml
kubectl apply -f timescale-migration-job-1.yaml
# ... repeat for additional replicas

# Wait for all jobs to complete
kubectl wait --for=condition=complete --timeout=600s job/postgres-ownership-migration-0 -n itrs
kubectl wait --for=condition=complete --timeout=600s job/timescale-ownership-migration-0 -n itrs
# ... wait for all replica jobs

# Check job logs to verify success
kubectl logs -n itrs job/postgres-ownership-migration-0
kubectl logs -n itrs job/timescale-ownership-migration-0
# ... check logs for all replica jobs

Scaling StatefulSets to apply new ownership Copied

To ensure that the updated app ownership takes effect, you need to scale down and then scale up the relevant StatefulSets.

# Scale down (adjust replicas based on your HA configuration)
kubectl scale statefulset pgplatform -n itrs --replicas=0
kubectl scale statefulset timescale -n itrs --replicas=0

# Wait for pods to terminate
kubectl wait --for=delete pod -l app=pgplatform -n itrs --timeout=120s
kubectl wait --for=delete pod -l app=timescale -n itrs --timeout=120s

# Scale back up (use your original replica count)
kubectl scale statefulset pgplatform -n itrs --replicas=1  # or 2, 3, etc. for HA
kubectl scale statefulset timescale -n itrs --replicas=1   # or 2, 3, etc. for HA

# Wait for platform to be ready
sleep 60
kubectl wait --timeout=600s iaxplatform/iax-iax-platform --for jsonpath="{.status.status}"=DEPLOYED -n itrs

Verify system health Copied

After the migration jobs complete and pods restart, verify that the ownership changes were applied correctly and that all database services are functioning properly.

Run the following checks to ensure the system is healthy:

# Verify all pods are running
kubectl get pods -n itrs
# All pods should be in Running state

# Verify ownership changed on volumes
kubectl exec -n itrs pgplatform-0 -c postgres -- stat -c '%u:%g' /var/lib/postgresql/data
# Should show: 5000:5000 (or your target UID:GID)

kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/data
kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/wal/pg_wal
# Should show: 5000:5000

# If using tablespaces, verify tablespace ownership:
# kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/tablespaces/timeseries_tablespace_1/data

# Verify correct UIDs inside containers
kubectl exec -n itrs pgplatform-0 -c postgres -- id
kubectl exec -n itrs timescale-0 -c timescale -- id
# Should show: uid=5000 gid=5000 (or your target UIDs)

# Test database connectivity
kubectl exec -n itrs pgplatform-0 -c postgres -- psql -U postgres -c "SELECT 1"
kubectl exec -n itrs timescale-0 -c timescale -- psql -U postgres -c "SELECT 1"
# Should return: 1

# Verify no errors in logs
kubectl logs -n itrs pgplatform-0 -c postgres --tail=50
kubectl logs -n itrs timescale-0 -c timescale --tail=50

Re-enable security policies Copied

  1. For Pod Security Admission, restore the namespace to restricted mode.

    kubectl label namespace itrs pod-security.kubernetes.io/enforce=restricted --overwrite
    
    # Verify
    kubectl get namespace itrs -o jsonpath='{.metadata.labels.pod-security\.kubernetes\.io/enforce}'
    # Should show: restricted
    
  2. For Gatekeeper, remove itrs from excludedNamespaces by locating its index in the array.

    # Get the index for capabilities constraint
    ITRS_INDEX_CAP=$(kubectl get k8spspcapabilities drop-all -o jsonpath='{.spec.match.excludedNamespaces}' | tr ',' '\n' | nl -v 0 | grep -n itrs | cut -d: -f1 | head -1)
    # Arrays are 0-indexed, but nl starts at 0, so subtract 1
    kubectl patch k8spspcapabilities drop-all --type=json -p="[{\"op\": \"remove\", \"path\": \"/spec/match/excludedNamespaces/$((ITRS_INDEX_CAP-1))\"}]"
    
    # Repeat for other constraints (e.g., allowed users)
    ITRS_INDEX_USR=$(kubectl get k8spspallowedusers jpmc-allowed-user-ranges -o jsonpath='{.spec.match.excludedNamespaces}' | tr ',' '\n' | nl -v 0 | grep -n itrs | cut -d: -f1 | head -1)
    kubectl patch k8spspallowedusers jpmc-allowed-user-ranges --type=json -p="[{\"op\": \"remove\", \"path\": \"/spec/match/excludedNamespaces/$((ITRS_INDEX_USR-1))\"}]"
    
  3. Run the following command to check for any constraint violations across all namespaces.

    # Check for constraint violations
    kubectl get constraints -A
    # Should show 0 violations in itrs namespace
    
    # Verify pods still running after policy re-enablement
    kubectl get pods -n itrs
    
["ITRS Analytics"] ["User Guide", "Technical Reference"]

Was this topic helpful?