Manual recovery: Change user IDs with strict security policies
Overview Copied
Important
These procedures apply only to ITRS Analytics installations deployed using Helm.
Please contact ITRS Support for guidance before performing this procedure.
This procedure is applicable in the following scenarios:
- You need to change
runAsUseron an existing installation. - Your cluster enforces Pod Security Standards (PSS) or implements Gatekeeper or Kyverno policies.
- The ownership migration requires privileged operations that conflict with your security policies.
Changing the runAsUser requires updating file ownership on all persistent volumes used by the database. This operation involves running privileged jobs (as root with elevated capabilities) to modify file ownership. These jobs cannot run while strict security policies are enforced, so it is important to temporarily relax relevant policies during the migration process.
Migration consideration Copied
- Plan for downtime — database pods will restart during the migration.
- Large databases — migration time increases with data size (can take 10+ minutes for large datasets).
- Security window — minimize time with relaxed policies, complete procedure as quickly as possible.
- Schedule appropriately — perform during maintenance window.
- Alternative approach — deploy a fresh installation with correct UIDs if feasible.
Why is this procedure necessary? Copied
Database services (PostgreSQL, TimescaleDB, etcd, and Kafka) require data directories to be owned by the UID running the process. When changing runAsUser, you must thoroughly update file ownership on all persistent volumes. This operation:
- Must run as root.
- Requires elevated Linux capabilities (
CHOWN,FOWNER,DAC_OVERRIDE) - Takes time proportional to the volume data size.
- Cannot execute while strict security policies are enforced.
As a best practice, set the correct runAsUser and fsGroup during initial deployment to avoid this complexity.
What gets migrated? Copied
The migration jobs update all persistent volumes associated with PostgreSQL:
- PGDATA — main database data directory (mandatory).
- Tablespaces — optional additional storage volumes for database data (if any are configured).
Note
All volumes must have their ownership updated whenrunAsUserchanges. Failing to do so will result in PostgreSQL failing to start due to permission errors.
Prerequisites Copied
Before starting this procedure, ensure you have:
- Established contact with an ITRS Support engineer.
- Cluster administrator access.
- Scheduled a maintenance window (database pods will restart).
- Documented your current and target UID/GID values.
Recovery procedure Copied
Update security context configuration Copied
-
Update your Helm values file with new UIDs. It’s critical that you update both
runAsUserandfsGroup.# values.yaml securityContext: pod: runAsUser: 5000 # New UID runAsGroup: 5000 # New GID supplementalGroups: [5000] fsGroup: 5000 # MUST change with runAsUser fsGroupChangePolicy: OnRootMismatch -
Apply the configuration using Helm.
helm upgrade iax itrs/iax-platform \ --namespace itrs \ --values values.yaml \ --wait
Warning
At this stage, pods will fail to start because of ownership mismatch. This is normal and expected.
Temporarily relax security policies Copied
-
For Pod Security Admission, update the namespace label to temporarily permit privileged operations
# Change namespace from restricted to privileged temporarily kubectl label namespace itrs pod-security.kubernetes.io/enforce=privileged --overwrite # Verify kubectl get namespace itrs -o jsonpath='{.metadata.labels.pod-security\.kubernetes\.io/enforce}' # Should show: privileged -
For Gatekeeper, temporarily exclude the namespace from constraint enforcement.
# Get your constraint names kubectl get constraints -A # Add itrs namespace to excludedNamespaces for each constraint # Example for K8sPSPCapabilities constraint named "drop-all": kubectl patch k8spspcapabilities drop-all --type=json -p='[ { "op": "add", "path": "/spec/match/excludedNamespaces/-", "value": "itrs" } ]' # Repeat for other constraints (e.g., K8sPSPAllowedUsers named "jpmc-allowed-user-ranges") kubectl patch k8spspallowedusers jpmc-allowed-user-ranges --type=json -p='[ { "op": "add", "path": "/spec/match/excludedNamespaces/-", "value": "itrs" } ]' # Verify kubectl get constraints -o yaml | grep -A 5 excludedNamespaces # Should show 'itrs' in the list
Run ownership migration jobs Copied
-
Create migration job manifests for PostgreSQL and TimescaleDB.
-
Start with
postgres-migration-job-0.yaml, which should be used for a single replica setup or for replica 0 in a high-availability (HA) configuration.apiVersion: batch/v1 kind: Job metadata: name: postgres-ownership-migration-0 namespace: itrs annotations: iax.itrsgroup.com/delete-after-install: "true" spec: backoffLimit: 3 activeDeadlineSeconds: 21600 template: metadata: labels: iax.itrsgroup.com/pre-upgrade-job: "true" spec: restartPolicy: Never securityContext: runAsUser: 0 runAsGroup: 0 containers: - name: fix-ownership image: proxy.itrsgroup.com/proxy/itrs-analytics/docker.itrsgroup.com/iax/postgres:<VERSION> # Replace <VERSION> with your platform version tag imagePullPolicy: IfNotPresent securityContext: runAsNonRoot: false allowPrivilegeEscalation: false capabilities: add: - CHOWN - FOWNER - DAC_OVERRIDE - DAC_READ_SEARCH drop: - ALL seccompProfile: type: RuntimeDefault env: - name: TARGET_UID value: "5000" # Your new UID - name: TARGET_GID value: "5000" # Your new GID - name: REPLICA_INDEX value: "0" command: - /bin/bash - -c - | set -e echo "Starting ownership migration for PostgreSQL replica ${REPLICA_INDEX}" echo "Target ownership: ${TARGET_UID}:${TARGET_GID}" PGDATA="/var/lib/postgresql/data" # Check if data directory is empty (fresh install, skip migration) if [ ! -d "${PGDATA}" ] || [ -z "$(ls -A ${PGDATA} 2>/dev/null)" ]; then echo "Data directory is empty - skipping migration (fresh install)" exit 0 fi # Get current ownership of PGDATA CURRENT_UID=$(stat -c '%u' "${PGDATA}") CURRENT_GID=$(stat -c '%g' "${PGDATA}") echo "Current PGDATA ownership: ${CURRENT_UID}:${CURRENT_GID}" # Check if migration is needed if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then echo "Ownership already correct - skipping migration" exit 0 else echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}" echo "Changing ownership (may take several minutes for large databases)..." chown -R "${TARGET_UID}:${TARGET_GID}" "${PGDATA}" echo "Migration completed" fi echo "" echo "Migration completed successfully for replica ${REPLICA_INDEX}" echo "Final ownership: $(stat -c '%u:%g' ${PGDATA})" echo "Note: Permissions will be fixed by the pod startup script" volumeMounts: - name: pgdata mountPath: /var/lib/postgresql volumes: - name: pgdata persistentVolumeClaim: claimName: data-pgplatform-0 -
Create the migration job manifest for TimescaleDB using
timescale-migration-job-0.yaml, intended for a single replica setup or for replica 0 in a high-availability (HA) configuration.apiVersion: batch/v1 kind: Job metadata: name: timescale-ownership-migration-0 namespace: itrs annotations: iax.itrsgroup.com/delete-after-install: "true" spec: backoffLimit: 3 activeDeadlineSeconds: 21600 template: metadata: labels: iax.itrsgroup.com/pre-upgrade-job: "true" spec: restartPolicy: Never securityContext: runAsUser: 0 runAsGroup: 0 containers: - name: fix-ownership image: proxy.itrsgroup.com/proxy/itrs-analytics/docker.itrsgroup.com/iax/timescale:<VERSION> # Replace <VERSION> with your platform version tag imagePullPolicy: IfNotPresent securityContext: runAsNonRoot: false allowPrivilegeEscalation: false capabilities: add: - CHOWN - FOWNER - DAC_OVERRIDE - DAC_READ_SEARCH drop: - ALL seccompProfile: type: RuntimeDefault env: - name: TARGET_UID value: "5000" # Your new UID - name: TARGET_GID value: "5000" # Your new GID - name: REPLICA_INDEX value: "0" command: - /bin/bash - -c - | set -e echo "Starting ownership migration for TimescaleDB replica ${REPLICA_INDEX}" echo "Target ownership: ${TARGET_UID}:${TARGET_GID}" echo "" MIGRATION_NEEDED=false # Process data volume PGDATA="/var/lib/postgresql/data" echo "Processing data volume: ${PGDATA}" if [ ! -d "${PGDATA}" ] || [ -z "$(ls -A ${PGDATA} 2>/dev/null)" ]; then echo "Data directory is empty - skipping (fresh install)" else # Get current ownership CURRENT_UID=$(stat -c '%u' "${PGDATA}") CURRENT_GID=$(stat -c '%g' "${PGDATA}") echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}" if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then echo "Ownership already correct" else echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}" echo "Changing ownership (may take several minutes for large databases)..." MIGRATION_NEEDED=true chown -R "${TARGET_UID}:${TARGET_GID}" "${PGDATA}" echo "Data volume migration completed" fi fi echo "" # Process WAL volume PGWAL="/var/lib/postgresql/wal/pg_wal" echo "Processing WAL volume: ${PGWAL}" if [ ! -d "${PGWAL}" ] || [ -z "$(ls -A ${PGWAL} 2>/dev/null)" ]; then echo "WAL directory is empty - skipping (fresh install)" else # Get current ownership CURRENT_UID=$(stat -c '%u' "${PGWAL}") CURRENT_GID=$(stat -c '%g' "${PGWAL}") echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}" if [ "${CURRENT_UID}" = "${TARGET_UID}" ] && [ "${CURRENT_GID}" = "${TARGET_GID}" ]; then echo "Ownership already correct" else echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}" MIGRATION_NEEDED=true echo "Changing WAL ownership (this is fast)..." chown -R "${TARGET_UID}:${TARGET_GID}" "${PGWAL}" echo "WAL volume migration completed" fi fi echo "" # Process tablespace volumes (if configured) # Uncomment and adjust based on your timescale.timeseriesDiskCount setting # Example below assumes timeseriesDiskCount=2: # # TABLESPACE="/var/lib/postgresql/tablespaces/timeseries_tablespace_1" # echo "Processing tablespace volume 1: ${TABLESPACE}" # if [ ! -d "${TABLESPACE}/data" ] || [ -z "$(ls -A ${TABLESPACE}/data 2>/dev/null)" ]; then # echo "Tablespace 1 is empty - skipping" # else # CURRENT_UID=$(stat -c '%u' "${TABLESPACE}/data") # CURRENT_GID=$(stat -c '%g' "${TABLESPACE}/data") # echo "Current ownership: ${CURRENT_UID}:${CURRENT_GID}" # if [ "${CURRENT_UID}" != "${TARGET_UID}" ] || [ "${CURRENT_GID}" != "${TARGET_GID}" ]; then # echo "Migration required from ${CURRENT_UID}:${CURRENT_GID} to ${TARGET_UID}:${TARGET_GID}" # MIGRATION_NEEDED=true # echo "Changing tablespace 1 ownership..." # chown -R "${TARGET_UID}:${TARGET_GID}" "${TABLESPACE}" # echo "Tablespace 1 migration completed" # else # echo "Ownership already correct" # fi # fi # echo "" # (Repeat for tablespace 2, 3, etc. based on your timeseriesDiskCount) if [ "${MIGRATION_NEEDED}" = "false" ]; then echo "No migration needed for replica ${REPLICA_INDEX}" else echo "Replica ${REPLICA_INDEX} migrated successfully" fi echo "Note: Permissions will be fixed by the pod startup script" volumeMounts: - name: tsdata mountPath: /var/lib/postgresql - name: tswal mountPath: /var/lib/postgresql/wal # Uncomment and add tablespace volumes if configured: # - name: tablespace-1 # mountPath: /var/lib/postgresql/tablespaces/timeseries_tablespace_1 # - name: tablespace-2 # mountPath: /var/lib/postgresql/tablespaces/timeseries_tablespace_2 volumes: - name: tsdata persistentVolumeClaim: claimName: timescale-ha-data-timescale-0 - name: tswal persistentVolumeClaim: claimName: timescale-ha-wal-timescale-0 # Uncomment and add tablespace PVCs if configured: # - name: tablespace-1 # persistentVolumeClaim: # claimName: timescale-ha-tablespace-data-1-timescale-0 # - name: tablespace-2 # persistentVolumeClaim: # claimName: timescale-ha-tablespace-data-2-timescale-0
Update the job manifests Copied
Before applying the migration jobs, make sure to update the job manifests to reflect your environment-specific settings, such as database names, credentials, namespaces, and any custom configuration required for your PostgreSQL or TimescaleDB deployment.
- Replace
<VERSION>in the image field with your actual platform version tag (for example,2.17.0). - Replace
TARGET_UIDandTARGET_GIDenvironment variable values with your target values (5000 in the example above). - For TimescaleDB:
- If you have tablespace volumes configured (
timescale.timeseriesDiskCount > 0), uncomment the tablespace sections in the script and add the corresponding volumeMounts and volumes. - Add one TABLESPACE section per tablespace (1, 2, 3, etc. based on your
timeseriesDiskCount).
- If you have tablespace volumes configured (
- For HA configurations (multiple replicas):
- Create separate job manifests for each replica (replica 0, 1, 2).
- Update
REPLICA_INDEXenvironment variable:"0","1","2", and others. - Update job names:
postgres-ownership-migration-0,postgres-ownership-migration-1, and others. - Update
claimNamesuffixes in volumes:- PostgreSQL:
data-pgplatform-0,data-pgplatform-1, and others. - TimescaleDB:
timescale-ha-data-timescale-0,timescale-ha-data-timescale-1. - TimescaleDB WAL:
timescale-ha-wal-timescale-0,timescale-ha-wal-timescale-1. - TimescaleDB tablespaces:
timescale-ha-tablespace-data-1-timescale-0,timescale-ha-tablespace-data-1-timescale-1.
- PostgreSQL:
Apply the jobs Copied
Apply the migration job manifests to run the ownership changes. For single-replica setups, apply one job per database. For HA configurations, apply separate jobs for each replica.
# Single replica setup:
kubectl apply -f postgres-migration-job-0.yaml
kubectl apply -f timescale-migration-job-0.yaml
# For HA setups, apply jobs for all replicas:
kubectl apply -f postgres-migration-job-0.yaml
kubectl apply -f postgres-migration-job-1.yaml
# ... repeat for additional replicas
kubectl apply -f timescale-migration-job-0.yaml
kubectl apply -f timescale-migration-job-1.yaml
# ... repeat for additional replicas
# Wait for all jobs to complete
kubectl wait --for=condition=complete --timeout=600s job/postgres-ownership-migration-0 -n itrs
kubectl wait --for=condition=complete --timeout=600s job/timescale-ownership-migration-0 -n itrs
# ... wait for all replica jobs
# Check job logs to verify success
kubectl logs -n itrs job/postgres-ownership-migration-0
kubectl logs -n itrs job/timescale-ownership-migration-0
# ... check logs for all replica jobs
Scaling StatefulSets to apply new ownership Copied
To ensure that the updated app ownership takes effect, you need to scale down and then scale up the relevant StatefulSets.
# Scale down (adjust replicas based on your HA configuration)
kubectl scale statefulset pgplatform -n itrs --replicas=0
kubectl scale statefulset timescale -n itrs --replicas=0
# Wait for pods to terminate
kubectl wait --for=delete pod -l app=pgplatform -n itrs --timeout=120s
kubectl wait --for=delete pod -l app=timescale -n itrs --timeout=120s
# Scale back up (use your original replica count)
kubectl scale statefulset pgplatform -n itrs --replicas=1 # or 2, 3, etc. for HA
kubectl scale statefulset timescale -n itrs --replicas=1 # or 2, 3, etc. for HA
# Wait for platform to be ready
sleep 60
kubectl wait --timeout=600s iaxplatform/iax-iax-platform --for jsonpath="{.status.status}"=DEPLOYED -n itrs
Verify system health Copied
After the migration jobs complete and pods restart, verify that the ownership changes were applied correctly and that all database services are functioning properly.
Run the following checks to ensure the system is healthy:
# Verify all pods are running
kubectl get pods -n itrs
# All pods should be in Running state
# Verify ownership changed on volumes
kubectl exec -n itrs pgplatform-0 -c postgres -- stat -c '%u:%g' /var/lib/postgresql/data
# Should show: 5000:5000 (or your target UID:GID)
kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/data
kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/wal/pg_wal
# Should show: 5000:5000
# If using tablespaces, verify tablespace ownership:
# kubectl exec -n itrs timescale-0 -c timescale -- stat -c '%u:%g' /var/lib/postgresql/tablespaces/timeseries_tablespace_1/data
# Verify correct UIDs inside containers
kubectl exec -n itrs pgplatform-0 -c postgres -- id
kubectl exec -n itrs timescale-0 -c timescale -- id
# Should show: uid=5000 gid=5000 (or your target UIDs)
# Test database connectivity
kubectl exec -n itrs pgplatform-0 -c postgres -- psql -U postgres -c "SELECT 1"
kubectl exec -n itrs timescale-0 -c timescale -- psql -U postgres -c "SELECT 1"
# Should return: 1
# Verify no errors in logs
kubectl logs -n itrs pgplatform-0 -c postgres --tail=50
kubectl logs -n itrs timescale-0 -c timescale --tail=50
Re-enable security policies Copied
-
For Pod Security Admission, restore the namespace to restricted mode.
kubectl label namespace itrs pod-security.kubernetes.io/enforce=restricted --overwrite # Verify kubectl get namespace itrs -o jsonpath='{.metadata.labels.pod-security\.kubernetes\.io/enforce}' # Should show: restricted -
For Gatekeeper, remove
itrsfromexcludedNamespacesby locating its index in the array.# Get the index for capabilities constraint ITRS_INDEX_CAP=$(kubectl get k8spspcapabilities drop-all -o jsonpath='{.spec.match.excludedNamespaces}' | tr ',' '\n' | nl -v 0 | grep -n itrs | cut -d: -f1 | head -1) # Arrays are 0-indexed, but nl starts at 0, so subtract 1 kubectl patch k8spspcapabilities drop-all --type=json -p="[{\"op\": \"remove\", \"path\": \"/spec/match/excludedNamespaces/$((ITRS_INDEX_CAP-1))\"}]" # Repeat for other constraints (e.g., allowed users) ITRS_INDEX_USR=$(kubectl get k8spspallowedusers jpmc-allowed-user-ranges -o jsonpath='{.spec.match.excludedNamespaces}' | tr ',' '\n' | nl -v 0 | grep -n itrs | cut -d: -f1 | head -1) kubectl patch k8spspallowedusers jpmc-allowed-user-ranges --type=json -p="[{\"op\": \"remove\", \"path\": \"/spec/match/excludedNamespaces/$((ITRS_INDEX_USR-1))\"}]" -
Run the following command to check for any constraint violations across all namespaces.
# Check for constraint violations kubectl get constraints -A # Should show 0 violations in itrs namespace # Verify pods still running after policy re-enablement kubectl get pods -n itrs