Internal documentation only
This page has been marked as draft.
Support bundles and sbtctl
How to upload a support bundle from any customer environment to visually analyze the server and receive insights about the server, the network and your application:
- Go to our Troubleshoot tab, content should show the following:
- Click on the “Add support bundle” as highlighted in the snippet below, followed by clicking the upload support bundle:
- Doing so should allow you to drag your bundle(s) or you may choose a file to upload. Files can be any Replicated Support bundle. You may also relate the bundle to an open support issue, as shared below:
- Once you’ve uploaded the file, analysis should show, followed by the bundle details, instance state, and instance information:
- If it is a non air-gap installation, after the client generates the support bundle, it should appear under Troubleshooting automatically. e.g.:
Once you have the support bundle downloaded to your Linux server, you can use sbctl for further checking.
To install sbctl:
wget https://github.com/replicatedhq/sbctl/releases/latest/download/sbctl_linux_amd64.tar.gz
tar -xvf sbctl_linux_amd64.tar.gz
sudo mv sbctl /usr/local/bin/
Afterward, check with which sbctl; you should see:
/usr/local/bin/sbctl
To check the support bundle, you also need to have kubectl available in your Linux.
Then you can use kubectl to do some checking, e.g.:
Typical Troubleshooting Scenarios
Check Restarting Pods
First, get the output of the following
kubectl get pods -A
And if you notice any pod that is encountering frequent restarts then you can use the file inspector option within the Replicated interface to look at the logs.
Example
[root@myhost analytics]# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE
.
kotsadm platform-metrics-6bfb4d5dcd-6xctp 1/1 Running 57 (28m ago) 7d
This shows the platform-metrics pod has restarted 57 times in 7 days.
To check the logs for this - In Replicate Page - File Inspector
Click on Cluster-Resources - Pods - Logs -
This contains the collection agent logs
Clicking the file shows an error - this would need further investigation but it could explain why the pod is restarting.
025-02-04 05:10:09.482 [kafka-producer-network-thread | producer-1] INFO org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Node -1 disconnected. 2025-02-04 05:20:08.769 [kafka-producer-network-thread | producer-2] INFO org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-2] Node -1 disconnected. [6548.950s][error][jvmti] Posting Resource Exhausted event: Metaspace [27189.022s][error][jvmti] Posting Resource Exhausted event: Metaspace [27189.857s][error][jvmti] Posting Resource Exhausted event: Metaspace [27192.140s][error][jvmti] Posting Resource Exhausted event: Metaspace [27192.822s][error][jvmti] Posting Resource Exhausted event: Metaspace 2025-02-04 12:33:59.377 [AgentMonitoringService-0] WARN com.itrsgroup.collection.ca.monitoring.AgentMonitoringService - Daemon is now in an unhealthy state [27214.820s][error][jvmti] Posting Resource Exhausted event: Metaspace [27240.051s][error][jvmti] Posting Resource Exhausted event: Metaspace OpenJDK 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGTERM to handler- the VM may need to be forcibly terminated
Check Final-Entity-Stream Logs
These are important as this will list entities with hundreds/thousands of children, which essentially indicates a very large dataview that should be reviewed.
Example command
kubectl get pods -A
[root@myhost analytics]# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE
.
kotsadm final-entity-stream-59b8d55bc9-6kpgr 1/1 Running 1 (4d16h ago) 7d
kubectl logs final-entity-stream-59b8d55bc9-6kpgr -n kotsadm
The output of the above command is verbose but if you see something like the below you can review the data - any large dataview should be reviewed and optimised.
2025-02-05 06:31:29.381 [statistics-reporter-0] INFO com.itrsgroup.obcerv.platform.stream.FinalEntityStream - Entities with the most children (top 20) (processor 18):
- (1,850 children) {probe=CMS Reporting, managedEntity=CMS Reporting Batch, sampler=CMSR Query Monitoring, dataview=CMSR Historical Long Running Queries}
Check Storage PVC is using SSD
A support bundle may report a latency like below
This will most likely mean the disk storage is not SSD which is a prerequisite for ITRS Analytics. Other commands you can run to get more information:
kubectl get pvc -n
kubectl describe storageclass