Gotcha: argocd-repo-server CrashLoopBackOff (copyutil “Already exists”)
#

Symptom
#

ArgoCD UI shows:

1
2
Unable to load data: connection error: desc = "transport: Error while dialing:
dial tcp <cluster-ip>:8081: connect: connection refused"

And on any Application detail page:

1
2
3
Failed to load target state: failed to generate manifest for source 1 of 2:
rpc error: code = Unavailable desc = connection error: ...
dial tcp <cluster-ip>:8081: connect: connection refused

What Port 8081 Is
#

Port 8081 is the argocd-repo-server - the component that renders manifests (Kustomize, Helm, plain YAML). If it’s down, ArgoCD can’t diff or sync any app.

Root Cause
#

The argocd-repo-server Deployment includes a copyutil init container that copies the ArgoCD binary and creates symlinks into a shared emptyDir volume (/var/run/argocd).

When the main container exits (e.g. after receiving SIGTERM during a Rancher Desktop restart) and Kubernetes restarts the pod without deleting it, the emptyDir volume is preserved across container restarts. On the next restart attempt, copyutil tries to ln files that already exist, fails with:

1
/bin/ln: Already exists

…and exits with code 1. This puts copyutil into CrashLoopBackOff, the main container never starts, and port 8081 is never opened.

How to Diagnose
#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Layer 1 - check pod status
kubectl get pods -n argocd
# argocd-repo-server shows 0/1 Completed or Init:CrashLoopBackOff

# Layer 2 - confirm init container is the culprit
kubectl describe pod -n argocd -l app.kubernetes.io/name=argocd-repo-server \
  | grep -A 10 "Init Containers:"
# copyutil: State: Waiting / Reason: CrashLoopBackOff, Restart Count: N

# Layer 3 - read copyutil logs
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-repo-server -c copyutil
# Output: /bin/ln: Already exists

# Confirm endpoints are empty (explains connection refused)
kubectl get endpoints argocd-repo-server -n argocd
# ENDPOINTS column is blank

Fix
#

Delete the pod. The Deployment recreates it with a fresh emptyDir, so copyutil succeeds and the main server starts normally.

1
2
3
4
5
6
7
8
kubectl delete pod -n argocd -l app.kubernetes.io/name=argocd-repo-server

# Wait for it to come back
kubectl rollout status deployment/argocd-repo-server -n argocd --timeout=60s

# Verify
kubectl get pods -n argocd
# argocd-repo-server should show 1/1 Running

Why This Happens on Rancher Desktop
#

Rancher Desktop suspends/resumes the underlying VM. On resume, the k3s node may send SIGTERM to pods. If the container restarts in-place (without pod deletion), the emptyDir survives and copyutil hits the stale symlinks on the next init run.

Key Takeaway
#

emptyDir volumes persist across container restarts within the same pod lifetime, but are cleared on pod deletion. Init containers that are not idempotent (i.e. will fail if run twice) can get stuck in this state. The fix is always to delete the pod, not just restart the container.

Learning ArgoCD - This article is part of a series.

Part 1: Installing ArgoCD on Rancher Desktop (Local)

Part 1: Helm Chart Managed by ArgoCD

Part 1: Cert Manager - TLS via ArgoCD

Part 1: This Article

Part 1: eu-prod Environment: Minikube with its own ArgoCD

Part 2: Deploying a Sample App via ArgoCD

Part 2: ApplicationSet - Auto-Discover Services from Git

Part 2: Sync Waves: Cluster-Complete Bootstrap

Part 2: Stakater Reloader: Why We Added It, and Why We Removed It

Part 2: eu-staging Environment: k3d with its own ArgoCD

Part 3: Local DNS via Traefik Ingress (`ravikrs.local`)