Skip to content

Fix manager CT startup in local Docker setup#338

Open
ArshSSandhu wants to merge 1 commit into
mieweb:mainfrom
ArshSSandhu:fix/manager-ct-local-startup
Open

Fix manager CT startup in local Docker setup#338
ArshSSandhu wants to merge 1 commit into
mieweb:mainfrom
ArshSSandhu:fix/manager-ct-local-startup

Conversation

@ArshSSandhu

Copy link
Copy Markdown

Summary

This PR fixes a local startup issue where the proxmox service can become unhealthy because Manager CT 100 does not boot correctly in Docker Desktop + WSL2 environments.

During debugging, Manager CT 100 was created with cores: 4, which caused LXC to generate an invalid empty cpuset line:

lxc.cgroup.cpuset.cpus =

That prevented CT 100 from starting. After removing the cores setting, CT 100 could start, but it still had a temporary emergency boot entrypoint configured:

entrypoint: /sbin/init systemd.unit=emergency.target

This PR makes the core limit optional and removes the temporary emergency entrypoint before the final Manager CT startup.

Changes

  • Makes the Manager CT core limit optional using MANAGER_CORES.
  • Removes the hardcoded --cores=4 from the default local setup.
  • Removes the temporary emergency entrypoint before the final CT startup so Manager boots normally with networking and services enabled.

Testing

Tested locally on Windows + Ubuntu WSL2 + Docker Desktop.

Started from a clean Docker state:

docker compose down -v
docker compose build proxmox pull-image
docker compose up -d

Verified the Proxmox healthcheck became healthy:

docker inspect opensource-server-proxmox-1 --format '{{.State.Health.Status}}'

Result:

healthy

Also verified inside the Proxmox container:

lxc-info -n 100
grep -n "cpuset" /var/lib/lxc/100/config || echo "no cpuset line"
grep -nE "cores|entrypoint" /etc/pve/lxc/100.conf || echo "no cores or entrypoint in CT config"
pct exec 100 -- ip a
pct exec 100 -- ss -lntp
curl -kI --connect-timeout 5 https://10.254.0.2

Confirmed:

  • CT 100 is running.
  • No invalid empty cpuset line is generated.
  • Manager receives IP 10.254.0.2.
  • nginx listens on 80/443.
  • Manager app listens on 3000.
  • curl returns HTTP/2 200.

Notes

On the first startup, the healthcheck may briefly fail while Manager finishes booting, but it later becomes healthy once nginx is listening on 443.

# forcing a cores value can cause LXC to generate an invalid empty
# lxc.cgroup.cpuset.cpus line. Set MANAGER_CORES=4 to opt in.
MANAGER_CORES="${MANAGER_CORES:-}"
MANAGER_CORE_ARGS=()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to support "MANAGER_CORE_ARGS". I'm fine with no CPU pinning by default for manager.


# Remove the temporary emergency entrypoint before the final start so the
# Manager CT boots to the default target with networking and services enabled.
pct set 100 --delete entrypoint || true

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the entrypoint fails to delete we need to assume the container is in a bad state and not start it. Let this fail rather than swallowing the error with || true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants