Skip to content

Proposal for worker pool sizing and actor scheduling.#249

Closed
Julian Gutierrez Oschmann (juli4n) wants to merge 1 commit into
agent-substrate:mainfrom
juli4n:scheduling
Closed

Proposal for worker pool sizing and actor scheduling.#249
Julian Gutierrez Oschmann (juli4n) wants to merge 1 commit into
agent-substrate:mainfrom
juli4n:scheduling

Conversation

@juli4n

Copy link
Copy Markdown
Collaborator

We have a couple of issues that are intimately related to each other.

  • ActorTemplates point to worker pools directly. This makes the system rigid and forces operators to create a large number of templates, one for each "variant".
    Ideally, there should be a single actor template, plus a way to schedule them to different worker pods, based on the actor / template requirements.

This topic is discussed in issue #47 and worker-pool-selection.md has a proposal for how to address it.

  • Worker pods have no resource requirements. This causes their QoS to be BestEffort, which makes them first in line for eviction, but it also hurts Substrate reliability as k8s scheduler
    has no clue on the real requirements for each actor.

This topic is discussed in issue #212 and worker-pool-sizing.md has a proposal for how to address it.

  • There is no way to influence how worker pods are scheduled on k8s nodes.

This topic is discussed in issue #212 and worker-pool-selection.md has a proposal for how to address it.


- **Customer-dedicated pools.** One tenant's actors must never share workers with another tenant's actors. The pool is provisioned per-tenant and must be unreachable to all other tenants.

- **Hardware requirements.** An actor template requires a specific node class (high-memory nodes for large in-memory state, SSD-backed nodes for I/O-intensive workloads). All actors from the template must land on workers with that hardware regardless of who creates them.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KVM / bare metal or nested virt support. I'm running into this with #239

@juli4n

Copy link
Copy Markdown
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants