Skip to content

Add WorkerPool scheduling fields#247

Open
han2ni3bal-pixel wants to merge 1 commit into
agent-substrate:mainfrom
han2ni3bal-pixel:feat-workerpool-scheduling-fields
Open

Add WorkerPool scheduling fields#247
han2ni3bal-pixel wants to merge 1 commit into
agent-substrate:mainfrom
han2ni3bal-pixel:feat-workerpool-scheduling-fields

Conversation

@han2ni3bal-pixel

Copy link
Copy Markdown

Summary

This PR adds a small, controlled set of scheduling/resource fields to WorkerPool so WorkerPool-managed Pods can be placed onto appropriate Kubernetes nodes and get proper resource guarantees.

Added fields:

  • resources
  • nodeSelector
  • tolerations
  • priorityClassName
  • nodeAffinity

Scope

This intentionally does not expose a full PodTemplateSpec, and does not add support for full affinity, podAffinity, or podAntiAffinity.

The goal is to keep the first version small while addressing the WorkerPool-to-Kubernetes-Node scheduling gap discussed in #212.

Related issue

Part of #212.
Related to #47.

@google-cla

google-cla Bot commented Jun 14, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

// WorkerPoolPodTemplate defines optional scheduling and resource settings for
// worker pods. Resources apply to the fixed ateom container; nodeAffinity is
// mapped to spec.affinity.nodeAffinity on the pod.
type WorkerPoolPodTemplate struct {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we scope this down this CL to NodeSelector and Tolerations? Do you have use cases for node affinity and priorityClass?

I would leave resources out of scope as that is being discussed in https://docs.google.com/document/d/1_i__JEdmRexKOGP4VJ7bOWrgdS9vPd-BQSR0sxg7GQ4/edit?resourcekey=0-jtYHhxfAGZCNcoknXn2GbA&tab=t.0

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julian Gutierrez Oschmann (@juli4n) Thanks for the feedback!

That makes sense to me regarding resources. We can leave resource requests/limits out of scope for this CL while the WorkerPool resource model is still being discussed separately.

For nodeAffinity, the use case I had in mind is supporting WorkerPools across heterogeneous node pools where nodeSelector is too limited. For example, a WorkerPool may want to require or prefer one of several equivalent accelerator/local-SSD/cache node pools, or express a soft preference during a node pool migration. In those cases, nodeSelector only supports exact key/value matching, while nodeAffinity gives us required/preferred terms and set-based matching.

For priorityClassName, the use case is preserving warm WorkerPool capacity under cluster pressure. Since WorkerPool pods are the slots available for actor resume, giving certain pools a higher priority can help protect latency-sensitive or interactive actor workloads from being displaced by lower-priority batch workloads. It also allows different WorkerPools to represent different service classes, e.g. interactive vs. batch.

That said, I’m happy to scope this PR down if you still prefer to keep the initial API surface smaller. Would you prefer that I keep nodeAffinity and priorityClassName in this CL with the above use cases, or should I remove them and follow up separately after nodeSelector/tolerations land?

Thanks again for taking a look.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve removed the resources field from this CL, btw.

@han2ni3bal-pixel han2ni3bal-pixel force-pushed the feat-workerpool-scheduling-fields branch 2 times, most recently from ef4ff52 to bb06b0c Compare June 16, 2026 02:45
@han2ni3bal-pixel han2ni3bal-pixel force-pushed the feat-workerpool-scheduling-fields branch from bb06b0c to 674aca3 Compare June 16, 2026 03:01
WithPath(ateompath.BasePath).
WithType(corev1.HostPathDirectoryOrCreate)))

applyWorkerPoolPodTemplate(podSpecAC, containerAC, wp.Spec.Template)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand that with the introduction of the template now creating the DeploymentApplyConfiguration is more complex / dynamic (it cannot be represented with a single struct literal), I find the approach of creating the static part here and then calling a method (in another file) to add the dynamic part a bit convoluted.

How about we move buildDeploymentApplyConfig to a new file (workerpool_apply.go?) and implement the whole generation of *appsv1ac.DeploymentApplyConfiguration in that method? Then we can add some table driven tests that exercise the buildDeploymentApplyConfig method e2e that take WorkerPool as input and assert expected *appsv1ac.DeploymentApplyConfiguration as output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants