Add WorkerPool scheduling fields#247
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
| // WorkerPoolPodTemplate defines optional scheduling and resource settings for | ||
| // worker pods. Resources apply to the fixed ateom container; nodeAffinity is | ||
| // mapped to spec.affinity.nodeAffinity on the pod. | ||
| type WorkerPoolPodTemplate struct { |
There was a problem hiding this comment.
Can we scope this down this CL to NodeSelector and Tolerations? Do you have use cases for node affinity and priorityClass?
I would leave resources out of scope as that is being discussed in https://docs.google.com/document/d/1_i__JEdmRexKOGP4VJ7bOWrgdS9vPd-BQSR0sxg7GQ4/edit?resourcekey=0-jtYHhxfAGZCNcoknXn2GbA&tab=t.0
There was a problem hiding this comment.
Julian Gutierrez Oschmann (@juli4n) Thanks for the feedback!
That makes sense to me regarding resources. We can leave resource requests/limits out of scope for this CL while the WorkerPool resource model is still being discussed separately.
For nodeAffinity, the use case I had in mind is supporting WorkerPools across heterogeneous node pools where nodeSelector is too limited. For example, a WorkerPool may want to require or prefer one of several equivalent accelerator/local-SSD/cache node pools, or express a soft preference during a node pool migration. In those cases, nodeSelector only supports exact key/value matching, while nodeAffinity gives us required/preferred terms and set-based matching.
For priorityClassName, the use case is preserving warm WorkerPool capacity under cluster pressure. Since WorkerPool pods are the slots available for actor resume, giving certain pools a higher priority can help protect latency-sensitive or interactive actor workloads from being displaced by lower-priority batch workloads. It also allows different WorkerPools to represent different service classes, e.g. interactive vs. batch.
That said, I’m happy to scope this PR down if you still prefer to keep the initial API surface smaller. Would you prefer that I keep nodeAffinity and priorityClassName in this CL with the above use cases, or should I remove them and follow up separately after nodeSelector/tolerations land?
Thanks again for taking a look.
There was a problem hiding this comment.
I’ve removed the resources field from this CL, btw.
ef4ff52 to
bb06b0c
Compare
bb06b0c to
674aca3
Compare
| WithPath(ateompath.BasePath). | ||
| WithType(corev1.HostPathDirectoryOrCreate))) | ||
|
|
||
| applyWorkerPoolPodTemplate(podSpecAC, containerAC, wp.Spec.Template) |
There was a problem hiding this comment.
While I understand that with the introduction of the template now creating the DeploymentApplyConfiguration is more complex / dynamic (it cannot be represented with a single struct literal), I find the approach of creating the static part here and then calling a method (in another file) to add the dynamic part a bit convoluted.
How about we move buildDeploymentApplyConfig to a new file (workerpool_apply.go?) and implement the whole generation of *appsv1ac.DeploymentApplyConfiguration in that method? Then we can add some table driven tests that exercise the buildDeploymentApplyConfig method e2e that take WorkerPool as input and assert expected *appsv1ac.DeploymentApplyConfiguration as output.
Summary
This PR adds a small, controlled set of scheduling/resource fields to
WorkerPoolso WorkerPool-managed Pods can be placed onto appropriate Kubernetes nodes and get proper resource guarantees.Added fields:
resourcesnodeSelectortolerationspriorityClassNamenodeAffinityScope
This intentionally does not expose a full
PodTemplateSpec, and does not add support for fullaffinity,podAffinity, orpodAntiAffinity.The goal is to keep the first version small while addressing the WorkerPool-to-Kubernetes-Node scheduling gap discussed in #212.
Related issue
Part of #212.
Related to #47.