Skip to content

feat(spider-scheduler): Add gRPC binary.#369

Open
sitaowang1998 wants to merge 18 commits into
y-scope:mainfrom
sitaowang1998:scheduler-grpc-server
Open

feat(spider-scheduler): Add gRPC binary.#369
sitaowang1998 wants to merge 18 commits into
y-scope:mainfrom
sitaowang1998:scheduler-grpc-server

Conversation

@sitaowang1998

@sitaowang1998 sitaowang1998 commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR:

  • Adds scheduler requests unpack.
  • Adds scheduler gRPC service handlers and error mapping to gRPC status.
  • Adds scheduler gRPC server binary.

Note

This PR does not add new tests.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • GitHub workflows pass.

Summary by CodeRabbit

  • New Features
    • Added a scheduler gRPC server command-line entrypoint that starts from a YAML config file.
    • Added gRPC support for next task, heartbeat, and shutdown requests.
    • Expanded scheduler configuration to include storage endpoint and pool sizing.
  • Bug Fixes
    • Improved task assignment protobuf-to-core conversion and request handling.
    • Enhanced scheduler service error mapping and shutdown/cancellation behavior.
  • Tests
    • Added a unit test to verify assignment record conversions.

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner July 1, 2026 03:13
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7fa84b49-cbc7-4273-b647-5900c6aac7b3

📥 Commits

Reviewing files that changed from the base of the PR and between 41f0a80 and 4f97796.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • components/spider-scheduler/src/grpc.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/spider-scheduler/src/grpc.rs

Walkthrough

This PR adds protobuf-to-core conversion for task assignment records, scheduler request unpacking, a tonic gRPC adapter, server configuration types, and a new scheduler gRPC server binary with Cargo wiring.

Changes

Scheduler gRPC server

Layer / File(s) Summary
Proto-to-core assignment conversion
components/spider-proto-rust/src/assignment.rs
Adds From<ProtoTaskAssignmentRecord> for TaskAssignmentRecord and a unit test for the conversion.
Scheduler request unpacking
components/spider-proto-rust/src/unpack/mod.rs, components/spider-proto-rust/src/unpack/scheduler.rs
Registers the scheduler unpack module and implements RequestUnpack for NextTaskRequest, HeartbeatRequest, and ShutdownRequest.
GrpcSchedulerService adapter
components/spider-scheduler/src/grpc.rs, components/spider-scheduler/src/lib.rs
Adds GrpcSchedulerService, maps scheduler errors to tonic Status, converts next_task responses, and exports the new gRPC module.
Server configuration and derives
components/spider-scheduler/src/config.rs, components/spider-scheduler/src/runtime.rs, components/spider-scheduler/src/execution_manager_registry.rs, components/spider-scheduler/src/lib.rs
Adds ServerConfig, updates related derives, and re-exports ServerConfig from the crate.
gRPC server binary and Cargo wiring
components/spider-scheduler/src/bin/grpc_server.rs, components/spider-scheduler/Cargo.toml
Adds the spider_scheduler_grpc_server binary, config loading, storage connection, shutdown handling, and dependency/bin target updates.

Estimated code review effort: 3 (Moderate) | ~30 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant GrpcSchedulerService
  participant SchedulerServiceState
  participant RequestUnpack

  Client->>GrpcSchedulerService: next_task / heartbeat / shutdown
  GrpcSchedulerService->>RequestUnpack: unpack protobuf request
  RequestUnpack-->>GrpcSchedulerService: domain inputs
  GrpcSchedulerService->>SchedulerServiceState: next_task / heartbeat / shutdown
  SchedulerServiceState-->>GrpcSchedulerService: result or SchedulerServiceError
  GrpcSchedulerService-->>Client: protobuf response or tonic Status
Loading
sequenceDiagram
  participant Main
  participant ServerConfig
  participant GrpcSchedulerStorageClient
  participant GrpcSchedulerService
  participant TonicServer

  Main->>ServerConfig: load YAML config
  Main->>GrpcSchedulerStorageClient: connect(storage_endpoint)
  Main->>GrpcSchedulerService: new(state, cancellation_token)
  Main->>TonicServer: serve_with_shutdown(grpc_service)
  TonicServer-->>Main: shutdown on Ctrl-C or cancellation
  Main->>Main: runtime.stop().await
Loading

Possibly related PRs

  • y-scope/spider#342: Adds related protobuf-to-core task assignment handling used by the scheduler RPC flow.
  • y-scope/spider#365: Introduces scheduler service state and error types consumed by the new gRPC adapter.
  • y-scope/spider#368: Adds runtime wiring that the new scheduler server entrypoint uses.

Suggested reviewers: LinZhihao-723

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely captures the main change: adding a gRPC binary for spider-scheduler.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@sitaowang1998

Copy link
Copy Markdown
Collaborator Author

@CodeRabbit review

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
components/spider-proto-rust/src/unpack/scheduler.rs (1)

22-32: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Unbounded client-supplied wait_time_ms can hold a request open indefinitely.

Duration::from_millis(self.wait_time_ms) accepts any client-supplied u64 unmodified and is later passed straight to dispatch_source.dequeue(wait_time). A malicious or buggy client can request an effectively unbounded wait (e.g., u64::MAX ms), tying up server resources for that RPC until an assignment appears or the connection drops. Consider clamping wait_time_ms to a sane maximum before converting to Duration.

🛡️ Proposed clamp
+const MAX_WAIT_TIME_MS: u64 = 60_000;
+
 impl RequestUnpack for NextTaskRequest {
     type Unpacked = (ExecutionManagerId, Option<TaskAssignmentRecord>, Duration);

     fn unpack(self) -> Result<Self::Unpacked, UnpackError> {
         Ok((
             ExecutionManagerId::from(self.execution_manager_id),
             self.prev_assignment.map(ProtoTaskAssignmentRecord::into),
-            Duration::from_millis(self.wait_time_ms),
+            Duration::from_millis(self.wait_time_ms.min(MAX_WAIT_TIME_MS)),
         ))
     }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-proto-rust/src/unpack/scheduler.rs` around lines 22 - 32,
The NextTaskRequest::unpack path currently converts client-controlled
wait_time_ms directly into a Duration, allowing an effectively unbounded wait.
Update NextTaskRequest::unpack to clamp self.wait_time_ms to a sensible maximum
before calling Duration::from_millis, and keep the rest of the unpacked tuple
unchanged. Make the limit explicit near the Duration::from_millis conversion so
the behavior is easy to find and maintain.
components/spider-scheduler/src/grpc.rs (1)

161-212: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Minor inconsistency: TAG constant only used in next_task.

next_task defines a local const TAG while heartbeat/shutdown pass string literals directly to service_error_handler. Purely cosmetic, no functional issue.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-scheduler/src/grpc.rs` around lines 161 - 212, The local
TAG constant in next_task is inconsistent with heartbeat and shutdown, which
pass string literals directly to service_error_handler. Clean this up by either
reusing a shared tag symbol across next_task, heartbeat, and shutdown, or by
removing TAG and passing the same literal style everywhere so the error handling
calls are uniform in grpc.rs.
components/spider-scheduler/src/bin/grpc_server.rs (2)

44-59: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

No SIGTERM handling for graceful shutdown.

Shutdown is only wired to the cancellation token and Ctrl-C (SIGINT). In containerized/orchestrated deployments (e.g., Kubernetes, systemd), shutdown is typically signalled via SIGTERM, which this server would not currently handle gracefully, likely resulting in a hard kill instead of a clean drain/stop.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-scheduler/src/bin/grpc_server.rs` around lines 44 - 59, The
gRPC server shutdown path in serve_with_shutdown only reacts to the cancellation
token and tokio::signal::ctrl_c, so SIGTERM is not handled. Update the shutdown
future in grpc_server.rs to also listen for SIGTERM alongside Ctrl-C and
cancellation_token, and trigger cancellation when SIGTERM is received. Use the
existing serve_with_shutdown, cancellation_token, and select! shutdown block so
the server can drain cleanly in orchestrated environments.

61-64: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Stop error is silently swallowed when serve also fails.

stop_result is computed unconditionally, but if serve_result is Err, the ? on line 62 returns before line 63 is evaluated, so any error from runtime.stop() is dropped without ever being surfaced (not even logged).

Suggested fix
     let stop_result = runtime.stop().await;
-    serve_result?;
-    stop_result?;
+    if let Err(ref error) = stop_result {
+        tracing::error!(error = %error, "Failed to stop scheduler runtime cleanly.");
+    }
+    serve_result?;
+    stop_result?;
     Ok(())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-scheduler/src/bin/grpc_server.rs` around lines 61 - 64, The
grpc_server main flow is dropping a runtime.stop() failure whenever serve_result
is Err because the serve_result? return short-circuits before stop_result? is
reached. Update the grpc_server logic to evaluate and report both outcomes
explicitly in this block around runtime.stop(), serve_result, and stop_result,
so a stop error is not silently lost even when serving also fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@components/spider-proto-rust/src/unpack/scheduler.rs`:
- Around line 22-32: The NextTaskRequest::unpack path currently converts
client-controlled wait_time_ms directly into a Duration, allowing an effectively
unbounded wait. Update NextTaskRequest::unpack to clamp self.wait_time_ms to a
sensible maximum before calling Duration::from_millis, and keep the rest of the
unpacked tuple unchanged. Make the limit explicit near the Duration::from_millis
conversion so the behavior is easy to find and maintain.

In `@components/spider-scheduler/src/bin/grpc_server.rs`:
- Around line 44-59: The gRPC server shutdown path in serve_with_shutdown only
reacts to the cancellation token and tokio::signal::ctrl_c, so SIGTERM is not
handled. Update the shutdown future in grpc_server.rs to also listen for SIGTERM
alongside Ctrl-C and cancellation_token, and trigger cancellation when SIGTERM
is received. Use the existing serve_with_shutdown, cancellation_token, and
select! shutdown block so the server can drain cleanly in orchestrated
environments.
- Around line 61-64: The grpc_server main flow is dropping a runtime.stop()
failure whenever serve_result is Err because the serve_result? return
short-circuits before stop_result? is reached. Update the grpc_server logic to
evaluate and report both outcomes explicitly in this block around
runtime.stop(), serve_result, and stop_result, so a stop error is not silently
lost even when serving also fails.

In `@components/spider-scheduler/src/grpc.rs`:
- Around line 161-212: The local TAG constant in next_task is inconsistent with
heartbeat and shutdown, which pass string literals directly to
service_error_handler. Clean this up by either reusing a shared tag symbol
across next_task, heartbeat, and shutdown, or by removing TAG and passing the
same literal style everywhere so the error handling calls are uniform in
grpc.rs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2e12ec90-561a-4584-889b-6dfd122af107

📥 Commits

Reviewing files that changed from the base of the PR and between dca2b8a and 41f0a80.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • components/spider-proto-rust/src/assignment.rs
  • components/spider-proto-rust/src/unpack/mod.rs
  • components/spider-proto-rust/src/unpack/scheduler.rs
  • components/spider-scheduler/Cargo.toml
  • components/spider-scheduler/src/bin/grpc_server.rs
  • components/spider-scheduler/src/config.rs
  • components/spider-scheduler/src/execution_manager_registry.rs
  • components/spider-scheduler/src/grpc.rs
  • components/spider-scheduler/src/lib.rs
  • components/spider-scheduler/src/runtime.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants