Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions .github/workflows/rust-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,5 @@ jobs:
with:
toolchain: stable

- name: Run Unit tests
- name: Run tests
run: make test
Comment thread
kerthcet marked this conversation as resolved.

- name: Run E2E tests
run: make test-e2e
6 changes: 3 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,12 @@ test-e2e: $(PYTEST) dev
docker-build:
docker compose -f docker-compose.e2e.yml build

docker-up:
docker compose -f docker-compose.e2e.yml up -d

docker-down:
docker compose -f docker-compose.e2e.yml down

test-all: test test-e2e
@echo "All tests completed successfully"

.PHONY: lint
lint: $(RUFF)
$(RUFF) check .
Expand Down
146 changes: 61 additions & 85 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,52 +2,49 @@

# SandD

**A Lightweight Sandbox Daemon for Secure Agent Execution in Isolated Environments.**
**Sandbox Daemon for Agent Command Execution**

[![Rust](https://img.shields.io/badge/rust-1.70+-orange.svg)](https://www.rust-lang.org/)
[![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Rust-powered WebSocket server with Python API for secure command execution in isolated environments.

[Features](#features) • [Quick Start](#quick-start) • [Architecture](#architecture) • [Documentation](./docs)
Rust-powered WebSocket server with Python API for remote command execution and interactive sessions.

</div>

---

## Features

- ✅ **Command Execution**: Execute shell commands remotely with timeout support
- ✅ **Interactive Shell (PTY)**: Full terminal sessions for debugging and manual work
- ✅ **File Transfer**: Upload/download files between agent and daemons
- ✅ **High Performance**: Rust-powered WebSocket server handles 200+ concurrent connections
- ✅ **Auto Reconnection**: Daemons automatically reconnect if connection drops
- ✅ **Heartbeat Monitoring**: Automatic stale connection cleanup
- ✅ **Cross-Platform**: Works on Linux, macOS, Windows
- **Command Execution** - Run shell commands on remote machines with timeout control
- **Interactive Sessions** - Full PTY sessions with bash for manual work
- **File Transfer** - Upload/download files between controller and workers
- **High Performance** - Rust async runtime handles high-concurrency workloads
- **Auto Reconnection** - Workers reconnect automatically on network failures
- **Cross-Platform** - Linux, macOS, Windows support

## Architecture

```
┌─────────────────────────────────────────┐
│ Python Agent Application │
│ ┌────────────────────────────────────┐ │
│ │ from sandd import Server │ │
│ │ │ │
│ │ server = Server("0.0.0.0", 8765) │ │
│ │ result = server.execute_command(
│ │ "daemon-1", "ls -la" │ │
│ │ ) │ │
│ └────────────────────────────────────┘ │
│ ▲ │
│ │ Python bindings (PyO3) │
│ ▼ │
│ ┌────────────────────────────────────┐ │
│ │ Rust WebSocket Server (tokio) │ │
│ │ • Command routing │ │
│ │ • Session management │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────
│ Python Agent Application
│ ┌────────────────────────────────────┐
│ │ from sandd import Server │
│ │ │
│ │ server = Server("0.0.0.0", 8765) │
│ │ result = server.exec(
│ │ "daemon-1", "ls -la" │
│ │ ) │
Comment thread
kerthcet marked this conversation as resolved.
│ └────────────────────────────────────┘
│ ▲
│ │ Python bindings (PyO3)
│ ▼
│ ┌────────────────────────────────────┐
│ │ Rust WebSocket Server (tokio) │
│ │ • Command routing │
│ │ • Session management │
│ └────────────────────────────────────┘
└─────────────────────────────────────────
│ WebSocket (WSS)
│ (Daemon initiates connection)
Expand All @@ -64,86 +61,65 @@ Rust-powered WebSocket server with Python API for secure command execution in is

## Quick Start

### 1. Build the System

```bash
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Build Python package
make install

# Build daemon binary
make daemon-release
# Build
make install # Python package
make daemon-release # Worker binary
```

### 2. Start the Agent (Python)
**Start controller:**

```python
from sandd import Server

# Start server
server = Server(host="0.0.0.0", port=8765)
print(f"Server listening on {server.address}")

# Wait for daemons
server = Server("0.0.0.0", 8765)
server.wait_for_daemon("worker-1", timeout=30)

# Execute command
result = server.execute_command("worker-1", "hostname")
print(f"Output: {result.stdout}")
result = server.exec("worker-1", "hostname")
print(result.stdout)
```

### 3. Start Daemons (Remote Machines)
**Start worker:**

```bash
# On remote machine 1
./target/release/sandd \
--server-url ws://agent-host:8765/ws \
--server-url ws://controller:8765/ws \
--daemon-id worker-1

# On remote machine 2
./target/release/sandd \
--server-url ws://agent-host:8765/ws \
--daemon-id worker-2

# Or let it auto-generate a UUID
./target/release/sandd \
--server-url ws://agent-host:8765/ws

# ... repeat for n+ machines
```

## Examples
## Documentation

See the [examples/](./examples) directory for common use cases.
- [Quick Start Guide](./docs/QUICKSTART.md)
- [Architecture Details](./docs/ARCHITECTURE.md)
- [Protocol Specification](./docs/PROTOCOL.md)
- [Development Guide](./docs/DEVELOP.md)
- [Examples](./examples)

## Development
## Security

See [DEVELOP.md](./docs/DEVELOP.md) for the complete developer guide including build commands, testing, and troubleshooting.
⚠️ **Add security layers for production use:**

## Security Considerations
- Use `wss://` (TLS) instead of plain `ws://`
- Add authentication (tokens, mTLS)
- Run workers in containers
- Validate commands before execution
- Audit log all commands

1. **No exposed daemon ports**: Daemons only make outbound connections to the agent
2. **Authentication**: Add token-based auth in production (not included in MVP)
3. **TLS/WSS**: Use `wss://` in production for encrypted connections
4. **Sandboxing**: Consider running daemon in containers or VMs
5. **Command validation**: Validate/sanitize commands in your application
## Roadmap

## Future Enhancements
- [ ] **Authentication** - Token-based auth for daemon connections
- [ ] **TLS Support** - Built-in WSS with certificate management
- [ ] **Audit Logging** - Track all commands, sessions, and file transfers
- [ ] **Metrics** - Prometheus-compatible metrics for monitoring
- [ ] **Resource Limits** - CPU/memory/timeout controls per daemon
- [ ] **Multi-tenancy** - Isolated workspaces with access control
- [ ] **Rate Limiting** - Prevent abuse and resource exhaustion
- [ ] **Command Allowlist** - Restrict allowed commands per daemon

## Contributing

- [ ] SSH protocol tunneling (for IDE remote development)
- [ ] Token-based authentication
- [ ] Command audit logging
- [ ] Resource limits per daemon
- [ ] Metrics/monitoring integration (Prometheus)
- [ ] Multi-tenancy support
- [ ] Command history and replay
We welcome any kind of contributions, feedback, and suggestions! See [DEVELOP.md](./docs/DEVELOP.md) for development setup and guidelines.

## License

MIT

## Contributing

Issues and PRs welcome! This is a production-ready foundation for remote command execution.
12 changes: 6 additions & 6 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,16 @@

### Why Rust Server?

At 200+ connections, Python asyncio:
- Uses 10GB+ memory (vs 2GB Rust)
- 80%+ CPU idle (vs 5% Rust)
- 500ms+ p99 latency (vs 20ms Rust)
- GIL contention kills performance
For high-concurrency workloads, Python asyncio:
- Uses significantly more memory than Rust
- Higher CPU usage and GIL contention
- Higher p99 latency
- Rust provides better performance and resource efficiency

### Why WebSocket?

- Persistent bidirectional connection
- Efficient for streaming (shell output)
- Efficient for streaming (session output)
- Well-supported libraries
- Can multiplex multiple sessions over one connection

Expand Down
16 changes: 6 additions & 10 deletions docs/DEVELOP.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ SandD/
│ ├── src/
│ │ ├── main.rs # Daemon entry point
│ │ ├── executor.rs # Command execution
│ │ ├── shell.rs # Shell (not implemented)
│ │ ├── session.rs # Interactive sessions (PTY)
│ │ └── protocol.rs # Message protocol
│ └── Cargo.toml
Expand Down Expand Up @@ -124,7 +124,7 @@ command_tx: mpsc::UnboundedSender<Message> // Stored in registry
**Incoming (Daemon → Python):**
```rust
pending_commands: oneshot::Sender<Result> // Request/Response
shell_sessions: mpsc::Sender<Vec<u8>> // Streaming
sessions: mpsc::Sender<Vec<u8>> // Streaming
file_transfers: Vec<Vec<u8>> // Chunked buffering
```

Expand Down Expand Up @@ -211,11 +211,7 @@ RUST_LOG=server=debug python3 examples/simple_test.py

### Not Implemented

1. **Interactive Shell**: Infrastructure exists, daemon returns "not implemented"
- Reason: `PtySystem` Sync issues
- Fix: Refactor shell manager to avoid Sync constraints

2. **File Transfer**: Protocol defined, daemon just logs
1. **File Transfer**: Protocol defined, daemon just logs
- Reason: Deferred for MVP
- Fix: Implement actual file I/O in daemon

Expand Down Expand Up @@ -278,14 +274,14 @@ Include motivation and context.
- Check daemon logs: `RUST_LOG=info ./target/release/sandd ...`

**Commands timing out:**
- Increase `timeout` parameter in `execute_command()` (in seconds)
- Increase `timeout` parameter in `exec()` (in seconds)
- Check daemon system resources: `top`, `free -h`
- Verify command actually completes when run manually
- Check daemon logs for errors

**High memory usage:**
- Monitor active shell sessions (they hold state)
- Close unused shell sessions
- Monitor active sessions (they hold state)
- Close unused sessions with `session.close()`
- Check number of connected daemons: `server.daemon_count()`

### Development Issues
Expand Down
Loading
Loading