Problem Statement
Users need to use Google Cloud SDKs (Drive, Sheets, BigQuery, etc.) from inside sandboxes. Google SDKs use Application Default Credentials (ADC), which resolves credentials through a fixed chain: GOOGLE_APPLICATION_CREDENTIALS file → well-known gcloud ADC file → GCE metadata server. The first two require raw secrets on disk inside the sandbox. The metadata server only exists on GCP compute. None work in an OpenShell sandbox today.
The proposed solution is to emulate the GCE metadata server via a loopback HTTP server inside the sandbox network namespace. The gateway already manages service account keys and generates short-lived tokens for the Vertex AI provider — this generalizes that capability.
Technical Context
ADC Resolution Chain
Google SDKs resolve credentials in this order (docs):
GOOGLE_APPLICATION_CREDENTIALS env var → reads JSON key file (raw private key on disk)
~/.config/gcloud/application_default_credentials.json → has refresh token (raw secret)
- GCE metadata server at
http://{GCE_METADATA_HOST}/computeMetadata/v1/... → short-lived tokens (no secrets on disk)
- Fail
Options 1 & 2 violate OpenShell's security model (no raw secrets in sandbox). Option 3 only works on GCP compute — unless we emulate it.
Critical Finding: SDKs Bypass HTTP_PROXY for Metadata
Deep investigation of the SDK source code revealed that neither Go nor Python honors HTTP_PROXY for metadata requests. This rules out a proxy-interception-only approach.
| SDK |
Detection |
Data Fetches |
Transport |
Proxy? |
Go cloud.google.com/go |
GCE_METADATA_HOST set → OnGCE() returns true immediately |
Custom http.Transport{Proxy: nil} |
Direct TCP |
Never |
Python google-auth |
ping() uses GCE_METADATA_IP (NOT GCE_METADATA_HOST) via raw http.client.HTTPConnection |
_http_client.Request for init, requests.Session for refresh |
Direct TCP (init), proxy-aware (refresh) |
Not for detection or init |
Node.js gcp-metadata |
BIOS probe + metadata ping |
Standard Node HTTP |
Direct TCP |
Honors proxy but doesn't need it |
Go: Creates &http.Transport{} with Proxy: nil — explicitly skips ProxyFromEnvironment. Not configurable. OnGCE() returns true immediately when GCE_METADATA_HOST is set ("The user explicitly said they're on GCE, so trust them").
Python: Two separate env vars: GCE_METADATA_IP (detection ping, default 169.254.169.254) and GCE_METADATA_HOST (data fetches, default metadata.google.internal). Detection uses http.client.HTTPConnection with zero proxy support. Not configurable — the source comment says: "This is only acceptable because the metadata server doesn't do SSL and never requires proxies." Both env vars must be set.
Node.js: METADATA_SERVER_DETECTION=assume-present bypasses BIOS detection. GCE_METADATA_HOST directs all requests to the loopback server.
Why Loopback Server, Not Proxy Interception
Since Go and Python hardcode direct TCP with no proxy support, the metadata emulator must be reachable via direct TCP inside the sandbox. A loopback server on 127.0.0.1:8174 handles all three SDKs uniformly.
Affected Components
Existing on main (to build upon)
| Component |
Key Files |
Relevance |
| Vertex AI provider profile |
providers/google-vertex-ai.yaml |
Pattern for SA key bootstrap + token refresh credentials |
| SA JWT token minting |
crates/openshell-server/src/provider_refresh.rs:501 |
mint_google_service_account_jwt() — reuse for google-cloud provider |
| Provider env resolution |
crates/openshell-server/src/grpc/provider.rs:425-534 |
resolve_provider_environment() — extend with GCP env injection |
| Credential state |
crates/openshell-sandbox/src/provider_credentials.rs |
ProviderCredentialState — extend with child_env_resolved() |
| Policy local handler |
crates/openshell-sandbox/src/policy_local.rs |
Pattern reference for synthetic HTTP serving |
| SSH netns connect |
crates/openshell-sandbox/src/ssh.rs |
Pattern reference for setns() on dedicated thread |
| Provider registry |
crates/openshell-providers/src/lib.rs |
Extend with google-cloud + vertex provider plugins |
| Bypass rules / nftables |
crates/openshell-sandbox/src/sandbox/linux/netns.rs |
Network namespace setup — loopback server binds inside netns |
New files to create
| Component |
File |
Purpose |
| Metadata server |
crates/openshell-sandbox/src/metadata_server.rs |
Generic loopback HTTP server with MetadataHandler trait |
| GCE metadata handler |
crates/openshell-sandbox/src/gcp_metadata.rs |
GCE metadata API implementation, OCSF logging |
| GCP constants |
crates/openshell-core/src/gcp.rs |
Shared constants: env var aliases, config keys, loopback address |
| GCP provider plugin |
crates/openshell-providers/src/providers/gcp.rs |
google-cloud provider type env injection |
| Vertex provider plugin |
crates/openshell-providers/src/providers/vertex.rs |
Extracted Vertex AI provider logic (from inline code in provider.rs) |
| GCP provider profile |
providers/google-cloud.yaml |
Credential definitions: service_account_token, adc_token |
| Docs |
docs/sandboxes/gcp-credentials.mdx |
User-facing GCP credentials documentation |
Files to modify
| File |
Change |
crates/openshell-sandbox/src/lib.rs |
Add mod gcp_metadata, mod metadata_server; spawn loopback server in netns |
crates/openshell-sandbox/src/provider_credentials.rs |
Add child_env_resolved() — triple-layer env var injection |
crates/openshell-sandbox/src/secrets.rs |
Add placeholder_for_env_key() helper |
crates/openshell-server/src/grpc/provider.rs |
Replace inline Vertex AI config injection with registry.inject_env() |
crates/openshell-providers/src/lib.rs |
Register google-cloud and vertex provider plugins |
crates/openshell-providers/src/profiles.rs |
Add google-cloud.yaml to embedded profiles |
crates/openshell-providers/src/providers/mod.rs |
Add gcp and vertex modules |
crates/openshell-core/src/lib.rs |
Add pub mod gcp |
Technical Investigation
Architecture: Loopback Metadata Server
Google SDK in sandbox (Go, Python, Node.js)
│
│ GCE_METADATA_HOST=127.0.0.1:8174
│ GCE_METADATA_IP=127.0.0.1:8174 (Python detection)
│ METADATA_SERVER_DETECTION=assume-present (Node.js)
│
├─ Direct TCP to 127.0.0.1:8174
│
▼
Loopback Metadata Server (127.0.0.1:8174)
bound inside sandbox netns via setns()
MetadataHandler trait → GCE MetadataContext
│
├─ Validates Metadata-Flavor: Google header
├─ Rejects X-Forwarded-For (SSRF defense)
├─ Reads token from ProviderCredentialState
▼
Returns: {"access_token":"<placeholder>","expires_in":N,"token_type":"Bearer"}
│
▼
SDK uses token in Authorization header on outbound *.googleapis.com requests
(routed through proxy with normal egress policy, placeholder resolved to real token)
Loopback Server Design
- Namespace entry: Dedicated OS thread calls
setns(netns_fd, CLONE_NEWNET) then TcpListener::bind("127.0.0.1:8174"). Thread exits after bind — no tokio thread pool contamination. Same pattern as ssh.rs::connect_in_netns.
- Accept loop: Runs on tokio runtime. 32 max concurrent connections, 4096 byte request cap.
- Handler trait:
MetadataHandler is generic — future cloud providers (AWS IMDS, Azure IMDS) can reuse the same bind_in_netns + accept loop infrastructure.
- Port choice: 8174 (unprivileged, avoids
CAP_NET_BIND_SERVICE).
- Readiness signal:
oneshot::Sender<SocketAddr> signals when bound, with 5s timeout.
- Conditional startup: Only starts if
GCE_METADATA_HOST is present in provider env (i.e., a google-cloud provider is attached).
SDK Compatibility: Three-Layer Env Var Injection
A new child_env_resolved() method on ProviderCredentialState handles all three SDKs:
| Env Var |
Value |
Purpose |
GCE_METADATA_HOST |
127.0.0.1:8174 |
Go: instant OnGCE()=true + data fetch target. Python/Node.js: data fetch target |
GCE_METADATA_IP |
127.0.0.1:8174 |
Python: detection ping target (separate from GCE_METADATA_HOST) |
METADATA_SERVER_DETECTION |
assume-present |
Node.js: skip BIOS probe that fails in sandboxes |
GCP_PROJECT_ID, GOOGLE_CLOUD_PROJECT |
Resolved from provider config |
Non-secret config, un-placeholderized for SDK startup reads |
CLOUD_ML_REGION, GCP_LOCATION |
Resolved from provider config |
Region aliases |
GCP_SERVICE_ACCOUNT_EMAIL |
Resolved from provider config |
SA email for metadata /email endpoint |
GCE Metadata API Surface
| Endpoint |
Response |
Content-Type |
GET / |
computeMetadata/\n |
text/plain |
GET /computeMetadata/v1/instance/service-accounts/default/token |
{"access_token":"<placeholder>","expires_in":N,"token_type":"Bearer"} |
application/json |
GET /computeMetadata/v1/instance/service-accounts/default/email |
SA email (real value) |
text/plain |
GET /computeMetadata/v1/instance/service-accounts/default/scopes |
https://www.googleapis.com/auth/cloud-platform |
text/plain |
GET /computeMetadata/v1/instance/service-accounts/default/aliases |
default\n |
text/plain |
GET /computeMetadata/v1/instance/service-accounts/default?recursive=true |
JSON with aliases, email, scopes |
application/json |
GET /computeMetadata/v1/instance/service-accounts/ |
default/\n |
text/plain |
GET /computeMetadata/v1/project/project-id |
Project ID (real value) |
text/plain |
All responses include Metadata-Flavor: Google header. Requests without Metadata-Flavor: Google → 403. Requests with X-Forwarded-For → 403.
Token Security: Placeholder-Based Resolution
The metadata /token endpoint serves placeholders (openshell:resolve:env:v{revision}_GCP_SA_ACCESS_TOKEN), not real token values. Real values are only resolved at the proxy layer when the token is used in outbound API requests via Authorization: Bearer headers. Non-secret values (project ID, SA email) are served as real values, matching real GCE metadata server behavior.
Provider Architecture
New google-cloud provider type alongside existing google-vertex-ai:
service_account_token: Gateway signs JWT with SA key, exchanges at oauth2.googleapis.com/token for access token. cloud-platform scope covers all Google APIs. Reuses existing mint_google_service_account_jwt().
adc_token: Gateway exchanges gcloud ADC refresh token for access token. Same OAuth2 refresh flow as existing Vertex AI ADC credential.
inject_env(): Generalized via ProviderRegistry — replaces inline Vertex AI config injection in resolve_provider_environment(). Both google-cloud and google-vertex-ai providers inject type-specific env vars through the same interface.
Existing Patterns to Follow
| Pattern |
Location on main |
How it applies |
| Synthetic HTTP serving |
crates/openshell-sandbox/src/policy_local.rs |
PolicyLocalContext route dispatch, response format, OCSF audit |
setns() on dedicated thread |
crates/openshell-sandbox/src/ssh.rs |
connect_in_netns pattern — OS thread + setns to avoid tokio contamination |
| SA JWT token minting |
crates/openshell-server/src/provider_refresh.rs:501-544 |
mint_google_service_account_jwt() — RS256 JWT → Google access token |
| Provider env var injection |
crates/openshell-server/src/grpc/provider.rs:488-527 |
Inline Vertex AI config injection — to be generalized |
| Credential snapshot |
crates/openshell-sandbox/src/provider_credentials.rs |
ProviderCredentialState atomic snapshot for credential access |
| Provider profile YAML |
providers/google-vertex-ai.yaml |
Credential definitions, refresh strategies, scopes |
Proposed Approach
Loopback HTTP server on 127.0.0.1:8174 inside the sandbox netns, reachable by all three SDKs via direct TCP. Three env vars (GCE_METADATA_HOST, GCE_METADATA_IP, METADATA_SERVER_DETECTION) ensure detection succeeds across Go, Python, and Node.js. MetadataHandler trait enables future cloud provider emulators (AWS IMDS, Azure). New google-cloud provider type with SA JWT + ADC OAuth2 refresh. SA keys never enter the sandbox.
Scope Assessment
- Complexity: Medium
- Confidence: High — clear implementation path, all building blocks exist on
main
- New files: 7
- Modified files: 8
- Issue type:
feat
Risks & Open Questions
- Loopback server lifecycle: Server runs for sandbox lifetime. If credential state is stale (refresh delay), tokens served may be near-expiry. Same latency as current env var injection — acceptable.
- Port 8174 collision: Unlikely but possible if sandbox processes bind the same port. Could be made configurable.
setns safety: Uses a dedicated OS thread that exits after bind — no namespace contamination of tokio thread pool.
- Placeholder tokens: SDKs receive placeholder strings, not real tokens. SDKs that validate token format (e.g., check for
ya29. prefix) may fail. Proxy resolves placeholders on outbound requests.
- Opt-in activation: Metadata server only starts if
GCE_METADATA_HOST is present in provider env (conditional on having a google-cloud provider attached).
- Egress policy: Sandboxes still need egress policy allowing
*.googleapis.com for actual API calls.
/universe/universe_domain endpoint: Go SDK fetches this during ADC init. Should return 404 (triggers default googleapis.com fallback) or serve the value.
- Gateway config docs: No new gateway TOML fields — activation is purely provider-driven.
Test Considerations
- Unit tests: Metadata handler endpoint routing,
Metadata-Flavor: Google header enforcement, X-Forwarded-For rejection, response format, expiry computation, missing credential handling (503), unknown paths (404)
- Unit tests:
STATIC_CONFIG_KEYS consistency with alias arrays, config_key() resolution, token env key ordering
- Unit tests: Provider injection — metadata host set, project ID/region/email propagation, non-overwrite of user values
- Integration tests (requires root): Loopback server bind-in-netns, verify TCP reachability from sandbox namespace
- E2E tests: Sandbox with
google-cloud provider, verify Python/Go/Node.js ADC detection succeeds and token fetch returns valid placeholder
- Negative tests: Missing
Metadata-Flavor header → 403, X-Forwarded-For → 403, unknown paths → 404, no credentials → 503, POST method → 405
Created by spike investigation. Use build-from-issue to plan and implement.
Problem Statement
Users need to use Google Cloud SDKs (Drive, Sheets, BigQuery, etc.) from inside sandboxes. Google SDKs use Application Default Credentials (ADC), which resolves credentials through a fixed chain:
GOOGLE_APPLICATION_CREDENTIALSfile → well-known gcloud ADC file → GCE metadata server. The first two require raw secrets on disk inside the sandbox. The metadata server only exists on GCP compute. None work in an OpenShell sandbox today.The proposed solution is to emulate the GCE metadata server via a loopback HTTP server inside the sandbox network namespace. The gateway already manages service account keys and generates short-lived tokens for the Vertex AI provider — this generalizes that capability.
Technical Context
ADC Resolution Chain
Google SDKs resolve credentials in this order (docs):
GOOGLE_APPLICATION_CREDENTIALSenv var → reads JSON key file (raw private key on disk)~/.config/gcloud/application_default_credentials.json→ has refresh token (raw secret)http://{GCE_METADATA_HOST}/computeMetadata/v1/...→ short-lived tokens (no secrets on disk)Options 1 & 2 violate OpenShell's security model (no raw secrets in sandbox). Option 3 only works on GCP compute — unless we emulate it.
Critical Finding: SDKs Bypass HTTP_PROXY for Metadata
Deep investigation of the SDK source code revealed that neither Go nor Python honors
HTTP_PROXYfor metadata requests. This rules out a proxy-interception-only approach.cloud.google.com/goGCE_METADATA_HOSTset →OnGCE()returnstrueimmediatelyhttp.Transport{Proxy: nil}google-authping()usesGCE_METADATA_IP(NOTGCE_METADATA_HOST) via rawhttp.client.HTTPConnection_http_client.Requestfor init,requests.Sessionfor refreshgcp-metadataGo: Creates
&http.Transport{}withProxy: nil— explicitly skipsProxyFromEnvironment. Not configurable.OnGCE()returnstrueimmediately whenGCE_METADATA_HOSTis set ("The user explicitly said they're on GCE, so trust them").Python: Two separate env vars:
GCE_METADATA_IP(detection ping, default169.254.169.254) andGCE_METADATA_HOST(data fetches, defaultmetadata.google.internal). Detection useshttp.client.HTTPConnectionwith zero proxy support. Not configurable — the source comment says: "This is only acceptable because the metadata server doesn't do SSL and never requires proxies." Both env vars must be set.Node.js:
METADATA_SERVER_DETECTION=assume-presentbypasses BIOS detection.GCE_METADATA_HOSTdirects all requests to the loopback server.Why Loopback Server, Not Proxy Interception
Since Go and Python hardcode direct TCP with no proxy support, the metadata emulator must be reachable via direct TCP inside the sandbox. A loopback server on
127.0.0.1:8174handles all three SDKs uniformly.Affected Components
Existing on
main(to build upon)providers/google-vertex-ai.yamlcrates/openshell-server/src/provider_refresh.rs:501mint_google_service_account_jwt()— reuse forgoogle-cloudprovidercrates/openshell-server/src/grpc/provider.rs:425-534resolve_provider_environment()— extend with GCP env injectioncrates/openshell-sandbox/src/provider_credentials.rsProviderCredentialState— extend withchild_env_resolved()crates/openshell-sandbox/src/policy_local.rscrates/openshell-sandbox/src/ssh.rssetns()on dedicated threadcrates/openshell-providers/src/lib.rsgoogle-cloud+vertexprovider pluginscrates/openshell-sandbox/src/sandbox/linux/netns.rsNew files to create
crates/openshell-sandbox/src/metadata_server.rsMetadataHandlertraitcrates/openshell-sandbox/src/gcp_metadata.rscrates/openshell-core/src/gcp.rscrates/openshell-providers/src/providers/gcp.rsgoogle-cloudprovider type env injectioncrates/openshell-providers/src/providers/vertex.rsprovider.rs)providers/google-cloud.yamlservice_account_token,adc_tokendocs/sandboxes/gcp-credentials.mdxFiles to modify
crates/openshell-sandbox/src/lib.rsmod gcp_metadata,mod metadata_server; spawn loopback server in netnscrates/openshell-sandbox/src/provider_credentials.rschild_env_resolved()— triple-layer env var injectioncrates/openshell-sandbox/src/secrets.rsplaceholder_for_env_key()helpercrates/openshell-server/src/grpc/provider.rsregistry.inject_env()crates/openshell-providers/src/lib.rsgoogle-cloudandvertexprovider pluginscrates/openshell-providers/src/profiles.rsgoogle-cloud.yamlto embedded profilescrates/openshell-providers/src/providers/mod.rsgcpandvertexmodulescrates/openshell-core/src/lib.rspub mod gcpTechnical Investigation
Architecture: Loopback Metadata Server
Loopback Server Design
setns(netns_fd, CLONE_NEWNET)thenTcpListener::bind("127.0.0.1:8174"). Thread exits after bind — no tokio thread pool contamination. Same pattern asssh.rs::connect_in_netns.MetadataHandleris generic — future cloud providers (AWS IMDS, Azure IMDS) can reuse the samebind_in_netns+ accept loop infrastructure.CAP_NET_BIND_SERVICE).oneshot::Sender<SocketAddr>signals when bound, with 5s timeout.GCE_METADATA_HOSTis present in provider env (i.e., agoogle-cloudprovider is attached).SDK Compatibility: Three-Layer Env Var Injection
A new
child_env_resolved()method onProviderCredentialStatehandles all three SDKs:GCE_METADATA_HOST127.0.0.1:8174OnGCE()=true+ data fetch target. Python/Node.js: data fetch targetGCE_METADATA_IP127.0.0.1:8174GCE_METADATA_HOST)METADATA_SERVER_DETECTIONassume-presentGCP_PROJECT_ID,GOOGLE_CLOUD_PROJECTCLOUD_ML_REGION,GCP_LOCATIONGCP_SERVICE_ACCOUNT_EMAIL/emailendpointGCE Metadata API Surface
GET /computeMetadata/\ntext/plainGET /computeMetadata/v1/instance/service-accounts/default/token{"access_token":"<placeholder>","expires_in":N,"token_type":"Bearer"}application/jsonGET /computeMetadata/v1/instance/service-accounts/default/emailtext/plainGET /computeMetadata/v1/instance/service-accounts/default/scopeshttps://www.googleapis.com/auth/cloud-platformtext/plainGET /computeMetadata/v1/instance/service-accounts/default/aliasesdefault\ntext/plainGET /computeMetadata/v1/instance/service-accounts/default?recursive=trueapplication/jsonGET /computeMetadata/v1/instance/service-accounts/default/\ntext/plainGET /computeMetadata/v1/project/project-idtext/plainAll responses include
Metadata-Flavor: Googleheader. Requests withoutMetadata-Flavor: Google→ 403. Requests withX-Forwarded-For→ 403.Token Security: Placeholder-Based Resolution
The metadata
/tokenendpoint serves placeholders (openshell:resolve:env:v{revision}_GCP_SA_ACCESS_TOKEN), not real token values. Real values are only resolved at the proxy layer when the token is used in outbound API requests viaAuthorization: Bearerheaders. Non-secret values (project ID, SA email) are served as real values, matching real GCE metadata server behavior.Provider Architecture
New
google-cloudprovider type alongside existinggoogle-vertex-ai:service_account_token: Gateway signs JWT with SA key, exchanges atoauth2.googleapis.com/tokenfor access token.cloud-platformscope covers all Google APIs. Reuses existingmint_google_service_account_jwt().adc_token: Gateway exchanges gcloud ADC refresh token for access token. Same OAuth2 refresh flow as existing Vertex AI ADC credential.inject_env(): Generalized viaProviderRegistry— replaces inline Vertex AI config injection inresolve_provider_environment(). Bothgoogle-cloudandgoogle-vertex-aiproviders inject type-specific env vars through the same interface.Existing Patterns to Follow
maincrates/openshell-sandbox/src/policy_local.rsPolicyLocalContextroute dispatch, response format, OCSF auditsetns()on dedicated threadcrates/openshell-sandbox/src/ssh.rsconnect_in_netnspattern — OS thread +setnsto avoid tokio contaminationcrates/openshell-server/src/provider_refresh.rs:501-544mint_google_service_account_jwt()— RS256 JWT → Google access tokencrates/openshell-server/src/grpc/provider.rs:488-527crates/openshell-sandbox/src/provider_credentials.rsProviderCredentialStateatomic snapshot for credential accessproviders/google-vertex-ai.yamlProposed Approach
Loopback HTTP server on
127.0.0.1:8174inside the sandbox netns, reachable by all three SDKs via direct TCP. Three env vars (GCE_METADATA_HOST,GCE_METADATA_IP,METADATA_SERVER_DETECTION) ensure detection succeeds across Go, Python, and Node.js.MetadataHandlertrait enables future cloud provider emulators (AWS IMDS, Azure). Newgoogle-cloudprovider type with SA JWT + ADC OAuth2 refresh. SA keys never enter the sandbox.Scope Assessment
mainfeatRisks & Open Questions
setnssafety: Uses a dedicated OS thread that exits after bind — no namespace contamination of tokio thread pool.ya29.prefix) may fail. Proxy resolves placeholders on outbound requests.GCE_METADATA_HOSTis present in provider env (conditional on having agoogle-cloudprovider attached).*.googleapis.comfor actual API calls./universe/universe_domainendpoint: Go SDK fetches this during ADC init. Should return 404 (triggers defaultgoogleapis.comfallback) or serve the value.Test Considerations
Metadata-Flavor: Googleheader enforcement,X-Forwarded-Forrejection, response format, expiry computation, missing credential handling (503), unknown paths (404)STATIC_CONFIG_KEYSconsistency with alias arrays,config_key()resolution, token env key orderinggoogle-cloudprovider, verify Python/Go/Node.js ADC detection succeeds and token fetch returns valid placeholderMetadata-Flavorheader → 403,X-Forwarded-For→ 403, unknown paths → 404, no credentials → 503, POST method → 405Created by spike investigation. Use
build-from-issueto plan and implement.