Skip to content

Envoy router verifies ate apiserver's serving cert#237

Open
Zoe Zhao (zoez7) wants to merge 1 commit into
agent-substrate:mainfrom
zoez7:mtls
Open

Envoy router verifies ate apiserver's serving cert#237
Zoe Zhao (zoez7) wants to merge 1 commit into
agent-substrate:mainfrom
zoez7:mtls

Conversation

@zoez7

@zoez7 Zoe Zhao (zoez7) commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Part of #170
Establishes mutual TLS between the atenet router and ate-apiserver, and updates the certificate plumbing it depends on.

Main changes:

  1. The atenet router now verifies ate apiserver' serving certificate, and presents its client cert to ate apiserver. Previously the connection used InsecureSkipVerify.

Minor bug fixes:

  1. Prevent servicednssigner from signing a cert with no DNS SANs. Also updated valkey cluster's cert configuration, because it was relying on the cert with empty DNS.
  • Tests pass
  • Appropriate changes to documentation are included in the PR

@ahmedtd

Copy link
Copy Markdown
Collaborator

Can you talk a little bit about the background of splitting out separate CAs / signers for the workerpools and ate-system components?

If at all possible, I would prefer to restrict us to two signers:

  • One that issues certs with DNS SANs based on the K8s services that a pod is part of (though opt-in with an annotation or label is fine).
  • One that issues SPIFFE certs that encode the k8s service account identity of the pod (and potentially, the pod ID, node ID etc in an additional extension).

The reason is that we can be reasonably confident that signers like these will land in upstream K8s in the near-ish future. The certificates here aren't special or unique to substrate, and they aren't part of the value proposition of the project. So I think we should plan to rebase on native K8s capabilities as they become available. That's easiest if we don't get very custom with the signers.

@zoez7

Copy link
Copy Markdown
Collaborator Author

I think the main motivations for splitting out separate CAs / signers was, if I mount the "podidentity" CA bundle as trusted client CAs in ate-system services, workerpool pods (therefore actors) would also be able to connect to ate-system servers like the Valkey cluster and atelet server.

But after typing it out loud - it seems this needs to be solved anyway (probably in a later milestone) by authorization inside servers such as valkey, instead of using separate CAs. Does that make sense to you?

@ahmedtd

Copy link
Copy Markdown
Collaborator

I think the main motivations for splitting out separate CAs / signers was, if I mount the "podidentity" CA bundle as trusted client CAs in ate-system services, workerpool pods (therefore actors) would also be able to connect to ate-system servers like the Valkey cluster and atelet server.

But after typing it out loud - it seems this needs to be solved anyway (probably in a later milestone) by authorization inside servers such as valkey, instead of using separate CAs. Does that make sense to you?

Yeah --- we should always draw a bright line between authentication and authorization. Assuming that a given client is authorized just because they were able to get a certificate from a particular CA has caused a lot of security problems in the past.

Unfortunately, for the specific case of Valkey we might be limited by its capabilities, in which case it might require a separate signer. Let me check the docs.

@ahmedtd

Copy link
Copy Markdown
Collaborator

It looks like valkey can do it: https://valkey.io/topics/tls/#:~:text=Client%20certificate%20authentication

@zoez7 Zoe Zhao (zoez7) force-pushed the mtls branch 2 times, most recently from c80f243 to 683d085 Compare June 16, 2026 00:51
Namespace string
Kubeconfig string
AteapiAddr string
AteapiClientCertPath string

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we group the cert and auth related config into its own struct? Makes it easier to read:

type AuthConfig struct {...}

}

func fileExists(path string) bool {
if path == "" {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this case actually? Empty file will get a does not exist error anyway.

return false
}
_, err := os.Stat(path)
return err == nil

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear we should eat the error if it's not a ENOENT error

@@ -62,22 +58,24 @@ func init() {

// RouterConfig holds deployment setup and endpoint options for the router node instance.
type RouterConfig struct {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest:

  • Move RouterConfig to its own file (config.go)
  • Make it not exported (rename to routerConfig)
  • make ateAPITransportCreds() a member function
  • rename ateAPITransportCreds to apiTransportCredentials

}

func ateapiTLSConfig(cfg RouterConfig) (*tls.Config, error) {
haveCA := fileExists(cfg.AteapiCACertsPath)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: just call stat and return the error. If the file exists, it should stat() with no error anyway.

@bowei Bowei Du (bowei) self-assigned this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants