diff --git a/docs/adr/34797-fast-path-yaml-bypass-for-permission-scope-validation.md b/docs/adr/34797-fast-path-yaml-bypass-for-permission-scope-validation.md new file mode 100644 index 00000000000..f0f396aeae3 --- /dev/null +++ b/docs/adr/34797-fast-path-yaml-bypass-for-permission-scope-validation.md @@ -0,0 +1,91 @@ +# ADR-34797: Fast-Path YAML Bypass for Permission Scope-Name Validation + +**Date**: 2026-05-26 +**Status**: Draft +**Deciders**: pelikhan (copilot-swe-agent) + +--- + +## Part 1 — Narrative (Human-Friendly) + +### Context + +A validation microbenchmark regressed to ~29.7µs/op (+17.9% vs historical baseline). Profiling identified `ValidatePermissionScopeNames` on the hot path of workflow compilation, dominated by two costs: (a) per-call construction of the full known-scope list (`GetAllPermissionScopes` + `GetAllGitHubAppOnlyScopes` + meta-key map + scope preview string), and (b) `yaml.Unmarshal` on the permissions block, which allocates a generic `map[string]any` even when the input is the trivial common case — a flat block-style mapping of scope → access level. Because permissions blocks in real workflows are nearly always written in block style with a small number of keys, the YAML decoder is doing far more work than the validation actually needs. + +### Decision + +We will short-circuit `ValidatePermissionScopeNames` with a fast extraction path that reads top-level scope keys directly from block-style YAML without invoking `yaml.Unmarshal`, falling back to the existing YAML-based parsing only when the content uses flow mappings, sequences, or other non-trivial YAML structures (detected by presence of `{`, `}`, `[`, `]`). Reusable validation state — the full scope list, the meta-key set (`all`, `read-all`, `write-all`, `none`), and the preformatted scope preview string used in error messages — is hoisted to package-level `var` initialization so it is built once at startup rather than on every call. Single-key validation logic is centralized in `validateSinglePermissionScopeKey` so the fast and fallback paths share identical error semantics. + +### Alternatives Considered + +#### Alternative 1: Keep `yaml.Unmarshal` and cache only the package-level state + +Hoisting the scope list, meta-key set, and preview string to `var` initialization alone would eliminate the per-call list construction (a meaningful chunk of the regression) without touching the YAML parsing path. This was considered because it is a strictly smaller change with no behavioral risk: every input continues through the same `yaml.Unmarshal` code path. It was rejected because profiling showed `yaml.Unmarshal` of the permissions block was itself a material cost (allocating `map[string]any` and decoder state on every call), and the common-case input is so structurally simple — a flat list of `key: level` lines — that parsing it as full YAML is disproportionate work. + +#### Alternative 2: Replace YAML parsing entirely with a hand-rolled parser + +Removing `yaml.Unmarshal` from this function altogether and parsing every input — including flow mappings and quoted keys with embedded colons — with custom code would maximize the speed-up. This was rejected because the GitHub Actions YAML spec allows several non-block forms (flow mappings like `{contents: read, issues: write}`, anchors, multi-line scalars) that are tedious and error-prone to reimplement. The hybrid approach — fast path for the easy common case, real YAML decoder for everything else — captures most of the speed-up while preserving exact behavior for edge cases that already work today. + +#### Alternative 3: Cache parsed results across calls + +A keyed cache from permissions-YAML string to validation result would skip both the YAML parse and the validation work on repeat inputs. This was not chosen because the validator is called from a compile pipeline that already memoizes higher-level work (see ADR-28560), and the inputs vary enough per workflow that a cache hit rate would not justify the cache plumbing, eviction policy, and concurrent-access surface area. + +### Consequences + +#### Positive +- The hot common case (block-style permissions with a handful of keys) avoids `yaml.Unmarshal` entirely, removing the dominant per-call allocation and decoder cost. +- Package-level cached scope list, meta-key set, and preview string remove the repeated `GetAllPermissionScopes`/`GetAllGitHubAppOnlyScopes` traversals and slice allocations from every validation call. +- `validateSinglePermissionScopeKey` centralizes the meta-key check, exact-match check, case-only suggestion, and fuzzy-match error message — both the fast path and the YAML fallback emit identical errors, removing a class of drift bugs. +- Flow-mapping inputs (`{contents: read, issues: write}`) and quoted-key inputs (`"contents": read`) are covered by added regression tests, so the dual-path design has explicit behavioral pinning. + +#### Negative +- There are now two code paths through `ValidatePermissionScopeNames` (hand-rolled fast extraction + YAML fallback). Future changes to validation semantics must be made in both `validateSinglePermissionScopeKey` callers and the fast path's structural assumptions. +- The fast-path detector keys off the literal presence of `{`, `}`, `[`, `]` anywhere in the content. A block-style permissions value containing these characters in a comment (e.g., `# example: {contents: read}`) will unnecessarily fall back to YAML parsing. This is a correctness-preserving false positive, not a behavior change, but it means the speed-up does not apply to inputs with such comments. +- Package-level initialization runs at import time, including `safeAllocationCapacity` and the join over the preview slice. Any future panic in those builders would now manifest at package init rather than first call. + +#### Neutral +- The function continues to return `nil` for inputs that are not maps (shorthand strings like `read-all`), matching prior behavior. +- The fast path explicitly accepts quoted keys by trimming `"` and `'` from extracted keys; quoted-key inputs that previously went through `yaml.Unmarshal` are now handled by the fast path with no observable difference. +- `countLeadingSpacesAndTabs` and `extractTopLevelPermissionScopeKeys` are new package-private helpers; their scope is intentionally narrow to this one validator and they are not exported. + +--- + +## Part 2 — Normative Specification (RFC 2119) + +> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119). + +### Fast-Path Extraction + +1. `ValidatePermissionScopeNames` **MUST** attempt fast-path extraction of top-level scope keys via `extractTopLevelPermissionScopeKeys` before invoking `yaml.Unmarshal`. +2. The fast path **MUST** decline (return `parsedFast == false`) when the content contains any of the characters `{`, `}`, `[`, or `]`, so flow mappings and sequences are routed to the YAML fallback. +3. The fast path **MUST** decline when any non-empty, non-comment line does not contain a `:` character at column > 0, or begins with `-` after trimming, so non-mapping inputs are routed to the YAML fallback. +4. When the fast path succeeds, it **MUST** trim surrounding `"` and `'` from each extracted key before validation, so quoted keys are accepted as equivalent to their unquoted form. +5. When the fast path succeeds and all extracted keys validate, `ValidatePermissionScopeNames` **MUST** return `nil` without invoking `yaml.Unmarshal`. + +### YAML Fallback Path + +1. When the fast path declines, `ValidatePermissionScopeNames` **MUST** fall back to `yaml.Unmarshal` of the content into a `map[string]any`. +2. If `yaml.Unmarshal` returns an error, the function **MUST** return `nil` (treating the input as a non-map shorthand), preserving prior behavior. +3. The fallback **MUST** invoke `validateSinglePermissionScopeKey` for each key in the resulting map, so error messages and suggestions are identical to the fast path. + +### Centralized Single-Key Validation + +1. Both the fast path and the YAML fallback **MUST** route per-key validation through `validateSinglePermissionScopeKey`; neither path may inline its own meta-key, exact-match, or fuzzy-match logic. +2. `validateSinglePermissionScopeKey` **MUST** accept any key present in `permissionScopeValidationMeta` (currently `all`, `read-all`, `write-all`, `none`) without error. +3. `validateSinglePermissionScopeKey` **MUST** accept any key present in `validPermissionScopes` without error. +4. For unknown keys that differ from a valid scope only by letter case, `validateSinglePermissionScopeKey` **MUST** return an error suggesting the lowercase form. +5. For unknown keys with no case-only match, `validateSinglePermissionScopeKey` **MUST** consult `stringutil.FindClosestMatches` against `permissionScopeValidationAllScopes` with a limit of 3 suggestions, and **MUST** return `nil` (silent accept) when no suggestions are found. + +### Package-Level Cached State + +1. `permissionScopeValidationAllScopes`, `permissionScopeValidationMeta`, and `permissionScopeValidationValidScopesPreview` **MUST** be initialized once at package load time as package-level `var`s. +2. `ValidatePermissionScopeNames` and `validateSinglePermissionScopeKey` **MUST NOT** rebuild the full scope list, meta-key map, or preview string on each call. +3. The preview string **MUST** contain at most the first 10 entries of `permissionScopeValidationAllScopes` joined by `", "`, followed by `"..."`. + +### Conformance + +An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance. + +--- + +*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/26425418484) workflow. The PR author must review, complete, and finalize this document before the PR can merge.* diff --git a/pkg/workflow/permissions_scope_validation_test.go b/pkg/workflow/permissions_scope_validation_test.go index 8a70cd4c14f..61daf62b809 100644 --- a/pkg/workflow/permissions_scope_validation_test.go +++ b/pkg/workflow/permissions_scope_validation_test.go @@ -40,6 +40,17 @@ pull-requests: read`, issues: write`, wantErr: false, }, + { + name: "quoted scope keys are valid", + yaml: `"contents": read +"pull-requests": write`, + wantErr: false, + }, + { + name: "flow mapping shorthand remains valid", + yaml: `{contents: read, issues: write}`, + wantErr: false, + }, { name: "typo in scope name suggests correction", yaml: `contnts: read`, diff --git a/pkg/workflow/permissions_validation.go b/pkg/workflow/permissions_validation.go index ccccfd64101..12e9da9556d 100644 --- a/pkg/workflow/permissions_validation.go +++ b/pkg/workflow/permissions_validation.go @@ -14,6 +14,20 @@ import ( "github.com/goccy/go-yaml" ) +var ( + permissionScopeValidationAllScopes = buildPermissionScopeValidationAllScopes() + permissionScopeValidationMeta = map[string]struct{}{ + "all": {}, + "read-all": {}, + "write-all": {}, + "none": {}, + } + permissionScopeValidationValidScopesPreview = strings.Join( + permissionScopeValidationAllScopes[:min(10, len(permissionScopeValidationAllScopes))], + ", ", + ) + "..." +) + // PermissionsValidationResult contains the result of permissions validation type PermissionsValidationResult struct { MissingPermissions map[PermissionScope]PermissionLevel // Permissions required but not granted @@ -402,27 +416,6 @@ func ValidatePermissionScopeNames(permissionsYAML string) error { permissionsValidationLog.Print("Validating permission scope names") - // Collect all valid scope names for fuzzy matching - ghTokenScopes := GetAllPermissionScopes() - appOnlyScopes := GetAllGitHubAppOnlyScopes() - // +1 for copilot-requests which is not in GetAllPermissionScopes - allScopes := make([]string, 0, safeAllocationCapacity(len(ghTokenScopes), len(appOnlyScopes), 1)) - for _, scope := range ghTokenScopes { - allScopes = append(allScopes, string(scope)) - } - for _, scope := range appOnlyScopes { - allScopes = append(allScopes, string(scope)) - } - // copilot-requests is valid even though not in GetAllPermissionScopes - allScopes = append(allScopes, string(PermissionCopilotRequests)) - // "all" is a meta-key that is always valid in shorthand contexts - validMeta := map[string]bool{ - "all": true, - "read-all": true, - "write-all": true, - "none": true, - } - // Strip optional "permissions:" prefix so we can parse just the map content content := strings.TrimSpace(permissionsYAML) if strings.HasPrefix(content, "permissions:") { @@ -435,7 +428,17 @@ func ValidatePermissionScopeNames(permissionsYAML string) error { } } - // Try to parse the content as a YAML map of scope → level + scopeKeys, parsedFast := extractTopLevelPermissionScopeKeys(content) + if parsedFast { + for _, scopeKey := range scopeKeys { + if err := validateSinglePermissionScopeKey(scopeKey); err != nil { + return err + } + } + return nil + } + + // Fallback to YAML map parsing for non-standard formats (flow mappings, etc.). var permsMap map[string]any if err := yaml.Unmarshal([]byte(content), &permsMap); err != nil { // Not a map (e.g., a shorthand like "read-all"); nothing to validate @@ -443,42 +446,116 @@ func ValidatePermissionScopeNames(permissionsYAML string) error { } for scopeKey := range permsMap { - if validMeta[scopeKey] { - continue + if err := validateSinglePermissionScopeKey(scopeKey); err != nil { + return err + } + } + + return nil +} + +func validateSinglePermissionScopeKey(scopeKey string) error { + if _, ok := permissionScopeValidationMeta[scopeKey]; ok { + return nil + } + if _, ok := validPermissionScopes[scopeKey]; ok { + return nil + } + + // Unknown scope key — check for a case-only difference first (e.g. "Contents" → "contents") + lowerScopeKey := strings.ToLower(scopeKey) + if lowerScopeKey != scopeKey { + if _, ok := validPermissionScopes[lowerScopeKey]; ok { + return fmt.Errorf( + "unknown permission scope %q.\n\nDid you mean: %s?\n\nValid permission scopes include: %s\n\nSee: %s", + scopeKey, + lowerScopeKey, + permissionScopeValidationValidScopesPreview, + constants.DocsPermissionsURL, + ) } - if _, ok := validPermissionScopes[scopeKey]; ok { + } + + // Check for a close fuzzy match + permissionsValidationLog.Printf("Unknown permission scope key: %q", scopeKey) + suggestions := stringutil.FindClosestMatches(scopeKey, permissionScopeValidationAllScopes, 3) + if len(suggestions) == 0 { + return nil // too different to be a typo, ignore silently + } + + return fmt.Errorf( + "unknown permission scope %q.\n\nDid you mean: %s?\n\nValid permission scopes include: %s\n\nSee: %s", + scopeKey, + strings.Join(suggestions, ", "), + permissionScopeValidationValidScopesPreview, + constants.DocsPermissionsURL, + ) +} + +func extractTopLevelPermissionScopeKeys(content string) ([]string, bool) { + if content == "" { + return nil, false + } + // Flow-style mappings ("{contents: read}") and sequence values are uncommon here. + // Let the YAML decoder handle those to preserve behavior. + if strings.Contains(content, "{") || strings.Contains(content, "}") || strings.Contains(content, "[") || strings.Contains(content, "]") { + return nil, false + } + + lines := strings.Split(content, "\n") + keys := make([]string, 0, 4) + baseIndent := -1 + + for _, line := range lines { + trimmed := strings.TrimSpace(line) + if trimmed == "" || strings.HasPrefix(trimmed, "#") { continue } + colonIdx := strings.IndexByte(trimmed, ':') + if colonIdx <= 0 { + return nil, false + } - // Unknown scope key — check for a case-only difference first (e.g. "Contents" → "contents") - lowerScopeKey := strings.ToLower(scopeKey) - if lowerScopeKey != scopeKey { - if _, ok := validPermissionScopes[lowerScopeKey]; ok { - return fmt.Errorf( - "unknown permission scope %q.\n\nDid you mean: %s?\n\nValid permission scopes include: %s\n\nSee: %s", - scopeKey, - lowerScopeKey, - strings.Join(allScopes[:min(10, len(allScopes))], ", ")+"...", - constants.DocsPermissionsURL, - ) - } + indent := countLeadingSpacesAndTabs(line) + if baseIndent == -1 { + baseIndent = indent + } + if indent != baseIndent { + // Nested block content (value continuation), not a top-level scope key. + continue } - // Check for a close fuzzy match - permissionsValidationLog.Printf("Unknown permission scope key: %q", scopeKey) - suggestions := stringutil.FindClosestMatches(scopeKey, allScopes, 3) - if len(suggestions) == 0 { - continue // too different to be a typo, ignore silently + key := strings.TrimSpace(trimmed[:colonIdx]) + if key == "" || strings.HasPrefix(key, "-") { + return nil, false } + key = strings.Trim(key, `"'`) - return fmt.Errorf( - "unknown permission scope %q.\n\nDid you mean: %s?\n\nValid permission scopes include: %s\n\nSee: %s", - scopeKey, - strings.Join(suggestions, ", "), - strings.Join(allScopes[:min(10, len(allScopes))], ", ")+"...", - constants.DocsPermissionsURL, - ) + keys = append(keys, key) } - return nil + return keys, len(keys) > 0 +} + +func countLeadingSpacesAndTabs(s string) int { + i := 0 + for i < len(s) && (s[i] == ' ' || s[i] == '\t') { + i++ + } + return i +} + +func buildPermissionScopeValidationAllScopes() []string { + ghTokenScopes := GetAllPermissionScopes() + appOnlyScopes := GetAllGitHubAppOnlyScopes() + allScopes := make([]string, 0, safeAllocationCapacity(len(ghTokenScopes), len(appOnlyScopes), 1)) + for _, scope := range ghTokenScopes { + allScopes = append(allScopes, string(scope)) + } + for _, scope := range appOnlyScopes { + allScopes = append(allScopes, string(scope)) + } + // copilot-requests is valid even though not in GetAllPermissionScopes + allScopes = append(allScopes, string(PermissionCopilotRequests)) + return allScopes }