Python: Add repr() sanitizer to py/log-injection query by kiro-agent[bot] · Pull Request #2 · mrigankpawagi/codeql

kiro-agent · 2026-06-10T07:34:10Z

This pull request was created by @kiro-agent on behalf of @mrigankpawagi 👻

Comment with /kiro fix to address specific feedback or /kiro all to address everything.
_{Learn about Kiro Web}

Summary

This PR improves the py/log-injection query by recognizing repr() and the %r format specifier as sanitizers for log injection.

Problem

MRVA (Multi-Repository Variant Analysis) on the top-100 Python repositories revealed many false positives where user-controlled values are logged safely via repr() or %r. Since repr() escapes special characters like newlines (converting \n to the literal string \\n), it effectively prevents log injection attacks. The query was flagging these safe patterns unnecessarily.

Changes

New ReprCallSanitizer class - Recognizes calls to the built-in repr() function as sanitizers, since repr() escapes special characters such as newlines.
New isReprFormattedLoggingArg predicate - Excludes arguments that are formatted with %r in the logging format string from being considered sinks. When a logging call like logging.warning('User: %r', name) is used, the %r format specifier applies repr() internally, which escapes newlines.
Test cases - Added good_repr1() (using repr(name) in string concatenation) and good_repr2() (using %r format specifier) to the existing test file.
Change notes - Added under both python/ql/lib/change-notes/ and python/ql/src/change-notes/.

MRVA Validation

Before: 335 alerts across top-100 Python repositories
After: 196 alerts (41% reduction in false positives)
No true positives were lost - all eliminated alerts were confirmed false positives where repr() or %r was used to sanitize the input.

Why this is correct

Python's repr() function returns a string representation that escapes special characters:

>>> repr("hello\nworld")
"'hello\\nworld'"

Since log injection relies on injecting newline characters to forge log entries, repr() is an effective sanitizer. The %r format specifier in logging calls applies repr() automatically.

Co-authored-by: Mrigank Pawagi <25179158+mrigankpawagi@users.noreply.github.com>

When CODEQL_EXTRACTOR_GO_OPTION_EXTRACT_TESTS=true is set, the Go extractor was incorrectly skipping internal test files (package foo) at repository roots when the project contains nested test packages. Root Cause: The extractor selected package variants by longest ID string, but this heuristic fails when nested packages have tests. For a package like "github.com/go-git/go-git/v6", packages.Load returns multiple variants: 1. "github.com/go-git/go-git/v6" (19 files, production only) 2. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6.test]" (39 files, production + 20 root tests) ← Should select this 3. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6/plumbing/format/packfile.test]" (19 files, test dependency) ← Was incorrectly selected (longest string) The old logic selected variant #3 (76 chars) over #2 (68 chars), causing 20 root test files to be missing from the database. Fix: Replace string length comparison with a better heuristic that prefers: 1. Exact test packages (e.g., "pkg [pkg.test]") over nested dependencies 2. Packages with more Syntax nodes (more files to extract) 3. String length as a tiebreaker This ensures the extractor selects the variant with the most complete test coverage, particularly for root-level internal tests. Testing: - Added comprehensive unit tests covering the selection logic - Tests simulate the real-world go-git scenario - All tests pass Impact: Root-level external tests (package foo_test) were already extracted correctly. This fix ensures internal tests (package foo) at the root are now also extracted when they exist alongside nested test packages. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…ainer-steps Python: Remove imprecise container steps #2

mrigankpawagi · 2026-06-23T10:45:46Z

Good, opened as PR 22038 on github/codeql.

Python: Add repr() sanitizer to py/log-injection query

b4278b4

Co-authored-by: Mrigank Pawagi <25179158+mrigankpawagi@users.noreply.github.com>

github-actions Bot added documentation Improvements or additions to documentation Python labels Jun 10, 2026

mrigankpawagi pushed a commit that referenced this pull request Jun 20, 2026

Merge pull request github#21888 from owen-mc/py/remove-imprecise-cont…

1f91f91

…ainer-steps Python: Remove imprecise container steps #2

mrigankpawagi closed this Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Add repr() sanitizer to py/log-injection query#2

Python: Add repr() sanitizer to py/log-injection query#2
kiro-agent[bot] wants to merge 1 commit into
mainfrom
improve-py-log-injection-repr-sanitizer-v2

kiro-agent Bot commented Jun 10, 2026

Uh oh!

mrigankpawagi commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kiro-agent Bot commented Jun 10, 2026

Summary

Problem

Changes

MRVA Validation

Why this is correct

Uh oh!

mrigankpawagi commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants