Skip to content

Python: Add repr() sanitizer to py/log-injection query#2

Closed
kiro-agent[bot] wants to merge 1 commit into
mainfrom
improve-py-log-injection-repr-sanitizer-v2
Closed

Python: Add repr() sanitizer to py/log-injection query#2
kiro-agent[bot] wants to merge 1 commit into
mainfrom
improve-py-log-injection-repr-sanitizer-v2

Conversation

@kiro-agent

@kiro-agent kiro-agent Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request was created by @kiro-agent on behalf of @mrigankpawagi 👻

Comment with /kiro fix to address specific feedback or /kiro all to address everything.
Learn about Kiro Web


Summary

This PR improves the py/log-injection query by recognizing repr() and the %r format specifier as sanitizers for log injection.

Problem

MRVA (Multi-Repository Variant Analysis) on the top-100 Python repositories revealed many false positives where user-controlled values are logged safely via repr() or %r. Since repr() escapes special characters like newlines (converting \n to the literal string \\n), it effectively prevents log injection attacks. The query was flagging these safe patterns unnecessarily.

Changes

  1. New ReprCallSanitizer class - Recognizes calls to the built-in repr() function as sanitizers, since repr() escapes special characters such as newlines.

  2. New isReprFormattedLoggingArg predicate - Excludes arguments that are formatted with %r in the logging format string from being considered sinks. When a logging call like logging.warning('User: %r', name) is used, the %r format specifier applies repr() internally, which escapes newlines.

  3. Test cases - Added good_repr1() (using repr(name) in string concatenation) and good_repr2() (using %r format specifier) to the existing test file.

  4. Change notes - Added under both python/ql/lib/change-notes/ and python/ql/src/change-notes/.

MRVA Validation

  • Before: 335 alerts across top-100 Python repositories
  • After: 196 alerts (41% reduction in false positives)
  • No true positives were lost - all eliminated alerts were confirmed false positives where repr() or %r was used to sanitize the input.

Why this is correct

Python's repr() function returns a string representation that escapes special characters:

>>> repr("hello\nworld")
"'hello\\nworld'"

Since log injection relies on injecting newline characters to forge log entries, repr() is an effective sanitizer. The %r format specifier in logging calls applies repr() automatically.

Co-authored-by: Mrigank Pawagi <25179158+mrigankpawagi@users.noreply.github.com>
@github-actions github-actions Bot added documentation Improvements or additions to documentation Python labels Jun 10, 2026
mrigankpawagi pushed a commit that referenced this pull request Jun 20, 2026
When CODEQL_EXTRACTOR_GO_OPTION_EXTRACT_TESTS=true is set, the Go
extractor was incorrectly skipping internal test files (package foo)
at repository roots when the project contains nested test packages.

Root Cause:
The extractor selected package variants by longest ID string, but this
heuristic fails when nested packages have tests. For a package like
"github.com/go-git/go-git/v6", packages.Load returns multiple variants:

1. "github.com/go-git/go-git/v6" (19 files, production only)
2. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6.test]"
   (39 files, production + 20 root tests) ← Should select this
3. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6/plumbing/format/packfile.test]"
   (19 files, test dependency) ← Was incorrectly selected (longest string)

The old logic selected variant #3 (76 chars) over #2 (68 chars),
causing 20 root test files to be missing from the database.

Fix:
Replace string length comparison with a better heuristic that prefers:
1. Exact test packages (e.g., "pkg [pkg.test]") over nested dependencies
2. Packages with more Syntax nodes (more files to extract)
3. String length as a tiebreaker

This ensures the extractor selects the variant with the most complete
test coverage, particularly for root-level internal tests.

Testing:
- Added comprehensive unit tests covering the selection logic
- Tests simulate the real-world go-git scenario
- All tests pass

Impact:
Root-level external tests (package foo_test) were already extracted
correctly. This fix ensures internal tests (package foo) at the root
are now also extracted when they exist alongside nested test packages.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
mrigankpawagi pushed a commit that referenced this pull request Jun 20, 2026
…ainer-steps

Python: Remove imprecise container steps #2
@mrigankpawagi

Copy link
Copy Markdown
Owner

Good, opened as PR 22038 on github/codeql.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants