Python: Add repr() sanitizer to py/log-injection query#2
Closed
kiro-agent[bot] wants to merge 1 commit into
Closed
Python: Add repr() sanitizer to py/log-injection query#2kiro-agent[bot] wants to merge 1 commit into
kiro-agent[bot] wants to merge 1 commit into
Conversation
Co-authored-by: Mrigank Pawagi <25179158+mrigankpawagi@users.noreply.github.com>
mrigankpawagi
pushed a commit
that referenced
this pull request
Jun 20, 2026
When CODEQL_EXTRACTOR_GO_OPTION_EXTRACT_TESTS=true is set, the Go extractor was incorrectly skipping internal test files (package foo) at repository roots when the project contains nested test packages. Root Cause: The extractor selected package variants by longest ID string, but this heuristic fails when nested packages have tests. For a package like "github.com/go-git/go-git/v6", packages.Load returns multiple variants: 1. "github.com/go-git/go-git/v6" (19 files, production only) 2. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6.test]" (39 files, production + 20 root tests) ← Should select this 3. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6/plumbing/format/packfile.test]" (19 files, test dependency) ← Was incorrectly selected (longest string) The old logic selected variant #3 (76 chars) over #2 (68 chars), causing 20 root test files to be missing from the database. Fix: Replace string length comparison with a better heuristic that prefers: 1. Exact test packages (e.g., "pkg [pkg.test]") over nested dependencies 2. Packages with more Syntax nodes (more files to extract) 3. String length as a tiebreaker This ensures the extractor selects the variant with the most complete test coverage, particularly for root-level internal tests. Testing: - Added comprehensive unit tests covering the selection logic - Tests simulate the real-world go-git scenario - All tests pass Impact: Root-level external tests (package foo_test) were already extracted correctly. This fix ensures internal tests (package foo) at the root are now also extracted when they exist alongside nested test packages. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
mrigankpawagi
pushed a commit
that referenced
this pull request
Jun 20, 2026
…ainer-steps Python: Remove imprecise container steps #2
Owner
|
Good, opened as PR 22038 on github/codeql. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request was created by @kiro-agent on behalf of @mrigankpawagi 👻
Comment with /kiro fix to address specific feedback or /kiro all to address everything.
Learn about Kiro Web
Summary
This PR improves the
py/log-injectionquery by recognizingrepr()and the%rformat specifier as sanitizers for log injection.Problem
MRVA (Multi-Repository Variant Analysis) on the top-100 Python repositories revealed many false positives where user-controlled values are logged safely via
repr()or%r. Sincerepr()escapes special characters like newlines (converting\nto the literal string\\n), it effectively prevents log injection attacks. The query was flagging these safe patterns unnecessarily.Changes
New
ReprCallSanitizerclass - Recognizes calls to the built-inrepr()function as sanitizers, sincerepr()escapes special characters such as newlines.New
isReprFormattedLoggingArgpredicate - Excludes arguments that are formatted with%rin the logging format string from being considered sinks. When a logging call likelogging.warning('User: %r', name)is used, the%rformat specifier appliesrepr()internally, which escapes newlines.Test cases - Added
good_repr1()(usingrepr(name)in string concatenation) andgood_repr2()(using%rformat specifier) to the existing test file.Change notes - Added under both
python/ql/lib/change-notes/andpython/ql/src/change-notes/.MRVA Validation
repr()or%rwas used to sanitize the input.Why this is correct
Python's
repr()function returns a string representation that escapes special characters:Since log injection relies on injecting newline characters to forge log entries,
repr()is an effective sanitizer. The%rformat specifier in logging calls appliesrepr()automatically.