Pauldorsch/reduce linux scan time#1826
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Linux container scanning time by deduplicating expensive Docker/Syft operations across concurrent callers, so the same image/scope work isn’t repeated unnecessarily within a single process run.
Changes:
- Added a static cache in
LinuxScannerto share in-flight (and completed) Syft runs between callers with identical(source, scope, binds). - Added a static cache in
DockerServiceto share in-flight (and completed) image pulls between callers for the same image. - Added extra debug logging around base image resolution and Docker operations, and reset the Syft cache in tests for isolation.
Show a summary per file
| File | Description |
|---|---|
| test/Microsoft.ComponentDetection.Detectors.Tests/LinuxScannerTests.cs | Resets the new Syft run cache for test isolation. |
| src/Microsoft.ComponentDetection.Detectors/linux/LinuxScanner.cs | Adds Syft run deduplication cache and exposes a test-only reset method. |
| src/Microsoft.ComponentDetection.Detectors/linux/LinuxContainerDetector.cs | Adds debug logging for resolved base image details. |
| src/Microsoft.ComponentDetection.Common/DockerService.cs | Adds image pull deduplication cache and adjusts logging/binds. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 5
|
👋 Hi! It looks like you modified some files in the
If none of the above scenarios apply, feel free to ignore this comment 🙂 |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
None of these apply, the Detectors return the exact same results |
3 Improvements to Linux Scanning:
Tempdirectory binding from the Syft scansSyftwith the same parameters one timeTesting the de-duplication:
dotnet run --project src/Microsoft.ComponentDetection/Microsoft.ComponentDetection.csproj scan --SourceDirectory C:\src\temp --DockerImagesToScan "alpine:3.18,ubuntu:22.04,test-with-base:latest,node:18-bullseye" --LogLevel DebugMany images, and since there are 2 Linux scanner happening at the same time, we hit the logic that keeps us from trying to pull the same one more than once:
we can also confirm that we only see the syft scan happening once per image sha:

Testing the performance:
dotnet run --project src/Microsoft.ComponentDetection/Microsoft.ComponentDetection.csproj scan --SourceDirectory C:\src\temp --DockerImagesToScan "node:18-bullseye"Before, ran in over

400sfornode:18-bullseyeimage (and we also see 2 containers starting up to syft scan this single image):After, ran in under

60sfornode:18-bullseyeimage (and we only see a single container start up to run the syft scan):