Sp7 stats fixes#5
Conversation
Comment out specific volume mounts for SSL certificates and add new volume mounts for Let's Encrypt and access logs.
📝 WalkthroughWalkthroughThis PR updates the sp7-stats server for Ubuntu 24: ChangesInfrastructure: TLS Automation and Nginx Routing
PHP Application: Path Fixes and Stats Logic
🚥 Pre-merge checks | ✅ 3 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docker-compose.yml`:
- Around line 38-46: The certbot container in docker-compose only runs the
certbot renew loop, so it cannot create the first certificate on a fresh host.
Update the certbot service entrypoint to include an initial bootstrap step in
addition to the existing renew behavior, using certbot certonly with the webroot
path and the required domains before nginx depends on the live certificate
files. Keep the existing renew/deploy-hook logic in the certbot service, and
reference the certbot service and its entrypoint command when making the change.
- Around line 44-46: The certbot service is still mounting the Docker socket and
using a hard-coded nginx container name in the deploy hook, which should be
removed. Update the certbot entrypoint so certificate renewal triggers a
host-managed reload, a restricted socket proxy, or a dedicated reload sidecar
instead of calling docker exec directly. Keep the change localized to the
certbot service configuration and ensure any reload mechanism does not depend on
sp7-stats-nginx-1 or unrestricted socket access.
In `@sp7-stats/config/nginx.conf`:
- Line 55: The nginx access_log configuration is too broad because it mirrors
all HTTPS traffic into the external host-mounted log instead of only the stats
capture endpoint. Update the server/location block around access_log so that the
`/capture`-scoped logging remains enabled in the `capture` handling path, and
remove or disable the generic external access_log entry from the main HTTPS
scope. Use the existing nginx config sections and the `/capture` location as the
unique anchors when making the change.
In `@sp7-stats/index.php`:
- Around line 188-223: The report rendering in the
institution/disciplines/collections list is outputting log-derived values
directly into HTML text and href attributes, which can allow stored XSS. Update
the echo paths in this block (and the matching render block later in the file)
to HTML-escape visible content like institution, discipline, collection,
version, domain, ISA number, and IP values, and use attribute-safe encoding for
query-string parameters before inserting them into the LINK-based URLs.
- Around line 402-409: The PHP-to-JavaScript value injection in the stats script
is unsafe because $search_query is written directly into a quoted JS string,
which can break out of the context. Update the variable assignments in the same
block that sets count, search_query, first_day, last_day, and view to emit PHP
values with json_encode so the JavaScript receives properly escaped literals.
Keep the fix localized around the existing JS variable declarations in the stats
page.
- Around line 20-32: Validate the parsed date before setting the day bounds in
the date-handling block around the $_GET['date'] logic. In index.php, check the
result from date_parse_from_format and only assign $first_day and $last_day
after confirming there are no parse errors or warnings and strtotime(...)
returned a valid timestamp. If parsing fails, leave the existing bounds
unchanged instead of letting the range fall back to day 0.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c92949e2-5cbb-4bf1-8980-2ee4fa893d2f
📒 Files selected for processing (6)
docker-compose.ymlsp7-stats/components/institutions.phpsp7-stats/config/nginx.confsp7-stats/cron/refresh_data.phpsp7-stats/index.phpsp7-stats/refresh_data/index.php
| certbot: | ||
| image: certbot/certbot | ||
| container_name: certbot | ||
| volumes: | ||
| - '/etc/letsencrypt:/etc/letsencrypt' | ||
| - './sp7-stats/:/var/www/certbot/' | ||
| - '/var/run/docker.sock:/var/run/docker.sock' | ||
| entrypoint: > | ||
| sh -c "trap exit TERM; while :; do certbot renew --webroot -w /var/www/certbot --deploy-hook 'docker exec sp7-stats-nginx-1 nginx -s reload'; sleep 12h; done" |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift
Add an initial certificate bootstrap path.
Line 46 only runs certbot renew, which requires an existing cert. On a fresh Ubuntu 24 host with empty /etc/letsencrypt, nginx will try to load the live cert files before certbot can create them, so startup can fail. Add a one-time certbot certonly --webroot ... -d stats.specifycloud.org -d sp7-stats.specifycloud.org bootstrap or a documented temp-cert flow before nginx starts.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docker-compose.yml` around lines 38 - 46, The certbot container in
docker-compose only runs the certbot renew loop, so it cannot create the first
certificate on a fresh host. Update the certbot service entrypoint to include an
initial bootstrap step in addition to the existing renew behavior, using certbot
certonly with the webroot path and the required domains before nginx depends on
the live certificate files. Keep the existing renew/deploy-hook logic in the
certbot service, and reference the certbot service and its entrypoint command
when making the change.
| - '/var/run/docker.sock:/var/run/docker.sock' | ||
| entrypoint: > | ||
| sh -c "trap exit TERM; while :; do certbot renew --webroot -w /var/www/certbot --deploy-hook 'docker exec sp7-stats-nginx-1 nginx -s reload'; sleep 12h; done" |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the hard-coded deploy-hook target used by certbot.
# Expect: the running nginx container name exactly matches sp7-stats-nginx-1.
docker compose ps nginx
docker container inspect sp7-stats-nginx-1 --format '{{.Name}} {{.State.Running}}'Repository: specify/sp7-stats
Length of output: 239
🏁 Script executed:
#!/bin/bash
set -euo pipefail
git ls-files docker-compose.yml
wc -l docker-compose.yml
sed -n '1,140p' docker-compose.yml
printf '\n-- references --\n'
rg -n "sp7-stats-nginx-1|docker.sock|certbot renew|nginx -s reload|certbot" -S .Repository: specify/sp7-stats
Length of output: 2290
🏁 Script executed:
#!/bin/bash
set -euo pipefail
wc -l docker-compose.yml
sed -n '1,140p' docker-compose.yml
printf '\n-- references --\n'
rg -n "sp7-stats-nginx-1|docker.sock|certbot renew|nginx -s reload|certbot" -S .Repository: specify/sp7-stats
Length of output: 2271
Remove Docker socket access from certbot. The deploy hook also hard-codes sp7-stats-nginx-1; use a host-managed reload, a restricted socket proxy, or a dedicated reload sidecar instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docker-compose.yml` around lines 44 - 46, The certbot service is still
mounting the Docker socket and using a hard-coded nginx container name in the
deploy hook, which should be removed. Update the certbot entrypoint so
certificate renewal triggers a host-managed reload, a restricted socket proxy,
or a dedicated reload sidecar instead of calling docker exec directly. Keep the
change localized to the certbot service configuration and ensure any reload
mechanism does not depend on sp7-stats-nginx-1 or unrestricted socket access.
|
|
||
| include auth.conf; | ||
| include nginx-with-github-auth/server.conf; | ||
| access_log /var/log/nginx/external/access.log combined; |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Keep the external stats log scoped to /capture.
Line 55 mirrors every HTTPS request into the host-mounted external log, while raw_data.php only consumes /capture? entries. This persists unrelated app/auth URLs and query strings and can bloat the stats input. Keep the external log only on Line 69.
Proposed change
- access_log /var/log/nginx/external/access.log combined;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| access_log /var/log/nginx/external/access.log combined; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sp7-stats/config/nginx.conf` at line 55, The nginx access_log configuration
is too broad because it mirrors all HTTPS traffic into the external host-mounted
log instead of only the stats capture endpoint. Update the server/location block
around access_log so that the `/capture`-scoped logging remains enabled in the
`capture` handling path, and remove or disable the generic external access_log
entry from the main HTTPS scope. Use the existing nginx config sections and the
`/capture` location as the unique anchors when making the change.
| if(array_key_exists('date',$_GET)){ | ||
| $_GET['date'] = urldecode($_GET['date']); | ||
| $_GET['date'] = urldecode($_GET['date']); | ||
|
|
||
| $date_info = date_parse_from_format(YEAR_FORMATTER.' '.MONTH_FORMATTER.' '.DAY_FORMATTER,$_GET['date']); | ||
| $date_info = date_parse_from_format(YEAR_FORMATTER.' '.MONTH_FORMATTER.' '.DAY_FORMATTER,$_GET['date']); | ||
|
|
||
| if(strlen($date_info['month'])==1) | ||
| $date_info['month'] = '0'.$date_info['month']; | ||
| if(strlen($date_info['month'])==1) | ||
| $date_info['month'] = '0'.$date_info['month']; | ||
|
|
||
| if(strlen($date_info['day'])==1) | ||
| $date_info['day'] = '0'.$date_info['day']; | ||
| if(strlen($date_info['day'])==1) | ||
| $date_info['day'] = '0'.$date_info['day']; | ||
|
|
||
| $unix_time = strtotime($date_info['month'].'/'.$date_info['day'].'/'.$date_info['year']); | ||
| $first_day = $last_day = intval($unix_time/86400); | ||
| $unix_time = strtotime($date_info['month'].'/'.$date_info['day'].'/'.$date_info['year']); | ||
| $first_day = $last_day = intval($unix_time/86400); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Validate parsed date before assigning day bounds.
If parsing fails, strtotime(...) can evaluate to false and Line 32 silently sets the range to day 0. Keep existing bounds when parse errors/warnings exist.
Proposed fix
if(array_key_exists('date',$_GET)){
$_GET['date'] = urldecode($_GET['date']);
$date_info = date_parse_from_format(YEAR_FORMATTER.' '.MONTH_FORMATTER.' '.DAY_FORMATTER,$_GET['date']);
+ if (
+ $date_info['error_count'] === 0 &&
+ $date_info['warning_count'] === 0 &&
+ checkdate((int)$date_info['month'], (int)$date_info['day'], (int)$date_info['year'])
+ ) {
+ $unix_time = strtotime($date_info['month'].'/'.$date_info['day'].'/'.$date_info['year']);
+ $first_day = $last_day = intval($unix_time/86400);
+ }
-
- if(strlen($date_info['month'])==1)
- $date_info['month'] = '0'.$date_info['month'];
-
- if(strlen($date_info['day'])==1)
- $date_info['day'] = '0'.$date_info['day'];
-
- $unix_time = strtotime($date_info['month'].'/'.$date_info['day'].'/'.$date_info['year']);
- $first_day = $last_day = intval($unix_time/86400);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if(array_key_exists('date',$_GET)){ | |
| $_GET['date'] = urldecode($_GET['date']); | |
| $_GET['date'] = urldecode($_GET['date']); | |
| $date_info = date_parse_from_format(YEAR_FORMATTER.' '.MONTH_FORMATTER.' '.DAY_FORMATTER,$_GET['date']); | |
| $date_info = date_parse_from_format(YEAR_FORMATTER.' '.MONTH_FORMATTER.' '.DAY_FORMATTER,$_GET['date']); | |
| if(strlen($date_info['month'])==1) | |
| $date_info['month'] = '0'.$date_info['month']; | |
| if(strlen($date_info['month'])==1) | |
| $date_info['month'] = '0'.$date_info['month']; | |
| if(strlen($date_info['day'])==1) | |
| $date_info['day'] = '0'.$date_info['day']; | |
| if(strlen($date_info['day'])==1) | |
| $date_info['day'] = '0'.$date_info['day']; | |
| $unix_time = strtotime($date_info['month'].'/'.$date_info['day'].'/'.$date_info['year']); | |
| $first_day = $last_day = intval($unix_time/86400); | |
| $unix_time = strtotime($date_info['month'].'/'.$date_info['day'].'/'.$date_info['year']); | |
| $first_day = $last_day = intval($unix_time/86400); | |
| if(array_key_exists('date',$_GET)){ | |
| $_GET['date'] = urldecode($_GET['date']); | |
| $date_info = date_parse_from_format(YEAR_FORMATTER.' '.MONTH_FORMATTER.' '.DAY_FORMATTER,$_GET['date']); | |
| if ( | |
| $date_info['error_count'] === 0 && | |
| $date_info['warning_count'] === 0 && | |
| checkdate((int)$date_info['month'], (int)$date_info['day'], (int)$date_info['year']) | |
| ) { | |
| $unix_time = strtotime($date_info['month'].'/'.$date_info['day'].'/'.$date_info['year']); | |
| $first_day = $last_day = intval($unix_time/86400); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sp7-stats/index.php` around lines 20 - 32, Validate the parsed date before
setting the day bounds in the date-handling block around the $_GET['date']
logic. In index.php, check the result from date_parse_from_format and only
assign $first_day and $last_day after confirming there are no parse errors or
warnings and strtotime(...) returned a valid timestamp. If parsing fails, leave
the existing bounds unchanged instead of letting the range fall back to day 0.
| echo '<li>' . urldecode($institution) . '<ul>'; | ||
|
|
||
| foreach($disciplines as $discipline => $collections){ | ||
| foreach($disciplines as $discipline => $collections){ | ||
|
|
||
| echo '<li>' . urldecode($discipline) . '<ul>'; | ||
| echo '<li>' . urldecode($discipline) . '<ul>'; | ||
|
|
||
| foreach($collections as $collection => $data){ | ||
| foreach($collections as $collection => $data){ | ||
|
|
||
| echo '<li data-reports_count="'.$data['count'].'"> | ||
| <a href="'.LINK.'institution/?institution='.$institution.'&discipline='.$discipline.'&collection='.$collection.'&year='.$year.'&month='.$month.'">'.urldecode($collection).'</a> ['.$data['count'].'] | ||
| <br>Specify 7 versions:<ul>'; | ||
| echo '<li data-reports_count="'.$data['count'].'"> | ||
| <a href="'.LINK.'institution/?institution='.$institution.'&discipline='.$discipline.'&collection='.$collection.'&year='.$year.'&month='.$month.'">'.urldecode($collection).'</a> ['.$data['count'].'] | ||
| <br>Specify 7 versions:<ul>'; | ||
|
|
||
| foreach($data['sp7_version'] as $version) | ||
| echo '<li>' . $version . '</li>'; | ||
| foreach($data['sp7_version'] as $version) | ||
| echo '<li>' . $version . '</li>'; | ||
|
|
||
| echo '</ul>Specify 6 versions:<ul>'; | ||
| echo '</ul>Specify 6 versions:<ul>'; | ||
|
|
||
| foreach($data['sp6_version'] as $version) | ||
| echo '<li>' . $version . '</li>'; | ||
| foreach($data['sp6_version'] as $version) | ||
| echo '<li>' . $version . '</li>'; | ||
|
|
||
|
|
||
| echo '</ul>Domains:<ul>'; | ||
| echo '</ul>Domains:<ul>'; | ||
|
|
||
| foreach($data['domain'] as $domain) | ||
| echo '<li><a href="https://' . $domain . '" target="_blank">' . $domain . '</a></li>'; | ||
| foreach($data['domain'] as $domain) | ||
| echo '<li><a href="https://' . $domain . '" target="_blank">' . $domain . '</a></li>'; | ||
|
|
||
| echo '</ul>'; | ||
| echo '</ul>'; | ||
|
|
||
| if(count($data['isa_number']) > 0) | ||
| echo 'ISA Numbers: ' . implode(', ', $data['isa_number']).'<br>'; | ||
| if(count($data['isa_number']) > 0) | ||
| echo 'ISA Numbers: ' . implode(', ', $data['isa_number']).'<br>'; | ||
|
|
||
| echo 'IP Addresses:<ul>'; | ||
| echo 'IP Addresses:<ul>'; | ||
|
|
||
| foreach($data['ip_address'] as $ip_address) | ||
| echo '<li><a href="' . LINK . 'ip_info/?ip=' . $ip_address . '" target="_blank">' . $ip_address . '</a></li>'; | ||
| foreach($data['ip_address'] as $ip_address) | ||
| echo '<li><a href="' . LINK . 'ip_info/?ip=' . $ip_address . '" target="_blank">' . $ip_address . '</a></li>'; | ||
|
|
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Escape log-derived values in HTML text and attributes.
Values from capture logs are rendered raw in text and href attributes here. This enables stored XSS when malicious values are ingested.
Proposed fix (pattern)
+<?php
+function h($v){ return htmlspecialchars((string)$v, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8'); }
+function u($v){ return rawurlencode((string)$v); }
+?>
-echo '<li>' . urldecode($institution) . '<ul>';
+echo '<li>' . h(urldecode($institution)) . '<ul>';
-echo '<a href="'.LINK.'institution/?institution='.$institution.'&discipline='.$discipline.'&collection='.$collection.'...">'
+echo '<a href="'.LINK.'institution/?institution='.u($institution).'&discipline='.u($discipline).'&collection='.u($collection).'...">'
-echo '<li><a href="https://' . $domain . '" target="_blank">' . $domain . '</a></li>';
+echo '<li><a href="https://' . h($domain) . '" target="_blank">' . h($domain) . '</a></li>';Also applies to: 325-345
🧰 Tools
🪛 PHPStan (2.2.2)
[error] 197-197: Constant LINK not found.
Learn more at https://phpstan.org/user-guide/discovering-symbols
(constant.notFound)
[error] 222-222: Constant LINK not found.
Learn more at https://phpstan.org/user-guide/discovering-symbols
(constant.notFound)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sp7-stats/index.php` around lines 188 - 223, The report rendering in the
institution/disciplines/collections list is outputting log-derived values
directly into HTML text and href attributes, which can allow stored XSS. Update
the echo paths in this block (and the matching render block later in the file)
to HTML-escape visible content like institution, discipline, collection,
version, domain, ISA number, and IP values, and use attribute-safe encoding for
query-string parameters before inserting them into the LINK-based URLs.
| let count = '<?=$times?>'; | ||
| let search_query = '<?=$search_query?>'; | ||
|
|
||
| const first_day = '<?=$first_day?>'; | ||
| const last_day = '<?=$last_day?>'; | ||
| const view = '<?=$view?>'; | ||
| const first_day = '<?=$first_day?>'; | ||
| const last_day = '<?=$last_day?>'; | ||
| const view = '<?=$view?>'; | ||
|
|
||
| const link = '<?=LINK?>?'; <?php | ||
| const link = '<?=LINK?>?'; <?php |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Use json_encode for PHP-to-JS value injection.
Line 403 inserts $search_query directly into a quoted JS string. A crafted value can break script context and execute code.
Proposed fix
- let count = '<?=$times?>';
- let search_query = '<?=$search_query?>';
-
- const first_day = '<?=$first_day?>';
- const last_day = '<?=$last_day?>';
- const view = '<?=$view?>';
-
- const link = '<?=LINK?>?';
+ let count = <?=json_encode((int)$times)?>;
+ let search_query = <?=json_encode((string)$search_query, JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_APOS|JSON_HEX_QUOT)?>;
+
+ const first_day = <?=json_encode((int)$first_day)?>;
+ const last_day = <?=json_encode((int)$last_day)?>;
+ const view = <?=json_encode((string)$view)?>;
+
+ const link = <?=json_encode(LINK . '?')?>;🧰 Tools
🪛 PHPStan (2.2.2)
[error] 409-409: Constant LINK not found.
Learn more at https://phpstan.org/user-guide/discovering-symbols
(constant.notFound)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sp7-stats/index.php` around lines 402 - 409, The PHP-to-JavaScript value
injection in the stats script is unsafe because $search_query is written
directly into a quoted JS string, which can break out of the context. Update the
variable assignments in the same block that sets count, search_query, first_day,
last_day, and view to emit PHP values with json_encode so the JavaScript
receives properly escaped literals. Keep the fix localized around the existing
JS variable declarations in the stats page.
Fixes #6
Get sp7-stats server working with new changes to nginx-with-github-auth.
Testing:
TBD
Summary by CodeRabbit
New Features
Bug Fixes