feat(admin): curator job run report and run log download by TahaKhan998 · Pull Request #801 · CERNDocumentServer/cds-rdm

TahaKhan998 · 2026-05-13T10:57:20Z

Closes #789

when an inspire harvester job finishes we email people. that email again has a “harvester actions” section with two links: one to grab a text log for that run (“download error log”) and one to open harvester reports filtered to that run (“view list of changes”)
removed the old “view full details” button from that email — it pointed at the system jobs admin page and a lot of people on the mailing list don’t have access, so it was confusing
new page: /administration/harvester-reports/<run_id>/report (curators only). shows the same kind of run summary + log lines as the admin job ui, with colours by level so it’s easier to read than raw text
the existing download endpoint still returns a .log file; both that file and the html page use the real invenio jobs log search for the run, not a random audit log export

kpsherva · 2026-06-16T11:49:08Z

-}
+}
+
+.harvester-report-actions {


in general, in css we try to avoid custom styling (with dedicated classes or assigned to html elements), and to rely on the semantic approach (class names related to behaviour and outcome, rather than specific dedicated element or class). For example in this case, the display flex class which you could assign to your element already exists in the theme and could be reused.

This approach helps us keeping the CSS maintenance to minimum and reduces the amount of stylesheets we write. Could you check how you can adapt your changes to our general approach?

kpsherva · 2026-06-16T12:12:58Z

+  React.useEffect(() => {
+    const q = currentQueryState.queryString || "";
+    const fromQuery = extractRunIdFromQuery(q, runs);
+    if (fromQuery) {
+      setActiveRunId(fromQuery);
+      return;
+    }
+    if (!q && defaultRun) {
+      setActiveRunId(defaultRun.id);
+      executeSearch(defaultRun, "");
+    }
+  }, []);


Suggested change

React.useEffect(() => {

const q = currentQueryState.queryString || "";

const fromQuery = extractRunIdFromQuery(q, runs);

if (fromQuery) {

setActiveRunId(fromQuery);

return;

}

if (!q && defaultRun) {

setActiveRunId(defaultRun.id);

executeSearch(defaultRun, "");

}

}, []);

React.useEffect(() => {

const q = currentQueryState.queryString || "";

const runId = extractRunIdFromQuery(q, runs) || defaultRun.id;

setActiveRunId(runId);

}

if (!q) {

executeSearch(defaultRun, "");

}

}, []);

to improve readability. Why do we need to call execute search if q is not present?

kpsherva · 2026-06-16T12:14:49Z

+    if isinstance(value, datetime):
+        dt = (
+            value.replace(tzinfo=timezone.utc)
+            if value.tzinfo is None
+            else value


why is this part needed? we handle tz globally now if I remember correctly, no need to account for it in the code

kpsherva · 2026-06-16T12:15:07Z

+        )
+    else:
+        try:
+            dt = datetime.fromisoformat(str(value).replace("Z", "+00:00"))


kpsherva · 2026-06-16T12:16:16Z

+            dt = datetime.fromisoformat(str(value).replace("Z", "+00:00"))
+        except (ValueError, TypeError):
+            return str(value)
+    return format_datetime(dt, "yyyy-MM-dd HH:mm", rebase=True)


what does rebase do? Normally we use other functions to serialize the time, could you check and adapt for consistency?

kpsherva · 2026-06-16T12:16:33Z

+        dt = value
+    else:
+        try:
+            dt = datetime.fromisoformat(str(value).replace("Z", "+00:00"))


same comment about timezone

kpsherva · 2026-06-16T12:17:01Z

+    return format_datetime(dt, "yyyy-MM-dd HH:mm", rebase=True)
+
+
+def _format_log_hit_timestamp(value):


these two utility functions look very similar, why do we need both?

kpsherva · 2026-06-16T12:19:08Z

+        run_id = (run_id or "").strip()
        if not run_id:
-            return {"message": "Missing run_id"}, 400
+            return None, ({"message": "Missing run_id"}, 400)


there are specific flask or invenio adapted methods to serialize the REST responses in the correct way, could you check and use them?

kpsherva · 2026-06-16T12:21:15Z

+    def download(self):
+        """Download a harvester run's logs as a plain-text ``.log`` file."""
+        if not curators_permission.can():
+            return {"message": "Permission denied"}, 403


this response also could use a response serializer

kpsherva · 2026-06-16T12:25:20Z

+            return jsonify(body), code
+        return render_template(
+            "cds_rdm/administration/harvester_run_report.html",
+            **ctx,


there is a bit of a mix of concepts here. In case of an error, you return a json, but in case everything is goes well, then we have a jinja template returned. Since this is the same view, an the UI side (not api) please use an error handler to return an error page - to keep the consistency of the returned formats

kpsherva reviewed May 13, 2026

View reviewed changes

Comment thread site/cds_rdm/assets/semantic-ui/js/cds_rdm/administration/harvesterReports/DownloadButton.js Outdated

kpsherva reviewed May 13, 2026

View reviewed changes

Comment thread site/cds_rdm/assets/semantic-ui/js/cds_rdm/administration/harvesterReports/DownloadButton.js Outdated

TahaKhan998 force-pushed the feat/harvester-reports-curator branch from 9de61f4 to 280f3d1 Compare May 14, 2026 21:12

TahaKhan998 force-pushed the feat/harvester-reports-curator branch from 280f3d1 to 5ab7c0e Compare June 8, 2026 09:14

feat(admin): curator job run report and run log download

7d750c2

TahaKhan998 force-pushed the feat/harvester-reports-curator branch from fa180d4 to 7d750c2 Compare June 9, 2026 11:23

kpsherva reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(admin): curator job run report and run log download#801

feat(admin): curator job run report and run log download#801
TahaKhan998 wants to merge 1 commit into
CERNDocumentServer:masterfrom
TahaKhan998:feat/harvester-reports-curator

TahaKhan998 commented May 13, 2026

Uh oh!

Uh oh!

Uh oh!

kpsherva Jun 16, 2026 •

edited

Loading

Uh oh!

kpsherva Jun 16, 2026

Uh oh!

kpsherva Jun 16, 2026

Uh oh!

kpsherva Jun 16, 2026

Uh oh!

kpsherva Jun 16, 2026

Uh oh!

kpsherva Jun 16, 2026

Uh oh!

kpsherva Jun 16, 2026

Uh oh!

kpsherva Jun 16, 2026

Uh oh!

kpsherva Jun 16, 2026

Uh oh!

kpsherva Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return format_datetime(dt, "yyyy-MM-dd HH:mm", rebase=True)


		def _format_log_hit_timestamp(value):

Conversation

TahaKhan998 commented May 13, 2026

Uh oh!

Uh oh!

Uh oh!

kpsherva Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kpsherva Jun 16, 2026 •

edited

Loading