Skip to content

feat(admin): curator job run report and run log download#801

Open
TahaKhan998 wants to merge 1 commit into
CERNDocumentServer:masterfrom
TahaKhan998:feat/harvester-reports-curator
Open

feat(admin): curator job run report and run log download#801
TahaKhan998 wants to merge 1 commit into
CERNDocumentServer:masterfrom
TahaKhan998:feat/harvester-reports-curator

Conversation

@TahaKhan998

Copy link
Copy Markdown

Closes #789

  • when an inspire harvester job finishes we email people. that email again has a “harvester actions” section with two links: one to grab a text log for that run (“download error log”) and one to open harvester reports filtered to that run (“view list of changes”)

  • removed the old “view full details” button from that email — it pointed at the system jobs admin page and a lot of people on the mailing list don’t have access, so it was confusing

  • new page: /administration/harvester-reports/<run_id>/report (curators only). shows the same kind of run summary + log lines as the admin job ui, with colours by level so it’s easier to read than raw text

  • the existing download endpoint still returns a .log file; both that file and the html page use the real invenio jobs log search for the run, not a random audit log export

image image

@TahaKhan998 TahaKhan998 force-pushed the feat/harvester-reports-curator branch from 9de61f4 to 280f3d1 Compare May 14, 2026 21:12
@TahaKhan998 TahaKhan998 force-pushed the feat/harvester-reports-curator branch from 280f3d1 to 5ab7c0e Compare June 8, 2026 09:14
@TahaKhan998 TahaKhan998 force-pushed the feat/harvester-reports-curator branch from fa180d4 to 7d750c2 Compare June 9, 2026 11:23
} No newline at end of file
}

.harvester-report-actions {

@kpsherva kpsherva Jun 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, in css we try to avoid custom styling (with dedicated classes or assigned to html elements), and to rely on the semantic approach (class names related to behaviour and outcome, rather than specific dedicated element or class). For example in this case, the display flex class which you could assign to your element already exists in the theme and could be reused.

This approach helps us keeping the CSS maintenance to minimum and reduces the amount of stylesheets we write. Could you check how you can adapt your changes to our general approach?

Comment on lines +57 to +68
React.useEffect(() => {
const q = currentQueryState.queryString || "";
const fromQuery = extractRunIdFromQuery(q, runs);
if (fromQuery) {
setActiveRunId(fromQuery);
return;
}
if (!q && defaultRun) {
setActiveRunId(defaultRun.id);
executeSearch(defaultRun, "");
}
}, []);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
React.useEffect(() => {
const q = currentQueryState.queryString || "";
const fromQuery = extractRunIdFromQuery(q, runs);
if (fromQuery) {
setActiveRunId(fromQuery);
return;
}
if (!q && defaultRun) {
setActiveRunId(defaultRun.id);
executeSearch(defaultRun, "");
}
}, []);
React.useEffect(() => {
const q = currentQueryState.queryString || "";
const runId = extractRunIdFromQuery(q, runs) || defaultRun.id;
setActiveRunId(runId);
}
if (!q) {
executeSearch(defaultRun, "");
}
}, []);

to improve readability. Why do we need to call execute search if q is not present?

Comment on lines +31 to +35
if isinstance(value, datetime):
dt = (
value.replace(tzinfo=timezone.utc)
if value.tzinfo is None
else value

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this part needed? we handle tz globally now if I remember correctly, no need to account for it in the code

)
else:
try:
dt = datetime.fromisoformat(str(value).replace("Z", "+00:00"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

dt = datetime.fromisoformat(str(value).replace("Z", "+00:00"))
except (ValueError, TypeError):
return str(value)
return format_datetime(dt, "yyyy-MM-dd HH:mm", rebase=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does rebase do? Normally we use other functions to serialize the time, could you check and adapt for consistency?

dt = value
else:
try:
dt = datetime.fromisoformat(str(value).replace("Z", "+00:00"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about timezone

return format_datetime(dt, "yyyy-MM-dd HH:mm", rebase=True)


def _format_log_hit_timestamp(value):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two utility functions look very similar, why do we need both?

run_id = (run_id or "").strip()
if not run_id:
return {"message": "Missing run_id"}, 400
return None, ({"message": "Missing run_id"}, 400)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are specific flask or invenio adapted methods to serialize the REST responses in the correct way, could you check and use them?

def download(self):
"""Download a harvester run's logs as a plain-text ``.log`` file."""
if not curators_permission.can():
return {"message": "Permission denied"}, 403

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this response also could use a response serializer

Comment thread site/cds_rdm/views.py
Comment on lines +62 to +65
return jsonify(body), code
return render_template(
"cds_rdm/administration/harvester_run_report.html",
**ctx,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a bit of a mix of concepts here. In case of an error, you return a json, but in case everything is goes well, then we have a jinja template returned. Since this is the same view, an the UI side (not api) please use an error handler to return an error page - to keep the consistency of the returned formats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feedback: harvester e-mail

2 participants