Skip to content

Add stdarch-gen-common: shared check/bless harness for generators#2157

Open
xonx4l wants to merge 5 commits into
rust-lang:mainfrom
xonx4l:stdarch-gen-common
Open

Add stdarch-gen-common: shared check/bless harness for generators#2157
xonx4l wants to merge 5 commits into
rust-lang:mainfrom
xonx4l:stdarch-gen-common

Conversation

@xonx4l

@xonx4l xonx4l commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This PR adds a library crate stdarch-gen-common that gives every generator three modes -:

  1. Write - Generator writes files straight into the committed location.
  2. Check - Generator write into a temporary directory . The crate diffs it against committed and fails on any mismatch without touching the working copy .
  3. Bless - Generator writes into a temp directory the crate then mirrors temp over committed. Adding, overwriting, and deleting files as needed.

Mode is picked via the STDARCH_GEN_MODE env var. The harness only touches files the generator explicitly lists as its own so hand-written files in the same directory are safe.

This PR also hooks stdarch-gen-hexagon to use it as the first example.

r? @folkertdev

@Kobzol

Kobzol commented Jun 9, 2026

Copy link
Copy Markdown
Member

For a bit more context, this is done for the stdarch GSoC project, where we try to integrate stdarch's tests in rust-lang/rust's CI. We want to move the blessing logic out of the CI YAML files and into Rust, to make it easier to reuse, and also share code amongst the various generators.

I'll take a look, but also CC @folkertdev , in case you have in mind who owns this code amongst the stdarch maintainers and who could also chime in.

@rustbot rustbot assigned folkertdev and unassigned Kobzol Jun 9, 2026

@Kobzol Kobzol left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating the crate! I left a few comments.

View changes since this review

Comment thread crates/stdarch-gen-common/src/lib.rs
Comment thread crates/stdarch-gen-common/src/lib.rs
Comment thread crates/stdarch-gen-common/src/lib.rs Outdated
Comment thread crates/stdarch-gen-common/src/lib.rs Outdated

@Kobzol Kobzol left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll have to examine how the generator behaves a bit deeper (I'll try to do that on Monday), because I'm still quite surprised from the behavior here 😅 I think that we might wanna do some targeted changes to the generator and the directory layout to make blessing simpler, if the complexity here is caused by the current directory layout.

View changes since this review

Comment thread crates/stdarch-gen-common/src/lib.rs
Comment thread crates/stdarch-gen-common/src/lib.rs Outdated
@xonx4l

xonx4l commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

I think I'll have to examine how the generator behaves a bit deeper (I'll try to do that on Monday), because I'm still quite surprised from the behavior here 😅 I think that we might wanna do some targeted changes to the generator and the directory layout to make blessing simpler, if the complexity here is caused by the current directory layout.

View changes since this review

Yes targeted changes to the generator will likely make this crate much simpler. Btw we are in a fixing loop :)

@folkertdev folkertdev left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems fine as a design. I'd probably just do the other generators in a separate PR using this as a foundation.

View changes since this review

Comment thread crates/stdarch-gen-common/src/lib.rs Outdated
Comment thread crates/stdarch-gen-common/src/lib.rs Outdated
@Kobzol

Kobzol commented Jun 14, 2026

Copy link
Copy Markdown
Member

I looked at the code more and now I understand the issue with blessing; as you said, the mod.rs file is manually written, and we sometimes generate more than one file (e.g. sve/generated.rs and sve/ld_st_tests_aarch64.rs), so we cannot depend on the generated filename to be just generated.rs, and we have to deal with multiple files.

I would suggest the following changes and simplifications:

When an output directory is passed to the generator, it will load all files from that directory, and keep a map of files that are automatically generated. These files will be detected by reading their first line and checking if it begins with // This code is automatically generated. DO NOT MODIFY.. Note that this line should be stored in a string constant and be automatically added to the start of all generated files; this should be done by stdarch-gen-common, but that's for a later PR.

Keep only two modes:

  • Check will generate files into a temporary directory (or even just into a string, it seems like the existing code works with Box<dyn Write>, so it shouldn't be difficult to write to a string instead of having to deal with temporary directories and files) , and compare their contents with the map. If there are any extra or missing files in the map, error out. If the contents do not match, error out.
  • Bless will write the generated output directly into the target directory. Any files from the map that were not (over)written will be deleted.

This assumes that two generators won't ever write into the same directory, but I think that assumption should be fine. In fact I think that it would be easiest to simply move all the generators into a single crate, at which point we could generate everything at once, but let's not complicate it further for now.

Does that make sense to you?

@xonx4l

xonx4l commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@Kobzol Yes , this makes sense to me.

I will refactor as per the idea you suggested as I think is quite good than per file version( especially the header one is a great way to diff) . The arm and loongarch generated files already carry the header // This code is automatically generated. DO NOT MODIFY . So only for the files generated by hexagon generators needs the addition of the header .

And yeah alright two modes look's fine will drop Write mode and keep only Check and Bless as described in the idea.

Thanks for taking the time to dig into this, it's a huge help.

@xonx4l xonx4l requested a review from Kobzol June 16, 2026 14:13
@xonx4l

xonx4l commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

@Kobzol made all the changes that were suggested and discussed . Thank you!

@Kobzol Kobzol left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks great! Left a few minor comments.

On CI, you should modify the check-stdarch-gen job to add the environment variable STDARCH_GEN_MODE=check.

The output could be improved a bit, e.g. the "Output: " path could be normalized and the filename of the mismatched file could be made absolute, to make it easier to find where should the file be located:

Step 3: Generating v64.rs and v128.rs (mode: Check)...
  Output: /projects/personal/rust/stdarch/crates/stdarch-gen-hexagon/../core_arch/src/hexagon/v64.rs
  Output: /projects/personal/rust/stdarch/crates/stdarch-gen-hexagon/../core_arch/src/hexagon/v128.rs
Error: "v128.rs: contents differ"

but those are minor things and I don't think we need to deal with that in this PR.

View changes since this review

Comment thread crates/stdarch-gen-common/src/lib.rs Outdated
Comment thread crates/stdarch-gen-common/src/lib.rs Outdated
Comment thread crates/stdarch-gen-common/src/lib.rs

@Kobzol Kobzol left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks great! I don't have approval rights on this repo, so I'll ask Folkert to take another look.

Edit: nevermind, apparently I do have merge rights 😆 I'll still wait for Folkert to chime in though.

View changes since this review

@Kobzol Kobzol requested a review from folkertdev June 19, 2026 11:53
@xonx4l

xonx4l commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Damn so fast was just going to ping :) , Yeah we need those Folkert opinions and feedbacks.

@folkertdev folkertdev left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, some final small things.

View changes since this review

Comment on lines +155 to +157
let Ok(contents) = fs::read_to_string(entry.path()) else {
continue;
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps a bit wasteful? you can just read the first GENERATED_MARKER.len() bytes and just check those?

Comment on lines +178 to +179
if entry.file_type()?.is_file() {
if let Some(name) = entry.file_name().to_str() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if entry.file_type()?.is_file() {
if let Some(name) = entry.file_name().to_str() {
if entry.file_type()?.is_file()
&& let Some(name) = entry.file_name().to_str() {

you can use let in that position these days.

}),
(false, false) => Ok(()),
(true, true) => {
if fs::read(&gen_path)? != fs::read(&comm_path)? {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably fine for now, but a more robust and efficient approach is to check the file sizes first, and if they are the same, read the files in chunks and compare those.

Ok(())
}

#[cfg(all(test, not(target_os = "ios")))]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment on why ios does not work?

println!(" Output: {}", v128_path.display());
let hexagon_dir = crate_dir.join("../core_arch/src/hexagon");

let mode = Mode::from_env();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment here on what the mode means, e.g.

Suggested change
let mode = Mode::from_env();
// Either "check" to check the output versus the committed output, or "bless" to update the output.
let mode = Mode::from_env();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants