Skip to content

Hermit: Fix readdir for hermit >= v0.12#8

Open
jounathaen wants to merge 5 commits into
mainfrom
hermit-readdir
Open

Hermit: Fix readdir for hermit >= v0.12#8
jounathaen wants to merge 5 commits into
mainfrom
hermit-readdir

Conversation

@jounathaen

Copy link
Copy Markdown
Member

On Hermit, the sys_getdents implementation was flawed, as it wasn't stateful. So the caller had to provide a buffer large enough to hold all dirents in advance. Else it returns EINVAL. The current workaround used in Rust is to grow the Buffer until it is sufficiently large.

hermit-os/kernel#1738 fixes the behavior of sys_getdents. Hermit now keeps track of the directory position and each new call to the sys_getdentsprovides the next entries.

But this results in the current implementation in Rust Std being flawed. Because Hermit does now return a proper readcount, even if not all dirent64s are read, read_dir will only ever iterate over the first 512 bytes of directory entries.

This PR fixes this with a proper implementation of the getdents64 algorithm.

Implementation remark:

  • The previous separation of ReadDir and ReadDirInner is unnecessary, as unlike stated in a comment, DirEntry does not contain a reference to ReadDirInner.

Compatibility:

  • This implementation contains a compatibility hack, so that to Hermit versions < 0.12 don't break. It's debatable if that could be removed, since Hermit as a research OS does not really have software out in the field that depends on compatibility.

@mkroening mkroening self-requested a review June 26, 2026 11:37

@mkroening mkroening left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this! :)

My comments are a bit larger than expected, so if you prefer fixing non-invasively in std first and me taking a look at a bigger refactor I described in the future, that would be fine with me. :)

Linux's NAME_MAX (maximum number of bytes in a file name) is 255. For Hermit, we define NAME_MAX with the same value, although it is not in use in the kernel at the moment. Any divergence from Linux could cause issues moving files from one OS to the other. Many filesystems have the same inherent file name length limit. I think it makes sense for Hermit to have that same limit of 255 bytes and for applications to be able to rely on that.

This means that if we choose a sufficiently large buffer, we don't have to deal with reallocations, which makes the code simpler and faster. This is also what is done by newlib, musl, and glibc. They use different buffer sizes, however:

  • newlib: 512 B
  • musl: 2048 B
  • glibc: st_blksize, though at least 32 KiB and at most 1 MiB

I would suggest using std's DEFAULT_BUF_SIZE, which is usually used for these buffers. It is 8 KiB on nearly all platforms.

In general, it might be useful to model this struct after BufReader, since that is essentially what we are doing. That would decrease the maintenance burden further.

Comment thread library/std/src/sys/fs/hermit.rs Outdated
Comment thread library/std/src/sys/fs/hermit.rs Outdated
// the length of the curent dirent64 buffer.
bytes_read: usize,
fd: FileDesc,
buf: Vec<u8>,

@mkroening mkroening Jun 26, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to use a fixed-size buffer as described above and also not zero the memory before using it:

Suggested change
buf: Vec<u8>,
buf: Box<[MaybeUninit<u8>]>,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered an array here, but then it'd be stack allocated, which I feel isn't the best choice for an 8 KiB buffer. But MaybeUninit is a good idea.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am suggesting a boxed slice.

@jounathaen jounathaen Jun 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A boxed slice requires a decision on the compatibility fix. In the meantime, I've pushed a version using Vec<MaybeUninit<u8>>. (I hope it's correct; I find MaybeUninit always challenging to use correctly 😅 )

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f5b3ea0 Would implement this change.

Comment thread library/std/src/sys/fs/hermit.rs Outdated
Comment on lines +218 to +225
// Workaround to keep compatibility with Hermit <= v0.11
if self.bytes_read > 0 {
let first = unsafe { &*(self.buf.as_ptr() as *const dirent64) };
// d_ino is always 0 on hermit pre v0.12, and always non-zero afterwards, so we can
// use it as a hack here to prevent infinite loops
if first.d_ino == 0 {
return None;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just drop this compat hack and only support the latest Hermit version.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'm hesitating is that this does not crash on older hermit versions but loops infinitely. So maybe keeping the hack for a while is worth a consideration? In principle I'm also fine with dropping it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have never provided support for older Hermit versions and always tell people to try the latest version or even the Git version whenever they have an issue. I also think one year is enough time.

If you want to keep it, when would you remove it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue. I just want to avoid people running, let's say, an old benchmark or anything with new Rust, and then they end up with their PC running just forever. And after three hours of pointless debugging, they then find out that it was the Stdlib that caused that bug.
And, the hack is definitely not really affecting the performance of a critical path.

I'm still undecided.

@mkroening mkroening Jun 26, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it the stronger I think, we should drop the hack.

When you old an old benchmark, you should either not change std and Hermit, or you should change both. We have made breaking changes often, sometimes big, sometimes small. So we have the situation regarding old benchmarks anyway.

Any other answer to this fundamental question would hold Hermit's evolution possibilities a lot. I don't think we are in a situation right now where we are confident enough in our API that we can make stability guarantees and think about publishing a stable 1.0.0 version of Hermit. Currently, every feature release of Hermit can be breaking.

While the performance impact might be small, I think one more free register and more readable code is nice.

@jounathaen jounathaen Jun 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I'll squash the commits
Anything else?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently on the phone. I can do another review next week. :)

Comment on lines -33 to -34
// all DirEntry's will have a reference to this struct
struct InnerReadDir {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having DirEntry have an Arc<InnerReadDir> like on the other platforms would actually be useful because it would allow us to save an allocation per path retrieval. DirEntry::path returns the full path, joining the file name to the root. Currently, we eagerly clone the root path for each new entry, which we can avoid by having these Arcs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'd suggest keeping that for a separate PR. This is not changing the API, and this PR does not change the current behavior.
(I'm also scared of the ownership & lifetime implications when having an Arc of a buffer that might be changing.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer would not be behind the Arc, only the root PathBuf, I think. 🤔

Anyways, it might make sense to not remove the structure with inner in this PR then, maybe. Both PRs would be smaller that way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the split for some potential future change to save a few bytes of git diff does not convince me 😜

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about saving bytes but about reducing the cognitive load when looking at diffs and keeping git-blame nicer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want to do it it's fine. I can take another look on Monday.

@mkroening

mkroening commented Jun 26, 2026

Copy link
Copy Markdown
Member

I just renamed this repo's master branch to main since CI complained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants