Hermit: Fix readdir for hermit >= v0.12#8
Conversation
ab78f24 to
6f6ed3f
Compare
mkroening
left a comment
There was a problem hiding this comment.
Thanks for looking into this! :)
My comments are a bit larger than expected, so if you prefer fixing non-invasively in std first and me taking a look at a bigger refactor I described in the future, that would be fine with me. :)
Linux's NAME_MAX (maximum number of bytes in a file name) is 255. For Hermit, we define NAME_MAX with the same value, although it is not in use in the kernel at the moment. Any divergence from Linux could cause issues moving files from one OS to the other. Many filesystems have the same inherent file name length limit. I think it makes sense for Hermit to have that same limit of 255 bytes and for applications to be able to rely on that.
This means that if we choose a sufficiently large buffer, we don't have to deal with reallocations, which makes the code simpler and faster. This is also what is done by newlib, musl, and glibc. They use different buffer sizes, however:
- newlib: 512 B
- musl: 2048 B
- glibc:
st_blksize, though at least 32 KiB and at most 1 MiB
I would suggest using std's DEFAULT_BUF_SIZE, which is usually used for these buffers. It is 8 KiB on nearly all platforms.
In general, it might be useful to model this struct after BufReader, since that is essentially what we are doing. That would decrease the maintenance burden further.
| // the length of the curent dirent64 buffer. | ||
| bytes_read: usize, | ||
| fd: FileDesc, | ||
| buf: Vec<u8>, |
There was a problem hiding this comment.
It would be nice to use a fixed-size buffer as described above and also not zero the memory before using it:
| buf: Vec<u8>, | |
| buf: Box<[MaybeUninit<u8>]>, |
There was a problem hiding this comment.
I considered an array here, but then it'd be stack allocated, which I feel isn't the best choice for an 8 KiB buffer. But MaybeUninit is a good idea.
There was a problem hiding this comment.
I am suggesting a boxed slice.
There was a problem hiding this comment.
A boxed slice requires a decision on the compatibility fix. In the meantime, I've pushed a version using Vec<MaybeUninit<u8>>. (I hope it's correct; I find MaybeUninit always challenging to use correctly 😅 )
| // Workaround to keep compatibility with Hermit <= v0.11 | ||
| if self.bytes_read > 0 { | ||
| let first = unsafe { &*(self.buf.as_ptr() as *const dirent64) }; | ||
| // d_ino is always 0 on hermit pre v0.12, and always non-zero afterwards, so we can | ||
| // use it as a hack here to prevent infinite loops | ||
| if first.d_ino == 0 { | ||
| return None; | ||
| } |
There was a problem hiding this comment.
I think we should just drop this compat hack and only support the latest Hermit version.
There was a problem hiding this comment.
The reason I'm hesitating is that this does not crash on older hermit versions but loops infinitely. So maybe keeping the hack for a while is worth a consideration? In principle I'm also fine with dropping it.
There was a problem hiding this comment.
We have never provided support for older Hermit versions and always tell people to try the latest version or even the Git version whenever they have an issue. I also think one year is enough time.
If you want to keep it, when would you remove it?
There was a problem hiding this comment.
I have no clue. I just want to avoid people running, let's say, an old benchmark or anything with new Rust, and then they end up with their PC running just forever. And after three hours of pointless debugging, they then find out that it was the Stdlib that caused that bug.
And, the hack is definitely not really affecting the performance of a critical path.
I'm still undecided.
There was a problem hiding this comment.
The more I think about it the stronger I think, we should drop the hack.
When you old an old benchmark, you should either not change std and Hermit, or you should change both. We have made breaking changes often, sometimes big, sometimes small. So we have the situation regarding old benchmarks anyway.
Any other answer to this fundamental question would hold Hermit's evolution possibilities a lot. I don't think we are in a situation right now where we are confident enough in our API that we can make stability guarantees and think about publishing a stable 1.0.0 version of Hermit. Currently, every feature release of Hermit can be breaking.
While the performance impact might be small, I think one more free register and more readable code is nice.
There was a problem hiding this comment.
Ok, then I'll squash the commits
Anything else?
There was a problem hiding this comment.
I'm currently on the phone. I can do another review next week. :)
| // all DirEntry's will have a reference to this struct | ||
| struct InnerReadDir { |
There was a problem hiding this comment.
Having DirEntry have an Arc<InnerReadDir> like on the other platforms would actually be useful because it would allow us to save an allocation per path retrieval. DirEntry::path returns the full path, joining the file name to the root. Currently, we eagerly clone the root path for each new entry, which we can avoid by having these Arcs.
There was a problem hiding this comment.
Yes, but I'd suggest keeping that for a separate PR. This is not changing the API, and this PR does not change the current behavior.
(I'm also scared of the ownership & lifetime implications when having an Arc of a buffer that might be changing.)
There was a problem hiding this comment.
The buffer would not be behind the Arc, only the root PathBuf, I think. 🤔
Anyways, it might make sense to not remove the structure with inner in this PR then, maybe. Both PRs would be smaller that way.
There was a problem hiding this comment.
Keeping the split for some potential future change to save a few bytes of git diff does not convince me 😜
There was a problem hiding this comment.
This is not about saving bytes but about reducing the cognitive load when looking at diffs and keeping git-blame nicer.
There was a problem hiding this comment.
If you don't want to do it it's fine. I can take another look on Monday.
|
I just renamed this repo's master branch to main since CI complained. |
0522144 to
3f5f7d5
Compare
…op compatibility hack with hermit 0.11
3f5f7d5 to
f5b3ea0
Compare
On Hermit, the
sys_getdentsimplementation was flawed, as it wasn't stateful. So the caller had to provide a buffer large enough to hold alldirentsin advance. Else it returnsEINVAL. The current workaround used in Rust is to grow the Buffer until it is sufficiently large.hermit-os/kernel#1738 fixes the behavior of
sys_getdents. Hermit now keeps track of the directory position and each new call to thesys_getdentsprovides the next entries.But this results in the current implementation in Rust Std being flawed. Because Hermit does now return a proper readcount, even if not all
dirent64s are read,read_dirwill only ever iterate over the first 512 bytes of directory entries.This PR fixes this with a proper implementation of the
getdents64algorithm.Implementation remark:
ReadDirandReadDirInneris unnecessary, as unlike stated in a comment,DirEntrydoes not contain a reference toReadDirInner.Compatibility: