Realm deployment script#1652
Conversation
45a5a61 to
0197d0c
Compare
elliottslaughter
left a comment
There was a problem hiding this comment.
@elliottslaughter reviewed 59 files and all commit messages, and made 4 comments.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on lockshaw).
deploy.sh line 10 at r1 (raw file):
} while [[ $# -gt 0 ]]; do
I wanted to make a couple of notes here because it looks like you're taking this in a different direction than what I intended.
This is NOT intended to be a one-size-fits-all, reusable deployment script. It is specifically intended to work on Sapling, and then I hacked it to also support CI so that we don't build something without test coverage. It is intended to be an example of how you could deploy on other machines without necessarily directly supporting any of them, because inevitably other HPC machines will look different, and it was a non-goal to account for every possible variation. I've worked on systems that have that as a goal, and it's doable, but it's also inherently brittle because essentially no HPC machines support CI and you will always be relying on human effort to keep any of it working.
The changes to make this more generic do appear to interfere with it working out of the box on Sapling, which was also a goal of the original.
We can decide to target a different set of goals but I want to be clear on what we're doing so that we build the right infrastructure for the job.
.github/workflows/deploy.yml line 16 at r1 (raw file):
run: | apt-get update -qq apt-get install -y build-essential cmake curl gcc-${{ matrix.gcc }} g++-${{ matrix.gcc }} git libibverbs-dev mpich libmpich-dev python3 zlib1g-dev python3-venv
Not a huge deal but I was keeping this list sorted.
bin/run-model/src/run-model/main.cc line 110 at r1 (raw file):
perform_all_passes_for_pcg_instance( /*instance=*/pcg_instance, /*profiling_settings=*/ProfilingSettings{0, 0},
Note to self that we need to run 1 real iteration or this won't do anything.
lib/local-execution/CMakeLists.txt line 16 at r1 (raw file):
task-spec pcg deps::spdlog
Should we re-sort these dependency lines to be either alphabetical, or put dependencies last? The new prefix breaks the previous sort order.
|
Previously, elliottslaughter (Elliott Slaughter) wrote…
Yeah, I thought you had changed your mind on this because the script didn't having "sapling" in the name so I was just going along with it. put sapling in the name, and then go ahead and re-add in the module loads (I'd do it but it's probably easier for you to test). I think the require_cmd's are still good for at least moving the errors up front, and moving over to pip is still good for reducing the base, and the flags are just for making the usage a bit cleaner (though if it would be better to go back to GCC_VERSION instead of just pulling from CC/CXX, go for it). I think that's all the changes I made, but if there's anything else lmk |
Adds deployment script and capability for a generic, Realm-based model runner executable.
Miscellaneous other fixes:
This change is