From 7b3c80f0ce381e296da90873001c2ec469e89ea1 Mon Sep 17 00:00:00 2001 From: Bharat Kathi Date: Fri, 5 Jun 2026 11:51:58 -0700 Subject: [PATCH] fix(mqtt): force instance replace on user_data change MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The AWS provider's default for aws_instance.user_data is user_data_replace_on_change = false. terraform applies a user_data diff via ModifyInstanceAttribute, which *stores* the new user-data but doesn't re-execute cloud-init — the new file lands only on the next stop+start. So the apply for PR #70 (which added the mapache user) updated state in place and never touched the running /etc/nanomq_pwd.conf. Setting user_data_replace_on_change = true makes user-data a force- replace attribute again, which is what the previous PR's removal of user_data from lifecycle.ignore_changes was actually trying to achieve. This change won't trigger anything on its own (state and code match the running box now that the running box was manually patched via SSM). The next legitimate user-data edit will trigger a clean instance replacement instead of the silent no-op trap. --- infra/modules/mqtt-ec2/main.tf | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/infra/modules/mqtt-ec2/main.tf b/infra/modules/mqtt-ec2/main.tf index 0f4b4a4..c192436 100644 --- a/infra/modules/mqtt-ec2/main.tf +++ b/infra/modules/mqtt-ec2/main.tf @@ -125,6 +125,14 @@ resource "aws_instance" "this" { mqtt_password_mapache = random_password.mqtt_mapache.result }) + # Force instance replacement when user_data changes. Without this, the + # AWS provider's default is to call ModifyInstanceAttribute, which + # *stores* the new user_data but doesn't re-execute it — the file lands + # only on the next stop+start. We learned this the hard way: a normal + # apply that added a third nanomq user updated state in place but left + # the running broker with the old /etc/nanomq_pwd.conf. + user_data_replace_on_change = true + # user_data is intentionally NOT in ignore_changes: nanomq carries no # persistent state, so legitimate config edits (new user, ACL change) # should flow through a normal `terraform apply` and trigger the ~90s