feat(mqtt): add mapache user + drop user_data from ignore_changes#70
Merged
Conversation
Two coupled changes: 1. New mqtt_user_mapache (default "mapache") with a 32-char random_password, written as a third line in /etc/nanomq_pwd.conf via user-data. Intended for the mapache services fleet beyond gr26 — query, foreman, any future MQTT publishers. Keeps the credential distinct from gr26 so it can rotate independently of CAN ingest. 2. Drop user_data from lifecycle.ignore_changes (keep ami). The old guard meant adding users, ACL tweaks, or any user-data edit required an explicit `terraform taint` / `-replace` to actually deploy — easy to forget, and we've already hit the trap twice. nanomq carries no persistent state, so a ~90s replacement on legitimate config changes is acceptable; paho clients auto-reconnect. Postgres + ClickHouse modules keep user_data ignored because they own data volumes. Side effect: the apply that lands this PR will replace the running gr-mqtt instance (new user-data → instance recreate). EIP, SG, and all three random_password values survive in state.
Contributor
Terraform plan:
|
| step | result |
|---|---|
| fmt | success |
| init | success |
| validate | success |
| plan | success |
plan output
module.origin_cert.tls_private_key.this: Refreshing state... [id=4c5b5e0a4a4f13723ba2aebe888a7cb50529fcc0]
module.mqtt.random_password.mqtt_tcm26: Refreshing state... [id=none]
module.postgres.random_password.postgres: Refreshing state... [id=none]
module.mqtt.random_password.mqtt: Refreshing state... [id=none]
module.clickhouse.random_password.admin: Refreshing state... [id=none]
data.cloudflare_zone.gauchoracing: Reading...
module.origin_cert.tls_cert_request.this: Refreshing state... [id=9398eb1d26e54eb59b18d476ced10810e80172e5]
module.origin_cert.cloudflare_origin_ca_certificate.this: Refreshing state... [id=307819530070461722629184968406649377122389744032]
module.eks.module.eks.module.kms.data.aws_caller_identity.current[0]: Reading...
module.clickhouse.data.aws_ami.al2023_arm64: Reading...
module.eks.module.eks.data.aws_iam_policy_document.assume_role_policy[0]: Reading...
module.mqtt.data.aws_ami.al2023_arm64: Reading...
module.clickhouse.aws_ebs_volume.data: Refreshing state... [id=vol-0e312e8d71875ec89]
module.eks.module.eks.aws_cloudwatch_log_group.this[0]: Refreshing state... [id=/aws/eks/gr-prod/cluster]
module.eks.module.eks.module.kms.data.aws_partition.current[0]: Reading...
module.postgres.aws_ebs_volume.data: Refreshing state... [id=vol-02b203218b57629b4]
module.eks.module.eks.module.kms.data.aws_partition.current[0]: Read complete after 0s [id=aws]
module.eks.module.eks.data.aws_iam_policy_document.assume_role_policy[0]: Read complete after 0s [id=2830595799]
module.eks.module.eks.data.aws_iam_policy_document.node_assume_role_policy[0]: Reading...
module.vpc.module.vpc.aws_vpc.this[0]: Refreshing state... [id=vpc-06e13a97395396a3b]
module.eks.module.eks.data.aws_iam_policy_document.node_assume_role_policy[0]: Read complete after 0s [id=3518401652]
module.eks.module.eks.data.aws_caller_identity.current[0]: Reading...
module.eks.module.eks.module.kms.data.aws_caller_identity.current[0]: Read complete after 0s [id=211125506628]
module.eks.module.eks.data.aws_partition.current[0]: Reading...
module.eks.module.eks.data.aws_partition.current[0]: Read complete after 0s [id=aws]
module.postgres.data.aws_ami.al2023_arm64: Reading...
data.cloudflare_zone.gauchoracing: Read complete after 0s [id=5ac5ae9c6086e4b55c5e1b21ca963d94]
module.eks.module.eks.data.aws_caller_identity.current[0]: Read complete after 0s [id=211125506628]
module.eks.module.eks.aws_iam_role.this[0]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002]
module.eks.module.eks.aws_iam_role.eks_auto[0]: Refreshing state... [id=gr-prod-eks-auto-20260601094833482500000004]
cloudflare_ruleset.ssl_overrides: Refreshing state... [id=5a5ed8237d6f48418c172979ffe5da81]
module.eks.module.eks.data.aws_iam_session_context.current[0]: Reading...
module.eks.module.eks.data.aws_iam_policy_document.custom[0]: Reading...
module.eks.module.eks.data.aws_iam_policy_document.custom[0]: Read complete after 0s [id=513122117]
module.eks.module.eks.aws_iam_policy.custom[0]: Refreshing state... [id=arn:aws:iam::211125506628:policy/gr-prod-cluster-20260601094833480900000001]
module.eks.module.eks.data.aws_iam_session_context.current[0]: Read complete after 0s [id=arn:aws:sts::211125506628:assumed-role/github-actions-terraform/GitHubActions]
module.eks.module.eks.aws_iam_role_policy_attachment.this["AmazonEKSBlockStoragePolicy"]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::aws:policy/AmazonEKSBlockStoragePolicy]
module.eks.module.eks.aws_iam_role_policy_attachment.this["AmazonEKSComputePolicy"]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::aws:policy/AmazonEKSComputePolicy]
module.eks.module.eks.aws_iam_role_policy_attachment.this["AmazonEKSClusterPolicy"]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.postgres.data.aws_ami.al2023_arm64: Read complete after 1s [id=ami-0a2a049c945b84826]
module.eks.module.eks.aws_iam_role_policy_attachment.this["AmazonEKSLoadBalancingPolicy"]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::aws:policy/AmazonEKSLoadBalancingPolicy]
module.clickhouse.data.aws_ami.al2023_arm64: Read complete after 1s [id=ami-0a2a049c945b84826]
module.eks.module.eks.aws_iam_role_policy_attachment.this["AmazonEKSNetworkingPolicy"]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::aws:policy/AmazonEKSNetworkingPolicy]
module.mqtt.data.aws_ami.al2023_arm64: Read complete after 1s [id=ami-0a2a049c945b84826]
module.eks.module.eks.aws_iam_role_policy_attachment.eks_auto["AmazonEC2ContainerRegistryPullOnly"]: Refreshing state... [id=gr-prod-eks-auto-20260601094833482500000004/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPullOnly]
module.eks.module.eks.aws_iam_role_policy_attachment.eks_auto["AmazonEKSWorkerNodeMinimalPolicy"]: Refreshing state... [id=gr-prod-eks-auto-20260601094833482500000004/arn:aws:iam::aws:policy/AmazonEKSWorkerNodeMinimalPolicy]
module.eks.module.eks.module.kms.data.aws_iam_policy_document.this[0]: Reading...
module.eks.module.eks.module.kms.data.aws_iam_policy_document.this[0]: Read complete after 0s [id=922405470]
module.eks.module.eks.aws_iam_role_policy_attachment.custom[0]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::211125506628:policy/gr-prod-cluster-20260601094833480900000001]
module.eks.module.eks.module.kms.aws_kms_key.this[0]: Refreshing state... [id=7768801a-b38a-4c26-8bc3-7bf6fe2aac86]
module.vpc.module.vpc.aws_default_security_group.this[0]: Refreshing state... [id=sg-0a592b2169fd42df8]
module.vpc.module.vpc.aws_default_route_table.default[0]: Refreshing state... [id=rtb-08f817bde5f65eb92]
module.mqtt.aws_security_group.this: Refreshing state... [id=sg-0f5a2dc492283dafe]
module.eks.module.eks.aws_security_group.cluster[0]: Refreshing state... [id=sg-0cac44db03a686436]
module.vpc.module.vpc.aws_default_network_acl.this[0]: Refreshing state... [id=acl-0fd92b5b8eb95b2f9]
module.vpc.module.vpc.aws_internet_gateway.this[0]: Refreshing state... [id=igw-0ace83040de106603]
module.clickhouse.aws_security_group.this: Refreshing state... [id=sg-0dee964416e7aaeb5]
module.eks.module.eks.aws_security_group.node[0]: Refreshing state... [id=sg-0b19db83dbe18cbf1]
module.vpc.module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0c18db918f54ad033]
module.vpc.module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-09fbaccd0b3aaab85]
module.vpc.module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-022e58c410c24d794]
module.vpc.module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-06540460bfd7d06a2]
module.vpc.module.vpc.aws_route_table.public[0]: Refreshing state... [id=rtb-0815845194166b58b]
module.vpc.module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0182a0562244a6fac]
module.vpc.module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-0a5a8299f6da9bc59]
module.vpc.module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0264aaaa19faa70f5]
module.postgres.aws_security_group.this: Refreshing state... [id=sg-08a5b2e02e0540520]
module.eks.module.eks.module.kms.aws_kms_alias.this["cluster"]: Refreshing state... [id=alias/eks/gr-prod]
module.origin_cert.aws_acm_certificate.this: Refreshing state... [id=arn:aws:acm:us-west-2:211125506628:certificate/d10d5205-6d4b-4798-a152-293c69174660]
module.vpc.module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-015b9b6ae09534761]
module.mqtt.aws_security_group_rule.ingress_cidr[0]: Refreshing state... [id=sgrule-1294033340]
module.clickhouse.aws_security_group_rule.ingress_cidr["8123"]: Refreshing state... [id=sgrule-467233960]
module.clickhouse.aws_security_group_rule.ingress_cidr["9000"]: Refreshing state... [id=sgrule-2026118668]
module.eks.module.eks.aws_iam_policy.cluster_encryption[0]: Refreshing state... [id=arn:aws:iam::211125506628:policy/gr-prod-cluster-ClusterEncryption20260601094855072000000006]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_10251_webhook"]: Refreshing state... [id=sgrule-1337801498]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_9443_webhook"]: Refreshing state... [id=sgrule-3115694346]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_kubelet"]: Refreshing state... [id=sgrule-2079615841]
module.eks.module.eks.aws_security_group_rule.node["ingress_nodes_ephemeral"]: Refreshing state... [id=sgrule-504933240]
module.eks.module.eks.aws_security_group_rule.node["ingress_self_coredns_tcp"]: Refreshing state... [id=sgrule-1577514230]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_4443_webhook"]: Refreshing state... [id=sgrule-118149494]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_6443_webhook"]: Refreshing state... [id=sgrule-3460871904]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_443"]: Refreshing state... [id=sgrule-500562133]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_8443_webhook"]: Refreshing state... [id=sgrule-3709110977]
module.eks.module.eks.aws_security_group_rule.node["ingress_self_coredns_udp"]: Refreshing state... [id=sgrule-4200159001]
module.eks.module.eks.aws_security_group_rule.node["egress_all"]: Refreshing state... [id=sgrule-4004824215]
module.vpc.module.vpc.aws_route.public_internet_gateway[0]: Refreshing state... [id=r-rtb-0815845194166b58b1080289494]
module.vpc.module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0c836864d0eccc7fc]
module.vpc.module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-0b736817e30c38444]
module.vpc.module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-07ea61af9805b07cb]
module.eks.module.eks.aws_security_group_rule.cluster["ingress_nodes_443"]: Refreshing state... [id=sgrule-535925259]
module.vpc.module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-041ffe3137ec2ff7a]
module.vpc.module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-01d455080d37c3108]
module.vpc.module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-09913d53f1ff40442]
module.postgres.aws_security_group_rule.ingress_cidr[0]: Refreshing state... [id=sgrule-3721507580]
module.eks.module.eks.aws_iam_role_policy_attachment.cluster_encryption[0]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::211125506628:policy/gr-prod-cluster-ClusterEncryption20260601094855072000000006]
module.vpc.module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-019992cce709b8681]
module.postgres.aws_security_group_rule.ingress_sg["sg-0b19db83dbe18cbf1"]: Refreshing state... [id=sgrule-1130224857]
module.mqtt.aws_security_group_rule.ingress_sg["sg-0b19db83dbe18cbf1"]: Refreshing state... [id=sgrule-1062873908]
module.clickhouse.aws_security_group_rule.ingress_sg["sg-0b19db83dbe18cbf1-8123"]: Refreshing state... [id=sgrule-4177740833]
module.clickhouse.aws_security_group_rule.ingress_sg["sg-0b19db83dbe18cbf1-9000"]: Refreshing state... [id=sgrule-2466486093]
module.mqtt.aws_instance.this: Refreshing state... [id=i-0bf98528bc8e9dab0]
module.clickhouse.aws_instance.this: Refreshing state... [id=i-0f862cc6460b5d98a]
module.vpc.module.vpc.aws_route.private_nat_gateway[0]: Refreshing state... [id=r-rtb-0c18db918f54ad0331080289494]
module.postgres.aws_instance.this: Refreshing state... [id=i-013aab40e28a0b6b7]
module.eks.module.eks.aws_eks_cluster.this[0]: Refreshing state... [id=gr-prod]
module.eks.module.eks.aws_eks_access_entry.this["arn-aws-iam--211125506628-role-github-actions-terraform"]: Refreshing state... [id=gr-prod:arn:aws:iam::211125506628:role/github-actions-terraform]
module.eks.module.eks.aws_eks_access_entry.this["arn-aws-iam--211125506628-user-admin-cli"]: Refreshing state... [id=gr-prod:arn:aws:iam::211125506628:user/admin-cli]
module.eks.module.eks.time_sleep.this[0]: Refreshing state... [id=2026-06-01T10:46:10Z]
module.eks.module.eks.data.tls_certificate.this[0]: Reading...
module.eks.module.eks.data.tls_certificate.this[0]: Read complete after 0s [id=f97f646c2cd14cc0db0f757f0fccc96abbbe2af5]
module.eks.module.eks.aws_iam_openid_connect_provider.oidc_provider[0]: Refreshing state... [id=arn:aws:iam::211125506628:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/21512EE80634956C7C9D0B9647C70224]
module.eks.module.eks.aws_eks_access_policy_association.this["arn-aws-iam--211125506628-user-admin-cli_admin"]: Refreshing state... [id=gr-prod#arn:aws:iam::211125506628:user/admin-cli#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.module.eks.aws_eks_access_policy_association.this["arn-aws-iam--211125506628-role-github-actions-terraform_admin"]: Refreshing state... [id=gr-prod#arn:aws:iam::211125506628:role/github-actions-terraform#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.argocd.helm_release.argocd: Refreshing state... [id=argocd]
module.mqtt.aws_eip.this[0]: Refreshing state... [id=eipalloc-000796d397533c6d8]
cloudflare_dns_record.gr_mqtt: Refreshing state... [id=53badd7d1ab83280dc57c671ee486f90]
module.clickhouse.aws_volume_attachment.data: Refreshing state... [id=vai-843739595]
module.clickhouse.aws_eip.this[0]: Refreshing state... [id=eipalloc-0970fced638b7d8d9]
cloudflare_dns_record.gr_clickhouse: Refreshing state... [id=d6f3d3c09db5c56703a7b107d6ac3f29]
module.postgres.aws_eip.this[0]: Refreshing state... [id=eipalloc-06d6c59b1e0a49482]
module.postgres.aws_volume_attachment.data: Refreshing state... [id=vai-3584522434]
cloudflare_dns_record.gr_postgres: Refreshing state... [id=a69d68353c27a536e84d7448af48e3f0]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
Terraform will perform the following actions:
# module.mqtt.aws_instance.this will be updated in-place
~ resource "aws_instance" "this" {
id = "i-0bf98528bc8e9dab0"
~ public_dns = "ec2-184-33-5-10.us-west-2.compute.amazonaws.com" -> (known after apply)
~ public_ip = "184.33.5.10" -> (known after apply)
tags = {
"Name" = "gr-mqtt"
"Role" = "nanomq"
}
~ user_data = (sensitive value)
# (38 unchanged attributes hidden)
# (9 unchanged blocks hidden)
}
# module.mqtt.random_password.mqtt_mapache will be created
+ resource "random_password" "mqtt_mapache" {
+ bcrypt_hash = (sensitive value)
+ id = (known after apply)
+ length = 32
+ lower = true
+ min_lower = 0
+ min_numeric = 0
+ min_special = 0
+ min_upper = 0
+ number = true
+ numeric = true
+ result = (sensitive value)
+ special = false
+ upper = true
}
Plan: 1 to add, 1 to change, 0 to destroy.
Changes to Outputs:
+ mqtt_password_mapache = (sensitive value)
─────────────────────────────────────────────────────────────────────────────
Note: You didn't use the -out option to save this plan, so Terraform can't
guarantee to take exactly these actions if you run "terraform apply" now.
3 tasks
BK1031
added a commit
that referenced
this pull request
Jun 5, 2026
The AWS provider's default for aws_instance.user_data is user_data_replace_on_change = false. terraform applies a user_data diff via ModifyInstanceAttribute, which *stores* the new user-data but doesn't re-execute cloud-init — the new file lands only on the next stop+start. So the apply for PR #70 (which added the mapache user) updated state in place and never touched the running /etc/nanomq_pwd.conf. Setting user_data_replace_on_change = true makes user-data a force- replace attribute again, which is what the previous PR's removal of user_data from lifecycle.ignore_changes was actually trying to achieve. This change won't trigger anything on its own (state and code match the running box now that the running box was manually patched via SSM). The next legitimate user-data edit will trigger a clean instance replacement instead of the silent no-op trap.
4 tasks
BK1031
added a commit
that referenced
this pull request
Jun 5, 2026
The dedicated `gr26` MQTT user (PR #45) and the `mapache` fleet user (PR #70) both exist on the gr-mqtt broker. Standardizing the in-cluster gr26 service onto the fleet credential so all mapache services share a single broker identity that rotates together — keeps the `gr26` user free for the on-vehicle ingest path if we ever want it there. - MQTT_USER: gr26 → mapache - MQTT_PASSWORD secretKeyRef.key: MQTT_PASSWORD → MQTT_MAPACHE_PASSWORD - mapache-secrets already has both keys populated (MQTT_PASSWORD untouched, MQTT_MAPACHE_PASSWORD added manually via `kubectl patch` before this PR) Rollout: ArgoCD sync → new pods come up with the new env, paho connects fresh under the mapache identity. Old gr26-user MQTT sessions hold their TCP socket until pod termination then drop cleanly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two coupled changes:
1. New `mapache` MQTT user
Adds `mqtt_user_mapache` (default `"mapache"`) with a 32-char `random_password`, written as a third line in `/etc/nanomq_pwd.conf` via user-data. Intended for the mapache services fleet beyond gr26 — query, foreman, any future MQTT publishers. Distinct from `gr26` so the service-fleet credential can rotate independently of the CAN-ingest pipeline.
Read via:
```bash
terraform -chdir=infra/environments/prod output -raw mqtt_password_mapache
```
2. Drop `user_data` from `lifecycle.ignore_changes`
Old guard:
```hcl
lifecycle {
ignore_changes = [user_data, ami]
}
```
…meant adding users, ACL tweaks, or any user-data edit required an explicit `terraform taint` / `-replace=...` to actually take effect. Easy to forget, and we've already hit the trap twice (tcm26 user, clickhouse chown). nanomq carries no persistent state, so a ~90s replacement on legitimate config changes is acceptable — paho clients auto-reconnect, no data is lost.
`ami` stays in `ignore_changes` so unrelated AL2023 base-image churn doesn't silently roll the instance.
Postgres + ClickHouse modules keep `user_data` ignored — they own data volumes and stop/start downtime is a bigger ask there. Justification is symmetric to why `prevent_destroy = true` exists on those EBS volumes but not here.
Apply consequence
`terraform plan` will show `module.mqtt.aws_instance.this` must be replaced because user-data changed (new mapache user line). EIP, SG, all three `random_password` values survive in state. Expect ~60–120s of broker downtime during the EC2 swap; paho clients in gr26 reconnect automatically.
Test plan