fix(clickhouse-ec2): disable text_log and lower log level in provisioning#79
Merged
Merged
Conversation
…ning Stock ClickHouse ships at `trace` log level with system.text_log enabled and no TTL. On the hosted instance text_log grew ~155 GiB in ~2 days and filled the data volume, after which every INSERT failed with NOT_ENOUGH_SPACE (Code 243) — query_token/query_log writes from the query service were blocked; reads were unaffected. The running server was fixed by hand, but that fix lived only in the container's writable layer because config.d was never bind-mounted, so it would not survive an image bump or instance replacement. Bake it into user-data instead, mirroring the existing users.d pattern: - Write /etc/clickhouse-server/config.d/quiet-logs.xml setting the logger level to `information` and removing text_log. - Bind-mount config.d (:ro) into the container so the override is applied. - chown 101:101 so the container's clickhouse uid can read it (no secrets, so no chmod 600 like admin.xml). System-log table TTLs (system.query_log etc.) are applied separately via SQL; they persist on the preserved EBS data volume. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
Terraform plan:
|
| step | result |
|---|---|
| fmt | success |
| init | success |
| validate | success |
| plan | success |
plan output
module.mqtt.random_password.mqtt_mapache: Refreshing state... [id=none]
module.mqtt.random_password.mqtt: Refreshing state... [id=none]
module.mqtt.random_password.mqtt_tcm26: Refreshing state... [id=none]
module.postgres.random_password.postgres: Refreshing state... [id=none]
module.clickhouse.random_password.admin: Refreshing state... [id=none]
module.origin_cert.tls_private_key.this: Refreshing state... [id=4c5b5e0a4a4f13723ba2aebe888a7cb50529fcc0]
module.origin_cert.tls_cert_request.this: Refreshing state... [id=9398eb1d26e54eb59b18d476ced10810e80172e5]
data.cloudflare_zone.gauchoracing: Reading...
module.origin_cert.cloudflare_origin_ca_certificate.this: Refreshing state... [id=307819530070461722629184968406649377122389744032]
module.clickhouse.data.aws_ami.al2023_arm64: Reading...
module.mqtt.data.aws_ami.al2023_arm64: Reading...
module.eks.module.eks.data.aws_iam_policy_document.assume_role_policy[0]: Reading...
module.eks.module.eks.data.aws_iam_policy_document.node_assume_role_policy[0]: Reading...
module.eks.module.eks.module.kms.data.aws_partition.current[0]: Reading...
module.eks.module.eks.aws_cloudwatch_log_group.this[0]: Refreshing state... [id=/aws/eks/gr-prod/cluster]
module.eks.module.eks.module.kms.data.aws_caller_identity.current[0]: Reading...
module.clickhouse.aws_ebs_volume.data: Refreshing state... [id=vol-0e312e8d71875ec89]
module.vpc.module.vpc.aws_vpc.this[0]: Refreshing state... [id=vpc-06e13a97395396a3b]
module.eks.module.eks.data.aws_iam_policy_document.node_assume_role_policy[0]: Read complete after 0s [id=3518401652]
module.eks.module.eks.module.kms.data.aws_partition.current[0]: Read complete after 0s [id=aws]
module.eks.module.eks.data.aws_iam_policy_document.assume_role_policy[0]: Read complete after 0s [id=2830595799]
module.eks.module.eks.data.aws_caller_identity.current[0]: Reading...
module.postgres.aws_ebs_volume.data: Refreshing state... [id=vol-02b203218b57629b4]
module.origin_cert.aws_acm_certificate.this: Refreshing state... [id=arn:aws:acm:us-west-2:211125506628:certificate/d10d5205-6d4b-4798-a152-293c69174660]
module.eks.module.eks.module.kms.data.aws_caller_identity.current[0]: Read complete after 0s [id=211125506628]
module.postgres.data.aws_ami.al2023_arm64: Reading...
data.cloudflare_zone.gauchoracing: Read complete after 0s [id=5ac5ae9c6086e4b55c5e1b21ca963d94]
module.eks.module.eks.data.aws_partition.current[0]: Reading...
module.eks.module.eks.data.aws_partition.current[0]: Read complete after 0s [id=aws]
module.eks.module.eks.aws_iam_role.eks_auto[0]: Refreshing state... [id=gr-prod-eks-auto-20260601094833482500000004]
module.eks.module.eks.data.aws_caller_identity.current[0]: Read complete after 0s [id=211125506628]
module.eks.module.eks.aws_iam_role.this[0]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002]
module.eks.module.eks.data.aws_iam_session_context.current[0]: Reading...
cloudflare_ruleset.ssl_overrides: Refreshing state... [id=5a5ed8237d6f48418c172979ffe5da81]
module.eks.module.eks.data.aws_iam_session_context.current[0]: Read complete after 0s [id=arn:aws:sts::211125506628:assumed-role/github-actions-terraform/GitHubActions]
module.eks.module.eks.data.aws_iam_policy_document.custom[0]: Reading...
module.eks.module.eks.data.aws_iam_policy_document.custom[0]: Read complete after 0s [id=513122117]
module.eks.module.eks.aws_iam_role_policy_attachment.eks_auto["AmazonEC2ContainerRegistryPullOnly"]: Refreshing state... [id=gr-prod-eks-auto-20260601094833482500000004/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPullOnly]
module.eks.module.eks.aws_iam_role_policy_attachment.eks_auto["AmazonEKSWorkerNodeMinimalPolicy"]: Refreshing state... [id=gr-prod-eks-auto-20260601094833482500000004/arn:aws:iam::aws:policy/AmazonEKSWorkerNodeMinimalPolicy]
module.eks.module.eks.aws_iam_policy.custom[0]: Refreshing state... [id=arn:aws:iam::211125506628:policy/gr-prod-cluster-20260601094833480900000001]
module.eks.module.eks.module.kms.data.aws_iam_policy_document.this[0]: Reading...
module.eks.module.eks.module.kms.data.aws_iam_policy_document.this[0]: Read complete after 0s [id=922405470]
module.eks.module.eks.aws_iam_role_policy_attachment.this["AmazonEKSLoadBalancingPolicy"]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::aws:policy/AmazonEKSLoadBalancingPolicy]
module.eks.module.eks.aws_iam_role_policy_attachment.this["AmazonEKSNetworkingPolicy"]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::aws:policy/AmazonEKSNetworkingPolicy]
module.eks.module.eks.aws_iam_role_policy_attachment.this["AmazonEKSBlockStoragePolicy"]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::aws:policy/AmazonEKSBlockStoragePolicy]
module.eks.module.eks.aws_iam_role_policy_attachment.this["AmazonEKSClusterPolicy"]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.module.eks.aws_iam_role_policy_attachment.this["AmazonEKSComputePolicy"]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::aws:policy/AmazonEKSComputePolicy]
module.eks.module.eks.module.kms.aws_kms_key.this[0]: Refreshing state... [id=7768801a-b38a-4c26-8bc3-7bf6fe2aac86]
module.eks.module.eks.aws_iam_role_policy_attachment.custom[0]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::211125506628:policy/gr-prod-cluster-20260601094833480900000001]
module.mqtt.data.aws_ami.al2023_arm64: Read complete after 1s [id=ami-0a2a049c945b84826]
module.clickhouse.data.aws_ami.al2023_arm64: Read complete after 1s [id=ami-0a2a049c945b84826]
module.postgres.data.aws_ami.al2023_arm64: Read complete after 1s [id=ami-0a2a049c945b84826]
module.eks.module.eks.module.kms.aws_kms_alias.this["cluster"]: Refreshing state... [id=alias/eks/gr-prod]
module.eks.module.eks.aws_iam_policy.cluster_encryption[0]: Refreshing state... [id=arn:aws:iam::211125506628:policy/gr-prod-cluster-ClusterEncryption20260601094855072000000006]
module.eks.module.eks.aws_iam_role_policy_attachment.cluster_encryption[0]: Refreshing state... [id=gr-prod-cluster-20260601094833481300000002/arn:aws:iam::211125506628:policy/gr-prod-cluster-ClusterEncryption20260601094855072000000006]
module.vpc.module.vpc.aws_default_route_table.default[0]: Refreshing state... [id=rtb-08f817bde5f65eb92]
module.vpc.module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0c18db918f54ad033]
module.mqtt.aws_security_group.this: Refreshing state... [id=sg-0f5a2dc492283dafe]
module.vpc.module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0182a0562244a6fac]
module.clickhouse.aws_security_group.this: Refreshing state... [id=sg-0dee964416e7aaeb5]
module.vpc.module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-0a5a8299f6da9bc59]
module.vpc.module.vpc.aws_default_security_group.this[0]: Refreshing state... [id=sg-0a592b2169fd42df8]
module.vpc.module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0264aaaa19faa70f5]
module.postgres.aws_security_group.this: Refreshing state... [id=sg-08a5b2e02e0540520]
module.vpc.module.vpc.aws_default_network_acl.this[0]: Refreshing state... [id=acl-0fd92b5b8eb95b2f9]
module.vpc.module.vpc.aws_internet_gateway.this[0]: Refreshing state... [id=igw-0ace83040de106603]
module.vpc.module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-09fbaccd0b3aaab85]
module.vpc.module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-022e58c410c24d794]
module.vpc.module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-06540460bfd7d06a2]
module.eks.module.eks.aws_security_group.cluster[0]: Refreshing state... [id=sg-0cac44db03a686436]
module.vpc.module.vpc.aws_route_table.public[0]: Refreshing state... [id=rtb-0815845194166b58b]
module.eks.module.eks.aws_security_group.node[0]: Refreshing state... [id=sg-0b19db83dbe18cbf1]
module.vpc.module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-015b9b6ae09534761]
module.mqtt.aws_security_group_rule.ingress_cidr[0]: Refreshing state... [id=sgrule-1294033340]
module.postgres.aws_security_group_rule.ingress_cidr[0]: Refreshing state... [id=sgrule-3721507580]
module.clickhouse.aws_security_group_rule.ingress_cidr["8123"]: Refreshing state... [id=sgrule-467233960]
module.clickhouse.aws_security_group_rule.ingress_cidr["9000"]: Refreshing state... [id=sgrule-2026118668]
module.clickhouse.aws_instance.this: Refreshing state... [id=i-0f862cc6460b5d98a]
module.postgres.aws_instance.this: Refreshing state... [id=i-013aab40e28a0b6b7]
module.mqtt.aws_instance.this: Refreshing state... [id=i-0bf98528bc8e9dab0]
module.vpc.module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0c836864d0eccc7fc]
module.vpc.module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-07ea61af9805b07cb]
module.vpc.module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-0b736817e30c38444]
module.vpc.module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-01d455080d37c3108]
module.vpc.module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-09913d53f1ff40442]
module.vpc.module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-041ffe3137ec2ff7a]
module.vpc.module.vpc.aws_route.public_internet_gateway[0]: Refreshing state... [id=r-rtb-0815845194166b58b1080289494]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_9443_webhook"]: Refreshing state... [id=sgrule-3115694346]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_kubelet"]: Refreshing state... [id=sgrule-2079615841]
module.eks.module.eks.aws_security_group_rule.node["ingress_nodes_ephemeral"]: Refreshing state... [id=sgrule-504933240]
module.eks.module.eks.aws_security_group_rule.node["ingress_self_coredns_udp"]: Refreshing state... [id=sgrule-4200159001]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_4443_webhook"]: Refreshing state... [id=sgrule-118149494]
module.eks.module.eks.aws_security_group_rule.node["egress_all"]: Refreshing state... [id=sgrule-4004824215]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_10251_webhook"]: Refreshing state... [id=sgrule-1337801498]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_6443_webhook"]: Refreshing state... [id=sgrule-3460871904]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_8443_webhook"]: Refreshing state... [id=sgrule-3709110977]
module.eks.module.eks.aws_security_group_rule.node["ingress_cluster_443"]: Refreshing state... [id=sgrule-500562133]
module.eks.module.eks.aws_security_group_rule.node["ingress_self_coredns_tcp"]: Refreshing state... [id=sgrule-1577514230]
module.eks.module.eks.aws_security_group_rule.cluster["ingress_nodes_443"]: Refreshing state... [id=sgrule-535925259]
module.vpc.module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-019992cce709b8681]
module.postgres.aws_security_group_rule.ingress_sg["sg-0b19db83dbe18cbf1"]: Refreshing state... [id=sgrule-1130224857]
module.mqtt.aws_security_group_rule.ingress_sg["sg-0b19db83dbe18cbf1"]: Refreshing state... [id=sgrule-1062873908]
module.clickhouse.aws_security_group_rule.ingress_sg["sg-0b19db83dbe18cbf1-8123"]: Refreshing state... [id=sgrule-4177740833]
module.clickhouse.aws_security_group_rule.ingress_sg["sg-0b19db83dbe18cbf1-9000"]: Refreshing state... [id=sgrule-2466486093]
module.vpc.module.vpc.aws_route.private_nat_gateway[0]: Refreshing state... [id=r-rtb-0c18db918f54ad0331080289494]
module.eks.module.eks.aws_eks_cluster.this[0]: Refreshing state... [id=gr-prod]
module.eks.module.eks.aws_eks_access_entry.this["arn-aws-iam--211125506628-role-github-actions-terraform"]: Refreshing state... [id=gr-prod:arn:aws:iam::211125506628:role/github-actions-terraform]
module.eks.module.eks.data.tls_certificate.this[0]: Reading...
module.eks.module.eks.aws_eks_access_entry.this["arn-aws-iam--211125506628-user-admin-cli"]: Refreshing state... [id=gr-prod:arn:aws:iam::211125506628:user/admin-cli]
module.eks.module.eks.time_sleep.this[0]: Refreshing state... [id=2026-06-01T10:46:10Z]
module.eks.module.eks.data.tls_certificate.this[0]: Read complete after 0s [id=f97f646c2cd14cc0db0f757f0fccc96abbbe2af5]
module.eks.module.eks.aws_iam_openid_connect_provider.oidc_provider[0]: Refreshing state... [id=arn:aws:iam::211125506628:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/21512EE80634956C7C9D0B9647C70224]
module.eks.module.eks.aws_eks_access_policy_association.this["arn-aws-iam--211125506628-user-admin-cli_admin"]: Refreshing state... [id=gr-prod#arn:aws:iam::211125506628:user/admin-cli#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.module.eks.aws_eks_access_policy_association.this["arn-aws-iam--211125506628-role-github-actions-terraform_admin"]: Refreshing state... [id=gr-prod#arn:aws:iam::211125506628:role/github-actions-terraform#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.mqtt.aws_eip.this[0]: Refreshing state... [id=eipalloc-000796d397533c6d8]
module.argocd.helm_release.argocd: Refreshing state... [id=argocd]
cloudflare_dns_record.gr_mqtt: Refreshing state... [id=53badd7d1ab83280dc57c671ee486f90]
module.clickhouse.aws_volume_attachment.data: Refreshing state... [id=vai-843739595]
module.clickhouse.aws_eip.this[0]: Refreshing state... [id=eipalloc-0970fced638b7d8d9]
module.postgres.aws_volume_attachment.data: Refreshing state... [id=vai-3584522434]
module.postgres.aws_eip.this[0]: Refreshing state... [id=eipalloc-06d6c59b1e0a49482]
cloudflare_dns_record.gr_clickhouse: Refreshing state... [id=d6f3d3c09db5c56703a7b107d6ac3f29]
cloudflare_dns_record.gr_postgres: Refreshing state... [id=a69d68353c27a536e84d7448af48e3f0]
No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.
BK1031
approved these changes
Jun 7, 2026
Contributor
|
You'll probably need to manually make the edit on the ec2 |
This was referenced Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The hosted ClickHouse EC2 instance was provisioned with stock image defaults:
tracelog level,system.text_logenabled, and no TTL on the system log tables. On this boxtext_logalone grew ~155 GiB in ~2 days and filled the data volume, after which every INSERT failed withCode: 243 NOT_ENOUGH_SPACE— blocking the query service's writes (query_tokenonPOST /query/token,query_logon everyGET /query/signals). Reads were unaffected. Actual telemetry (signal/gr26_can/ping) was only ~2.2 GiB; ~99% of the disk was the database logging about itself.The running server has been fixed by hand, but that fix lived only in the container's writable layer —
config.dwas never bind-mounted (onlyusers.dand the data volume are), so it would silently revert on the next image bump (clickhouse_version) or instance replacement. This bakes the fix into provisioning so it can't recur on a rebuild.Changes
infra/modules/clickhouse-ec2/user-data.sh.tftpl, mirroring the existingusers.dpattern:/etc/clickhouse-server/config.d/quiet-logs.xml:<logger><level>information</level>— drops from the noisy stocktrace(the root cause of the volume).<text_log remove="1"/>— stop materializing the server log into a table entirely (the on-diskclickhouse-server.logstill exists).config.d(:ro) into the container so the override is applied.chown 101:101so the container's clickhouse uid can read it (no secrets here, so unlikeadmin.xmlit stays world-readable rather thanchmod 600).Out of scope / follow-ups
system.query_log7d,asynchronous_insert_log3d, withttl_only_drop_parts=1) are applied separately via SQL. They persist on the preserved EBS data volume, so they don't belong in user-data.Testing
Validated equivalent config on the live container:
config.d/quiet-logs.xmlmerged correctly (preprocessed config shows<level>information</level>and notext_logblock), the server restarted clean, andtext_logstopped being written.🤖 Generated with Claude Code