Skip to content

HDDS-15312. Improve VolumeInfoMetrics to include MinFreeSpace and Non-Ozone used space#10304

Open
priyeshkaratha wants to merge 3 commits into
apache:masterfrom
priyeshkaratha:HDDS-15312
Open

HDDS-15312. Improve VolumeInfoMetrics to include MinFreeSpace and Non-Ozone used space#10304
priyeshkaratha wants to merge 3 commits into
apache:masterfrom
priyeshkaratha:HDDS-15312

Conversation

@priyeshkaratha
Copy link
Copy Markdown
Contributor

@priyeshkaratha priyeshkaratha commented May 19, 2026

What changes were proposed in this pull request?

Min Free Space and Non-Ozone Used Space metrics are not available in JMX. Also, Total Capacity and Filesystem Capacity calculated value is same, so there is no need to expose both fields separately since it always shares same value. Therefore, we are removing the Total Capacity field and added Min Free Space and Non-Ozone Used Space metrics.

What is the link to the Apache JIRA

HDDS-15312

How was this patch tested?

Updated unit testcase.
Also tested manually

bash-5.1$ curl -XGET http://ozone-datanode-1:9882/jmx?qry=Hadoop:service=HddsDatanode,name=VolumeInfoMetrics-/data/hdds
{
  "beans" : [ {
    "name" : "Hadoop:service=HddsDatanode,name=VolumeInfoMetrics-/data/hdds",
    "modelerType" : "VolumeInfoMetrics-/data/hdds",
    "tag.Context" : "ozone",
    "tag.StorageType" : "DISK",
    "tag.DatanodeUuid" : "96b844d8-283e-449b-a6dc-26dd1daf2569",
    "tag.VolumeType" : "DATA_VOLUME",
    "tag.StorageDirectory" : "/data/hdds/hdds",
    "tag.VolumeState" : "NORMAL",
    "tag.Hostname" : "3552d2e7b2dd",
    "AvailableSpaceInsufficient" : 0,
    "DbCompactLatencyNumOps" : 0,
    "DbCompactLatencyAvgTime" : 0.0,
    "NumContainerCreateRequestsInSoftBandMinFreeSpace" : 0,
    "NumContainerCreateRequestsRejectedHardMinFreeSpace" : 0,
    "NumScans" : 1,
    "NumScansSkipped" : 0,
    "NumWriteRequestsInSoftBandMinFreeSpace" : 0,
    "NumWriteRequestsRejectedHardMinFreeSpace" : 0,
    "ReservedCrossesLimit" : 1,
    "LayoutVersion" : 1,
    "Containers" : 0,
    "Committed" : 0,
    "OzoneCapacity" : 105076655700,
    "OzoneAvailable" : 88517070848,
    "OzoneUsed" : 4382720,
    "Reserved" : 10508716,
    "FilesystemCapacity" : 105087164416,
    "FilesystemAvailable" : 88517070848,
    "FilesystemUsed" : 16570093568,
    "MinFreeSpace" : 104857600,
    "NonOzoneUsed" : 16565710848
  } ]
}bash-5.1$ 

@priyeshkaratha priyeshkaratha marked this pull request as ready for review May 19, 2026 05:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates DataNode per-volume JMX metrics (VolumeInfoMetrics) to better reflect filesystem usage by (1) exposing the min-free-space threshold and (2) exposing non-Ozone-used space, while also removing a redundant total-capacity metric.

Changes:

  • Add new JMX gauges: MinFreeSpace (soft limit reported to SCM) and NonOzoneUsed (filesystem usage not attributable to Ozone/HDDS).
  • Change FilesystemUsed to be computed as FilesystemCapacity - FilesystemAvailable (instead of reporting the DU/HDDS-used value).
  • Remove the TotalCapacity gauge (previously duplicative with FilesystemCapacity).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/VolumeInfoMetrics.java Adds MinFreeSpace / NonOzoneUsed, changes filesystem-used calculation, removes redundant TotalCapacity metric.
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/volume/TestVolumeInfoMetrics.java Updates the unit test assertions/mocks to cover the new/changed metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sarvekshayr sarvekshayr self-requested a review May 27, 2026 08:04
@devmadhuu devmadhuu self-requested a review May 27, 2026 12:16
@devmadhuu
Copy link
Copy Markdown
Contributor

Thanks @priyeshkaratha for the patch. Some observations:

  1. dn-overview.html and dn.js still reference TotalCapacity:
  • Remove the Total Capacity column (or replace with Min Free Space).

  • Add columns for MinFreeSpace and NonOzoneUsed with transform() in dn.js.

  1. And as mentioned in PR description: "Total Capacity and Filesystem Capacity represent the same value" - But I think they are not same value, they are mathematically equal, but they meant different things (OzoneCapacity + Reserved vs raw FS capacity). Can you recheck this or they represent and referring same underlying value.

  2. If we are removing the TotalCapacity , then did you verify if any grafana dashboards/alerts scrapes that jmx metric name anywhere ? Just make sure that it doesn't cause regression.

  3. JMX exposes soft MinFreeSpace (getReportedFreeSpaceToSpare). Write path rejection uses hard getFreeSpaceToSpare(). Admin or developers debugging numWriteRequestsRejectedHardMinFreeSpace may want both. You want want to do in this or next PR, your call.

Copy link
Copy Markdown
Contributor

@yandrey321 yandrey321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these metrics available from /prom endpoint as well?

@priyeshkaratha
Copy link
Copy Markdown
Contributor Author

priyeshkaratha commented Jun 1, 2026

@devmadhuu Thanks for the review. I have addressed all the three points in latest changes

  1. If we are removing the TotalCapacity , then did you verify if any grafana dashboards/alerts scrapes that jmx metric name anywhere ? Just make sure that it doesn't cause regression.

Yes, verified this.

  1. JMX exposes soft MinFreeSpace (getReportedFreeSpaceToSpare). Write path rejection uses hard getFreeSpaceToSpare(). Admin or developers debugging numWriteRequestsRejectedHardMinFreeSpace may want both. You want want to do in this or next PR, your call.

I will handle this in followup Jira

Are these metrics available from /prom endpoint as well?

@yandrey321 Yes its reflecting prom as well.

@priyeshkaratha priyeshkaratha requested a review from yandrey321 June 1, 2026 08:16
Copy link
Copy Markdown
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @priyeshkaratha for improving the patch. Changes LGTM. Just a minor nit. Please take care, however I have approved the PR.

when(volume.getCommittedBytes()).thenReturn(10L);
when(volume.getContainers()).thenReturn(3L);
when(volume.getReportedFreeSpaceToSpare(anyLong())).thenReturn(20L);
when(volume.getFreeSpaceToSpare(anyLong())).thenReturn(10L);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stub may not be necessary, The code only calls getReportedFreeSpaceToSpare (the soft limit). Nothing reads getFreeSpaceToSpare

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants