How to collect metrics of container, pods, node and cluster in k8s?

Below is a detailed overview of how to collect metrics at different layers in a Kubernetes cluster—covering container-level, pod-level, node-level, and overall cluster-level metrics. We’ll focus on the most common, open-source approaches, although there are many commercial or cloud-specific variants that work similarly.

1. Collecting Container and Pod Metrics

cAdvisor (Container Advisor)

What It Is: A daemon that collects resource usage and performance characteristics of running containers.
Where It Runs: Typically embedded inside the Kubernetes kubelet process on each node.
Metrics Collected: CPU usage, memory usage, network I/O, filesystem I/O per container and pod.
Accessing Metrics: Exposed on the kubelet endpoint (e.g., http://<node-ip>:10255/metrics/cadvisor or https://<node-ip>:10250/metrics/cadvisor if secure).

Kubernetes uses cAdvisor under the hood, so you usually don’t install it separately—it’s already built into the kubelet. These cAdvisor metrics are then scraped by a metrics collector (e.g., Prometheus).

Prometheus Scraping

Prometheus is commonly used to scrape cAdvisor metrics.
How It Works:
1. You install the Prometheus Operator or a standalone Prometheus instance in the cluster.
2. You configure Prometheus to scrape the kubelet’s cAdvisor metrics endpoint (and other endpoints).
Collected Data: CPU, memory, disk, network usage for containers/pods.

Metrics Server (For HPA)

What It Is: A lightweight, cluster-wide aggregator of resource usage data.
Primary Use Case: Used by Kubernetes’ Horizontal Pod Autoscaler (HPA) to scale workloads based on CPU/memory usage.
Data Source: Fetches metrics from Kubelets/cAdvisor, then makes them available via the metrics.k8s.io API.
Limitations: Designed for autoscaling, not for long-term storage or advanced analytics.

2. Collecting Node Metrics

Kubelet’s /metrics Endpoint

What It Is: The kubelet itself exposes node metrics (e.g., CPU/memory usage of the node, runtime stats).
Where to Find:
- http://<node-ip>:10255/metrics (insecure endpoint)
- https://<node-ip>:10250/metrics (secure endpoint)
Collected Data: Node-wide CPU usage, memory usage, runtime container stats (via cAdvisor integration).

Node Exporter (Prometheus)

What It Is: A Prometheus exporter that collects Linux system-level metrics.
How to Deploy: Typically deployed as a DaemonSet so that every node runs a Node Exporter container.
Collected Data: CPU, memory, disk usage, file system stats, network, etc., at the node level.
Scraping: Prometheus scrapes the Node Exporter endpoints, adding those metrics to the time-series database.

3. Collecting Cluster Metrics & State

kube-state-metrics (KSM)

What It Is: A component that listens to the Kubernetes API and generates metrics about cluster objects.
Examples of Metrics:
- Number of desired/available replicas in Deployments, DaemonSets, StatefulSets
- Pod status, job status, node status
- Resource quotas, limits, requests
How to Deploy: Install via Helm chart or YAML manifest. Usually deployed as a single Deployment, which listens to the API server.
Scraping: Prometheus scrapes /metrics endpoint of kube-state-metrics to retrieve cluster-level metrics.

Control Plane Metrics (API Server, Scheduler, Controller Manager)

API Server: Exposes metrics on :6443/metrics (secure port).
Scheduler: Exposes metrics on a separate port (often :10251/metrics).
Controller Manager: Exposes metrics on another port (often :10252/metrics).
Scraping: Configure Prometheus to scrape these endpoints, often requiring RBAC and service discovery settings to allow secure scraping.

4. Putting It All Together with Prometheus

A common and recommended way to gather Kubernetes metrics at all levels (containers, pods, nodes, and cluster objects) is:

Prometheus Operator:
- Manages Prometheus, Alertmanager, and other CRDs (ServiceMonitor, PodMonitor).
- Automatically discovers Kubernetes services (including kubelet, cAdvisor, kube-state-metrics, Node Exporter) based on labels or annotations.
Components to Install:
- Prometheus (for scraping all metrics).
- Node Exporter (usually as a DaemonSet).
- kube-state-metrics (as a Deployment).
- (Optional) Metrics Server (for HPA functionality).
Scrape Configurations:
- ServiceMonitor and PodMonitor CRDs tell Prometheus which endpoints to scrape and on which ports.
- For example, a ServiceMonitor might point to kubelet pods’ 10250 port for cAdvisor data.
Storage & Retention:
- Prometheus has an internal time-series database.
- For longer-term storage or large-scale clusters, use Thanos, Cortex, or Mimir to extend Prometheus’ capabilities.

5. Visualization & Dashboards

Once you have Prometheus collecting container, pod, node, and cluster metrics, you can visualize them:

Grafana:
- Very common with Prometheus.
- Community dashboards for Kubernetes out of the box (includes cluster overview, node metrics, pod resource usage, etc.).
- Additional dashboards available for kube-state-metrics, cAdvisor, Node Exporter, etc.
Splunk Observability, Elastic Stack, Datadog, etc.:
- You can forward Prometheus data (via OpenTelemetry Collector or Prometheus Remote Write) to these platforms.
- Each platform typically provides dashboards and alerting for Kubernetes metrics.

6. Example Deployment Steps (Prometheus Stack)

Here’s a simplified example workflow using Helm:

Add Repo & Install Prometheus Stack: helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update # Install the kube-prometheus-stack (includes Prometheus, Alertmanager, Grafana, Node Exporter, kube-state-metrics, etc.) helm install my-prom-stack prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
Confirm Pods: kubectl get pods -n monitoring You should see pods like:
- Prometheus server
- Alertmanager
- Node Exporter (DaemonSet)
- kube-state-metrics
- Grafana
Access Grafana:
- By default, the Helm chart creates a Service for Grafana.
- You can port-forward to it and log in, or expose it through an Ingress.
kubectl port-forward svc/my-prom-stack-grafana 3000:80 -n monitoring Then open http://localhost:3000.
Dashboards:
- Grafana has built-in “Kubernetes / Compute Resources” dashboards when using the kube-prometheus-stack.
- You can also import community dashboards from Grafana.com.

7. Additional Best Practices

RBAC & Security:
- Secure access to kubelet metrics (/metrics/cadvisor).
- Use SSL/TLS if needed, along with appropriate certificates.
- Restrict who can query your metrics endpoints.
Limit Over-Collection:
- High-frequency scraping can lead to large data volumes and performance overhead.
- Consider adjusting scrape intervals or sampling strategies.
Resource Requests & Limits:
- Ensure the Prometheus server has enough CPU/memory to handle the ingestion load.
- Tune retention times, storage volume, and potential remote write solutions.
High Availability:
- Run multiple Prometheus replicas if you need HA.
- Tools like Thanos or Cortex can replicate data across multiple Prometheus instances.
Extend with Logs & Tracing:
- For a full observability stack, add log aggregation (e.g., Fluentd, Loki, Splunk) and distributed tracing (e.g., Jaeger, OpenTelemetry).
- This helps correlate metrics with logs and traces for faster root cause analysis.

In Summary

cAdvisor (built into kubelet) collects container-level CPU/memory metrics.
Node Exporter provides OS-level metrics from each node.
kube-state-metrics exposes cluster resource and object metrics.
Metrics Server is essential for the Horizontal Pod Autoscaler.
Prometheus (scraping) + Grafana (dashboards) is the most common open-source solution.

By installing these components (often packaged together with the Prometheus Operator or the kube-prometheus-stack Helm chart), you’ll have a comprehensive view of container, pod, node, and cluster-level metrics in Kubernetes, all accessible for visualization and alerting.

How to collect metrics of container, pods, node and cluster in k8s?

1. Collecting Container and Pod Metrics

cAdvisor (Container Advisor)

Prometheus Scraping

Metrics Server (For HPA)

2. Collecting Node Metrics

Kubelet’s /metrics Endpoint

Node Exporter (Prometheus)

3. Collecting Cluster Metrics & State

kube-state-metrics (KSM)

Control Plane Metrics (API Server, Scheduler, Controller Manager)

4. Putting It All Together with Prometheus

5. Visualization & Dashboards

6. Example Deployment Steps (Prometheus Stack)

7. Additional Best Practices

In Summary

Related

Leave a Reply Cancel reply

1. Collecting Container and Pod Metrics

cAdvisor (Container Advisor)

Prometheus Scraping

Metrics Server (For HPA)

2. Collecting Node Metrics

Kubelet’s /metrics Endpoint

Node Exporter (Prometheus)

3. Collecting Cluster Metrics & State

kube-state-metrics (KSM)

Control Plane Metrics (API Server, Scheduler, Controller Manager)

4. Putting It All Together with Prometheus

5. Visualization & Dashboards

6. Example Deployment Steps (Prometheus Stack)

7. Additional Best Practices

In Summary

Share this:

Related

Leave a Reply Cancel reply