How Prometheus scrap works, and how to find the target node and get the metrics files

Below is an overview of how Prometheus scraping works, how Prometheus discovers (“finds”) targets in Kubernetes or other environments, and how it retrieves metrics from those targets.

1. Prometheus Scraping Fundamentals

Pull-Based Model
- Prometheus uses a pull model: it periodically sends HTTP requests (scrapes) to endpoints (targets) that expose metrics in a plaintext or OpenMetrics format.
- By default, metrics are served at a path like http://<host>:<port>/metrics.
Prometheus Configuration (prometheus.yml)
- Prometheus’ behavior is controlled by a YAML config file (often named prometheus.yml).
- This config includes one or more scrape_configs sections. Each scrape_config defines how Prometheus discovers targets and where it scrapes them from.

Example snippet of a scrape_config:

scrape_configs:
  - job_name: 'example-service'
    kubernetes_sd_configs:
      - role: service
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        regex: (.+):(?:\d+);(\d+)
        replacement: $1:$2
        target_label: __address__

job_name: The name of the scrape job.
kubernetes_sd_configs: Uses Kubernetes service discovery to find services.
relabel_configs: Filters or transforms discovered targets to the correct address/port/path for scraping.

Scrape Interval
- Each job has a scrape_interval (default: 15 seconds). Prometheus queries each discovered target at that interval.
No “Metrics Files”
- Prometheus doesn’t fetch “metrics files” in the sense of logs on disk. It sends HTTP GET requests to the target’s /metrics (or another path) endpoint, which returns the metrics in text format (the Prometheus exposition format).

2. How Prometheus Finds Targets

A. Static Configuration (Non-Kubernetes)

For basic setups (e.g., dev or PoC), you can hardcode targets:

scrape_configs:
  - job_name: 'static_example'
    static_configs:
      - targets: ['192.168.1.10:9100', '192.168.1.11:9100']

Prometheus will scrape each of those targets on the specified port and path.

B. Service Discovery (Kubernetes, EC2, Consul, etc.)

Kubernetes Service Discovery
- In a Kubernetes cluster, Prometheus can use the Kubernetes API to dynamically discover pods/services/endpoints.
- Common approaches:
  - role: service: Discover services.
  - role: pod: Discover pods directly.
  - Service Monitors / Pod Monitors if using the Prometheus Operator.
Annotations in Kubernetes
- A common pattern is to annotate Services or Pods:
  - prometheus.io/scrape: "true"
  - prometheus.io/port: "8080"
  - prometheus.io/path: "/metrics"
- Prometheus’ relabel_configs can filter in only those targets that have prometheus.io/scrape set to "true".
Other Service Discovery
- Prometheus also supports EC2, Azure, GCE, Consul, etc.
- Each discovery mechanism has its own config block (ec2_sd_configs:, consul_sd_configs:, etc.).

3. Scraping Flow in Kubernetes

Prometheus Queries the K8s API
- Using the credentials provided (often via in-cluster config if you run Prometheus inside K8s), Prometheus queries the Kubernetes API to list Pods, Services, or Endpoints.
Relabeling
- The discovered targets have metadata (like labels, annotations).
- Via relabel_configs, Prometheus transforms or filters this metadata to determine the final scrape endpoint (i.e., IP:port and path).
HTTP GET to /metrics
- On each scrape interval, Prometheus sends an HTTP GET request to each valid target.
- The target returns a plaintext metrics payload (like node_cpu_seconds_total{cpu="0"} 1000) for each metric.
Prometheus Ingests & Stores
- Prometheus parses the returned data and stores the time series in its internal TSDB (time-series database).

4. Verifying Which Targets Are Scraped

Prometheus Web UI
- Access the Prometheus web UI (e.g., http://<prometheus-host>:9090).
- Go to Status -> Targets.
- You’ll see a list of all targets Prometheus is currently scraping, their job name, last scrape time, and scrape status.
Debugging Discovery
- In the web UI, go to Status -> Service Discovery.
- This shows you the raw data returned by the service discovery mechanism (like the Kubernetes API) before relabeling.
- You can see which pods/services are being discovered and how they are labeled.

5. How to “Find the Target Node and Get the Metrics”

Kubernetes (Node)
- If you want node metrics, you often run Node Exporter as a DaemonSet.
  - This exporter runs on each node (so each node is a target).
  - The node exporter typically exposes metrics on port 9100 at the /metrics path.
- Alternatively, you can scrape kubelet’s cAdvisor endpoint to get container-level metrics.
Pods and Services
- If your microservice is instrumented with a Prometheus client library and you expose /metrics, Prometheus can discover and scrape that endpoint.
- The underlying node is “found” automatically via the K8s service discovery logic (the node IP or Pod IP).
- You can see in the Prometheus “Targets” page exactly which IP and port it’s scraping.
Raw “Metrics File”
- Technically, you can fetch the raw metrics text from any target by doing curl http://<target-ip>:<port>/metrics.
- This is not stored as a file on the node by default. It’s generated dynamically when you hit the /metrics endpoint.

6. Example (Kubernetes Service Annotation)

Let’s say you have a Service manifest like:

apiVersion: v1
kind: Service
metadata:
  name: my-app
  namespace: default
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  selector:
    app: my-app
  ports:
    - port: 8080
      targetPort: 8080

Prometheus sees the annotation prometheus.io/scrape: "true"
It scrapes each pod behind that Service on port 8080 at the /metrics path.
You can verify this in the Prometheus web UI under Status -> Targets (look for my-app).

7. Key Takeaways

Prometheus “Scrape Configs” define how targets are found and how often they are scraped.
Kubernetes Service Discovery (and optional annotations) automates target discovery in a cluster.
Prometheus Doesn’t Pull Metrics Files from the file system; it makes HTTP GET requests to the /metrics endpoint each scrape interval.
Check the Prometheus UI under Status -> Targets or Status -> Service Discovery to see what endpoints are being scraped and how they’re labeled.

By setting up your scrape_configs properly in prometheus.yml (or by using ServiceMonitor/PodMonitor objects with the Prometheus Operator), Prometheus will automatically find the node, pod, and service endpoints in Kubernetes and scrape the metrics they expose.