Hey,

While people generally know (and agree) that cAdvisor is the guy in the room to keep track of container metrics, there’s a sort of hidden feature of the docker daemon that people don’t take into account: the daemon by itself can be monitored too - see Collect Docker Metrics with Prometheus.

Grafana Dashboard configured with Docker Daemon metrics

You indeed can determine whether the daemon is running by checking the systemd metrics via cAdvisor (if you’re running the Docker daemon as a systemd service), you can’t know much more than that.

Using the native Docker daemon metrics, you can get much more.

In this post we go through:

How to turn Docker daemon metrics on

To turn the feature on, change your daemon.json (usually at /etc/docker/daemon.json) file to allow experimental features and tell it an address and path to expose the metrics.

For instance:

{
        "expermental": true,
        "metrics-addr": "0.0.0.0:9100"
}

With that done, check out whether it’s indeed exporting the metrics:

curl localhost:9100/metrics

        # HELP builder_builds_failed_total Number of failed image builds
        # TYPE builder_builds_failed_total counter
        builder_builds_failed_total{reason="build_canceled"} 0
        builder_builds_failed_total{reason="build_target_not_reachable_error"} 0
        ...

With the endpoint working, you’re now able to have Prometheus scraping this endpoint.

Illustration of Prometheus scraping Docker targets

Gathering Docker daemon metrics on MacOS

If you’re a Docker-for-mac user, then there’s a little difference.

First, you’ll not have the /etc/docker/daemon.json file - this is all managed by the Docker mac app, which doesn’t prevent you from tweaking the daemon configurations though.

Head over to the preferences pannel:

Location of the Docker-for-mac preferences menu

Once there, head to the Daemon tab and switch to the advanced mode.

Advanced Docker Daemon configuration in Docker for Mac

With that configuration set, you’re now able to properly reach the docker-for-mac daemon metrics: curl localhost:9100 from your mac terminal.

Tailoring a Docker metrics Grafana Dashboard

Once our metric collection is set up, we can now start graphing data from our Prometheus instance.

To demonstrate which kind of metrics we can retrieve, I set up a sample repository (cirocosta/sample-collect-docker-metrics) with some preconfigured dashboards that you can use right away.

There, I configured Prometheus to look for the Docker daemon at host.docker.internal, a DNS name that in a Docker for Mac installation resolves to the IP of the Xhyve VM that Docker is running on (see Networking features in Docker for Mac).

global:
  scrape_interval: '5s'
  evaluation_interval: '5s'

scrape_configs:

  - job_name: 'docker'
    static_configs:
      - targets:
        - 'host.docker.internal:9100'
        labels:
          instance: 'instance-1'

With Prometheus now scraping my Docker for Mac daemon, we can jump to the Grafana Web UI and start creating some dashboards and graphs.

Docker Daemon CPU usage

The one I started with was CPU usage to answer the question “How much CPU usage is the Docker daemon process taking”.

Given that all Prometheus Golang clients report their process CPU usage, we’re able to graph that (just like any other process level metric that are also revealed by Prometheus clients).

From the Prometheus documentation, these are:

Metric Description
process_cpu_seconds_total Total user and system CPU time spent in seconds
process_open_fds Number of open file descriptors
process_max_fds Maximum number of open file descriptors
process_virtual_memory_bytes Virtual memory size in bytes
process_resident_memory_bytes Resident memory size in bytes
process_heap_bytes Process heap size in bytes
process_start_time_seconds Start time of the process since unix epoch in seconds

Knowing that, we can then graph the CPU usage of the dockerd process:

Grafana Dashboard displaying the CPU usage

What the query does is:

  1. it grabs the total CPU seconds spent by the Daemon process, then
  2. calculates the per-second instantaneous rate from the five-minutes vector of CPU seconds.

Docker container start times heatmap

A handy type of metric that we can graph from the daemon metrics is heatmaps. With them, we can spot unusual spikes in actions that the Docker daemon performs.

From the Grafana documentation:

The Heatmap panel allows you to view histograms over time

Since Grafana v5.1, we’ve been able to make use of Heatmaps when gathering metrics from Prometheus instances (see this commit).

Grafana Dashboard displaying container startup times in a Heatmap

All that the query, in this case, is doing is:

  1. gathering all the Prometheus histogram buckets’ values for the container action seconds metric labeled with the “start” action; then
  2. calculating the rate in a one-minute vector.

The second step is necessary to make the data non-commulative, such that we can see at which point the most significant start time happened.

Number of running Docker containers

Another interesting type of metric that we can gather is gauge-like metrics that tell us about how many containers are being supervised by the daemon and in what states they are.

This can indicate how loaded the machine is. Differently from cAdvisor, here we can determine how many containers are indeed running, are still in create state, or even in a pause state.

Grafana Dashboard configured to show the number of containers in each state

Closing thoughts

By having the Docker daemon exporting its own metrics, we’re able to enhance even more the observability of the system when used in conjunction with cAdvisor.

While cAdvisor gives us the in-depth look of how the containers consume the machine resources, the Daemon metrics tells us how the daemon itself is consuming the machine resources and how well it’s performing what it should do - manage containers.

There are certainly more interesting metrics that you can gather, but I hope these ones will give you an idea of what’s possible.

If you enjoyed the article, please share it! I’d appreciate a lot!

You can also get related content by signing up to the newsletter below.

I’m also available for questions and feedback - hit me at @cirowrc any time.

Have a good one!

Ciro