Docker Registry Blob upload latency Heatmap


Today I was looking at the internal struct that ends up being filled as the result of parsing the Docker Registry configuration, and doing that I found that in the master branch of the repository there’s already support for metrics scraping by Prometheus (see configuration.go), something that used to be only available in OpenShift (see openshift/origin issue).

It surprised me that this addition came not super recently:

commit e3c37a46e2529305ad6f5648abd6ab68c777820a
Author: tifayuki <>
Date:   Thu Nov 16 16:43:38 2017 -0800

    Add Prometheus Metrics
    at the first iteration, only the following metrics are collected:
      - HTTP metrics of each API endpoint
      - cache counter for request/hit/miss
      - histogram of storage actions, including:
        GetContent, PutContent, Stat, List, Move, and Delete
    Signed-off-by: tifayuki <>

What about giving it a try?

So, first, start by building an image right from the master branch:

# Clone the Docker registry repository
git clone

# Build the registry image using the provided Dockerfile
# that lives right in the root of the project.
# Here I tag it as `local` just to make sure that we
# don't confuse it with `registry:latest`. 
docker build --tag registry:local .

With the image built using the latest code, tailor a configuration that enables the exporter:

version: 0.1
  level: "debug"
  formatter: "json"
    service: "registry"
    blobdescriptor: "inmemory"
    rootdirectory: "/var/lib/registry"
  addr: ":5000"
    addr: ":5001"
      enabled: true
      path: "/metrics"
    X-Content-Type-Options: [ "nosniff" ]

Run the registry with this configuration and note how :5001/metrics will provide you with the metrics you expect.

# Make a request to the metrics endpoint and filter the
# output so we can see what the metrics descriptions are.
# ps.: each of those metrics end up expanding to multiple
# dimensions via labels.
curl localhost:5001/metrics --silent | ag registry_ | ag HELP

HELP registry_http_in_flight_requests The in-flight HTTP requests
HELP registry_http_request_duration_seconds The HTTP request latencies in seconds.
HELP registry_http_request_size_bytes The HTTP request sizes in bytes.
HELP registry_http_requests_total Total number of HTTP requests made.
HELP registry_http_response_size_bytes The HTTP response sizes in bytes.
HELP registry_storage_action_seconds The number of seconds that the storage action takes
HELP registry_storage_cache_total The number of cache request received

Wanting to see how useful these metrics can be, I set up an environment in AWS where there’s an EC2 instance with a registry instance that’s backed by an S3 bucket as the storage tier (you can check more about how to achieve it here: How to set up a private docker registry using AWS S3).

With the registry up, it was now a matter of having Prometheus and Grafana running locally so I could start making some queries:

version: '3.3'

  # Create a "pod-like" container that will serve as both the
  # network entrypoint for both of the containers as well as
  # provide a common ground for them to communicate over localhost
  # (given that they'll share the same network namespace).
    container_name: 'pod'
      - '9090:9090'
      - '3000:3000'
    image: 'alpine'
    tty: true

    container_name: 'grafana'
    depends_on: [ 'pod' ]
    network_mode: 'service:pod'
    image: 'grafana/grafana:5.2.0-beta3'
    restart: 'always'

    container_name: 'prometheus'
    depends_on: [ 'pod' ]
    network_mode: 'service:pod'
    image: 'prom/prometheus'
    restart: 'always'
      - './prometheus.yml:/etc/prometheus/prometheus.yml'

Given that prometheus is making use of a local configuration under ./prometheus.yml, this one looked like the following:

  scrape_interval: '15s'
  evaluation_interval: '15s'
  scrape_timeout: '10s'

  - job_name: 'prometheus'
      - targets: [ 'localhost:9090' ]

  - job_name: 'registry'
    # Here I just specified the public IP address of the
    # EC2 instance that I had holding the registry at that
    # time. 
    # Naturally, you wouldn't keep it open like this.
      - targets: [ '' ]

Now, to reproduce the panel showed in the beginning, head over to our grafana dashboard and create a heatmap panel with the following query:


That’s it!

If you have any questions or found something odd, please let me know! I’m cirowrc on Twitter.

Have a good one!