Hey,
Today I was looking at the internal struct that ends up being filled as the result of parsing the Docker Registry configuration, and doing that I found that in the master
branch of the repository there’s already support for metrics scraping by Prometheus (see configuration.go), something that used to be only available in OpenShift (see openshift/origin issue).
It surprised me that this addition came not super recently:
commit e3c37a46e2529305ad6f5648abd6ab68c777820a
Author: tifayuki <tifayuki@gmail.com>
Date: Thu Nov 16 16:43:38 2017 -0800
Add Prometheus Metrics
at the first iteration, only the following metrics are collected:
- HTTP metrics of each API endpoint
- cache counter for request/hit/miss
- histogram of storage actions, including:
GetContent, PutContent, Stat, List, Move, and Delete
Signed-off-by: tifayuki <tifayuki@gmail.com>
What about giving it a try?
So, first, start by building an image right from the master
branch:
# Clone the Docker registry repository
git clone https://github.com/docker/distribution
# Build the registry image using the provided Dockerfile
# that lives right in the root of the project.
#
# Here I tag it as `local` just to make sure that we
# don't confuse it with `registry:latest`.
docker build --tag registry:local .
With the image built using the latest code, tailor a configuration that enables the exporter:
version: 0.1
log:
level: "debug"
formatter: "json"
fields:
service: "registry"
storage:
cache:
blobdescriptor: "inmemory"
filesystem:
rootdirectory: "/var/lib/registry"
http:
addr: ":5000"
debug:
addr: ":5001"
prometheus:
enabled: true
path: "/metrics"
headers:
X-Content-Type-Options: [ "nosniff" ]
Run the registry with this configuration and note how :5001/metrics
will provide you with the metrics you expect.
# Make a request to the metrics endpoint and filter the
# output so we can see what the metrics descriptions are.
#
# ps.: each of those metrics end up expanding to multiple
# dimensions via labels.
curl localhost:5001/metrics --silent | ag registry_ | ag HELP
HELP registry_http_in_flight_requests The in-flight HTTP requests
HELP registry_http_request_duration_seconds The HTTP request latencies in seconds.
HELP registry_http_request_size_bytes The HTTP request sizes in bytes.
HELP registry_http_requests_total Total number of HTTP requests made.
HELP registry_http_response_size_bytes The HTTP response sizes in bytes.
HELP registry_storage_action_seconds The number of seconds that the storage action takes
HELP registry_storage_cache_total The number of cache request received
Wanting to see how useful these metrics can be, I set up an environment in AWS where there’s an EC2 instance with a registry instance that’s backed by an S3 bucket as the storage tier (you can check more about how to achieve it here: How to set up a private docker registry using AWS S3).
With the registry up, it was now a matter of having Prometheus and Grafana running locally so I could start making some queries:
version: '3.3'
services:
# Create a "pod-like" container that will serve as both the
# network entrypoint for both of the containers as well as
# provide a common ground for them to communicate over localhost
# (given that they'll share the same network namespace).
pod:
container_name: 'pod'
ports:
- '9090:9090'
- '3000:3000'
image: 'alpine'
tty: true
grafana:
container_name: 'grafana'
depends_on: [ 'pod' ]
network_mode: 'service:pod'
image: 'grafana/grafana:5.2.0-beta3'
restart: 'always'
prometheus:
container_name: 'prometheus'
depends_on: [ 'pod' ]
network_mode: 'service:pod'
image: 'prom/prometheus'
restart: 'always'
volumes:
- './prometheus.yml:/etc/prometheus/prometheus.yml'
Given that prometheus
is making use of a local configuration under ./prometheus.yml
, this one looked like the following:
global:
scrape_interval: '15s'
evaluation_interval: '15s'
scrape_timeout: '10s'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: [ 'localhost:9090' ]
- job_name: 'registry'
# Here I just specified the public IP address of the
# EC2 instance that I had holding the registry at that
# time.
#
# Naturally, you wouldn't keep it open like this.
static_configs:
- targets: [ '52.67.104.105:5001' ]
Now, to reproduce the panel showed in the beginning, head over to our grafana dashboard and create a heatmap panel with the following query:
rate(registry_http_request_duration_seconds_bucket{handler="blob_upload"}[10m])
That’s it!
If you have any questions or found something odd, please let me know! I’m cirowrc on Twitter.
Have a good one!
finis