Inspecting Docker images without pulling them

Hey,

depending on what you’re trying to build it might happen that part of it involves inspecting a Docker image from a registry but you can’t afford to pull it.

It turns out that there’s an API that allows you to perform exactly that - be it DockerHub or a private registry.

The Docker Registry HTTP API is the protocol to facilitate distribution of images to the docker engine. It interacts with instances of the docker registry, which is a service to manage information about docker images and enable their distribution.

Testing it locally

The first step to test it locally is raising a registry from the library/registry image.

# Run a container named `registry` in the background.
# Map the internal port 5000 to the host on port 5000.
docker run \
        --detach \
        --name registry \
        --publish '5000:5000' \
        registry

Check that it’s definitely working:

# Fetch an image from an external registry.
# Because we're not explicitly specifying a
# registry address it's going to use dockerhub's 
# address. 
# For instance, we could use registry-1.docker.io/library/registry
# instead of just `registry` (library repository is also
# another default value).
docker pull busybox

# Tag the image we just fetched to now be named
# `localhost:5000/test`.
docker tag busybox localhost:5000/test

# Push the image. 
# Because we have explicitly specified a registry
# in the name of the image, it'll be pushed to the
# registry at the given address (localhost:5000)
docker push localhost:5000/test

The push refers to repository [localhost:5000/busybox]
0271b8eebde3: Pushed 
latest: digest: sha256:91ef6c1c52b166be02645b8efee30d1ee65362024f7da41c404681561734c465 size: 527

Having the local registry working, we can move to the script that inspects images right from the registry metadata. The following script contains all that it takes to retrieve it and relies only on two dependencies: bash and jq.

#!/bin/bash

set -o errexit

# Address of the registry that we'll be 
# performing the inspections against.
# This is necessary as the arguments we
# supply to the API calls don't include 
# such address (the address is used in the
# url itself).
readonly REGISTRY_ADDRESS="${REGISTRY_ADDRESS:-localhost:5000}"


# Entry point of the script.
# If makes sure that the user supplied the right
# amount of arguments (image_name and image_tag)
# and then performs the main workflow:
#       1.      retrieve the image digest
#       2.      retrieve the configuration for
#               that digest.
main() {
  check_args "$@"

  local image=$1
  local tag=$2
  local digest=$(get_digest $image $tag)

  get_image_configuration $image $digest
}


# Makes sure that we provided (from the cli) 
# enough arguments.
check_args() {
  if (($# != 2)); then
    echo "Error:
    Two arguments must be provided - $# provided.

    Usage:
      ./get-image-config.sh <image> <tag>

Aborting."
    exit 1
  fi
}


# Retrieves the digest of a specific image tag,
# that is, the address of the uppermost of a specific 
# tag of an image (see more at 
# https://docs.docker.com/registry/spec/api/#content-digests).
# 
# You can know more about the endpoint used at
# https://docs.docker.com/registry/spec/api/#pulling-an-image-manifest
get_digest() {
  local image=$1
  local tag=$2

  echo "Retrieving image digest.
    IMAGE:  $image
    TAG:    $tag
  " >&2

  curl \
    --silent \
    --header "Accept: application/vnd.docker.distribution.manifest.v2+json" \
    "http://$REGISTRY_ADDRESS/v2/$image/manifests/$tag" |
    jq -r '.config.digest'
}


# Retrieves the image configuration from a given
# digest.
# See more about the endpoint at:
# https://docs.docker.com/registry/spec/api/#pulling-a-layer
get_image_configuration() {
  local image=$1
  local digest=$2

  echo "Retrieving Image Configuration.
    IMAGE:  $image
    DIGEST: $digest
  " >&2

  curl \
    --silent \
    --location \
    "http://$REGISTRY_ADDRESS/v2/$image/blobs/$digest" |
    jq -r '.container_config'
}


# Run the entry point with the CLI arguments
# as a list of words as supplied.
main "$@"

ps.: It’s important to note that the API calls need to specify the type of content that it accepts (application/vnd.docker.distribution.manifest.v2+json).

Let’s now check if it’s working for real:

chmod +x ./get-image-config.sh
./get-image-config.sh test latest

{
  "Hostname": "3fbce8bb8947",
  "Domainname": "",
...
  "Env": [
    "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
  ],
  "Cmd": [
    "/bin/sh",
    "-c",
    "#(nop) ",
...
  "OnBuild": null,
  "Labels": {}
}

Cool, it indeed works!

While that does the job for registries that require no authentication, it’s not suitable for DockerHub. When checking a public image there, we need to perform an extra step before getting the digest of an image - retrieve a token. With such token in hand, we can then inspect either public or private images.

Going back to scripting, let’s create a different one to deal with this case - call it get-public-image-config.sh (this is for brevity sake, using some other programming language you could place some conditionals and detect each case).

The additional code can be placed in a method called get_token which only takes image as an argument:

# Retrieves a token that grants access to the
# registry.docker.io (dockerhub registry) access
# to pull a specific image.
#
# note.:        the token that we retrieve is valid only
#               for that image.
# note.:        we get the token from `auth.docker.io` and
#               not `registry.docker.io`.
# usage.:       get_token "library/nginx"
#
get_token() {
  local image=$1

  echo "Retrieving Docker Hub token.
    IMAGE: $image
  " >&2

  curl \
    --silent \
    "https://auth.docker.io/token?scope=repository:$image:pull&service=registry.docker.io" \
    | jq -r '.token'
}

With the token in hands it’s just a matter of making use of it on the other calls.

If we were targetting private images we’d modify get_token a little bit: on the call to get the token from auth.docker.io we’d need to do it with a DockerHub username and password pair (without authentication we can only have access to public images). To do so, specify in that call an authorization header (--user flag in curl):

# Retrieves a token that grants access to
# a private image named `image` on registry.docker.io.
# note.:        the user identified by `DOCKER_USERNAME`
#               and `DOCKER_PASSWORD` must have access
#               to the image.
get_token() {
  local image=$1

  echo "Retrieving Docker Hub token.
    IMAGE: $image
  " >&2

  curl \
    --silent \ 
    --u "$DOCKER_USERNAME:$DOCKER_PASSWORD" \
    "https://auth.docker.io/token?scope=repository:$image:pull&service=registry.docker.io" \
    | jq -r '.token'
}

With the token in hards we could now retrieve the digest of a given image and tag (pay attention to the extra Authorization: Bearer $token header that we added):

# Retrieve the digest, now specifying in the header
# the token we just received.
# note.:        $token corresponds to the token
#               that we received from the `get_token`
#               call.
# note.:        $image must be the full image name without
#               the registry part, e.g.: `nginx` should
#               be named `library/nginx`.
get_digest() {
  local image=$1
  local tag=$2
  local token=$3

  echo "Retrieving image digest.
    IMAGE:  $image
    TAG:    $tag
    TOKEN:  $token
  " >&2

  curl \
    --silent \
    --header "Accept: application/vnd.docker.distribution.manifest.v2+json" \
    --header "Authorization: Bearer $token" \
    "https://registry-1.docker.io/v2/$image/manifests/$tag" \
    | jq -r '.config.digest'
}

With that we can have a full script that retrieves public images from Dockerhub (check how in main we first retrieve a token, then we pass that token to the following methods):

#!/bin/bash

set -o errexit

main() {
  check_args "$@"

  local image=$1
  local tag=$2
  local token=$(get_token $image)
  local digest=$(get_digest $image $tag $token)

  get_image_configuration $image $token $digest
}

get_image_configuration() {
  local image=$1
  local token=$2
  local digest=$3

  echo "Retrieving Image Configuration.
    IMAGE:  $image
    TOKEN:  $token
    DIGEST: $digest
  " >&2

  curl \
    --silent \
    --location \
    --header "Authorization: Bearer $token" \
    "https://registry-1.docker.io/v2/$image/blobs/$digest" \
    | jq -r '.container_config'
}

get_token() {
  local image=$1

  echo "Retrieving Docker Hub token.
    IMAGE: $image
  " >&2

  curl \
    --silent \
    "https://auth.docker.io/token?scope=repository:$image:pull&service=registry.docker.io" \
    | jq -r '.token'
}

# Retrieve the digest, now specifying in the header
# that we have a token (so we can pe...
get_digest() {
  local image=$1
  local tag=$2
  local token=$3

  echo "Retrieving image digest.
    IMAGE:  $image
    TAG:    $tag
    TOKEN:  $token
  " >&2

  curl \
    --silent \
    --header "Accept: application/vnd.docker.distribution.manifest.v2+json" \
    --header "Authorization: Bearer $token" \
    "https://registry-1.docker.io/v2/$image/manifests/$tag" \
    | jq -r '.config.digest'
}

check_args() {
  if (($# != 2)); then
    echo "Error:
    Two arguments must be provided - $# provided.
  
    Usage:
      ./get-image-config.sh <image> <tag>
      
Aborting."
    exit 1
  fi
}

main "$@"

Note.: again, you must add the full name of the image (official images use the library repository so nginx should be referred as library/nginx).

To make sure that it works, run it against an image like nginx:

# because `nginx` comes from the set of
# official images we must prepend `library/`.
./get-public-image-config.sh library/nginx latest

Retrieving Docker Hub token.
    IMAGE: library/nginx
  
Retrieving image digest.
    IMAGE:  library/nginx
    TAG:    latest
    TOKEN:  eyJhbGciVFER...ZsNw
  
Retrieving Image Configuration.
    IMAGE:  library/nginx
    TOKEN:  eyJhb...ZsNw
    DIGEST: sha256:9e7424e5dbaeb9b28fea44d8c75b41ac6104989b49b2464b7cbbed16ceeccfc3
  
{
  "Hostname": "22e255475f18",
  "Domainname": "",
  ...
  "Labels": {
    "maintainer": "NGINX Docker Maintainers <docker-maint@nginx.com>"
  },
  "StopSignal": "SIGTERM"
}

Extra - old images

If you try to retrieve an image that is not very new (say, that has some 2 years) you’ll notice that the script I posted above might not work.

The reason for that is that images that have been pushed to the docker registry a long time ago won’t use the second version of the V2 manifest. However, they still present the image configuration even though in a regular string.

Bellow is a script that deals with that case:

#!/bin/bash

set -o errexit

main() {
  check_args "$@"

  local image=$1
  local tag=$2
  local token=$(get_token $image)
  local old_config=$(get_old_config $image $tag $token)

  get_image_configuration "$old_config"

}

get_image_configuration () {
  local old_config=$1

  echo "$old_config" | jq -r '.history[0].v1Compatibility' | jq '.container_config'
}

get_token() {
  local image=$1

  echo "Retrieving Docker Hub token.
    IMAGE: $image
  " >&2

  curl \
    --silent \
    "https://auth.docker.io/token?scope=repository:$image:pull&service=registry.docker.io" \
    | jq -r '.token'
}

get_old_config() {
  local image=$1
  local tag=$2
  local token=$3

  echo "Retrieving image digest.
    IMAGE:  $image
    TAG:    $tag
    TOKEN:  $token
  " >&2

  curl \
    --silent \
    --header "Accept: application/vnd.docker.distribution.manifest.v2+json" \
    --header "Authorization: Bearer $token" \
    "https://registry-1.docker.io/v2/$image/manifests/$tag" \
    | jq -r '.'
}

check_args() {
  if (($# != 2)); then
    echo "Error:
    Two arguments must be provided - $# provided.
  
    Usage:
      ./get-image-config.sh <image> <tag>
      
Aborting."
    exit 1
  fi
}

main "$@"

If you’re looking for the difference, look at main. Essentially we abandon the idea of retrieving a digest and simply pick the “old config”. From the old config, we look at the first blob in the list which represents the uppermost layer - the layer that contains all the info altogether. From there we parse that plain-text JSON and then get the config.

Update

From the feedbacks the article received, two alternatives (that you can use right now) appeared constantly:

github.com/GoogleCloudPlatform/container-diff - Diff your Docker containers - it looks pretty interesting but even though you can specify remote:// to an image when analyzing it, it looks like it always pulls the entire image. Maybe I got something wrong?
github.com/projectatomic/skopeo - “Work with remote images registries - retrieving information, images, signing content” - well, does exactly what it says! For the scope of retrieving image / repositories configuration before pulling them, totally worth it.

By the way, if you wan’t to try Skopeo, building it from source is very easy on MacOS:

# install gpgme
brew install gpgme

# clone the repository to the golang-specific directory
# (make sure you have Go installed and GOPATH set 
# before).
git clone \
        https://github.com/projectatomic/skopeo \
        $GOPATH/src/github.com/projectatomic/skopeo

# get into the repository you just clones
cd $GOPATH/src/github.com/projectatomic/skopeo 

# run the binary-local target 
# (this is a regular `go build` with some
# tags and flags set.
make binary-local

# now you should have `skopeo` in the repository
# directory
./skopeo --help

NAME:
   skopeo - Various operations with container images and container image registries

USAGE:
   skopeo [global options] command [command options] [arguments...]
   
VERSION:
   0.1.28-dev commit: 78b29a5c2f05b4026876728e7651bad31193216c

Now inspecting an image or a repository from Dockerhub is one command away:

./skopeo \
        --override-os=linux \
        inspect docker://docker.io/fedora
{
    "Name": "docker.io/library/fedora",
    ...
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:a8ee583972c2295bb76704d4defe5116d5e4dd7ba3767aaa2cc8fcf71088ee06"
    ]
}

Note that here I’m specifying the --override-os flag to the command. The reason is that otherwise it’ll try to inspect the image or repository filtering by digests marked with OS=darwin. If you’re using Linux you’d not need to use that flag.

Closing thoughts

Interacting with DockerHub or a private registry is not all that hard, it’s just not very documented. Having these scripts it becomes pretty easy to get it working on any language you want - just add some checks, parse the image names and you should be good to go.

Here are the resources mentioned in the article:

Please let me know if I got anything wrong and/or if there’s an easier way to do it.

Have a good one!

finis