Hey,

I’ve been trying to improve the Hugo template that I created for this blog, and one of the things I wanted to do was inject the contents of an index.css file into a index.amp.html layout file - Accelerated Mobile Pages (AMP) requires us to have our custom styles inlined in the HTML (see Custom Amp Styles).

Illustration of the AWK templating

Here I go through a way of achieving that - using awk - and a way of not achieving that - using sed (naively).

If you ever need to template a file with contents of another, this gist is for you.

ps.: yes, with Hugo you could use partials, but let’s say you want to just have some sort of poor man’s templating engine.

If you want to skip the introduction, jump to the awk section.

Getting started with the naive way - sed without escapeing

Knowing that sed is all about performing substitions, it was the first tool I tried.

Unfortunatelly, it didn’t work as I expected.

# Create the file that contains the contents that we want
# to insert into another file at a specific location.
echo 'xxxxxxx
yyyyyyy
zzzzzzz' > ./content.txt

# Create the template that we want to have the variable
# substitution.
echo '11111111
22222222
__REPLACE_ME__
33333333' > ./template.txt


# Perform the substituion of `__REPLACE_ME__` by a single
# string just to test the `sed` syntax.
sed 's/__REPLACE_ME__/SOMETHING_NEW/g' ./template.txt
11111111
22222222
SOMETHING_NEW
33333333


# Well, if it worked for that simple case, why not just 
# replace `SOMETHING_NEW' by all the contents of the
# file? We can get that via `cat` anyway:
CONTENT=$(cat ./content.txt)
sed "s/__REPLACE_ME__/$CONTENT/g" ./template.txt

sed: 1: "s/__REPLACE_ME__/xxxxxxx
 ...": unescaped newline inside substitute pattern

As we can see, that doesn’t work.

The main reason is that the command was not properly formatted - there are newlines not being escaped when we perform the substitution.

That happens because bash is performing a simple text substitution when we specify $CONTENT.

To see that working, make use of the xtrace option invoking bash directly:

# By invoking bash with the `-x` flag, we enable the 
# `xtrace` option (which is usually added to scripts
# when debugging by specifying `set -o xtrace` or
# `set -x`).
#
# The options has the effect of expanding every command
# and variables that are supplied to it.
#
# This way, we're able to see what bash used as the full
# command after substituting the $CONTENT variable.
/bin/bash -x -c \
        "sed \"s/__REPLACE_ME__/$CONTENT/g\" ./template.txt"

+ sed 's/__REPLACE_ME__/xxxxxxx
yyyyyyy
zzzzzzz/g' ./template.txt
sed: 1: "s/__REPLACE_ME__/xxxxxx ...": 
unescaped newline inside substitute pattern

Knowing that, we can either:

  1. escape the newlines before supplying $CONTENT to sed; or
  2. make use of something that has greater flexibility (like awk).

With 1, it means that we’d need to get into regular expressions and some fancy sed syntax (see this stackoverflow answer).

With 2, we can create a little program that we can be very verbose about what’s doing and make the whole process pretty explicity. I’ll go with this one.

Moving on - using AWK to template the file

AWK (just like sed) is a pretty ubiquotous tool in Unix systems, meaning that you can find it everywhere.

It allows you to create data-driven programs where given a set of rules, actions are performed whenever a match happens.

By taking data from files (either stdin or regular files supplied via arguments), awk contiguously checks whether your matchers match and then it so, proceeds with the action specified.

Illustration of the AWK processing pipeline

“The awk utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs.”

“The basic function of awk is to search files for lines (or other units of text) that contain certain patterns.”

— The GNU Awk User’s Guide

Given that, we can start modelling a solution for our problem in terms of matches:

  1. we definitely want to match __REPLACE_ME__ at a given point; and
  2. we certainly want to match lines those lines that are not meant to be replaced and just printed to stdout.

We can also model it in terms of actions:

  1. we want to replace the __REPLACE_ME__ tag; and
  2. we want to just print to the standard output what doesn’t match the __REPLACE_ME__ tag.

Now, it’s a matter of adhering to the awk specifics and implementing it.

note.: because awk gives us a fully featured programming language, here I include parameter validation and the ability for the user to specify the pattern that (s)he wants to replace by using an environment variable (PATTERN).

#!/usr/bin/awk -f

# templater - takes a file and replaces a variable
# in a given template with the contents of another file.

# err() - Prints a supplied `text` to standard error.
# @text: Text to be printed to stderr.
function err (text) {
        print text > "/dev/stderr"
}


# The BEGIN matcher is a special type of matcher that
# gets executed whenever the AWK program is starting
# and no records have been matched yet.
BEGIN {
        if (ARGC != 3) {
                err("Error: not enough arguments.")
                err("")

                err("Usage: ./templater <content_file> <template_file>")
                err("Aborting.")
                exit 1
        }

        if (length(ENVIRON["PATTERN"]) == 0) {
                err("Error: no pattern specified.")
                err("")

                err("Specify a pattern via the `PATTERN` environment variable.")
                err("For example: ")
                err("  PATTERN=__CONTENT__ templater contents.txt template.txt")
                err("Aborting.")
                exit 1
        }
}

# By using the `NR=FNR` pattern we're able to specify
# an action that we want to perform only on the first
# file that we supply via the command line.
#
# FNR is a counter that keeps track of the current line
# in the current file that is being processed.
#
# NR is a counter that keeps track of the total number
# of lines that have been processed so far.
#
# By trying to match `NR==FNR` we can perform an action
# in the very first file. To visualize that, we can set
# up an experiment:
#
#       $ cat file1
#       a
#       b
#       c
#
#       $ cat file2
#       d
#       e
#
#       $ awk '{print FILENAME, NR, FNR, $0}' file1 file2
#       file1 1 1 a
#       file1 2 2 b
#       file1 3 3 c
#       file2 4 1 d -> not equal -> starts the second one
#       file2 5 2 e -> not equal
#
# In the action we can then store all the lines from
# the first file in memory so that we can use it later
# when we find the string to replace.
#
# By specifying the `next` statement, no further matching
# is performed for this record (line).
#
# ps.: we could also check `FILENAME`, like:
#       FILENAME==ARGV[1]
NR==FNR {
        content_lines[n++]=$0;
        next;
}

# Once we find the string to replace, we iterate over
# all the lines that we stored (from the first file)
# and then once we're done, we force AWK to immediately
# stop processing the current record so that it doesn't
# print `__CONTENT__` and don't proceed with performing
# further matches for this record (line).
#
# ps.: if you didn't want to take a variable here, for
# instance, have a fixed pattern to replace, you could
# simply use `/PATTERN/ { ... }`.
$0 ~ ENVIRON["PATTERN"] {
        for (i = 0; i < n; i++) {
                print content_lines[i];
        }
        next
}

# Given that 1 always evaluates to `true`, this is a match
# that will always occur.
#
# As we can either omit an action or a match (not both!),
# we can use a catch-all match (1) and let awk use the
# default action (print current line).
#
# This has the effect of printing all lines that didn't
# match the other matches that we specified above.
1

As we’re already specifying the interpreter directive (#!/usr/bin/awk -f), we can execute the program by making it executable and then feeding some data to it:

# Make it executable
chmod +x ./templater.awk

# Execute it passing the necessary arguments
PATTERN=__REPLACE_ME__ \
        ./templater.awk \
        content.txt \
        template.txt
11111111
22222222
xxxxxxx
yyyyyyy
zzzzzzz
33333333


# Using `-` as the `template.txt` argument
# and feeding its `stdin` also works:
export PATTERN=__REPLACE_ME__
echo "this is
very cool
__REPLACE_ME__
right?" | ./templater.awk \
        content.txt \
        -
this is
very cool
xxxxxxx
yyyyyyy
zzzzzzz
right?

Closing thoughts

Although I’ve copied and pasted AWK snippets now and then, I’ve never really understood the intrisics of them.

Now, going through its documentation, it feels like I have so much more power when it comes to processing text in the terminal.

I hope you feel that way too! If you’re willing to know more, make sure you check out the gawk manual. There you have multiple examples and it’s also very detailed.

If you have any questions or just want to make contact, feel free to reach me at @cirowrc on Twitter.

Have a good one!

finis