Hey,
I’ve been trying to improve the Hugo template that I created for this blog, and one of the things I wanted to do was inject the contents of an index.css
file into a index.amp.html
layout file - Accelerated Mobile Pages (AMP) requires us to have our custom styles inlined in the HTML (see Custom Amp Styles).
Here I go through a way of achieving that - using awk
- and a way of not achieving that - using sed
(naively).
If you ever need to template a file with contents of another, this gist is for you.
ps.: yes, with Hugo you could use partials, but let’s say you want to just have some sort of poor man’s templating engine.
If you want to skip the introduction, jump to the awk section.
Getting started with the naive way - sed without escapeing
Knowing that sed
is all about performing substitions, it was the first tool I tried.
Unfortunatelly, it didn’t work as I expected.
# Create the file that contains the contents that we want
# to insert into another file at a specific location.
echo 'xxxxxxx
yyyyyyy
zzzzzzz' > ./content.txt
# Create the template that we want to have the variable
# substitution.
echo '11111111
22222222
__REPLACE_ME__
33333333' > ./template.txt
# Perform the substituion of `__REPLACE_ME__` by a single
# string just to test the `sed` syntax.
sed 's/__REPLACE_ME__/SOMETHING_NEW/g' ./template.txt
11111111
22222222
SOMETHING_NEW
33333333
# Well, if it worked for that simple case, why not just
# replace `SOMETHING_NEW' by all the contents of the
# file? We can get that via `cat` anyway:
CONTENT=$(cat ./content.txt)
sed "s/__REPLACE_ME__/$CONTENT/g" ./template.txt
sed: 1: "s/__REPLACE_ME__/xxxxxxx
...": unescaped newline inside substitute pattern
As we can see, that doesn’t work.
The main reason is that the command was not properly formatted - there are newlines not being escaped when we perform the substitution.
That happens because bash
is performing a simple text substitution when we specify $CONTENT
.
To see that working, make use of the xtrace
option invoking bash
directly:
# By invoking bash with the `-x` flag, we enable the
# `xtrace` option (which is usually added to scripts
# when debugging by specifying `set -o xtrace` or
# `set -x`).
#
# The options has the effect of expanding every command
# and variables that are supplied to it.
#
# This way, we're able to see what bash used as the full
# command after substituting the $CONTENT variable.
/bin/bash -x -c \
"sed \"s/__REPLACE_ME__/$CONTENT/g\" ./template.txt"
+ sed 's/__REPLACE_ME__/xxxxxxx
yyyyyyy
zzzzzzz/g' ./template.txt
sed: 1: "s/__REPLACE_ME__/xxxxxx ...":
unescaped newline inside substitute pattern
Knowing that, we can either:
- escape the newlines before supplying
$CONTENT
tosed
; or - make use of something that has greater flexibility (like
awk
).
With 1
, it means that we’d need to get into regular expressions and some fancy sed
syntax (see this stackoverflow answer).
With 2
, we can create a little program that we can be very verbose about what’s doing and make the whole process pretty explicity. I’ll go with this one.
Moving on - using AWK to template the file
AWK (just like sed
) is a pretty ubiquotous tool in Unix systems, meaning that you can find it everywhere.
It allows you to create data-driven programs where given a set of rules, actions are performed whenever a match happens.
By taking data from files (either stdin
or regular files supplied via arguments), awk
contiguously checks whether your matchers match and then it so, proceeds with the action specified.
“The
awk
utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs.”
“The basic function of awk is to search files for lines (or other units of text) that contain certain patterns.”
Given that, we can start modelling a solution for our problem in terms of matches:
- we definitely want to match
__REPLACE_ME__
at a given point; and - we certainly want to match lines those lines that are not meant to be replaced and just printed to
stdout
.
We can also model it in terms of actions:
- we want to replace the
__REPLACE_ME__
tag; and - we want to just print to the standard output what doesn’t match the
__REPLACE_ME__
tag.
Now, it’s a matter of adhering to the awk
specifics and implementing it.
note.: because awk
gives us a fully featured programming language, here I include parameter validation and the ability for the user to specify the pattern that (s)he wants to replace by using an environment variable (PATTERN
).
#!/usr/bin/awk -f
# templater - takes a file and replaces a variable
# in a given template with the contents of another file.
# err() - Prints a supplied `text` to standard error.
# @text: Text to be printed to stderr.
function err (text) {
print text > "/dev/stderr"
}
# The BEGIN matcher is a special type of matcher that
# gets executed whenever the AWK program is starting
# and no records have been matched yet.
BEGIN {
if (ARGC != 3) {
err("Error: not enough arguments.")
err("")
err("Usage: ./templater <content_file> <template_file>")
err("Aborting.")
exit 1
}
if (length(ENVIRON["PATTERN"]) == 0) {
err("Error: no pattern specified.")
err("")
err("Specify a pattern via the `PATTERN` environment variable.")
err("For example: ")
err(" PATTERN=__CONTENT__ templater contents.txt template.txt")
err("Aborting.")
exit 1
}
}
# By using the `NR=FNR` pattern we're able to specify
# an action that we want to perform only on the first
# file that we supply via the command line.
#
# FNR is a counter that keeps track of the current line
# in the current file that is being processed.
#
# NR is a counter that keeps track of the total number
# of lines that have been processed so far.
#
# By trying to match `NR==FNR` we can perform an action
# in the very first file. To visualize that, we can set
# up an experiment:
#
# $ cat file1
# a
# b
# c
#
# $ cat file2
# d
# e
#
# $ awk '{print FILENAME, NR, FNR, $0}' file1 file2
# file1 1 1 a
# file1 2 2 b
# file1 3 3 c
# file2 4 1 d -> not equal -> starts the second one
# file2 5 2 e -> not equal
#
# In the action we can then store all the lines from
# the first file in memory so that we can use it later
# when we find the string to replace.
#
# By specifying the `next` statement, no further matching
# is performed for this record (line).
#
# ps.: we could also check `FILENAME`, like:
# FILENAME==ARGV[1]
NR==FNR {
content_lines[n++]=$0;
next;
}
# Once we find the string to replace, we iterate over
# all the lines that we stored (from the first file)
# and then once we're done, we force AWK to immediately
# stop processing the current record so that it doesn't
# print `__CONTENT__` and don't proceed with performing
# further matches for this record (line).
#
# ps.: if you didn't want to take a variable here, for
# instance, have a fixed pattern to replace, you could
# simply use `/PATTERN/ { ... }`.
$0 ~ ENVIRON["PATTERN"] {
for (i = 0; i < n; i++) {
print content_lines[i];
}
next
}
# Given that 1 always evaluates to `true`, this is a match
# that will always occur.
#
# As we can either omit an action or a match (not both!),
# we can use a catch-all match (1) and let awk use the
# default action (print current line).
#
# This has the effect of printing all lines that didn't
# match the other matches that we specified above.
1
As we’re already specifying the interpreter directive (#!/usr/bin/awk -f
), we can execute the program by making it executable and then feeding some data to it:
# Make it executable
chmod +x ./templater.awk
# Execute it passing the necessary arguments
PATTERN=__REPLACE_ME__ \
./templater.awk \
content.txt \
template.txt
11111111
22222222
xxxxxxx
yyyyyyy
zzzzzzz
33333333
# Using `-` as the `template.txt` argument
# and feeding its `stdin` also works:
export PATTERN=__REPLACE_ME__
echo "this is
very cool
__REPLACE_ME__
right?" | ./templater.awk \
content.txt \
-
this is
very cool
xxxxxxx
yyyyyyy
zzzzzzz
right?
Closing thoughts
Although I’ve copied and pasted AWK snippets now and then, I’ve never really understood the intrisics of them.
Now, going through its documentation, it feels like I have so much more power when it comes to processing text in the terminal.
I hope you feel that way too! If you’re willing to know more, make sure you check out the gawk
manual. There you have multiple examples and it’s also very detailed.
If you have any questions or just want to make contact, feel free to reach me at @cirowrc on Twitter.
Have a good one!
finis