Hey,

Being accustomed to tying together multiple bash programs with the pipe operator, sometimes I’ve seen myself not being able to easily do so when a command expected the input to come from a file instead of stdin.

Illustration of a pipe to dirname failing
Given that `dirname` doesn't expect input from `stdin`, piping to it doesn't work as expected

Using process substitution we can make sure that pretty much every command can perform the equivalent of taking contents from standard input.

tl;dr: dirname <(echo "/var/lib/my/file.txt")

Rationale

Although it’s prevalent for some programs like jq (the command-line JSON processor) to receive things from its standard input, process them, and then give the response back on the standard output, not every program is made like that.

Visualization of JQ taking a JSON input from the standard intput and producing a filter JSON as output
JQ receiving JSON from stdin and procuding a filtered output at stdout

Some programs, like dirname, expect that the input comes as a form of positional argument (others usually expect a flag).

While that might not look like a big deal, for some use cases where you want to streamline operations between multiple processes, it might feel like you can’t easily do that when a file is expected.

The programmer that created the command line interface for the program needs to be aware that someone might want to use its software in such manner and then code it for that.

Example: consuming from stdin using Go

For instance, if we’re doing a byte counter in Go, that’s how we’d do it:

package main

import (
	"bufio"
	"fmt"
	"io"
	"os"
)

// maxBufferSize defines the maximum size of
// the buffer that we're creating for temporarily
// holding the contents of what's coming from
// stdin.
const maxBufferSize = 4096

func main() {
	var totalRead uint64

	buffer := make([]byte, maxBufferSize)
	reader := bufio.NewReader(os.Stdin)

        // Keep reading in a buffered fashion
        // from `stdin` until an EOF is returned
        // in the form of an error.
        //
        // When that happens, output to `stdout`
        // the total count and exit.
	for {
		n, err := reader.Read(buffer)
		if err != nil {
			if err == io.EOF {
				break
			}

			fmt.Fprintln(os.Stderr, err.Error())
			os.Exit(1)
		}

		totalRead += uint64(n)
	}

        // Write the `stdout` the total amount of
        // bytes that were read.
	fmt.Println(totalRead)
}

Compile the code with a standard go build and check that it indeed works as expected:

# Create a 1MB file named `./1M`
dd if=/dev/zero of=./1M bs=1024 count=1024


# Confirm that we sent 1024*1024 bytes to the file
ls -lah | grep 1M
-rw-r--r--  1 cirocosta  wheel   1.0M  4 Sep 21:13 1M


# Check how many bytes are there in this file by
# running our go code by sending the whole contents
# of the file to the standard input of our command
cat ./1M | ./byte-counter
1048576

Example: consuming from a file using Go

If we were about to change this code to not take from stdin, but instead, read from a file, the changes wouldn’t be significant: take the filename either from a flag or a positional argument and then read it.

diff --git a/tmp/before b/tmp/after
index fe7f39c..e4102af 100644
--- a/tmp/before
+++ b/tmp/after
@@ -16,8 +16,24 @@ const maxBufferSize = 4096
 func main() {
        var totalRead uint64

+       if len(os.Args) < 2 {
+               fmt.Fprintln(os.Stderr, "error: not enough arguments")
+               fmt.Fprintln(os.Stderr, "Usage: byte-counter <filename>")
+               os.Exit(1)
+       }
+
+       fileName := os.Args[1]
+
+       // Open the file that the user provided us
+       // with a path. 
+       // Given that `os.Open` returns os.File which 
+       // implements the `io.Reader` interface, we don't 
+       // need to change the rest of the code.
+       reader, err := os.Open(fileName)
+       if err != nil {
+               fmt.Fprintln(os.Stderr, err)
+               os.Exit(1)
+       }
+
+       defer file.Close()
+
        buffer := make([]byte, maxBufferSize)
@@ -41,8 +57,5 @@ func main() {

Even better, because it doesn’t matter to us what the reader is (the interface only cares about the methods that the implementor implements), we can modify the code very little.

Example: consuming from both a file and stdin using Go

We can go even deeper and simplify our reading logic and make it handle both cases - when it should read from a file and when it should read from the standard input:

package main

import (
	"fmt"
	"io"
	"io/ioutil"
	"os"
)

// maxBufferSize defines the maximum size of
// the buffer that we're creating for temporarily
// holding the contents of what's coming from
// stdin.
const maxBufferSize = 4096

func main() {
	var (
		err    error
		reader io.Reader
		writer = ioutil.Discard
	)

	// Make sure that the user always supplies
	// a positional argument to be explcit that
	// it either wants to consume the input of
	// stdin or a file.
	if len(os.Args) < 2 {
		fmt.Fprintln(os.Stderr, "error: not enough arguments")
		fmt.Fprintln(os.Stderr, "Usage: byte-counter (filename|-)")
		os.Exit(1)
	}

	if os.Args[1] == "-" {
		reader = os.Stdin
	} else {
		reader, err = os.Open(os.Args[1])
		if err != nil {
			fmt.Fprintln(os.Stderr, err)
			os.Exit(1)
		}

		defer reader.(*os.File).Close()
	}

	// Use the `io.Copy` method to read the contents
	// of the file into a temporary buffer and then
	// write those to the discard facility (/dev/null).
	//
	// Naturally, there's no need for the `write` part,
	// but this simplifies our code.
	n, err := io.Copy(writer, reader)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		os.Exit(1)
	}

	// Write the `stdout` the total amount of
	// bytes that were read.
	fmt.Println(n)
}

Given that for io.Copy all that matters is something that implements the io.Reader interface (and well, both stdin and a file can be referred by a file descriptor in Linux anyway - pretty close semantics), we can use both os.Stdin and the os.File interchangeably with io.Copy. Neat!

Using process substitution to pass a temporary file consuming the output of a command

While it’s pretty well known that bash can interpolate the results of the execution of a given command (like, echo $(date) becomes echo Sun 9 Sep 2018 20:53:41 EDT), it is also capable of interpolating the execution of a command with a named pipe (see the article Introduction to Named Pipes for some great info about what pipes are).

Here is where process substitution comes in place.

Whenever bash spots the <(cmd) operator, it runs the given cmd and then makes its standard output be connected to a temporary named pipe that it creates.

dirname <(echo "/var/lib/my/file.txt")
                |
                *-----> 1. creates a named pipe under /dev/fd/<number>
                *-----> 2. substitutes the execution by the file path

                        dirname /dev/fd/<number>

                        *---> executes the `dirname` command

The interpolation then substitutes such syntax by the path to this temporary file and now whatever program you’re running is capable of reading from this temporary file!

ps: the resulting temporary file is not a regular file.

If the command you’re supplying this named pipe to checks if the file is a regular file, the entire operation might fail (as the file types are different).

Closing thoughts

Having a little bit of extra knowledge about some capabilities of bash is definitely handy.

I really hope I enjoyed this quick bash tip tied with a bit of Go as well.

Please let me know if you have any questions or additions! I’m cirowrc on Twitter and would love your feedback!

Cheers,

Ciro