Pipes and combining commands
A pipe is a connection between two commands. The pipe receives the output from the first command and provides it as input to the second command. Pipes let you combine the Linux commands you already know to perform complicated jobs that no single command can handle.
Combining existing programs in new ways is a key part of the Unix programming philosophy, named for the Unix operating system where it became popular.
The Unix Programming Philosophy
"... at its heart is an idea that the power of a system comes more from the relationships among programs than from the programs themselves. Many UNIX programs do quite trivial things in isolation but, combined with other programs, become general and useful tools."
Linux is a descendant of Unix and inherits this philosophy - along with many features and design choices that support it.
Combining commands with pipes
You tell bash to create a pipe between two commands with the pipe symbol
("|
") (usually found on your backslash key).
Creating a pipe between two commands
$ command1 [options]... [arguments]
|
command2 [options]... [arguments]...
Data flows left-to-right; the standard output from the first command passes to the second command's standard input. The first command's standard error is not redirected by the pipe, and appears in your terminal (along with all of the second command's output).
You are not limited to a single pipe, either. You can chain together as many commands as necessary (even different invocations of the same command) to accomplish your goal.
Shortly after adding pipes to Unix, programmers Ken Thompson and Dennis Ritchie "upgraded every command on the system in a single night" to take advantage of them. This mainly required allowing each to read data from stdin when no other source was specified.
Pipes also led to the invention of the stderr stream in order to keep error messages separate from piped data.
References
- Kernighan, Brian. UNIX: A History and Memoir. Kindle Direct Publishing, 2020, p. 69.
Combining command line tools
A common description of Linux commands is as tools for operating on data. These tools receive data from standard input; modify or process that data; and send the results to standard output.
Linux pipes let you connect these tools like steps in an assembly line to do complicated work over multiple steps without writing a new program to do the work. As you learn more Linux commands, you add tools to your toolbox that you can combine in new ways.
Tools to manage terminal output
Some commands can print a lot of output very quickly, flooding your terminal and making the information you care about harder to find. Piping that output to another command lets you limit what's printed or examine it in a more controlled way.
The head and tail commands limit output by
printing only the first (or last) few lines they receive as input. A
terminal pager like less
displays only one screen of
data and gives you keyboard controls (similar to those used by
man pages) to navigate the text.
less
- file perusal filter (opposite of
more
)
$ less [option]... [file]...
Try it yourself: Managing the output of ls -l
in a really big directory
(Part 1) What happens when you run the command
$ ls -l /usr/bin
on cs-class? (Hint: connect to cs-class using ssh and try it)
You just flooded your terminal with output! The /usr/bin
directory has over 1800 items in it, and ls -l
prints one
item per line.
(Part 2) Can you think of a "piped command" that would let you view this
output in a more mangageable way? (Hint: try the less
command above).
Use a pipe to connect ls -l
to less
:
$ ls -l /usr/bin | less
Now you see one screen of text, with controls to navigate the rest.
You'll now see only one screen's worth of output, along with a cursor
Tools to copy terminal outputs
Another useful tool is tee
, which works like a tee junction
in plumbing.
tee
- read from standard input and write to
standard output and files
$ tee [option]... [file]...
You can use tee
to capture a program's output in a file while still
displaying it in your terminal -- or passing it on to another program through
a pipe.
Example: Using tee
to capture user input and
program output
Suppose you wrote an interactive program in C++ (compiled as a.out
that asks the user questions and processes their responses. The
tee
command can be used (twice) to capture both the users' inputs
and the program's responses in a single log file.
$ tee -a log.txt | a.out | tee -a log.txt
The pipeline above has three parts:
- The first
tee
receives standard input (from the user) and sends it both to the filelog.txt
and to standard output used in the next step. - The
a.out
receives standard input (fromtee
) containing the user's inputs; processes them; and prints the responses to standard output for the last step. - The second
tee
receives the output ofa.out
and sends it both tolog.txt
and to standard output, displayed in the terminal for the user to read.
The option -a
tells tee
to append to
the log file rather than overwriting it - so the two copies of
tee
don't take turns overwriting each others work.
Yet another useful program is xargs
, whose job is to transform standard
input into command-line arguments for another program.
xargs
build and execute command lines from
standard input
$ xargs [options] [command
[initial-args]]
A classic use-case for xargs
is with programs which take a list of
filenames as command-line arguments. A command like ls
can be used to
print a list of filenames to standard output which xargs
can transform
into an argument list. (See two diffent examples here and
here further down this page!)
Tools to compare data
You can compare a command's output to "known" output line-by-line using
diff
. A dash ("-
") as a filename tells
diff
to compare standard input to the named file.
diff
- compare files line by line
$ diff [option]... files
If you need to identify byte-by-byte differences (especially if the data
isn't plain text), you can use cmp
to do the job.
cmp
- compare two files byte by byte
$ cmp [option]... file1 [file2]
Tools to filter and transform data
Many Linux commands behave like filters (a la Instagram, though the term comes from signal processing) which transform input data into a modified form for later use. Some simple transformations are handled by the commands below.
The sort
command outputs all input lines in sorted order.
sort
- sort lines of text files
$ sort [option]... [file]...
The uniq
command compares each line of text and outputs only
one copy of each "unique" occurrence.
uniq
- report or omit repeated lines
$ uniq [option]... [input]
The word count command wc
counts not only words of input but
also lines and individual characters. Command-line arguments are used to
choose which counts it outputs.
wc
- print newline, word, and byte counts
for each file
$ wc [option]... [file]...
Tools to process text streams
Since so many Linux files contain plain text, tools that can search, format, or process text are particularly useful to learn about. The tools below are explored in greater detail on other pages.
grep
is a pattern-matching tool. It searches each line of input for a particular pattern, and prints the lines that match.sed
is a stream editor. It can use a rule to edit or transform each line of input provided and print the results.awk
is both a command and a programming language. It specializes in operations common in data extraction, manipulation, and report-generation.
In particular, grep
is often given as the quintissential example
of a Linux command-line tool in its usage and its pattern-matching behavior
proves useful in many different situations.
A few examples of piped commands using grep
, sed
, and
xargs
given below.
Example: Counting the number of long lines in C++ source code
Programmers agree that long lines ("wide code") are harder to read, though they argue about how many characters is "too wide." The Google C++ Style Guide mandates an 80 character line length, but admits this is a controversial choice.
Suppose you are a COSC 051 TA (or an attentive student) who wants to check for
overlong lines in your main.cpp
. You can use grep
(and a suitable regular expression) to search for the bad lines.
grep -n -e '.\{81\}' main.cpp
The pattern '.\{81\}'
("81 characters of any kind") will only match
lines which contain at least 81 characters (lines that are "too wide"), so
grep
will print those "bad" lines while filtering the rest. The
-n
option prints the line number of each match (so you can quickly
find and split the long lines).
What if your programming project contains more than one source code file? You
could type out each filename by hand, but you know a command for listing
filenames: the ls
command! Let's use a pipe to solve the problem
instead.
# ls | xargs grep -n -e '.\{81\}'
Here ls
provides all the filenames, and xargs
uses
them as arguments to grep
so it will search each file.
What if your project directory contains files that aren't C++ source code (like
a .pdf specification)? The ls
names all the files, but we can
filter them with grep
.
$ ls | grep -e '*.cpp' -e '*.h' | xargs grep -n -e '.\{81\}'
The new grep
checks the filenames provided by ls
and
filters out those not ending in either .cpp or .h (that is, C++ source code).
These are the only files passed to xargs
and searched for long
lines.
Example: Staging files for a git commit
Git is a version control program that creates snapshots (called
commits) of your coding project as you work. You tell git which
changes to save by adding them to a draft version of the next
commit. This is done with the git add
command.
$ git add file1 [file2]...
Suppose I'm writing a custom class, order
, with an associated
header and implementation file as well as client code in main.cpp
.
I can ask git to print information about files that have changed since my
last commit.
This output is human-readable, but I can't use it with git add
as-is. First, let's use grep
to pick out lines with names of
modified files. These all start with modified:
.
Now I've got the filenames I want to add, but they aren't "clean" - the
modified:
shouldn't be there! Next, let's use sed
to trim off that unwanted bit by replacing it with nothing.
Great! Now we've got a list of files to add with one file per line. Last,
let's use xargs
to make that list of filenames into arguments
for git add
to stage.
Voila! The modified files are now staged for your commit!
The nice thing about this pipeline is that it works equally well when there are three, thirty, or three hundred modified files - more than is practical to type by hand!