Home Flight manual Ramblings Tags RSS

Scope sed commands to specific lines

Table of contents

I really like sed. If you merely use it sporadically to search and replace text from the CLI, and have no interest in reaching for this trusty tool for anything else, you may want to know of a very popular alternative that somewhat simplifies that specific task: sd.

The finer folks will find boundless treasures in the venerable manual pages, and perhaps a few tips and tricks scattered online, such as those I plan to publish on this very platform.

The POSIX man pages of sed are more complete than that of its GNU implementation: you may run man 1p sed to access the POSIX version for a more faithful (and universal) reference; however GNU offers a significantly more exhaustive version through either info sed or its on-line version.

Obligatory TL;DR

You can limit the scope over which sed commands operate by preceding them with one or two addresses. An address may be a line number, a regular expression, or a few other special forms.

sed '/match/       s/before/after/'  file.txt   # substitute only on lines matching 'match'
sed '5,20          s/before/after/'  file.txt   # substitute only on lines 5 through 20
sed '/if/,  /fi/   d'                script.sh  # delete if-blocks (not production-ready...)
sed '/!important/  !d'               style.css  # extract css !important rules

The anatomy of a sed command

A sed command takes the following form:

[address[,address]]function

In this syntax, function represents a single-character command verb in sed, followed by any applicable arguments. The most common function is s, which stands for substitute, but you'll find numerous others, such a d for delete, p for print, a for append...1

To keep things readable (we wield regular expressions but we're not animals), the addresses and function can be preceded and/or followed by blank characters. The command in its entirety, [address[,address]]function, may be preceded and/or followed by blank and/or semicolon characters.

One last, simple yet quite useful trick regarding line selection: you can negate the selection over which a function shall operate by preceding it with an exclamation mark (!), which will cause the command to be executed on all lines except those addressed.

Each function you can execute may take up to either zero, one, or two addresses.

The forms of addresses

An address may take one of the following forms:

To these POSIX standards, GNU sed adds the following option:

Address ranges

In sed, most commands can be given with:

Using two addresses allows you to target ranges of lines where the two addresses serve as flip-flop markers: when the first address is matched, if we're currently outside a match range, we start processing the input lines for that command, until the second address is matched or the end of the input is reached.

Some GNU sed-specific pseudo-addresses

The GNU version of sed introduces three pseudo-addresses that may only be used in the context of address ranges, and aren't valid as stand-alone addresses:

Well, that's a lot of theory, but nothing quite scary in practice, though I suppose some demonstrations are in order.

Exempli gratia

Here are some examples of addresses and address ranges in action, using the delete command:

seq 1 10 | sed '1d'      # delete the first line
seq 1 10 | sed '$d'      # delete the last line
seq 1 10 | sed '2,4d'    # delete lines 2 through 4, inclusive
seq 1 10 | sed '2,$d'    # delete from line 2 to the end

# The following are GNU sed extensions:
seq 1 10 | sed '1~2d'    # delete odd-numbered lines
seq 1 10 | sed '2~2d'    # delete even-numbered lines
seq 1 10 | sed '3,+2d'   # delete line 3 and the two following it (3, 4, and 5)
seq 1 10 | sed '3,~7d'   # delete line 3, up to the next multiple of 7 (3, 4, 5, 6, and 7)

It is also possible to use regular expressions as addresses:

seq 1 20 | sed '/3/d'       # delete lines matching '3' (lines 3 and 13)
seq 1 20 | sed '/[357]/d'   # delete lines matching '3', '5', or '7' (lines 3, 5, 7, 13, 15, and 17)
seq 1 20 | sed '1,/[357]/d' # delete from line 1 to the first matching '3', '5', or '7' (lines 1, 2, 3)

You may target some function from a script:

# Delete the function "handleanything" by specifying lines ranging from:
# - its declaration: /function handleanything/, to:
# - its closing brace: /^}$/
sed '/function handleanything/, /^}$/ d' <<'EOF'
answer=42
function handleanything() {
    echo "We will look into it next sprint, please create a Jira ticket"
}
function why() {
    echo $answer
}
EOF
answer=42
function why() {
    echo $answer
}
you can use white space in the [address[,address]]function pattern for better readability

You can also retain only that function, one of two ways:

# Using the `-n` (or `--quiet` or `--silent`) option to suppress automatic
# printing of the processed stream, and the `p` command to print only the lines
# in the specified range:
sed -n '/function handleanything/, /^}$/  p' script.sh

# Using the negation operator (!) to operate on lines *outside* the specified range:
sed    '/function handleanything/, /^}$/ !d' script.sh
don't hesitate to use white-space, /function handleanything/, /^}$/ !d is also legal

For instance, you may refer to some of your favourite manual entries in this way:

man strftime | sed '/%D/,/^$/!d'
man strftime | sed '/%D/,  /^$/  !d'
       %D     Equivalent to %m/%d/%y.  (Yecch—for Americans only.  Americans should note that in
              other countries %d/%m/%y is rather common.  This means that in international context
              this format is ambiguous and should not be used.) (SU)

A real-life scenario

Another quick tip: if you find yourself wanting to substitute some text only on certain lines, you don't necessarily have to complicate your regular expression.

Let's take the example of some nginx configuration:

# handles https traffic
server {
    listen 443 ssl http2;
    server_name example.com;

    location /app1 {
        proxy_pass      http://backend1;
        proxy_redirect  http://backend1  /app1;
    }

    location /app2 {
        proxy_pass      http://backend2;
        proxy_redirect  http://backend2  /app2;
    }

    location /app3 {
        proxy_pass      http://backend3;
        proxy_redirect  http://backend3  /app3;
    }
}

Suppose that you want to avoid performing SSL termination in your reverse proxy: you'll need to change occurrences of http to https.

Here's the diff you'd obtain after running a naive sed 's/http/https/' nginx.conf:

-# handles https traffic
+# handles httpss traffic
 server {
-    listen 443 ssl http2;
+    listen 443 ssl https2;
     server_name example.com;

     location /app1 {
-        proxy_pass      http://backend1;
-        proxy_redirect  http://backend1  /app1;
+        proxy_pass      https://backend1;
+        proxy_redirect  https://backend1  /app1;
     }

     location /app2 {
-        proxy_pass      http://backend2;
-        proxy_redirect  http://backend2  /app2;
+        proxy_pass      https://backend2;
+        proxy_redirect  https://backend2  /app2;
     }

     location /app3 {
-        proxy_pass      http://backend3;
-        proxy_redirect  http://backend3  /app3;
+        proxy_pass      https://backend3;
+        proxy_redirect  https://backend3  /app3;
     }
 }

Oops, we don't actually want to change the comment line nor the protocol (there's no https2), but only the proxy_pass directives.

You could revise your regular expression to something like:

s/\(proxy_pass *\|proxy_redirect *\) http/\1 https

That would work, but... Good grief.

Even if you're comfortable with regular expressions in the first place, and with the BRE flavour whenever necessary, in addition to the PCRE dialect I'm sure you'll often favour, that's still a troublesome incantation not only to read, but even to write: you may very well, for instance, forget about the pesky multiple spaces for horizontal alignment in your first attempt!

A more idiomatic sed way would be to scope your substitution to the lines containing proxy_:

sed '/proxy_/s/http/https/' nginx.conf
sed '/proxy_/  s/http/https/' nginx.conf
the second version is also entirely legal syntax, it merely adds white-space for readability

How real-life is that, though? Wouldn't you just spin up your favourite text editor at this point?

Certainly. But you'll be happy to know that your favourite text editor offers the same function!

:g/proxy_/s/http/https
:g  /proxy_/  s/http/https
Vim is more lenient and tolerates omitting the final / at the end of the replacement pattern

You could also have selected the location blocks:

sed '/location.*{/,/^}/s/http/https/' nginx.conf
sed '/location.*{/,  /^}/  s/http/https/' nginx.conf

Or maybe all lines following /app1:

sed '#/app1#,$s/http/https/' nginx.conf
sed '#/app1#,  $  s/http/https/' nginx.conf
sed '\#/app1#,  $  s/http/https/' nginx.conf  # GNU sed only
for illustration, I used hashes (#) to delineate the RegExp here, to not have to escape the literal /

Or lines 5 and onwards:

sed '5,$s/http/https/' nginx.conf
sed '5,  $  s/http/https/' nginx.conf

The world is your oyster! Know your options, start using them here and there, you'll find what works for you and maybe best accommodate your tools to your taste.

One savvy application I use all the time

I'll share here a part of my personal prepare-commit-msg Git hook, which runs every time I commit anything anywhere and prepares the commit template to my liking.

I use the --verbose flag when committing, so that Git shows me:

[...] a unified diff between the HEAD commit and what would be committed at the bottom of the commit message template to help the user describe the commit by reminding what changes the commit has.

In summary, instead of being presented with:

git commit
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
# Changes to be committed:
#	modified:   test
#
# Changes not staged for commit:
#	modified:   test
#
# Untracked files:
#	other

You would instead have:

git commit --verbose
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
# Changes to be committed:
#	modified:   test
#
# Changes not staged for commit:
#	modified:   test
#
# Untracked files:
#	other
#
# ------------------------ >8 ------------------------
# Do not modify or remove the line above.
# Everything below it will be ignored.
diff --git a/test b/test
index 84ef876..4c1d036 100644
--- a/test
+++ b/test
@@ -1 +1 @@
-Let me tell you about something
+Let me tell you about something cool
I like having a recap of my changes below the scissors line

Quite handy, yet too noisy. My prepare-commit-msg hook prunes the content that I find too busy, using sed:

# Prune unstaged/untracked content listing and hand-holding guidance
sed -i.bak "$commit_msg_file" -e '
    /^# Please enter the commit message/, /^#$/d                ;
    /^# Do not modify or remove/,         /^[^#]/ { /^[^#]/!d } ;
    /^# Changes not staged/,              /^#$/d                ;
    /^# Untracked files/,                 /^#$/d                ;'
I use -i.bak to keep a backup of the original file, just in case

There's some more to unpack in this snippet, but the part I want to go over in this article here is the selection of chunks to delete using the /pattern/,/pattern/ address range syntax with sed, leaving me with a much cleaner commit message template:

git commit --verbose
# On branch master
# Changes to be committed:
#	modified:   test
# ------------------------ >8 ------------------------
diff --git a/test b/test
index 84ef876..4c1d036 100644
--- a/test
+++ b/test
@@ -1 +1 @@
-Let me tell you about something
+Let me tell you about something cool
in reality, I have configured git-commit to always use --verbose and needn't specify it

Much tidier, isn't it?

In this article, I only talked about selecting lines in sed, but I am barely scratching the surface of what you can do with the tool altogether, and will be sure to post more articles about it in the future.

Have fun!

  1. They all were inherited from the traditional ed line-oriented editor, and this legacy still lives on to this day, stronger than ever, through even the Vim (and Neovim) commands that use precisely these same single-letter verbs. Oh yeah, the CLI-only crew has it all figured out!

  2. The "hash" or "number sign" (#), is also routinely referred to by the Americanisms "octothorpe", or more absurdly "pound sign"

Referenced tools