I really like sed. If you merely use it sporadically to search and replace
text from the CLI, and have no interest in reaching for this trusty tool for
anything else, you may want to know of a very popular alternative that somewhat
simplifies that specific task: sd.
The finer folks will find boundless treasures in the venerable manual pages,
and perhaps a few tips and tricks scattered online, such as those I plan to
publish on this very platform.
The
POSIXmanpages ofsedare more complete than that of itsGNUimplementation: you may runman 1p sedto access thePOSIXversion for a more faithful (and universal) reference; howeverGNUoffers a significantly more exhaustive version through eitherinfo sedor its on-line version.
⚓ Obligatory TL;DR
You can limit the scope over which sed commands operate by preceding them
with one or two addresses. An address may be a line number, a regular
expression, or a few other special forms.
⚓
The anatomy of a sed command
A sed command takes the following form:
[address[,address]]function
In this syntax, function represents a single-character command verb in
sed, followed by any applicable arguments. The most common function is s,
which stands for substitute, but you'll find numerous others, such a d for
delete, p for print, a for append...1
To keep things readable (we wield regular expressions but we're not animals), the
addresses andfunctioncan be preceded and/or followed by blank characters. The command in its entirety,[address[,address]]function, may be preceded and/or followed by blank and/or semicolon characters.
One last, simple yet quite useful trick regarding line selection: you can
negate the selection over which a function shall operate by preceding it
with an exclamation mark (!), which will cause the command to be executed on
all lines except those addressed.
Each function you can execute may take up to either zero, one, or two addresses.
⚓ The forms of addresses
An address may take one of the following forms:
-
a decimal number that counts input lines cumulatively across files (
1-indexed), -
a dollar sign (
$) character, which references the last line of input, or -
a context address, which consists of a
BREas described here in the completeGNUsedmanual on-line, preceded and followed by a delimiter—usually a forward slash (/).Using
GNUsed, using a different delimiter in the context of addresses requires abackslash(\) before the preceding delimiter. For example, to use a hash 2 as the delimiter, withGNUsed, you'd write\#regexp#instead of#regexp#.In addition,
GNU's implementation introduces the/re/Iand/re/Msyntaxes for case-insensitive and multi-line matching, respectively. Most often in other contexts these flags are lower-case, butGNUsedoffers a would-be conflictingifunction verb for insert that couldn't be interpreted adequately, had the flags beeniandminstead ofIandM.
To these POSIX standards, GNU sed adds the following option:
-
the
first~stepform, wherefirstandstepare both decimal numbers.This matches one every every
steplines, starting withfirst. For example,1~2matches all odd-numbered lines, while2~2matches the even ones.
⚓ Address ranges
In sed, most commands can be given with:
-
no addresses, in which case the command will be executed for all input lines,
-
one address, then only input lines which match that address will be considered, or
-
two addresses, in which case the command will operate over all input lines matching the inclusive range of lines starting from the first address and continuing to the second address.
The syntax is
address1,address2(that is, the addresses are separated by a comma); the line whichaddress1matched will always be accepted, even ifaddress2selects an earlier line; and ifaddress2is a regular expression, it will not be tested against the line thataddress1matched.
Using two addresses allows you to target ranges of lines where the two addresses serve as flip-flop markers: when the first address is matched, if we're currently outside a match range, we start processing the input lines for that command, until the second address is matched or the end of the input is reached.
⚓
Some GNU sed-specific pseudo-addresses
The GNU version of sed introduces three pseudo-addresses that may only
be used in the context of address ranges, and aren't valid as stand-alone
addresses:
-
+Nand~N, whereNis a decimal number, only available to the second of an address pair:+Nmatches the line that isNlines after that of the first address,~Nmatches the lines whose number is a multiple ofN. -
0, which may only be used as the first of an address pair. It is most esoteric and only included here for completeness. The special0,address2syntax is used to:Start out in "matched first address" state, until
address2is found. This is similar to1,address2, except that ifaddress2matches the very first line of input the0,address2form will be at the end of its range, whereas the1,address2form will still be at the beginning of its range. This works only whenaddress2is a regular expression.
Well, that's a lot of theory, but nothing quite scary in practice, though I suppose some demonstrations are in order.
⚓ Exempli gratia
Here are some examples of addresses and address ranges in action, using
the delete command:
| | | |
# The following are GNU sed extensions:
| | | |
It is also possible to use regular expressions as addresses:
| | |
You may target some function from a script:
# Delete the function "handleanything" by specifying lines ranging from:
# - its declaration: /function handleanything/, to:
# - its closing brace: /^}$/
answer=42
function why() {
echo $answer
}
[address[,address]]function pattern for better readabilityYou can also retain only that function, one of two ways:
# Using the `-n` (or `--quiet` or `--silent`) option to suppress automatic
# printing of the processed stream, and the `p` command to print only the lines
# in the specified range:
# Using the negation operator (!) to operate on lines *outside* the specified range:
/function handleanything/, /^}$/ !d is also legalFor instance, you may refer to some of your favourite manual entries in this way:
|
|
%D Equivalent to %m/%d/%y. (Yecch—for Americans only. Americans should note that in
other countries %d/%m/%y is rather common. This means that in international context
this format is ambiguous and should not be used.) (SU)
⚓ A real-life scenario
Another quick tip: if you find yourself wanting to substitute some text
only on certain lines, you don't necessarily have to complicate your regular
expression.
Let's take the example of some nginx configuration:
# handles https traffic
server {
listen 443 ssl http2;
server_name example.com;
location /app1 {
proxy_pass http://backend1;
proxy_redirect http://backend1 /app1;
}
location /app2 {
proxy_pass http://backend2;
proxy_redirect http://backend2 /app2;
}
location /app3 {
proxy_pass http://backend3;
proxy_redirect http://backend3 /app3;
}
}
Suppose that you want to avoid performing SSL
termination in your
reverse proxy: you'll need to change occurrences of http to https.
Here's the diff you'd obtain after running a naive sed 's/http/https/' nginx.conf:
-# handles https traffic
+# handles httpss traffic
server {
- listen 443 ssl http2;
+ listen 443 ssl https2;
server_name example.com;
location /app1 {
- proxy_pass http://backend1;
- proxy_redirect http://backend1 /app1;
+ proxy_pass https://backend1;
+ proxy_redirect https://backend1 /app1;
}
location /app2 {
- proxy_pass http://backend2;
- proxy_redirect http://backend2 /app2;
+ proxy_pass https://backend2;
+ proxy_redirect https://backend2 /app2;
}
location /app3 {
- proxy_pass http://backend3;
- proxy_redirect http://backend3 /app3;
+ proxy_pass https://backend3;
+ proxy_redirect https://backend3 /app3;
}
}
Oops, we don't actually want to change the comment line nor the protocol
(there's no https2), but only the proxy_pass directives.
You could revise your regular expression to something like:
s/\(proxy_pass *\|proxy_redirect *\) http/\1 https
That would work, but... Good grief.
Even if you're comfortable with regular expressions in the first place,
and with the BRE flavour whenever necessary, in addition to the PCRE dialect I'm sure
you'll often favour, that's still a troublesome incantation not only to read,
but even to write: you may very well, for instance, forget about the pesky
multiple spaces for horizontal alignment in your first attempt!
A more idiomatic sed way would be to scope your substitution to the lines containing
proxy_:
How real-life is that, though? Wouldn't you just spin up your favourite text editor at this point?
Certainly. But you'll be happy to know that your favourite text editor offers the same function!
:g/proxy_/s/http/https
:g /proxy_/ s/http/https
/ at the end of the replacement patternYou could also have selected the location blocks:
Or maybe all lines following /app1:
#) to delineate the RegExp here, to not have to escape the literal /Or lines 5 and onwards:
The world is your oyster! Know your options, start using them here and there, you'll find what works for you and maybe best accommodate your tools to your taste.
⚓ One savvy application I use all the time
I'll share here a part of my personal prepare-commit-msg Git
hook,
which runs every time I commit anything anywhere and prepares the commit
template to my liking.
I use the --verbose flag when committing, so that Git shows me:
[...] a unified diff between the HEAD commit and what would be committed at the bottom of the commit message template to help the user describe the commit by reminding what changes the commit has.
In summary, instead of being presented with:
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
# Changes to be committed:
# modified: test
#
# Changes not staged for commit:
# modified: test
#
# Untracked files:
# other
You would instead have:
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
# Changes to be committed:
# modified: test
#
# Changes not staged for commit:
# modified: test
#
# Untracked files:
# other
#
# ------------------------ >8 ------------------------
# Do not modify or remove the line above.
# Everything below it will be ignored.
diff --git a/test b/test
index 84ef876..4c1d036 100644
--- a/test
+++ b/test
@@ -1 +1 @@
-Let me tell you about something
+Let me tell you about something cool
Quite handy, yet too noisy. My prepare-commit-msg hook prunes the content
that I find too busy, using sed:
# Prune unstaged/untracked content listing and hand-holding guidance
-i.bak to keep a backup of the original file, just in caseThere's some more to unpack in this snippet, but the part I want to go over
in this article here is the selection of chunks to delete using the
/pattern/,/pattern/ address range syntax with sed, leaving me with a much
cleaner commit message template:
# On branch master
# Changes to be committed:
# modified: test
# ------------------------ >8 ------------------------
diff --git a/test b/test
index 84ef876..4c1d036 100644
--- a/test
+++ b/test
@@ -1 +1 @@
-Let me tell you about something
+Let me tell you about something cool
git-commit to always use --verbose and needn't specify itMuch tidier, isn't it?
In this article, I only talked about selecting lines in sed, but I am
barely scratching the surface of what you can do with the tool altogether,
and will be sure to post more articles about it in the future.
Have fun!
-
They all were inherited from the traditional
edline-oriented editor, and this legacy still lives on to this day, stronger than ever, through even the Vim (and Neovim) commands that use precisely these same single-letter verbs. Oh yeah, theCLI-only crew has it all figured out! ↩ -
The "hash" or "number sign" (
#), is also routinely referred to by the Americanisms "octothorpe", or more absurdly "pound sign" ↩
⚓ Referenced tools
| sed | |
|---|---|
| from core/sed | |
| manual | repository |
| sd | |
|---|---|
| from extra/sd | |
| manual | repository |
| vim | |
|---|---|
| from extra/vim | |
| manual | repository |