3.3. Addressing and address ranges
Sed commands may have an optional "address" or "address range"
prefix. If there is no address or address range given, then the
command is applied to all the lines of the input file or text
stream. Three commands cannot take an address prefix:
- labels, used to branch or jump within the script
- the close brace, '}', which ends the '{' "command"
- the '#' comment character, also technically a "command"
An address can be a line number (such as 1, 5, 37, etc.), a regular
expression (written in the form /RE/ or \xREx where 'x' is any
character other than '\' and RE is the regular expression), or the
dollar sign ($), representing the last line of the file. An
exclamation mark (!) after an address or address range will apply
the command to every line EXCEPT the ones named by the address. A
null regex ("//") will be replaced by the last regex which was
used. Also, some seds do not support \xREx as regex delimiters.
5d # delete line 5 only
5!d # delete every line except line 5
/RE/s/LHS/RHS/g # substitute only if RE occurs on the line
/^$/b label # if the line is blank, branch to ':label'
/./!b label # ... another way to write the same command
\%.%!b label # ... yet another way to write this command
$!N # on all lines but the last, get the Next line
Note that an embedded newline can be represented in an address by
the symbol \n, but this syntax is needed only if the script puts 2
or more lines into the pattern space via the N, G, or other
commands. The \n symbol does not match the newline at an
end-of-line because when sed reads each line into the pattern space
for processing, it strips off the trailing newline, processes the
line, and adds a newline back when printing the line to standard
output. To match the end-of-line, use the '$' metacharacter, as
follows:
/tape$/ # matches the word 'tape' at the end of a line
/tape$deck/ # matches the word 'tape$deck' with a literal '$'
/tape\ndeck/ # matches 'tape' and 'deck' with a newline between
The following sed commands usually accept only a single address.
All other commands (except labels, '}', and '#') accept both single
addresses and address ranges.
= print to stdout the line number of the current line
a after printing the current line, append "text" to stdout
i before printing the current line, insert "text" to stdout
q quit after the current line is matched
r file prints contents of "file" to stdout after line is matched
Note that we said "usually." If you need to apply the '=', 'a',
'i', or 'r' commands to each and every line within an address
range, this behavior can be coerced by the use of braces. Thus,
"1,9=" is an invalid command, but "1,9{=;}" will print each line
number followed by its line for the first 9 lines (and then print
the rest of the rest of the file normally).
Address ranges occur in the form
<address1>,<address2> or <address1>,<address2>!
where the address can be a line number or a standard /regex/.
<address2> can also be a dollar sign, indicating the end of file.
Under GNU sed 3.02+, ssed, and sed15+, <address2> may also be a
notation of the form +num, indicating the next num lines after
<address1> is matched.
Address ranges are:
(1) Inclusive. The range "/From here/,/eternity/" matches all the
lines containing "From here" up to and including the line
containing "eternity". It will not stop on the line just prior to
"eternity". (If you don't like this, see section 4.24.)
(2) Plenary. They always match full lines, not just parts of lines.
In other words, a command to change or delete an address range will
change or delete whole lines; it won't stop in the middle of a
line.
(3) Multi-linear. Address ranges normally match 2 lines or more.
The second address will never match the same line the first address
did; therefore a valid address range always spans at least two
lines, with these exceptions which match only one line:
- if the first address matches the last line of the file
- if using the syntax "/RE/,3" and /RE/ occurs only once in the file at line 3 or below
- if using HHsed v1.5. See section 3.4.
(4) Minimalist. In address ranges with /regex/ as <address2>, the
range "/foo/,/bar/" will stop at the first "bar" it finds, provided
that "bar" occurs on a line below "foo". If the word "bar" occurs
on several lines below the word "foo", the range will match all the
lines from the first "foo" up to the first "bar". It will not
continue hopping ahead to find more "bar"s. In other words, address
ranges are not "greedy," like regular expressions.
(5) Repeating. An address range will try to match more than one
block of lines in a file. However, the blocks cannot nest. In
addition, a second match will not "take" the last line of the
previous block. For example, given the following text,
start
stop start
stop
the sed command '/start/,/stop/d' will only delete the first two
lines. It will not delete all 3 lines.
(6) Relentless. If the address range finds a "start" match but
doesn't find a "stop", it will match every line from "start" to the
end of the file. Thus, beware of the following behaviors:
/RE1/,/RE2/ # If /RE2/ is not found, matches from /RE1/ to the
# end-of-file.
20,/RE/ # If /RE/ is not found, matches from line 20 to the
# end-of-file.
/RE/,30 # If /RE/ occurs any time after line 30, each
# occurrence will be matched in sed15+, sedmod, and
# GNU sed v3.02+. GNU sed v2.05 and 1.18 will match
# from the 2nd occurrence of /RE/ to the end-of-file.
If these behaviors seem strange, remember that they occur because
sed does not look "ahead" in the file. Doing so would stop sed from
being a stream editor and have adverse effects on its efficiency.
If these behaviors are undesirable, they can be circumvented or
corrected by the use of nested testing within braces. The following
scripts work under GNU sed 3.02:
# Execute your_commands on range "/RE1/,/RE2/", but if /RE2/ is
# not found, do nothing.
/RE1/{:a;N;/RE2/!ba;your_commands;}
# Execute your_commands on range "20,/RE/", but if /RE/ is not
# found, do nothing.
20{:a;N;/RE/!ba;your_commands;}
As a side note, once we've used N to "slurp" lines together to test
for the ending expression, the pattern space will have gathered
many lines (possibly thousands) together and concatenated them as a
single expression, with the \n sequence marking line breaks. The
REs within the pattern space may have to be modified (e.g., you
must write '/\nStart/' instead of '/^Start/' and '/[^\n]*/' instead
of '/.*/') and other standard sed commands will be unavailable or
difficult to use.
# Execute your_commands on range "/RE/,30", but if /RE/ occurs
# on line 31 or later, do not match it.
1,30{/RE/,$ your_commands;}
For related suggestions on using address ranges, see sections 4.2,
4.15, and 4.19 of this FAQ. Also, note the following section.