4.23. How do I match a block of specific consecutive lines?
There are three ways to approach this problem:
(1) Try to use a "/range/, /expression/"
(2) Try to use a "/multi-line\nexpression/"
(3) Try to use a block of "literal strings"
We describe each approach in the following sections.
4.23.1. Try to use a "/range/, /expression/"
If the block of lines are strings that never change their order
and if the top line never occurs outside the block, like this:
Abel
Baker
Charlie
Delta
then these solutions will work for deleting the block:
sed 's/^Abel$/{N;N;N;d;}' files # for blocks with few lines
sed '/^Abel$/, /^Zebra$/d' files # for blocks with many lines
sed '/^Abel$/,+25d' files # HHsed, sedmod, ssed, gsed 3.02.80
To change the block, use the 'c' (change) command instead of 'd'.
To print that block only, use the -n switch and 'p' (print) instead
of 'd'. To change some things inside the block, try this:
/^Abel$/,/^Delta$/ {
:ack
N;
/\nDelta$/! b ack
# At this point, all the lines in the block are collected
s/ubstitute /somethin/g;
}
4.23.2. Try to use a "multi-line\nexpression"
If the top line of the block sometimes appears alone or is
sometimes followed by other lines, or if a partial block may occur
somewhere in the file, a multi-line expression may be required.
In these examples, we give solutions for matching an N-line block.
The expression "/^RE1\nRE2\nRE3...$/" represents a properly formed
regular expression where \n indicates a newline between lines. Note
that the 'N' followed by the 'P;D;' commands forms a "sliding
window" technique. A window of N lines is formed. If the multi-line
pattern matches, the block is handled. If not, the top line is
printed and then deleted from the pattern space, and we try to
match at the next line.
# sed script to delete 2 consecutive lines: /^RE1\nRE2$/
$b
/^RE1$/ {
$!N
/^RE1\nRE2$/d
P;D
}
#---end of script---
# sed script to delete 3 consecutive lines. (This script
# fails under GNU sed v2.05 and earlier because of the 't'
# bug when s///n is used; see section 7.5(1) of the FAQ.)
: more
$!N
s/\n/&/2;
t enough
$!b more
: enough
/^RE1\nRE2\nRE3$/d
P;D
#---end of script---
For example, to delete a block of 5 consecutive lines, the previous
script must be altered in only two places:
(1) Change the 2 in "s/\n/&/2;" to a 4 (the trailing semicolon is
needed to work around a bug in HHsed v1.5).
(2) Change the regex line to "/^RE1\nRE2\nRE3\nRE4\nRE5$/d",
modifying the expression as needed.
Suppose we want to delete a block of two blank lines followed by
the word "foo" followed by another blank line (4 lines in all).
Other blank lines and other instances of "foo" should be left
alone. After changing the '2' to a '3' (always one number less than
the total number of lines), the regex line would look like this:
"/^\n\nfoo\n$/d". (Thanks to Greg Ubben for this script.)
As an alternative to work around the 't' bug in older versions of
GNU sed, the following script will delete 4 consecutive lines:
# sed script to delete 4 consecutive lines. Use this if you
# require GNU sed 2.05 and below.
/^RE1$/!b
$!N
$!N
:a
$b
N
/^RE1\nRE2\nRE3\nRE4$/d
P
s/^.*\n\(.*\n.*\n.*\)$/\1/
ba
#---end of script---
Its drawback is that it must be modified in 3 places instead of 2
to adapt it for more lines, and as additional lines are added, the
's' command is forced to work harder to match the regexes. On the
other hand, it avoids a bug with gsed-2.05 and illustrates another
way to solve the problem of deleting consecutive lines.
4.23.3. Try to use a block of "literal strings"
If you need to match a static block of text (which may occur any
number of times throughout a file), where the contents of the block
are known in advance, then this script is easy to use. It requires
an intermediate file, which we will call "findrep.txt" (below):
A block of several consecutive lines to
be matched literally should be placed on
top. Regular expressions like .* or [a-z]
will lose their special meaning and be
interpreted literally in this block.
----
Four hyphens separate the two sections. Put
the replacement text in the lower section.
As above, sed symbols like &, \n, or \1 will
lose their special meaning.
This is a 3-step process. A generic script called "blockrep.sed"
will read "findrep.txt" (above) and generate a custom script, which
is then used on the actual input file. In other words,
"findrep.txt" is a simplified description of the editing that you
want to do on the block, and "blockrep.sed" turns it into actual
sed commands.
Use this process from a Unix shell or from a DOS prompt:
sed -nf blockrep.sed findrep.txt >custom.sed
sed -f custom.sed input.file >output.file
erase custom.sed
The generic script "blockrep.sed" follows below. It's fairly long.
Examining its output might help you understanding how to use the
sliding window technique.
# filename: blockrep.sed
# author: Paolo Bonzini
# Requires:
# (1) blocks to find and replace, e.g., findrep.txt
# (2) an input file to be changed, input.file
#
# blockrep.sed creates a second sed script, custom.sed,
# to find the lines above the row of 4 hyphens, globally
# replacing them with the lower block of text. GNU sed
# is recommended but not required for this script.
#
# Loop on the first part, accumulating the `from' text
# into the hold space.
:a
/^----$/! {
# Escape slashes, backslashes, the final newline and
# regular expression metacharacters.
s,[/\[.*],\\&,g
s/$/\\/
H
#
# Append N cmds needed to maintain the sliding window.
x
1 s,^.,s/,
1! s/^/N\
/
x
n
ba
}
#
# Change the final backslash to a slash to separate the
# two sides of the s command.
x
s,\\$,/,
x
#
# Until EOF, gather the substitution into hold space.
:b
n
s,[/\],\\&,g
$! s/$/\\/
H
$! bb
#
# Start the RHS of the s command without a leading
# newline, add the P/D pair for the sliding window, and
# print the script.
g
s,/\n,/,
s,$,/\
P\
D,p
#---end of script---