The sed (Stream Editor) FAQ - 4.23. How do I match a block of <i>specific</i> consecutive lines?

The sed FAQ
Prev	Home	Next

4.23. How do I match a block of specific consecutive lines?

There are three ways to approach this problem:

       (1) Try to use a "/range/, /expression/"
       (2) Try to use a "/multi-line\nexpression/"
       (3) Try to use a block of "literal strings"

We describe each approach in the following sections.

4.23.1. Try to use a "/range/, /expression/"

If the block of lines are strings that never change their order and if the top line never occurs outside the block, like this:

       Abel
       Baker
       Charlie
       Delta

then these solutions will work for deleting the block:

     sed 's/^Abel$/{N;N;N;d;}' files    # for blocks with few lines
     sed '/^Abel$/, /^Zebra$/d' files   # for blocks with many lines
     sed '/^Abel$/,+25d' files          # HHsed, sedmod, ssed, gsed 3.02.80

To change the block, use the 'c' (change) command instead of 'd'. To print that block only, use the -n switch and 'p' (print) instead of 'd'. To change some things inside the block, try this:

     /^Abel$/,/^Delta$/ {
         :ack
         N;
         /\nDelta$/! b ack
         # At this point, all the lines in the block are collected
         s/ubstitute /somethin/g;
     }

4.23.2. Try to use a "multi-line\nexpression"

If the top line of the block sometimes appears alone or is sometimes followed by other lines, or if a partial block may occur somewhere in the file, a multi-line expression may be required.

In these examples, we give solutions for matching an N-line block. The expression "/^RE1\nRE2\nRE3...$/" represents a properly formed regular expression where \n indicates a newline between lines. Note that the 'N' followed by the 'P;D;' commands forms a "sliding window" technique. A window of N lines is formed. If the multi-line pattern matches, the block is handled. If not, the top line is printed and then deleted from the pattern space, and we try to match at the next line.

     # sed script to delete 2 consecutive lines: /^RE1\nRE2$/
     $b
     /^RE1$/ {
       $!N
       /^RE1\nRE2$/d
       P;D
     }
     #---end of script---

     # sed script to delete 3 consecutive lines. (This script
     # fails under GNU sed v2.05 and earlier because of the 't'
     # bug when s///n is used; see section 7.5(1) of the FAQ.)
     : more
     $!N
     s/\n/&/2;
     t enough
     $!b more
     : enough
     /^RE1\nRE2\nRE3$/d
     P;D
     #---end of script---

For example, to delete a block of 5 consecutive lines, the previous script must be altered in only two places:

(1) Change the 2 in "s/\n/&/2;" to a 4 (the trailing semicolon is needed to work around a bug in HHsed v1.5).

(2) Change the regex line to "/^RE1\nRE2\nRE3\nRE4\nRE5$/d", modifying the expression as needed.

Suppose we want to delete a block of two blank lines followed by the word "foo" followed by another blank line (4 lines in all). Other blank lines and other instances of "foo" should be left alone. After changing the '2' to a '3' (always one number less than the total number of lines), the regex line would look like this: "/^\n\nfoo\n$/d". (Thanks to Greg Ubben for this script.)

As an alternative to work around the 't' bug in older versions of GNU sed, the following script will delete 4 consecutive lines:

     # sed script to delete 4 consecutive lines. Use this if you
     # require GNU sed 2.05 and below.
     /^RE1$/!b
     $!N
     $!N
     :a
     $b
     N
     /^RE1\nRE2\nRE3\nRE4$/d
     P
     s/^.*\n\(.*\n.*\n.*\)$/\1/
     ba
     #---end of script---

Its drawback is that it must be modified in 3 places instead of 2 to adapt it for more lines, and as additional lines are added, the 's' command is forced to work harder to match the regexes. On the other hand, it avoids a bug with gsed-2.05 and illustrates another way to solve the problem of deleting consecutive lines.

4.23.3. Try to use a block of "literal strings"

If you need to match a static block of text (which may occur any number of times throughout a file), where the contents of the block are known in advance, then this script is easy to use. It requires an intermediate file, which we will call "findrep.txt" (below):

       A block of several consecutive lines to
       be matched literally should be placed on
       top. Regular expressions like .*  or [a-z]
       will lose their special meaning and be
       interpreted literally in this block.
       ----
       Four hyphens separate the two sections. Put
       the replacement text in the lower section.
       As above, sed symbols like &, \n, or \1 will
       lose their special meaning.

This is a 3-step process. A generic script called "blockrep.sed" will read "findrep.txt" (above) and generate a custom script, which is then used on the actual input file. In other words, "findrep.txt" is a simplified description of the editing that you want to do on the block, and "blockrep.sed" turns it into actual sed commands.

Use this process from a Unix shell or from a DOS prompt:

     sed -nf blockrep.sed findrep.txt >custom.sed
     sed -f custom.sed input.file >output.file
     erase custom.sed

The generic script "blockrep.sed" follows below. It's fairly long. Examining its output might help you understanding how to use the sliding window technique.

     # filename: blockrep.sed
     #   author: Paolo Bonzini
     # Requires:
     #    (1) blocks to find and replace, e.g., findrep.txt
     #    (2) an input file to be changed, input.file
     #
     # blockrep.sed creates a second sed script, custom.sed,
     # to find the lines above the row of 4 hyphens, globally
     # replacing them with the lower block of text. GNU sed
     # is recommended but not required for this script.
     #
     # Loop on the first part, accumulating the `from' text
     # into the hold space.
     :a
     /^----$/! {
        # Escape slashes, backslashes, the final newline and
        # regular expression metacharacters.
        s,[/\[.*],\\&,g
        s/$/\\/
        H
        #
        # Append N cmds needed to maintain the sliding window.
        x
        1 s,^.,s/,
        1! s/^/N\
     /
        x
        n
        ba
     }
     #
     # Change the final backslash to a slash to separate the
     # two sides of the s command.
     x
     s,\\$,/,
     x
     #
     # Until EOF, gather the substitution into hold space.
     :b
     n
     s,[/\],\\&,g
     $! s/$/\\/
     H
     $! bb
     #
     # Start the RHS of the s command without a leading
     # newline, add the P/D pair for the sliding window, and
     # print the script.
     g
     s,/\n,/,
     s,$,/\
     P\
     D,p
     #---end of script---

The sed FAQ
Prev	Home	Next