The sed (Stream Editor) FAQ - 6.7.4. Word boundaries

The sed FAQ
Prev	Home	Next

6.7.4. Word boundaries

GNU sed, ssed, sed16, sed15 and sedmod use certain symbols to define the boundary between a "word character" and a nonword character. A word character fits the regex "[A-Za-z0-9_]". Note: a word character includes the underscore "_" but not the hyphen, probably because the underscore is permissible as a label in sed and in other scripting languages. (In gsed103, a word character did NOT include the underscore; it included alphanumerics only.)

These symbols include '\<' and '\>' (gsed, ssed, sed15, sed16, sedmod) and '\b' and '\B' (gsed only). Note that the boundary symbols do not represent a character, but a position on the line. Word boundaries are used with literal characters or character sets to let you match (and delete or alter) whole words without affecting the spaces or punctuation marks outside of those words. They can only be used in a "/pattern/" address or in the LHS of a 's/LHS/RHS/' command. The following table shows how these symbols may be used in HHsed and GNU sed. Sedmod matches the syntax of HHsed.

      Match position      Possible word boundaries   HHsed   GNU sed
      ---------------------------------------------------------------
      start of word    [nonword char]^[word char]      \<    \< or \b
      end of word         [word char]^[nonword char]   \>    \> or \b
      middle of word      [word char]^[word char]     none      \B
      outside of word  [nonword char]^[nonword char]  none      \B
      ---------------------------------------------------------------

In ssed, the symbols '\<' and '\>' lose their special meaning when the -R switch is used to invoke Perl-style expressions. However, the identical meaning of '\<' and '\>' can be obtained through these nonmatching, zero-width assertions:

       (?<!\w)(?=\w)  and   (?<=\w)(?!\w)

The sed FAQ
Prev	Home	Next