6.7.4. Word boundaries
GNU sed, ssed, sed16, sed15 and sedmod use certain symbols to define
the boundary between a "word character" and a nonword character. A
word character fits the regex "[A-Za-z0-9_]". Note: a word character
includes the underscore "_" but not the hyphen, probably because the
underscore is permissible as a label in sed and in other scripting
languages. (In gsed103, a word character did NOT include the
underscore; it included alphanumerics only.)
These symbols include '\<' and '\>' (gsed, ssed, sed15, sed16,
sedmod) and '\b' and '\B' (gsed only). Note that the boundary
symbols do not represent a character, but a position on the line.
Word boundaries are used with literal characters or character sets
to let you match (and delete or alter) whole words without
affecting the spaces or punctuation marks outside of those words.
They can only be used in a "/pattern/" address or in the LHS of a
's/LHS/RHS/' command. The following table shows how these symbols
may be used in HHsed and GNU sed. Sedmod matches the syntax of
HHsed.
Match position Possible word boundaries HHsed GNU sed
---------------------------------------------------------------
start of word [nonword char]^[word char] \< \< or \b
end of word [word char]^[nonword char] \> \> or \b
middle of word [word char]^[word char] none \B
outside of word [nonword char]^[nonword char] none \B
---------------------------------------------------------------
In ssed, the symbols '\<' and '\>' lose their special meaning when
the -R switch is used to invoke Perl-style expressions. However,
the identical meaning of '\<' and '\>' can be obtained through
these nonmatching, zero-width assertions:
(?<!\w)(?=\w) and (?<=\w)(?!\w)