3.7. GNU/POSIX extensions to regular expressions
GNU sed supports "character classes" in addition to regular
character sets, such as [0-9A-F]. Like regular character sets,
character classes represent any single character within a set.
"Character classes are a new feature introduced in the POSIX
standard. A character class is a special notation for describing
lists of characters that have a specific attribute, but where the
actual characters themselves can vary from country to country
and/or from character set to character set. For example, the notion
of what is an alphabetic character differs in the USA and in
France." [quoted from the docs for GNU awk v3.1.0.]
Though character classes don't generally conserve space on the
line, they help make scripts portable for international use. The
equivalent character sets for U.S. users follows:
[[:alnum:]] - [A-Za-z0-9] Alphanumeric characters
[[:alpha:]] - [A-Za-z] Alphabetic characters
[[:blank:]] - [ \x09] Space or tab characters only
[[:cntrl:]] - [\x00-\x19\x7F] Control characters
[[:digit:]] - [0-9] Numeric characters
[[:graph:]] - [!-~] Printable and visible characters
[[:lower:]] - [a-z] Lower-case alphabetic characters
[[:print:]] - [ -~] Printable (non-Control) characters
[[:punct:]] - [!-/:-@[-`{-~] Punctuation characters
[[:space:]] - [ \t\v\f] All whitespace chars
[[:upper:]] - [A-Z] Upper-case alphabetic characters
[[:xdigit:]] - [0-9a-fA-F] Hexadecimal digit characters
Note that [[:graph:]] does not match the space " ", but [[:print:]]
does. Some character classes may (or may not) match characters in
the high ASCII range (ASCII 128-255 or 0x80-0xFF), depending on
which C library was used to compile sed. For non-English languages,
[[:alpha:]] and other classes may also match high ASCII characters.