6.1 The Theory Behind It All
Regular expressions are a concept borrowed from automata theory.
Regular expressions provide a a way to describe a "language" of
strings.
The term, language, when used in the sense borrowed from automata
theory, can be a bit confusing. A language in automata theory is
simply some (possibly infinite) set of strings. Each string (which can
be possibly empty) is composed of a set of characters from a fixed,
finite set. In our case, this set will be all the possible
ASCII characters(10).
When we write a regular expression, we are writing a description of some
set of possible strings. For the regular expression to have meaning,
this set of possible strings that we are defining should have some
meaning to us.
Regular expressions give us extreme power to do pattern matching on text
documents. We can use the regular expression syntax to write a succinct
description of the entire, infinite class of strings that fit our
specification. In addition, anyone else who understands the description
language of regular expressions, can easily read out description and
determine what set of strings we want to match. Regular expressions are a
universal description for matching regular strings.
When we discuss regular expressions, we discuss "matching". If a
regular expression "matches" a given string, then that string is in
the class we described with the regular expression. If it does not
match, then the string is not in the desired class.