Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Mail Systems
Eclipse Documentation

How To Guides
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Problem Solutions





1. What Is Perl?
2. Course Requisites and Goals
3. Perl References & Resources
4. State of Perl
5. Taste of Perl
6. Storing & Running Perl Programs
7. The Elements
8. Literals & Operators
    9. Loops and I/O
10. Grade Book Example
11. Pipe I/O and System Calls
12. Matching
13. Parsing
14. Simple CGI
15. Testing Perl Programs
16. Common Goofs

12. Matching

Matching involves use of patterns called "regular expressions". This, as you will see, leads to Perl Paradox Number Four: Regular expressions aren't. See sections 13 and 14 of the Quick Reference.

The =~ operator performs pattern matching and substitution. For example, if:

    $s = 'One if by land and two if by sea';
    if ($s =~ /if by la/) {print "YES"}

    else {print "NO"}
prints "YES", because the string $s matches the simple constant pattern "if by la".
    if ($s =~ /one/) {print "YES"}

    else {print "NO"}
prints "NO", because the string does not match the pattern. However, by adding the "i" option to ignore case, we would get a "YES" from the following:
    if ($s =~ /one/i) {print "YES"}

    else {print "NO"}

Patterns can contain a mind-boggling variety of special directions that facilitate very general matching. See Perl Reference Guide section 13, Regular Expressions. For example, a period matches any character (except the "newline" \n character).

    if ($x =~ / {print "YES"}
would print "YES" for $x = "lamp", "lump", "slumped", but not for $x = "lmp" or "less amperes".

Parentheses () group pattern elements. An asterisk * means that the preceding character, element, or group of elements may occur zero times, one time, or many times. Similarly, a plus + means that the preceding element or group of elements must occur at least once. A question mark ? matches zero or one times. So:

    /fr.*nd/  matches "frnd", "friend", "front and back"

    /fr.+nd/  matches "frond", "friend", "front and back"

                but not "frnd".

    /10*1/    matches "11", "101", "1001", "100000001".

    /b(an)*a/ matches "ba", "bana", "banana", "banananana"

    /flo?at/  matches "flat" and "float"

                but not "flooat"

Square brackets [ ] match a class of single characters.

    [0123456789] matches any single digit

    [0-9]        matches any single digit

    [0-9]+       matches any sequence of one or more digits

    [a-z]+       matches any lowercase word

    [A-Z]+       matches any uppercase word

    [ab n]*      matches the null string "", "b",

                    any number of blanks, "nab a banana"

[^...] matches characters that are not "...":

    [^0-9]       matches any non-digit character.

Curly braces allow more precise specification of repeated fields. For example [0-9]{6} matches any sequence of 6 digits, and [0-9]{6,10} matches any sequence of 6 to 10 digits.

Patterns float, unless anchored. The caret ^ (outside [ ]) anchors a pattern to the beginning, and dollar-sign $ anchors a pattern at the end, so:

    /at/         matches "at", "attention", "flat", & "flatter"

    /^at/        matches "at" & "attention" but not "flat"

    /at$/        matches "at" & "flat", but not "attention"

    /^at$/       matches "at" and nothing else.

    /^at$/i      matches "at", "At", "aT", and "AT".

    /^[ \t]*$/   matches a "blank line", one that contains nothing

                          or any combination of blanks and tabs.

The Backslash. Other characters simply match themselves, but the characters +?.*^$()[]{}|\ and usually / must be escaped with a backslash \ to be taken literally. Thus:

    /10.2/       matches "10Q2", "1052", and "10.2"

    /10\.2/      matches "10.2" but not "10Q2" or "1052"

    /\*+/        matches one or more asterisks

    /A:\\DIR/    matches "A:\DIR"

    /\/usr\/bin/ matches "/usr/bin"
If a backslash preceeds an alphanumeric character, this sequence takes a special meaning, typically a short form of a [ ] character class. For example, \d is the same as the [0-9] digits character class.
    /[-+]?\d*\.?\d*/      is the same as

Either of the above matches decimal numbers: "-150", "-4.13", "3.1415", "+0000.00", etc.

A simple \s specifies "white space", the same as the character class [ \t\n\r\f] (blank, tab, newline, carriage return,form-feed). A character may be specified in hexadecimal as a \x followed by two hexadecimal digits; \x1b is the ESC character.

A vertical bar | specifies "or".

    if ($answer =~ /^y|^yes|^yeah/i ) {

         print "Affirmative!";

prints "Affirmative!" for $answer equal to "y" or "yes" or "yeah" (or "Y", "YeS", or "yessireebob, that's right").

[an error occurred while processing this directive]