Bash supports a surprising number of string manipulation
operations. Unfortunately, these tools lack
a unified focus. Some are a subset of parameter substitution, and
others fall under the functionality of the UNIX expr command. This results in
inconsistent command syntax and overlap of functionality,
not to mention confusion.
Example 9-10. Inserting a blank line between paragraphs in a text file
#!/bin/bash
# paragraph-space.sh
# Inserts a blank line between paragraphs of a single-spaced text file.
# Usage: $0 <FILENAME
MINLEN=45 # May need to change this value.
# Assume lines shorter than $MINLEN characters
#+ terminate a paragraph.
while read line # For as many lines as the input file has...
do
echo "$line" # Output the line itself.
len=${#line}
if [ "$len" -lt "$MINLEN" ]
then echo # Add a blank line after short line.
fi
done
exit 0
Length of Matching Substring at Beginning of String
Numerical position in $string of first character in
$substring that matches.
stringZ=abcABC123ABCabc
echo `expr index "$stringZ" C12` # 6
# C position.
echo `expr index "$stringZ" 1c` # 3
# 'c' (in #3 position) matches before '1'.
This is the near equivalent of
strchr() in C.
Substring Extraction
${string:position}
Extracts substring from $string at
$position.
If the $string parameter is
"*"
or "@", then this extracts the
positional parameters,
[1]
starting at $position.
${string:position:length}
Extracts $length characters
of substring from $string at
$position.
stringZ=abcABC123ABCabc
# 0123456789.....
# 0-based indexing.
echo ${stringZ:0} # abcABC123ABCabc
echo ${stringZ:1} # bcABC123ABCabc
echo ${stringZ:7} # 23ABCabc
echo ${stringZ:7:3} # 23A
# Three characters of substring.
# Is it possible to index from the right end of the string?
echo ${stringZ:-4} # abcABC123ABCabc
# Defaults to full string, as in ${parameter:-default}.
# However . . .
echo ${stringZ:(-4)} # Cabc
echo ${stringZ: -4} # Cabc
# Now, it works.
# Parentheses or added space "escape" the position parameter.
# Thank you, Dan Jacobson, for pointing this out.
If the $string parameter is
"*" or
"@", then this extracts a maximum
of $length positional parameters, starting
at $position.
echo ${*:2} # Echoes second and following positional parameters.
echo ${@:2} # Same as above.
echo ${*:2:3} # Echoes three positional parameters, starting at second.
expr substr $string $position $length
Extracts $length characters
from $string starting at
$position.
Extracts $substring
at beginning of $string,
where $substring is a regular expression.
expr "$string" : '\($substring\)'
Extracts $substring
at beginning of $string,
where $substring is a regular
expression.
stringZ=abcABC123ABCabc
# =======
echo `expr match "$stringZ" '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1
echo `expr "$stringZ" : '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1
echo `expr "$stringZ" : '\(.......\)'` # abcABC1
# All of the above forms give an identical result.
expr match "$string" '.*\($substring\)'
Extracts $substring
at end of
$string, where
$substring is a regular
expression.
expr "$string" : '.*\($substring\)'
Extracts $substring
at end of $string,
where $substring is a regular
expression.
Strips shortest match of
$substring from
front of
$string.
${string##substring}
Strips longest match of
$substring from
front of
$string.
stringZ=abcABC123ABCabc
# |----|
# |----------|
echo ${stringZ#a*C} # 123ABCabc
# Strip out shortest match between 'a' and 'C'.
echo ${stringZ##a*C} # abc
# Strip out longest match between 'a' and 'C'.
${string%substring}
Strips shortest match of
$substring from
back of
$string.
${string%%substring}
Strips longest match of
$substring from
back of
$string.
stringZ=abcABC123ABCabc
# ||
# |------------|
echo ${stringZ%b*c} # abcABC123ABCa
# Strip out shortest match between 'b' and 'c', from back of $stringZ.
echo ${stringZ%%b*c} # a
# Strip out longest match between 'b' and 'c', from back of $stringZ.
Example 9-11. Converting graphic file formats, with filename change
#!/bin/bash
# cvt.sh:
# Converts all the MacPaint image files in a directory to "pbm" format.
# Uses the "macptopbm" binary from the "netpbm" package,
#+ which is maintained by Brian Henderson ([email protected]).
# Netpbm is a standard part of most Linux distros.
OPERATION=macptopbm
SUFFIX=pbm # New filename suffix.
if [ -n "$1" ]
then
directory=$1 # If directory name given as a script argument...
else
directory=$PWD # Otherwise use current working directory.
fi
# Assumes all files in the target directory are MacPaint image files,
#+ with a ".mac" filename suffix.
for file in $directory/* # Filename globbing.
do
filename=${file%.*c} # Strip ".mac" suffix off filename
#+ ('.*c' matches everything
#+ between '.' and 'c', inclusive).
$OPERATION $file > "$filename.$SUFFIX"
# Redirect conversion to new filename.
rm -f $file # Delete original files after converting.
echo "$filename.$SUFFIX" # Log what is happening to stdout.
done
exit 0
# Exercise:
# --------
# As it stands, this script converts *all* the files in the current
#+ working directory.
# Modify it to work *only* on files with a ".mac" suffix.
A simple emulation of getopt
using substring extraction constructs.
Example 9-12. Emulating getopt
#!/bin/bash
# getopt-simple.sh
# Author: Chris Morgan
# Used in the ABS Guide with permission.
getopt_simple()
{
echo "getopt_simple()"
echo "Parameters are '$*'"
until [ -z "$1" ]
do
echo "Processing parameter of: '$1'"
if [ ${1:0:1} = '/' ]
then
tmp=${1:1} # Strip off leading '/' . . .
parameter=${tmp%%=*} # Extract name.
value=${tmp##*=} # Extract value.
echo "Parameter: '$parameter', value: '$value'"
eval $parameter=$value
fi
shift
done
}
# Pass all options to getopt_simple().
getopt_simple $*
echo "test is '$test'"
echo "test2 is '$test2'"
exit 0
---
sh getopt_example.sh /test=value1 /test2=value2
Parameters are '/test=value1 /test2=value2'
Processing parameter of: '/test=value1'
Parameter: 'test', value: 'value1'
Processing parameter of: '/test2=value2'
Parameter: 'test2', value: 'value2'
test is 'value1'
test2 is 'value2'
Substring Replacement
${string/substring/replacement}
Replace first match of
$substring with
$replacement.
${string//substring/replacement}
Replace all matches of
$substring with
$replacement.
stringZ=abcABC123ABCabc
echo ${stringZ/abc/xyz} # xyzABC123ABCabc
# Replaces first match of 'abc' with 'xyz'.
echo ${stringZ//abc/xyz} # xyzABC123ABCxyz
# Replaces all matches of 'abc' with # 'xyz'.
${string/#substring/replacement}
If $substring matches
front end of
$string, substitute
$replacement for
$substring.
${string/%substring/replacement}
If $substring matches
back end of
$string, substitute
$replacement for
$substring.
stringZ=abcABC123ABCabc
echo ${stringZ/#abc/XYZ} # XYZABC123ABCabc
# Replaces front-end match of 'abc' with 'XYZ'.
echo ${stringZ/%abc/XYZ} # abcABC123ABCXYZ
# Replaces back-end match of 'abc' with 'XYZ'.
9.2.1. Manipulating strings using awk
A Bash script may invoke the string manipulation facilities of
awk as an alternative to using its
built-in operations.
Example 9-13. Alternate ways of extracting substrings
#!/bin/bash
# substring-extraction.sh
String=23skidoo1
# 012345678 Bash
# 123456789 awk
# Note different string indexing system:
# Bash numbers first character of string as '0'.
# Awk numbers first character of string as '1'.
echo ${String:2:4} # position 3 (0-1-2), 4 characters long
# skid
# The awk equivalent of ${string:pos:length} is substr(string,pos,length).
echo | awk '
{ print substr("'"${String}"'",3,4) # skid
}
'
# Piping an empty "echo" to awk gives it dummy input,
#+ and thus makes it unnecessary to supply a filename.
exit 0
9.2.2. Further Discussion
For more on string manipulation in scripts, refer to Section 9.3 and the
relevant section of the expr command listing. For script examples,
see: