8.6.17 Extract data from text file table
Let's consider a text file called DPL
in which all previous Debian
project leader's names and their initiation days are listed in a
space-separated format.
Ian Murdock August 1993
Bruce Perens April 1996
Ian Jackson January 1998
Wichert Akkerman January 1999
Ben Collins April 2001
Bdale Garbee April 2002
Martin Michlmayr March 2003
Awk is frequently used to extract data from these types of files.
$ awk '{ print $3 }' <DPL # month started
August
April
January
January
April
April
March
$ awk '($1=="Ian") { print }' <DPL # DPL called Ian
Ian Murdock August 1993
Ian Jackson January 1998
$ awk '($2=="Perens") { print $3,$4 }' <DPL # When Perens started
April 1996
Shells such as Bash can be also used to parse this kind of file:
$ while read first last month year; do
echo $month
done <DPL
... same output as the first Awk example
Here, read
built-in command uses the characters in $IFS (internal
field separators) to split lines into words.
If you change IFS to ":", you can parse /etc/passwd
with
shell nicely:
$ oldIFS="$IFS" # save old value
$ IFS=":"
$ while read user password uid gid rest_of_line; do
if [ "$user" = "osamu" ]; then
echo "$user's ID is $uid"
fi
done < /etc/passwd
osamu's ID is 1001
$ IFS="$oldIFS" # restore old value
(If Awk is used to do the equivalent, use FS=":" to set
the field separator.)
IFS is also used by the shell to split results of parameter expansion, command
substitution, and arithmetic expansion. These do not occur within double or
single quoted words. The default value of IFS is <space>, <tab>,
and <newline> combined.
Be careful about using this shell IFS tricks. Strange things may happen, when
shell interprets some parts of the script as its input.
$ IFS=":," # use ":" and "," as IFS
$ echo IFS=$IFS, IFS="$IFS" # echo is a Bash built-in
IFS= , IFS=:,
$ date -R # just a command output
Sat, 23 Aug 2003 08:30:15 +0200
$ echo $(date -R) # sub shell --> input to main shell
Sat 23 Aug 2003 08 30 36 +0200
$ unset IFS # reset IFS to the default
$ echo $(date -R)
Sat, 23 Aug 2003 08:30:50 +0200