33.2.3 Etags Regexps
The ‘--regex’ option provides a general way of recognizing tags
based on regexp matching. You can freely intermix it with file names.
If you specify multiple ‘--regex’ options, all of them are used
in parallel, but each one applies only to the source files that follow
it. The syntax is:
--regex=[{language}]/tagregexp/[nameregexp/]modifiers
The essential part of the option value is tagregexp, the
regexp for matching tags. It is always used anchored, that is, it
only matches at the beginning of a line. If you want to allow
indented tags, use a regexp that matches initial whitespace; start it
with ‘[ \t]*’.
In these regular expressions, ‘\’ quotes the next character, and
all the GCC character escape sequences are supported (‘\a’ for
bell, ‘\b’ for back space, ‘\d’ for delete, ‘\e’ for
escape, ‘\f’ for formfeed, ‘\n’ for newline, ‘\r’ for
carriage return, ‘\t’ for tab, and ‘\v’ for vertical tab).
Ideally, tagregexp should not match more characters than are
needed to recognize what you want to tag. If the syntax requires you
to write tagregexp so it matches more characters beyond the tag
itself, you should add a nameregexp, to pick out just the tag.
This will enable Emacs to find tags more accurately and to do
completion on tag names more reliably. You can find some examples
below.
The modifiers are a sequence of zero or more characters that
modify the way etags
does the matching. A regexp with no
modifiers is applied sequentially to each line of the input file, in a
case-sensitive way. The modifiers and their meanings are:
- ‘i’
- Ignore case when matching this regexp.
- ‘m’
- Match this regular expression against the whole file, so that
multi-line matches are possible.
- ‘s’
- Match this regular expression against the whole file, and allow
‘.’ in tagregexp to match newlines.
The ‘-R’ option cancels all the regexps defined by preceding
‘--regex’ options. It applies to the file names following it, as
you can see from the following example:
etags --regex=/reg1/i voo.doo --regex=/reg2/m \
bar.ber -R --lang=lisp los.er
Here etags
chooses the parsing language for voo.doo and
bar.ber according to their contents. etags
also uses
reg1 to recognize additional tags in voo.doo, and both
reg1 and reg2 to recognize additional tags in
bar.ber. reg1 is checked against each line of
voo.doo and bar.ber, in a case-insensitive way, while
reg2 is checked against the whole bar.ber file,
permitting multi-line matches, in a case-sensitive way. etags
uses only the Lisp tags rules, with no user-specified regexp matching,
to recognize tags in los.er.
You can restrict a ‘--regex’ option to match only files of a
given language by using the optional prefix {language}.
(‘etags --help’ prints the list of languages recognized by
etags
.) This is particularly useful when storing many
predefined regular expressions for etags
in a file. The
following example tags the DEFVAR
macros in the Emacs source
files, for the C language only:
--regex='{c}/[ \t]*DEFVAR_[A-Z_ \t(]+"\([^"]+\)"/'
When you have complex regular expressions, you can store the list of
them in a file. The following option syntax instructs etags
to
read two files of regular expressions. The regular expressions
contained in the second file are matched without regard to case.
--regex=@case-sensitive-file --ignore-case-regex=@ignore-case-file
A regex file for etags
contains one regular expression per
line. Empty lines, and lines beginning with space or tab are ignored.
When the first character in a line is ‘@’, etags
assumes
that the rest of the line is the name of another file of regular
expressions; thus, one such file can include another file. All the
other lines are taken to be regular expressions. If the first
non-whitespace text on the line is ‘--’, that line is a comment.
For example, we can create a file called ‘emacs.tags’ with the
following contents:
-- This is for GNU Emacs C source files
{c}/[ \t]*DEFVAR_[A-Z_ \t(]+"\([^"]+\)"/\1/
and then use it like this:
etags [email protected] *.[ch] */*.[ch]
Here are some more examples. The regexps are quoted to protect them
from shell interpretation.
- Tag Octave files:
etags --language=none \
--regex='/[ \t]*function.*=[ \t]*\([^ \t]*\)[ \t]*(/\1/' \
--regex='/###key \(.*\)/\1/' \
--regex='/[ \t]*global[ \t].*/' \
*.m
Note that tags are not generated for scripts, so that you have to add
a line by yourself of the form ‘###key scriptname’ if you
want to jump to it.
- Tag Tcl files:
etags --language=none --regex='/proc[ \t]+\([^ \t]+\)/\1/' *.tcl
- Tag VHDL files:
etags --language=none \
--regex='/[ \t]*\(ARCHITECTURE\|CONFIGURATION\) +[^ ]* +OF/' \
--regex='/[ \t]*\(ATTRIBUTE\|ENTITY\|FUNCTION\|PACKAGE\
\( BODY\)?\|PROCEDURE\|PROCESS\|TYPE\)[ \t]+\([^ \t(]+\)/\3/'