Computer Science
GLOB(7) Linux Programmer's Manual GLOB(7)
NAME
glob - Globbing pathnames
DESCRIPTION
Long ago, in Unix V6, there was a program /etc/glob that
would expand wildcard patterns. Soon afterwards this
became a shell built-in.
These days there is also a library routine glob(3) that
will perform this function for a user program.
The rules are as follows (POSIX 1003.2, 3.13).
WILDCARD MATCHING
A string is a wildcard pattern if it contains one of the
characters `?', `*' or `['. Globbing is the operation that
expands a wildcard pattern into the list of pathnames
matching the pattern. Matching is defined by:
A `?' (not between brackets) matches any single character.
A `*' (not between brackets) matches any string, including
the empty string.
Character classes
An expression `[...]' where the first character after the
leading `[' is not an `!' matches a single character,
namely any of the characters enclosed by the brackets.
The string enclosed by the brackets cannot be empty;
therefore `]' can be allowed between the brackets, pro-
vided that it is the first character. (Thus, `[][!]'
matches the three characters `[', `]' and `!'.)
Ranges
There is one special convention: two characters separated
by `-' denote a range. (Thus, `[A-Fa-f0-9]' is equivalent
to `[ABCDEFabcdef0123456789]'.) One may include `-' in
its literal meaning by making it the first or last charac-
ter between the brackets. (Thus, `[]-]' matches just the
two characters `]' and `-', and `[--/]' matches the three
characters `-', `.', `/'.)
Complementation
An expression `[!...]' matches a single character, namely
any character that is not matched by the expression
obtained by removing the first `!' from it. (Thus,
`[!]a-]' matches any single character except `]', `a' and
`-'.)
One can remove the special meaning of `?', `*' and `[' by
preceding them by a backslash, or, in case this is part of
a shell command line, enclosing them in quotes. Between
brackets these characters stand for themselves. Thus,
`[[?*\]' matches the four characters `[', `?', `*' and
`\'.
PATHNAMES
Globbing is applied on each of the components of a path-
name separately. A `/' in a pathname cannot be matched by
a `?' or `*' wildcard, or by a range like `[.-0]'. A range
cannot contain an explicit `/' character; this would lead
to a syntax error.
If a filename starts with a `.', this character must be
matched explicitly. (Thus, `rm *' will not remove .pro-
file, and `tar c *' will not archive all your files; `tar
c .' is better.)
EMPTY LISTS
The nice and simple rule given above: `expand a wildcard
pattern into the list of matching pathnames' was the orig-
inal Unix definition. It allowed one to have patterns that
expand into an empty list, as in
xv -wait 0 *.gif *.jpg
where perhaps no *.gif files are present (and this is not
an error). However, POSIX requires that a wildcard pat-
tern is left unchanged when it is syntactically incorrect,
or the list of matching pathnames is empty. With bash one
can force the classical behaviour by setting
allow_null_glob_expansion=true.
(Similar problems occur elsewhere. E.g., where old scripts
have
rm `find . -name "*~"`
new scripts require
rm -f nosuchfile `find . -name "*~"`
to avoid error messages from rm called with an empty argu-
ment list.)
NOTES
Regular expressions
Note that wildcard patterns are not regular expressions,
although they are a bit similar. First of all, they match
filenames, rather than text, and secondly, the conventions
are not the same: e.g., in a regular expression `*' means
zero or more copies of the preceding thing.
Now that regular expressions have bracket expressions
where the negation is indicated by a `^', POSIX has
declared the effect of a wildcard pattern `[^...]' to be
undefined.
Character classes and Internationalization
Of course ranges were originally meant to be ASCII ranges,
so that `[ -%]' stands for `[ !"#$%]' and `[a-z]' stands
for "any lowercase letter". Some Unix implementations
generalized this so that a range X-Y stands for the set of
characters with code between the codes for X and for Y.
However, this requires the user to know the character cod-
ing in use on the local system, and moreover, is not con-
venient if the collating sequence for the local alphabet
differs from the ordering of the character codes. There-
fore, POSIX extended the bracket notation greatly, both
for wildcard patterns and for regular expressions. In the
above we saw three types of item that can occur in a
bracket expression: namely (i) the negation, (ii) explicit
single characters, and (iii) ranges. POSIX specifies
ranges in an internationally more useful way and adds
three more types:
(iii) Ranges X-Y comprise all characters that fall between
X and Y (inclusive) in the currect collating sequence as
defined by the LC_COLLATE category in the current locale.
(iv) Named character classes, like
[:alnum:] [:alpha:] [:blank:] [:cntrl:]
[:digit:] [:graph:] [:lower:] [:print:]
[:punct:] [:space:] [:upper:] [:xdigit:]
so that one can say `[[:lower:]]' instead of `[a-z]', and
have things work in Denmark, too, where there are three
letters past `z' in the alphabet. These character classes
are defined by the LC_CTYPE category in the current
locale.
(v) Collating symbols, like `[.ch.]' or `[.a-acute.]',
where the string between `[.' and `.]' is a collating ele-
ment defined for the current locale. Note that this may be
a multi-character element.
(vi) Equivalence class expressions, like `[=a=]', where
the string between `[=' and `=]' is any collating element
from its equivalence class, as defined for the current
locale. For example, `[[=a=]]' might be equivalent to
`[a]' (warning: Latin-1 here), that is, to `[a[.a-
acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]'.
SEE ALSO
sh(1), glob(3), fnmatch(3), locale(7), regex(7)
Unix 12 June 1998 1
Back to the index