Computer Science


GLOB(7)             Linux Programmer's Manual             GLOB(7)

NAME
       glob - Globbing pathnames

DESCRIPTION
       Long  ago,  in Unix V6, there was a program /etc/glob that
       would expand  wildcard  patterns.   Soon  afterwards  this
       became a shell built-in.

       These  days  there  is also a library routine glob(3) that
       will perform this function for a user program.

       The rules are as follows (POSIX 1003.2, 3.13).

WILDCARD MATCHING
       A string is a wildcard pattern if it contains one  of  the
       characters `?', `*' or `['. Globbing is the operation that
       expands a wildcard pattern  into  the  list  of  pathnames
       matching the pattern. Matching is defined by:

       A `?' (not between brackets) matches any single character.

       A `*' (not between brackets) matches any string, including
       the empty string.

   Character classes
       An  expression `[...]' where the first character after the
       leading `[' is not an  `!'  matches  a  single  character,
       namely  any  of  the  characters enclosed by the brackets.
       The string enclosed  by  the  brackets  cannot  be  empty;
       therefore  `]'  can  be allowed between the brackets, pro-
       vided that it  is  the  first  character.  (Thus,  `[][!]'
       matches the three characters `[', `]' and `!'.)

   Ranges
       There  is one special convention: two characters separated
       by `-' denote a range.  (Thus, `[A-Fa-f0-9]' is equivalent
       to  `[ABCDEFabcdef0123456789]'.)   One  may include `-' in
       its literal meaning by making it the first or last charac-
       ter  between the brackets.  (Thus, `[]-]' matches just the
       two characters `]' and `-', and `[--/]' matches the  three
       characters `-', `.', `/'.)

   Complementation
       An  expression `[!...]' matches a single character, namely
       any character  that  is  not  matched  by  the  expression
       obtained  by  removing  the  first  `!'  from  it.  (Thus,
       `[!]a-]' matches any single character except `]', `a'  and
       `-'.)

       One  can remove the special meaning of `?', `*' and `[' by
       preceding them by a backslash, or, in case this is part of
       a  shell  command line, enclosing them in quotes.  Between
       brackets these characters  stand  for  themselves.   Thus,
       `[[?*\]'  matches  the  four  characters `[', `?', `*' and
       `\'.

PATHNAMES
       Globbing is applied on each of the components of  a  path-
       name  separately. A `/' in a pathname cannot be matched by
       a `?' or `*' wildcard, or by a range like `[.-0]'. A range
       cannot  contain an explicit `/' character; this would lead
       to a syntax error.

       If a filename starts with a `.', this  character  must  be
       matched  explicitly.   (Thus, `rm *' will not remove .pro-
       file, and `tar c *' will not archive all your files;  `tar
       c .' is better.)

EMPTY LISTS
       The  nice  and simple rule given above: `expand a wildcard
       pattern into the list of matching pathnames' was the orig-
       inal Unix definition. It allowed one to have patterns that
       expand into an empty list, as in
            xv -wait 0 *.gif *.jpg
       where perhaps no *.gif files are present (and this is  not
       an  error).   However, POSIX requires that a wildcard pat-
       tern is left unchanged when it is syntactically incorrect,
       or the list of matching pathnames is empty.  With bash one
       can   force   the   classical   behaviour    by    setting
       allow_null_glob_expansion=true.

       (Similar problems occur elsewhere. E.g., where old scripts
       have
            rm `find . -name "*~"`
       new scripts require
            rm -f nosuchfile `find . -name "*~"`
       to avoid error messages from rm called with an empty argu-
       ment list.)

NOTES
   Regular expressions
       Note  that  wildcard patterns are not regular expressions,
       although they are a bit similar. First of all, they  match
       filenames, rather than text, and secondly, the conventions
       are not the same: e.g., in a regular expression `*'  means
       zero or more copies of the preceding thing.

       Now  that  regular  expressions  have  bracket expressions
       where the negation  is  indicated  by  a  `^',  POSIX  has
       declared  the  effect of a wildcard pattern `[^...]' to be
       undefined.

   Character classes and Internationalization
       Of course ranges were originally meant to be ASCII ranges,
       so  that  `[ -%]' stands for `[ !"#$%]' and `[a-z]' stands
       for "any lowercase  letter".   Some  Unix  implementations
       generalized this so that a range X-Y stands for the set of
       characters with code between the codes for X  and  for  Y.
       However, this requires the user to know the character cod-
       ing in use on the local system, and moreover, is not  con-
       venient  if  the collating sequence for the local alphabet
       differs from the ordering of the character codes.   There-
       fore,  POSIX  extended  the bracket notation greatly, both
       for wildcard patterns and for regular expressions.  In the
       above  we  saw  three  types  of  item that can occur in a
       bracket expression: namely (i) the negation, (ii) explicit
       single  characters,  and  (iii)  ranges.  POSIX  specifies
       ranges in an internationally  more  useful  way  and  adds
       three more types:

       (iii) Ranges X-Y comprise all characters that fall between
       X and Y (inclusive) in the currect collating  sequence  as
       defined  by the LC_COLLATE category in the current locale.

       (iv) Named character classes, like
       [:alnum:]  [:alpha:]  [:blank:]  [:cntrl:]
       [:digit:]  [:graph:]  [:lower:]  [:print:]
       [:punct:]  [:space:]  [:upper:]  [:xdigit:]
       so that one can say `[[:lower:]]' instead of `[a-z]',  and
       have  things  work  in Denmark, too, where there are three
       letters past `z' in the alphabet.  These character classes
       are  defined  by  the  LC_CTYPE  category  in  the current
       locale.

       (v) Collating symbols,  like  `[.ch.]'  or  `[.a-acute.]',
       where the string between `[.' and `.]' is a collating ele-
       ment defined for the current locale. Note that this may be
       a multi-character element.

       (vi)  Equivalence  class  expressions, like `[=a=]', where
       the string between `[=' and `=]' is any collating  element
       from  its  equivalence  class,  as defined for the current
       locale. For example,  `[[=a=]]'  might  be  equivalent  to
       `[a]'   (warning:  Latin-1  here),  that  is,  to  `[a[.a-
       acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]'.

SEE ALSO
       sh(1), glob(3), fnmatch(3), locale(7), regex(7)

Unix                       12 June 1998                         1

Back to the index


Apply now!


Handbook

Postgraduate study options

Computer Science Blog



Please give us your feedback or ask us a question

This message is...


My feedback or question is...


My email address is...

(Only if you need a reply)

A to Z Directory | Site map | Accessibility | Copyright | Privacy | Disclaimer | Feedback on this page