lex
and POSIX
flex
is a rewrite of the AT&T Unix lex
tool (the two
implementations do not share any code, though), with some
extensions and incompatibilities, both of which are of
concern to those who wish to write scanners acceptable to
either implementation. Flex is fully compliant with the
POSIX lex
specification, except that when using `%pointer'
(the default), a call to `unput()' destroys the contents of
yytext
, which is counter to the POSIX specification.
In this section we discuss all of the known areas of incompatibility between flex, AT&T lex, and the POSIX specification.
flex's
`-l' option turns on maximum compatibility with the
original AT&T lex
implementation, at the cost of a major
loss in the generated scanner's performance. We note
below which incompatibilities can be overcome using the `-l'
option.
flex
is fully compatible with lex
with the following
exceptions:
lex
scanner internal variable yylineno
is not supported unless `-l' or `%option yylineno' is used.
yylineno
should be maintained on a per-buffer basis, rather
than a per-scanner (single global variable) basis. yylineno
is
not part of the POSIX specification.
EOF
.
Input is instead controlled by defining the
YY_INPUT
macro.
The flex
restriction that `input()' cannot be
redefined is in accordance with the POSIX
specification, which simply does not specify any way of
controlling the scanner's input other than by making
an initial assignment to yyin
.
flex
scanners are not as reentrant as lex
scanners.
In particular, if you have an interactive scanner
and an interrupt handler which long-jumps out of
the scanner, and the scanner is subsequently called
again, you may get the following message:
fatal flex scanner internal error--end of buffer missedTo reenter the scanner, first use
yyrestart( yyin );Note that this call will throw away any buffered input; usually this isn't a problem with an interactive scanner. Also note that flex C++ scanner classes are reentrant, so if using C++ is an option for you, you should use them instead. See "Generating C++ Scanners" above for details.
yyout
(default
stdout
).
`output()' is not part of the POSIX specification.
lex
does not support exclusive start conditions
(%x), though they are in the POSIX specification.
flex
encloses them
in parentheses. With lex, the following:
NAME [A-Z][A-Z0-9]* %% foo{NAME}? printf( "Found it\n" ); %%will not match the string "foo" because when the macro is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence is such that the '?' is associated with "[A-Z0-9]*". With
flex
, the
rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and
so the string "foo" will match.
Note that if the definition begins with `^' or ends
with `$' then it is not expanded with parentheses, to
allow these operators to appear in definitions
without losing their special meanings. But the
`<s>, /', and `<<EOF>>' operators cannot be used in a
flex
definition.
Using `-l' results in the lex
behavior of no
parentheses around the definition.
The POSIX specification is that the definition be enclosed in
parentheses.
lex
allow a rule's action to begin on
a separate line, if the rule's pattern has trailing whitespace:
%% foo|bar<space here> { foobar_action(); }
flex
does not support this feature.
lex
`%r' (generate a Ratfor scanner) option is
not supported. It is not part of the POSIX
specification.
yytext
is undefined until
the next token is matched, unless the scanner was
built using `%array'. This is not the case with lex
or the POSIX specification. The `-l' option does
away with this incompatibility.
lex
interprets "abc{1,3}" as "match
one, two, or three occurrences of 'abc'", whereas
flex
interprets it as "match 'ab' followed by one,
two, or three occurrences of 'c'". The latter is
in agreement with the POSIX specification.
lex
interprets "^foo|bar" as "match either 'foo' at the
beginning of a line, or 'bar' anywhere", whereas
flex
interprets it as "match either 'foo' or 'bar'
if they come at the beginning of a line". The
latter is in agreement with the POSIX specification.
lex
are not required by flex
scanners;
flex
ignores them.
flex
or lex
.
Scanners also include YY_FLEX_MAJOR_VERSION
and
YY_FLEX_MINOR_VERSION
indicating which version of
flex
generated the scanner (for example, for the
2.5 release, these defines would be 2 and 5
respectively).
The following flex
features are not included in lex
or the
POSIX specification:
C++ scanners %option start condition scopes start condition stacks interactive/non-interactive scanners yy_scan_string() and friends yyterminate() yy_set_interactive() yy_set_bol() YY_AT_BOL() <<EOF>> <*> YY_DECL YY_START YY_USER_ACTION YY_USER_INIT #line directives %{}'s around actions multiple actions on a line
plus almost all of the flex flags. The last feature in
the list refers to the fact that with flex
you can put
multiple actions on the same line, separated with
semicolons, while with lex
, the following
foo handle_foo(); ++num_foos_seen;
is (rather surprisingly) truncated to
foo handle_foo();
flex
does not truncate the action. Actions that are not
enclosed in braces are simply terminated at the end of the
line.
Go to the first, previous, next, last section, table of contents.