The Bison parser is actually a C function named yyparse
. Here we
describe the interface conventions of yyparse
and the other
functions that it needs to use.
Keep in mind that the parser uses many C identifiers starting with `yy' and `YY' for internal purposes. If you use such an identifier (aside from those in this manual) in an action or in additional C code in the grammar file, you are likely to run into trouble.
yyparse
You call the function yyparse
to cause parsing to occur. This
function reads tokens, executes actions, and ultimately returns when it
encounters end-of-input or an unrecoverable syntax error. You can also
write an action which directs yyparse
to return immediately without
reading further.
The value returned by yyparse
is 0 if parsing was successful (return
is due to end-of-input).
The value is 1 if parsing failed (return is due to a syntax error).
In an action, you can cause immediate return from yyparse
by using
these macros:
YYACCEPT
YYABORT
yylex
The lexical analyzer function, yylex
, recognizes tokens from
the input stream and returns them to the parser. Bison does not create
this function automatically; you must write it so that yyparse
can
call it. The function is sometimes referred to as a lexical scanner.
In simple programs, yylex
is often defined at the end of the Bison
grammar file. If yylex
is defined in a separate source file, you
need to arrange for the token-type macro definitions to be available there.
To do this, use the `-d' option when you run Bison, so that it will
write these macro definitions into a separate header file
`name.tab.h' which you can include in the other source files
that need it. See section Invoking Bison.
yylex
The value that yylex
returns must be the numeric code for the type
of token it has just found, or 0 for end-of-input.
When a token is referred to in the grammar rules by a name, that name
in the parser file becomes a C macro whose definition is the proper
numeric code for that token type. So yylex
can use the name
to indicate that type. See section Symbols, Terminal and Nonterminal.
When a token is referred to in the grammar rules by a character literal,
the numeric code for that character is also the code for the token type.
So yylex
can simply return that character code. The null character
must not be used this way, because its code is zero and that is what
signifies end-of-input.
Here is an example showing these things:
yylex () { ... if (c == EOF) /* Detect end of file. */ return 0; ... if (c == '+' || c == '-') return c; /* Assume token type for `+' is '+'. */ ... return INT; /* Return the type of the token. */ ... }
This interface has been designed so that the output from the lex
utility can be used without change as the definition of yylex
.
If the grammar uses literal string tokens, there are two ways that
yylex
can determine the token type codes for them:
yylex
can use these symbolic names like
all others. In this case, the use of the literal string tokens in
the grammar file has no effect on yylex
.
yylex
can find the multi-character token in the yytname
table. The index of the token in the table is the token type's code.
The name of a multi-character token is recorded in yytname
with a
double-quote, the token's characters, and another double-quote. The
token's characters are not escaped in any way; they appear verbatim in
the contents of the string in the table.
Here's code for looking up a token in yytname
, assuming that the
characters of the token are stored in token_buffer
.
for (i = 0; i < YYNTOKENS; i++) { if (yytname[i] != 0 && yytname[i][0] == '"' && strncmp (yytname[i] + 1, token_buffer, strlen (token_buffer)) && yytname[i][strlen (token_buffer) + 1] == '"' && yytname[i][strlen (token_buffer) + 2] == 0) break; }The
yytname
table is generated only if you use the
%token_table
declaration. See section Bison Declaration Summary.
In an ordinary (nonreentrant) parser, the semantic value of the token must
be stored into the global variable yylval
. When you are using
just one data type for semantic values, yylval
has that type.
Thus, if the type is int
(the default), you might write this in
yylex
:
... yylval = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ ...
When you are using multiple data types, yylval
's type is a union
made from the %union
declaration (see section The Collection of Value Types). So when
you store a token's value, you must use the proper member of the union.
If the %union
declaration looks like this:
%union { int intval; double val; symrec *tptr; }
then the code in yylex
might look like this:
... yylval.intval = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ ...
If you are using the `@n'-feature (see section Special Features for Use in Actions) in
actions to keep track of the textual locations of tokens and groupings,
then you must provide this information in yylex
. The function
yyparse
expects to find the textual location of a token just parsed
in the global variable yylloc
. So yylex
must store the
proper data in that variable. The value of yylloc
is a structure
and you need only initialize the members that are going to be used by the
actions. The four members are called first_line
,
first_column
, last_line
and last_column
. Note that
the use of this feature makes the parser noticeably slower.
The data type of yylloc
has the name YYLTYPE
.
When you use the Bison declaration %pure_parser
to request a
pure, reentrant parser, the global communication variables yylval
and yylloc
cannot be used. (See section A Pure (Reentrant) Parser.) In such parsers the two global variables are replaced by
pointers passed as arguments to yylex
. You must declare them as
shown here, and pass the information back by storing it through those
pointers.
yylex (lvalp, llocp) YYSTYPE *lvalp; YYLTYPE *llocp; { ... *lvalp = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ ... }
If the grammar file does not use the `@' constructs to refer to
textual positions, then the type YYLTYPE
will not be defined. In
this case, omit the second argument; yylex
will be called with
only one argument.
If you use a reentrant parser, you can optionally pass additional
parameter information to it in a reentrant way. To do so, define the
macro YYPARSE_PARAM
as a variable name. This modifies the
yyparse
function to accept one argument, of type void *
,
with that name.
When you call yyparse
, pass the address of an object, casting the
address to void *
. The grammar actions can refer to the contents
of the object by casting the pointer value back to its proper type and
then dereferencing it. Here's an example. Write this in the parser:
%{ struct parser_control { int nastiness; int randomness; }; #define YYPARSE_PARAM parm %}
Then call the parser like this:
struct parser_control
{
int nastiness;
int randomness;
};
...
{
struct parser_control foo;
... /* Store proper data in foo
. */
value = yyparse ((void *) &foo);
...
}
In the grammar actions, use expressions like this to refer to the data:
((struct parser_control *) parm)->randomness
If you wish to pass the additional parameter data to yylex
,
define the macro YYLEX_PARAM
just like YYPARSE_PARAM
, as
shown here:
%{ struct parser_control { int nastiness; int randomness; }; #define YYPARSE_PARAM parm #define YYLEX_PARAM parm %}
You should then define yylex
to accept one additional
argument--the value of parm
. (This makes either two or three
arguments in total, depending on whether an argument of type
YYLTYPE
is passed.) You can declare the argument as a pointer to
the proper object type, or you can declare it as void *
and
access the contents as shown above.
You can use `%pure_parser' to request a reentrant parser without
also using YYPARSE_PARAM
. Then you should call yyparse
with no arguments, as usual.
yyerror
The Bison parser detects a parse error or syntax error
whenever it reads a token which cannot satisfy any syntax rule. A
action in the grammar can also explicitly proclaim an error, using the
macro YYERROR
(see section Special Features for Use in Actions).
The Bison parser expects to report the error by calling an error
reporting function named yyerror
, which you must supply. It is
called by yyparse
whenever a syntax error is found, and it
receives one argument. For a parse error, the string is normally
"parse error"
.
If you define the macro YYERROR_VERBOSE
in the Bison declarations
section (see section The Bison Declarations Section), then Bison provides a more verbose
and specific error message string instead of just plain "parse
error"
. It doesn't matter what definition you use for
YYERROR_VERBOSE
, just whether you define it.
The parser can detect one other kind of error: stack overflow. This
happens when the input contains constructions that are very deeply
nested. It isn't likely you will encounter this, since the Bison
parser extends its stack automatically up to a very large limit. But
if overflow happens, yyparse
calls yyerror
in the usual
fashion, except that the argument string is "parser stack
overflow"
.
The following definition suffices in simple programs:
yyerror (s) char *s; { fprintf (stderr, "%s\n", s); }
After yyerror
returns to yyparse
, the latter will attempt
error recovery if you have written suitable error recovery grammar rules
(see section Error Recovery). If recovery is impossible, yyparse
will
immediately return 1.
The variable yynerrs
contains the number of syntax errors
encountered so far. Normally this variable is global; but if you
request a pure parser (see section A Pure (Reentrant) Parser) then it is a local variable
which only the actions can access.
Here is a table of Bison constructs, variables and macros that are useful in actions.
$$
but specifies alternative typealt in the union
specified by the %union
declaration. See section Data Types of Values in Actions.
$n
but specifies alternative typealt in the
union specified by the %union
declaration.
See section Data Types of Values in Actions.
yyparse
, indicating failure.
See section The Parser Function yyparse
.
yyparse
, indicating success.
See section The Parser Function yyparse
.
yychar
when there is no look-ahead token.
yyerror
, and does not print any message. If you
want to print an error message, call yyerror
explicitly before
the `YYERROR;' statement. See section Error Recovery.
yyparse
.) When there is
no look-ahead token, the value YYEMPTY
is stored in the variable.
See section Look-Ahead Tokens.
struct { int first_line, last_line; int first_column, last_column; };Thus, to get the starting line number of the third component, use `@3.first_line'. In order for the members of this structure to contain valid information, you must make
yylex
supply this information about each token.
If you need only certain members, then yylex
need only fill in
those members.
The use of this feature makes the parser noticeably slower.
Go to the first, previous, next, last section, table of contents.