yacc(1)
yacc - Generates an LR(1) parsing program from input con-
sisting of a context-free grammar specification
SYNOPSIS
yacc [-vltds] [-b prefix] [-N number] [-p symbol_prefix]
[-P pathname] grammar
The yacc command converts a context-free grammar specifi-
cation into a set of tables for a simple automaton that
executes an LR(1) parsing algorithm.
STANDARDS
Interfaces documented on this reference page conform to
industry standards as follows:
yacc: XPG4, XPG4-UNIX
Refer to the standards(5) reference page for more informa-
tion about industry standards and associated tags.
FLAGS
Uses prefix instead of y as the prefix for all output
filenames (prefix.tab.c, prefix.tab.h, and prefix.output).
Produces the y.tab.h file, which contains the #define
statements that associate the yacc-assigned token codes
with your token names. This allows source files other
than y.tab.c to access the token codes by including this
header file. Includes no #line constructs in y.tab.c.
Use this only after the grammar and associated actions are
fully debugged. [Digital] Provides yacc with extra stor-
age for building its LALR tables, which may be necessary
when compiling very large grammars. Thenumber should be
larger than 40,000 when you use this flag. Allows multi-
ple yacc parsers to be linked together. Use symbol_prefix
instead of yy to prefix global symbols. [Digital] Speci-
fies an alternative parser (instead of /usr/ccs/lib/yacc-
par). The pathname specifies the filename of the skeleton
to be used in place of yaccpar). [Digital] Breaks the
yyparse() function into several smaller functions.
Because its size is somewhat proportional to that of the
grammar, it is possible for yyparse() to become too large
to compile, optimize, or execute efficiently. Compiles
run-time debugging code. By default, this code is not
included when y.tab.c is compiled. If YYDEBUG has a
nonzero value, the C compiler (cc) includes the debugging
code, whether or not the -t flag was used. Without com-
piling this code, yyparse() will run more quickly. Pro-
duces the y.output file, which contains a readable
description of the parsing tables and a report on con-
flicts generated by grammar ambiguities.
PARAMETERS
yacc Input under the DESCRIPTION.
DESCRIPTION
The yacc grammar can be ambiguous; specified precedence
rules are used to break ambiguities.
You must compile the y.tab.c output file with a C language
compiler to produce the yyparse() function. This function
must be loaded with a yylex lexical analyzer function, as
well as main() and yyerror(), an error-handling routine
(you must provide these routines). The lex command is
useful for creating lexical analyzers usable by yacc.
The yacc program reads its skeleton parser from the file
/usr/ccs/lib/yaccpar. Use the environment variable YACC-
PAR to specify another location for the yacc program to
read from. If you use this environment variable, the -P
option is ignored, if specified.
Syntax for yacc Input
This section contains a formal description of the yacc
input file (or grammar file), which is normally named with
a .y suffix. The section provides a listing of the spe-
cial values, macros, and functions recognized by yacc.
The general format of the yacc input file is: [ defini-
tions ] %% [ rules ] [ %% [ user functions ] ] where Is
the section where you define the variables to be used
later in the grammar, such as in the rules section. It is
also where files are included (#include) and processing
conditions are defined. This section is optional. Is the
section that contains grammar rules for the parser. A
yacc input file must have a rules section. Is the section
that contains user-supplied functions that can be used by
the actions in the rules section. This section is
optional.
The NULL character must not be used in grammar rules or
literals. Each line in the definitions can be:
%{ When placed on lines by themselves, these enclose C
code to be passed into the global definitions of the out-
put file. Such lines commonly include preprocessor direc-
tives and declarations of external variables and func-
tions. Lists tokens or terminal symbols to be used in the
rest of the input file. This line is needed for tokens
that do not appear in other % definitions. If type is pre-
sent, the C type for all tokens on this line is declared
to be the type referenced by type. If a positive integer
number follows a token, that value is assigned to the
token. Indicates that each token is an operator, that all
tokens in this definition have equal precedence, and that
an operator, that all tokens in this definition have equal
precedence, and that a succession of the operators listed
in this definition are evaluated right to left. Indicates
that each token is an operator, and that the operators
listed in this definition cannot appear in succession.
Indicates that the token cannot be used associatively.
Indicates the highest-level production rule to be reduced;
in other words, the rule where the parser can consider its
work done and terminate. If this definition is not
included, the parser uses the first production rule. The
symbol must be non-terminal (not a token). Defines each
symbol as data type type, to resolve ambiguities. If this
construct is present, yacc performs type checking and oth-
erwise assumes all symbols to be of type integer. Defines
the yylval global variable as a union, where union-def is
a standard C definition in the format: { type member ; [
type member ; ... ] }
At least one member should be an int. Any valid C
data type can be defined, including structures.
When you run yacc with the -d option, the defini-
tion of yylval is placed in the y.tab.h file and
can be referred to in a lex input file.
Every token (non-terminal symbol) must be listed in one of
the preceding % definitions. Multiple tokens can be sepa-
rated by white space or commas. All the tokens in %left,
%right, and %nonassoc definitions are assigned a prece-
dence with tokens in later definitions having precedence
over those in earlier definitions.
In addition to symbols, a token can be literal character
enclosed in single quotes. (Multibyte characters are rec-
ognized by the lexical analyzer and returned as tokens.)
The following special characters can be used, just as in C
programs: Alert Newline Tab Vertical tab Carriage Return
Backspace Form Feed Backslash Single Quote Question mark
One or more octal digits specifying the integer value of
the character
The rules section consists of a series of production rules
that the parser tries to reduce. The format of each pro-
duction rule is:
symbol : symbol-sequence [ action ] [ | symbol-sequence [ action
] ... ] ;
where symbol-sequence consists of zero or more symbols
separated by white space. The first symbol must be the
first character of the line, but newlines and other white
space can appear anywhere else in the rule. All terminal
symbols must be declared in %token definitions.
own rule. Always use left-recursion (where the recursive
symbol appears before the terminating case in symbol-
sequence).
The specific sequence: %prec token indicates that the cur-
rent sequence of symbols is to be preferred over others,
at the level of precedence assigned to token in the defi-
nitions section.
The specially defined token error matches any unrecognized
sequence of input. This token causes the parser to invoke
the yyerror function. By default, the parser tries to
synchronize with the input and continue processing it by
reading and discarding all input up to the symbol follow-
ing error. (You can override this behavior through the
yyerrok action.) If no error token appears in the yacc
input file, the parser exits with an error message upon
encountering unrecognized input.
The parser always executes action after encountering the
symbol that precedes it. Thus, an action can appear in
the middle of a symbol-sequence, after each symbol-
sequence, or after multiple instances of symbol-sequence.
In the last case, action is executed when the parser
matches any of the sequences.
The action consists of standard C code within braces and
can also take the following values, variables, and key-
words. If the token returned by the yylex function is
associated with a significant value, yylex should place
the value in this global variable. By default, yylval is
of type long. The definitions section can include a
%union definition to associate with other data types,
including structures. If you run yacc with the -d option,
the full yylval definition is passed into the y.tab.h file
for access by lex Causes the parser to start parsing
tokens immediately after an erroneous sequence, instead of
performing the default action of reading and discarding
tokens up to a synchronization token. The yyerrok action
should appear immediately after the error token. Refers
to symbol n, a token index in the production, counting
from the beginning of the production rule, where the first
symbol after the colon is $1. The type variable is the
name of one of the union lines listed in the %union direc-
tive in the declaration section. The <type> syntax (non-
standard) allows the value to be cast to a specific data
type. Note that you will rarely need to use the type syn-
tax. Refers to the value returned by the matched symbol-
sequence and used for the matched symbol when reducing
other rules. The symbol-sequence generally assigns a
value to $$. The type variable is the name of one of the
union lines listed in the %union directive in the declara-
will rarely need to use the type syntax.
The user functions section contains user-supplied pro-
grams. If you supply a lexical analyzer (yylex) to the
parser, it must be contained in the user functions sec-
tion.
The following functions, which are contained in the user
functions section, are invoked within the yyparse function
generated by yacc. The lexical analyzer called by yyparse
to recognize each token of input. Usually this function
is created by lex. yylex reads input, recognizes expres-
sions within the input, and returns a token number repre-
senting the kind of token read. The function returns an
int value. A return value of 0 (zero) means the end of
input.
If the parser and yylex do not agree on these token
numbers, reliable communication between them cannot
occur. For (one character) literals, the token is
simply the numeric value of the character in the
current character set. The numbers for other tokens
can either be chosen by yacc, or by the user. In
either case, the #define construct of C is used to
allow yylex () to return these numbers symboli-
cally. The #define statements are put into the code
file, and the header file if that file is
requested. The set of characters permitted by yacc
in an identifier is larger than that permitted by
C. Token names found to contain such characters
will not be included in the #define declarations.
If the token numbers are chosen by yacc, the tokens
other than literals, are assigned numbers greater
than 256, although no order is implied. A token can
be explicitly assigned a number by following its
first appearance in the declaration section with a
number. Names and literals not defined this way
retain their default definition. All assigned token
numbers are unique and distinct from the token num-
bers used for literals. If duplicate token numbers
cause conflicts in parser generation, yacc reports
an error; otherwise, it is unspecified whether the
token assignment is accepted or an error is
reported.
The end of the input is marked by a special token
called the endmarker that has a token number that
is zero or negative. All lexical analyzers return
zero or negative as a token number upon reaching
the end of their input. If the tokens up to, but
not excluding, the endmarker form a structure that
text, it is considered an error. The function that
the parser calls upon encountering an input error.
The default function, defined in liby.a, simply
prints string to the standard error. The user can
redefine the function. The function's type is
void.
The liby.a library contains default main() and yyerror()
functions. These look like the following, respectively:
main() {
setlocale(LC_ALL, "");
(void) yyparse();
return(0); }
int yyerror(s);
char *s; {
fprintf(stderr,"%s\n",s);
return (0); }
Comments, in C syntax, can appear anywhere in the user
functions or definitions sections. In the rules section,
comments can appear wherever a symbol is allowed. Blank
lines or lines consisting of white space can be inserted
anywhere in the file, and are ignored.
ENVIRONMENT VARIABLES
The following environment variables affect the execution
of yacc: Provides a default value for the international-
ization variables that are unset or null. If LANG is unset
or null, the corresponding value from the default locale
is used. If any of the internationalization variables con-
tain an invalid setting, the utility behaves as if none of
the variables had been defined. If set to a non-empty
string value, overrides the values of all the other inter-
nationalization variables. Determines the locale for the
interpretation of sequences of bytes of text data as char-
acters (for example, single-byte as opposed to multi-byte
characters in arguments and input files). Determines the
locale for the format and contents of diagnostic messages
written to standard error. Determines the location of
message catalogues for the processing of LC_MESSAGES.
NOTES
The LANG and LC_* variables affect the execution of the
yacc command as stated. The main() function defined by
yacc calls setlocale(LC_ALL, "") thus, the program gener-
ated by yacc will also be affected by the contents of
these variables at runtime.
EXAMPLES
This section describes the example programs for the lex
and yacc commands, which together create a simple desk
program also allows you to assign values to variables
(each designated by a single lowercase ASCII letter), and
then use the variables in calculations. The files that
contain the program are as follows: The lex specification
file that defines the lexical analysis rules. The yacc
grammar file that defines the parsing rules and calls the
yylex() function created by lex to provide input.
The remaining text expects that the current directory is
the directory that contains the lex and yacc example pro-
gram files.
Compiling the Example Program
Perform the following steps to create the example program
using lex and yacc: Process the yacc grammar file using
the -d flag. The -d flag tells yacc to create a file that
defines the tokens it uses in addition to the C language
source code. yacc -d calc.y The following files are cre-
ated (the *.o files are created temporarily and then
removed): The C language source file that yacc created for
the parser. A header file containing #define statements
for the tokens used by the parser. Process the lex speci-
fication file: lex calc.l The following file is created:
The C language source file that lex created for the lexi-
cal analyzer. Compile and link the two C language source
files: cc -o calc y.tab.c lex.yy.c The following files are
created: The object file for y.tab.c. The object file for
lex.yy.c. The executable program file.
You can then run the program directly by entering:
calc
Then enter numbers and operators in calculator
fashion. After you press <Return>, the program
displays the result of the operation. If you
assign a value to a variable as follows, the cursor
moves to the next line: m=4 <Return> _
You can then use the variable in calculations and
it will have the value assigned to it: m+5 <Return>
9
The Parser Source Code
The text that follows shows the contents of the file
calc.y. This file has entries in all three of the sec-
tions of a yacc grammar file: declarations, rules, and
programs. %{ #include <stdio.h>
int regs[26]; int base;
%}
%token DIGIT LETTER
%left '|' %left '&' %left '+' '-' %left '*' '/' '%' %left
UMINUS /*supplies precedence for unary minus */
%% /*beginning of rules section */
list : /*empty */ | list stat '\n'
| list error '\n' { yyerrok; }
;
stat : expr { printf("%d\n",$1); }
| LETTER '=' expr { regs[$1] = $3; }
;
expr : '(' expr ')' { $$ = $2; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| expr '%' expr { $$ = $1 % $3; }
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '&' expr { $$ = $1 & $3; }
| expr '|' expr { $$ = $1 | $3; }
| '-' expr %prec UMINUS { $$ = -$2; }
| LETTER { $$ = regs[$1]; }
| number ;
number : DIGIT { $$ = $1; base =
($1==0) ? 8:10; } | number DIGIT
{ $$ = base * $1 + $2; } ;
%% main() { return(yyparse()); }
yyerror(s) char *s; { fprintf(stderr,"%s\n",s); }
yywrap() { return(1); }
Declarations Section
This section contains entries that perform the following
functions: Includes standard I/O header file. Defines
global variables. Defines the list rule as the place to
start processing. Defines the tokens used by the parser.
Defines the operators and their precedence.
Rules Section
The rules section defines the rules that parse the input
stream.
Programs Section
The programs section contains the following routines.
Because these routines are included in this file, you do
start the program. This error handling routine only
prints a syntax error message. The wrap-up routine that
returns a value of 1 when the end of input occurs.
The Lexical Analyzer Source Code
This shows the contents of the file calc.l. This file
contains include statements for standard input and output,
as well as for the y.tab.h file. The yacc program gener-
ates that file from the yacc grammar file information, if
you use the -d flag with the yacc command. The file
y.tab.h contains definitions for the tokens that the
parser program uses. In addition, calc.l contains the
rules used to generate the tokens from the input stream.
%{
#include <stdio.h> #include "y.tab.h" int c; #if !defined
(YYSTYPE) #define YYSTYPE long #endif extern YYSTYPE yyl-
val; %} %% " " ; [a-z] { c = yytext[0];
yylval = c - 'a'; return(LETTER);
} [0-9] { c = yytext[0]; yyl-
val = c - '0'; return(DIGIT); } [^a-z
0-9] { c = yytext[0]; return(c);
}
FILES
A readable description of parsing tables and a report on
conflicts generated by grammar ambiguities. Output file.
Definitions for token names. Temporary file. Temporary
file. Temporary file. Default skeleton parser for C pro-
grams. yacc library.
EXIT VALUES
The following exit values are returned:
Successful completion An error occurred
RELATED INFORMATION
Commands: lex(1)
Standards: standards(5)
Documents: Programming Support Tools
delim off