These notational conventions are used for presenting syntax:
[pattern] | optional |
{pattern} | zero or more repetitions |
(pattern) | grouping |
pat1 | pat2 | choice |
pat<pat'> | difference---elements generated by pat |
except those generated by pat' | |
fibonacci | terminal syntax in typewriter font |
BNF-like syntax is used throughout, with productions having the form:
nonterm | -> | alt1 | alt2 | ... | altn |
There are some families of nonterminals indexed by precedence levels (written as a superscript). Similarly, the nonterminals op, varop, and conop may have a double index: a letter l, r, or n for left-, right- or nonassociativity and a precedence level. A precedence-level variable i ranges from 0 to 9; an associativity variable a varies over {l, r, n}. Thus, for example
aexp | -> | ( expi+1 qop(a,i) ) |
In both the lexical and the context-free syntax, there are some ambiguities that are to be resolved by making grammatical phrases as long as possible, proceeding from left to right (in shift-reduce parsing, resolving shift/reduce conflicts by shifting). In the lexical syntax, this is the "consume longest lexeme" rule. In the context-free syntax, this means that conditionals, let-expressions, and lambda abstractions extend to the right as far as possible.
program | -> | {lexeme | whitespace } |
lexeme | -> | varid | conid | varsym | consym | literal | special | reservedop | reservedid |
literal | -> | integer | float | char | string |
special | -> | ( | ) | , | ; | [ | ] | _ | `| { | } |
whitespace | -> | whitestuff {whitestuff} |
whitestuff | -> | whitechar | comment | ncomment |
whitechar | -> | newline | return | linefeed | vertab | formfeed |
| | space | tab | UNIwhite | |
newline | -> | a newline (system dependent) |
return | -> | a carriage return |
linefeed | -> | a line feed |
space | -> | a space |
tab | -> | a horizontal tab |
vertab | -> | a vertical tab |
formfeed | -> | a form feed |
uniWhite | -> | any UNIcode character defined as whitespace |
comment | -> | -- {any}newline |
ncomment | -> | {- ANYseq {ncomment ANYseq}-} |
ANYseq | -> | {ANY}<{ANY}( {- | -} ) {ANY}> |
ANY | -> | any | newline | vertab | formfeed |
any | -> | graphic | space | tab | nonbrkspc |
graphic | -> | large | small | digit | symbol | special | : | " | ' |
small | -> | ASCsmall | UNIsmall |
ASCsmall | -> | a | b | ... | z |
UNIsmall | -> | any Unicode lowercase letter |
large | -> | ASClarge | UNIlarge |
ASClarge | -> | A | B | ... | Z |
UNIlarge | -> | any uppercase or titlecase Unicode letter |
symbol | -> | ASCsymbol | UNIsymbol |
ASCsymbol | -> | ! | # | $ | % | & | * | + | . | / | < | = | > | ? | @ |
| | \ | ^ | | | - | ~ | |
UNIsymbol | -> | Any Unicode symbol or punctuation |
digit | -> | 0 | 1 | ... | 9 |
udigit | -> | digit | UNIdigit |
UNIdigit | -> | A Unicode numberic |
octit | -> | 0 | 1 | ... | 7 |
hexit | -> | digit | A | ... | F | a | ... | f |
varid | -> | (small {small | large | udigit | ' | _})<reservedid> | |
conid | -> | large {small | large | udigit | ' | _} | |
reservedid | -> | case | class | data | default | deriving | do | else | |
| | if | import | in | infix | infixl | infixr | instance | ||
| | let | module | newtype | of | then | type | where | ||
specialid | -> | as | qualified | hiding | |
varsym | -> | ( symbol {symbol | :})<reservedop> | |
consym | -> | (: {symbol | :})<reservedop> | |
reservedop | -> | .. | :: | = | \ | | | <- | -> | @ | ~ | => | |
specialop | -> | - | ! | |
varid | (variables) | ||
conid | (constructors) | ||
tyvar | -> | varid | (type variables) |
tycon | -> | conid | (type constructors) |
tycls | -> | conid | (type classes) |
modid | -> | conid | (modules) |
qvarid | -> | [ modid . ] varid | |
qconid | -> | [ modid . ] conid | |
qtycon | -> | [ modid . ] tycon | |
qtycls | -> | [ modid . ] tycls | |
qvarsym | -> | [ modid . ] varsym | |
qconsym | -> | [ modid . ] consym | |
decimal | -> | digit{digit} | |
octal | -> | octit{octit} | |
hexadecimal | -> | hexit{hexit} | |
integer | -> | decimal | |
| | 0o octal | 0O octal | ||
| | 0x hexadecimal | 0X hexadecimal | ||
float | -> | decimal . decimal[(e | E)[- | +]decimal] | |
char | -> | ' (graphic<' | \> | space | escape<\&>) ' | |
string | -> | " {graphic<" | \> | space | escape | gap}" | |
escape | -> | \ ( charesc | ascii | decimal | o octal | x hexadecimal ) | |
charesc | -> | a | b | f | n | r | t | v | \ | " | ' | & | |
ascii | -> | ^cntrl | NUL | SOH | STX | ETX | EOT | ENQ | ACK | |
| | BEL | BS | HT | LF | VT | FF | CR | SO | SI | DLE | ||
| | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | ||
| | EM | SUB | ESC | FS | GS | RS | US | SP | DEL | ||
cntrl | -> | ASClarge | @ | [ | \ | ] | ^ | _ | |
gap | -> | \ whitechar {whitechar}\ |
Definitions: The indentation of a lexeme is the column number indicating the start of that lexeme; the indentation of a line is the indentation of its leftmost lexeme. To determine the column number, assume a fixed-width font with this tab convention: tab stops are 8 characters apart, and a tab character causes the insertion of enough spaces to align the current position with the next tab stop.
In the syntax given in the rest of the report, layout lists are always preceded by the keyword where, let, do, or of, and are enclosed within curly braces ({ }) with the individual declarations separated by semicolons (;). Layout lists usually contain declarations, but do and case introduce lists of other sorts. For example, the syntax of a let expression is:
let { decl1 ; decl2 ; ... ; decln [;] } in exp
Haskell permits the omission of the braces and semicolons by using layout to convey the same information. This allows both layout-sensitive and -insensitive styles of coding, which can be freely mixed within one program. Because layout is not required, Haskell programs can be straightforwardly produced by other programs.
The layout (or "off-side") rule takes effect whenever the open brace is omitted after the keyword where, let, do, or of. When this happens, the indentation of the next lexeme (whether or not on a new line) is remembered and the omitted open brace is inserted (the whitespace preceding the lexeme may include comments). For each subsequent line, if it contains only whitespace or is indented more, then the previous item is continued (nothing is inserted); if it is indented the same amount, then a new item begins (a semicolon is inserted); and if it is indented less, then the layout list ends (a close brace is inserted). A close brace is also inserted whenever the syntactic category containing the layout list ends; that is, if an illegal lexeme is encountered at a point where a close brace would be legal, a close brace is inserted. The layout rule matches only those open braces that it has inserted; an explicit open brace must be matched by an explicit close brace. Within these explicit open braces, no layout processing is performed for constructs outside the braces, even if a line is indented to the left of an earlier implicit open brace.
Given these rules, a single newline may actually terminate several
layout lists. Also, these rules permit:
f x = let a = 1; b = 2
g y = exp2
in exp1
making a, b and g all part of the same layout
list.
To facilitate the use of layout at the top level of a module (an implementation may allow several modules may reside in one file), the keyword module and the end-of-file token are assumed to occur in column 0 (whereas normally the first column is 1). Otherwise, all top-level declarations would have to be indented.
Section 1.5 gives an example that uses the layout rule.
module | -> | module modid [exports] where body | |
| | body | ||
body | -> | { [impdecls ;] [[fixdecls ;] topdecls [;]] } | |
| | { impdecls [;] } | ||
impdecls | -> | impdecl1 ; ... ; impdecln | (n>=1) |
exports | -> | ( export1 , ... , exportn [ , ] ) | (n>=0) |
export | -> | qvar | |
| | qtycon [(..) | ( qcname1 , ... , qcnamen )] | (n>=0) | |
| | qtycls [(..) | ( qvar1 , ... , qvarn )] | (n>=0) | |
| | module modid | ||
qcname | -> | qvar | qcon |
impdecl | -> | import [qualified] modid [as modid] [impspec] | |
impspec | -> | ( import1 , ... , importn [ , ] ) | (n>=0) |
| | hiding ( import1 , ... , importn [ , ] ) | (n>=0) | |
import | -> | var | |
| | tycon [ (..) | ( cname1 , ... , cnamen )] | (n>=1) | |
| | tycls [(..) | ( var1 , ... , varn )] | (n>=0) | |
cname | -> | var | con |
fixdecls | -> | fix1 ; ... ; fixn | (n>=1) |
fix | -> | infixl [digit] ops | |
| | infixr [digit] ops | ||
| | infix [digit] ops | ||
ops | -> | op1 , ... , opn | (n>=1) |
topdecls | -> | topdecl1 ; ... ; topdecln | (n>=0) |
topdecl | -> | type simpletype = type | |
| | data [context =>] simpletype = constrs [deriving] | ||
| | newtype [context =>] simpletype = con atype [deriving] | ||
| | class [context =>] simpleclass [where { cbody [;] }] | ||
| | instance [context =>] qtycls inst [where { valdefs [;] }] | ||
| | default (type1 , ... , typen) | (n>=0) | |
| | decl | ||
decls | -> | decl1 ; ... ; decln | (n>=0) |
decl | -> | signdecl | |
| | valdef | ||
decllist | -> | { decls [;] } | |
signdecl | -> | vars :: [context =>] type | |
vars | -> | var1 , ..., varn | (n>=1) |
type | -> | btype [-> type] | (function type) |
btype | -> | [btype] atype | (type application) |
atype | -> | gtycon | |
| | tyvar | ||
| | ( type1 , ... , typek ) | (tuple type, k>=2) | |
| | [ type ] | (list type) | |
| | ( type ) | (parenthesized constructor) | |
gtycon | -> | qtycon | |
| | () | (unit type) | |
| | [] | (list constructor) | |
| | (->) | (function constructor) | |
| | (,{,}) | (tupling constructors) | |
context | -> | class | |
| | ( class1 , ... , classn ) | (n>=1) | |
class | -> | qtycls tyvar |
simpletype | -> | tycon tyvar1 ... tyvark | (k>=0) |
constrs | -> | constr1 | ... | constrn | (n>=1) |
constrs | -> | constr1 | ... | constrn | (n>=1) |
constr | -> | con [!] atype1 ... [!] atypek | (arity con = k, k>=0) |
| | (btype | ! atype) conop (btype | ! atype) | (infix conop) | |
| | con { fielddecl1 , ... , fielddecln } | (n>=1) | |
fielddecl | -> | vars :: (type | ! atype) | |
deriving | -> | deriving (dclass | (dclass1, ... , dclassn)) | (n>=0) |
dclass | -> | qtycls |
simpleclass | -> | tycls tyvar | |
cbody | -> | [ cmethods [ ; cdefaults ] ] | |
cmethods | -> | signdecl1 ; ... ; signdecln | (n >= 1) |
cdefaults | -> | valdef1 ; ... ; valdefn | (n >= 1) |
inst | -> | gtycon | |
| | ( gtycon tyvar1 ... tyvark ) | (k>=0, tyvars distinct) | |
| | ( tyvar1 , ... , tyvark ) | (k>=2, tyvars distinct) | |
| | [ tyvar ] | ||
| | ( tyvar1 -> tyvar2 ) | tyvar1 and tyvar2 distinct | |
valdefs | -> | valdef1 ; ... ; valdefn | (n>=0) |
valdef | -> | lhs = exp [where decllist] |
| | lhs gdrhs [where decllist] | |
lhs | -> | pat0 |
| | funlhs | |
funlhs | -> | var apat {apat } |
| | pati+1 varop(a,i) pati+1 | |
| | lpati varop(l,i) pati+1 | |
| | pati+1 varop(r,i) rpati | |
gdrhs | -> | gd = exp [gdrhs] |
gd | -> | | exp0 |
exp | -> | exp0 :: [context =>] type | (expression type signature) |
| | exp0 | ||
expi | -> | expi+1 [qop(n,i) expi+1] | |
| | lexpi | ||
| | rexpi | ||
lexpi | -> | (lexpi | expi+1) qop(l,i) expi+1 | |
lexp6 | -> | - exp7 | |
rexpi | -> | expi+1 qop(r,i) (rexpi | expi+1) | |
exp10 | -> | \ apat1 ... apatn -> exp | (lambda abstraction, n>=1) |
| | let decllist in exp | (let expression) | |
| | if exp then exp else exp | (conditional) | |
| | case exp of { alts [;] } | (case expression) | |
| | do { stmts [;] } | (do expression) | |
| | fexp | ||
fexp | -> | [fexp] aexp | (function application) |
aexp | -> | qvar | (variable) |
| | gcon | (general constructor) | |
| | literal | ||
| | ( exp ) | (parenthesized expression) | |
| | ( exp1 , ... , expk ) | (tuple, k>=2) | |
| | [ exp1 , ... , expk ] | (list, k>=1) | |
| | [ exp1 [, exp2] .. [exp3] ] | (arithmetic sequence) | |
| | [ exp | qual1 , ... , qualn ] | (list comprehension, n>=1) | |
| | ( expi+1 qop(a,i) ) | (left section) | |
| | ( qop(a,i) expi+1 ) | (right section) | |
| | qcon { fbind1 , ... , fbindn } | (labeled construction, n>=0) | |
| | aexp{qcon} { fbind1 , ... , fbindn } | (labeled update, n >= 1) |
qual | -> | pat <- exp | |
| | let decllist | ||
| | exp | ||
alts | -> | alt1 ; ... ; altn | (n>=1) |
alt | -> | pat -> exp [where decllist] | |
| | pat gdpat [where decllist] | ||
gdpat | -> | gd -> exp [ gdpat ] | |
stmts | -> | exp [; stmts] | |
| | pat <- exp ; stmts | ||
| | let decllist ; stmts | ||
fbinds | -> | { fbind1 , ... , fbindn } | (n>=0) |
fbind | -> | var | qvar = exp | |
pat | -> | var + integer | (successor pattern) |
| | pat0 | ||
pati | -> | pati+1 [qconop(n,i) pati+1] | |
| | lpati | ||
| | rpati | ||
lpati | -> | (lpati | pati+1) qconop(l,i) pati+1 | |
lpat6 | -> | - (integer | float) | (negative literal) |
rpati | -> | pati+1 qconop(r,i) (rpati | pati+1) | |
pat10-> | apat | ||
| | gcon apat1 ... apatk | (arity gcon = k, k>=1) |
apat | -> | var [@ apat] | (as pattern) |
| | gcon | (arity gcon = 0) | |
| | qcon { fpat1 , ... , fpatk } | (labeled pattern, k>=0) | |
| | literal | ||
| | _ | (wildcard) | |
| | ( pat ) | (parenthesized pattern) | |
| | ( pat1 , ... , patk ) | (tuple pattern, k>=2) | |
| | [ pat1 , ... , patk ] | (list pattern, k>=1) | |
| | ~ apat | (irrefutable pattern) | |
fpat | -> | var = pat | |
| | var |
gcon | -> | () | |
| | [] | ||
| | (,{,}) | ||
| | qcon | ||
var | -> | varid | ( varsym ) | (variable) |
qvar | -> | qvarid | ( qvarsym ) | (qualified variable) |
con | -> | conid | ( consym ) | (constructor) |
qcon | -> | qconid | ( qconsym ) | (qualified constructor) |
varop | -> | varsym | `varid` | (variable operator) |
qvarop | -> | qvarsym | `qvarid` | (qualified variable operator) |
conop | -> | consym | `conid` | (constructor operator) |
qconop | -> | qconsym | `qconid` | (qualified constructor operator) |
op | -> | varop | conop | (operator) |
qop | -> | qvarop | qconop | (qualified operator) |