The Haskell 1.4 Report
top | back | next | contents | function index

B  Syntax

B.1  Notational Conventions

These notational conventions are used for presenting syntax:

[pattern] optional
{pattern} zero or more repetitions
(pattern) grouping
pat1 | pat2 choice
pat<pat'> difference---elements generated by pat
except those generated by pat'
fibonacci terminal syntax in typewriter font

BNF-like syntax is used throughout, with productions having the form:
nonterm -> alt1 | alt2 | ... | altn

There are some families of nonterminals indexed by precedence levels (written as a superscript). Similarly, the nonterminals op, varop, and conop may have a double index: a letter l, r, or n for left-, right- or nonassociativity and a precedence level. A precedence-level variable i ranges from 0 to 9; an associativity variable a varies over {l, r, n}. Thus, for example
aexp -> ( expi+1 qop(a,i) )
actually stands for 30 productions, with 10 substitutions for i and 3 for a.

In both the lexical and the context-free syntax, there are some ambiguities that are to be resolved by making grammatical phrases as long as possible, proceeding from left to right (in shift-reduce parsing, resolving shift/reduce conflicts by shifting). In the lexical syntax, this is the "consume longest lexeme" rule. In the context-free syntax, this means that conditionals, let-expressions, and lambda abstractions extend to the right as far as possible.

B.2  Lexical Syntax

program -> {lexeme | whitespace }
lexeme -> varid | conid | varsym | consym | literal | special | reservedop | reservedid
literal -> integer | float | char | string
special -> ( | ) | , | ; | [ | ] | _ | `| { | }
whitespace -> whitestuff {whitestuff}
whitestuff -> whitechar | comment | ncomment
whitechar -> newline | return | linefeed | vertab | formfeed
| space | tab | UNIwhite
newline -> a newline (system dependent)
return -> a carriage return
linefeed -> a line feed
space -> a space
tab -> a horizontal tab
vertab -> a vertical tab
formfeed -> a form feed
uniWhite -> any UNIcode character defined as whitespace
comment -> -- {any}newline
ncomment -> {- ANYseq {ncomment ANYseq}-}
ANYseq -> {ANY}<{ANY}( {- | -} ) {ANY}>
ANY -> any | newline | vertab | formfeed
any -> graphic | space | tab | nonbrkspc
graphic -> large | small | digit | symbol | special | : | " | '
small -> ASCsmall | UNIsmall
ASCsmall -> a | b | ... | z
UNIsmall -> any Unicode lowercase letter
large -> ASClarge | UNIlarge
ASClarge -> A | B | ... | Z
UNIlarge -> any uppercase or titlecase Unicode letter
symbol -> ASCsymbol | UNIsymbol
ASCsymbol -> ! | # | $ | % | & | * | + | . | / | < | = | > | ? | @
| \ | ^ | | | - | ~
UNIsymbol -> Any Unicode symbol or punctuation
digit -> 0 | 1 | ... | 9
udigit -> digit | UNIdigit
UNIdigit -> A Unicode numberic
octit -> 0 | 1 | ... | 7
hexit -> digit | A | ... | F | a | ... | f

varid -> (small {small | large | udigit | ' | _})<reservedid>
conid -> large {small | large | udigit | ' | _}
reservedid -> case | class | data | default | deriving | do | else
| if | import | in | infix | infixl | infixr | instance
| let | module | newtype | of | then | type | where
specialid -> as | qualified | hiding
varsym -> ( symbol {symbol | :})<reservedop>
consym -> (: {symbol | :})<reservedop>
reservedop -> .. | :: | = | \ | | | <- | -> | @ | ~ | =>
specialop -> - | !
varid (variables)
conid (constructors)
tyvar -> varid (type variables)
tycon -> conid (type constructors)
tycls -> conid (type classes)
modid -> conid (modules)
qvarid -> [ modid . ] varid
qconid -> [ modid . ] conid
qtycon -> [ modid . ] tycon
qtycls -> [ modid . ] tycls
qvarsym -> [ modid . ] varsym
qconsym -> [ modid . ] consym
decimal -> digit{digit}
octal -> octit{octit}
hexadecimal -> hexit{hexit}
integer -> decimal
| 0o octal | 0O octal
| 0x hexadecimal | 0X hexadecimal
float -> decimal . decimal[(e | E)[- | +]decimal]
char -> ' (graphic<' | \> | space | escape<\&>) '
string -> " {graphic<" | \> | space | escape | gap}"
escape -> \ ( charesc | ascii | decimal | o octal | x hexadecimal )
charesc -> a | b | f | n | r | t | v | \ | " | ' | &
ascii -> ^cntrl | NUL | SOH | STX | ETX | EOT | ENQ | ACK
| BEL | BS | HT | LF | VT | FF | CR | SO | SI | DLE
| DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN
| EM | SUB | ESC | FS | GS | RS | US | SP | DEL
cntrl -> ASClarge | @ | [ | \ | ] | ^ | _
gap -> \ whitechar {whitechar}\

B.3  Layout

Definitions: The indentation of a lexeme is the column number indicating the start of that lexeme; the indentation of a line is the indentation of its leftmost lexeme. To determine the column number, assume a fixed-width font with this tab convention: tab stops are 8 characters apart, and a tab character causes the insertion of enough spaces to align the current position with the next tab stop.

In the syntax given in the rest of the report, layout lists are always preceded by the keyword where, let, do, or of, and are enclosed within curly braces ({ }) with the individual declarations separated by semicolons (;). Layout lists usually contain declarations, but do and case introduce lists of other sorts. For example, the syntax of a let expression is:

let { decl1 ; decl2 ; ... ; decln [;] } in exp

Haskell permits the omission of the braces and semicolons by using layout to convey the same information. This allows both layout-sensitive and -insensitive styles of coding, which can be freely mixed within one program. Because layout is not required, Haskell programs can be straightforwardly produced by other programs.

The layout (or "off-side") rule takes effect whenever the open brace is omitted after the keyword where, let, do, or of. When this happens, the indentation of the next lexeme (whether or not on a new line) is remembered and the omitted open brace is inserted (the whitespace preceding the lexeme may include comments). For each subsequent line, if it contains only whitespace or is indented more, then the previous item is continued (nothing is inserted); if it is indented the same amount, then a new item begins (a semicolon is inserted); and if it is indented less, then the layout list ends (a close brace is inserted). A close brace is also inserted whenever the syntactic category containing the layout list ends; that is, if an illegal lexeme is encountered at a point where a close brace would be legal, a close brace is inserted. The layout rule matches only those open braces that it has inserted; an explicit open brace must be matched by an explicit close brace. Within these explicit open braces, no layout processing is performed for constructs outside the braces, even if a line is indented to the left of an earlier implicit open brace.

Given these rules, a single newline may actually terminate several layout lists. Also, these rules permit:

f x = let a = 1; b = 2 
          g y = exp2
       in exp1

making a, b and g all part of the same layout list.

To facilitate the use of layout at the top level of a module (an implementation may allow several modules may reside in one file), the keyword module and the end-of-file token are assumed to occur in column 0 (whereas normally the first column is 1). Otherwise, all top-level declarations would have to be indented.

Section 1.5 gives an example that uses the layout rule.

B.4  Context-Free Syntax

module -> module modid [exports] where body
| body
body -> { [impdecls ;] [[fixdecls ;] topdecls [;]] }
| { impdecls [;] }
impdecls -> impdecl1 ; ... ; impdecln   (n>=1)

exports -> ( export1 , ... , exportn [ , ] )   (n>=0)
export -> qvar
| qtycon [(..) | ( qcname1 , ... , qcnamen )]   (n>=0)
| qtycls [(..) | ( qvar1 , ... , qvarn )]   (n>=0)
| module modid
qcname -> qvar | qcon

impdecl -> import [qualified] modid [as modid] [impspec]
impspec -> ( import1 , ... , importn [ , ] )   (n>=0)
| hiding ( import1 , ... , importn [ , ] )   (n>=0)
import -> var
| tycon [ (..) | ( cname1 , ... , cnamen )]   (n>=1)
| tycls [(..) | ( var1 , ... , varn )]   (n>=0)
cname -> var | con
fixdecls -> fix1 ; ... ; fixn   (n>=1)
fix -> infixl [digit] ops
| infixr [digit] ops
| infix  [digit] ops
ops -> op1 , ... , opn   (n>=1)

topdecls -> topdecl1 ; ... ; topdecln   (n>=0)
topdecl -> type simpletype = type
| data [context =>] simpletype = constrs [deriving]
| newtype [context =>] simpletype = con atype [deriving]
| class [context =>] simpleclass [where { cbody [;] }]
| instance [context =>] qtycls inst [where { valdefs [;] }]
| default (type1 , ... , typen)   (n>=0)
| decl
decls -> decl1 ; ... ; decln   (n>=0)
decl -> signdecl
| valdef
decllist -> { decls [;] }
signdecl -> vars :: [context =>] type
vars -> var1 , ..., varn (n>=1)

type -> btype [-> type] (function type)
btype -> [btype] atype (type application)
atype -> gtycon
| tyvar
| ( type1 , ... , typek ) (tuple type, k>=2)
| [ type ] (list type)
| ( type ) (parenthesized constructor)
gtycon -> qtycon
| () (unit type)
| [] (list constructor)
| (->) (function constructor)
| (,{,}) (tupling constructors)
context -> class
| ( class1 , ... , classn ) (n>=1)
class -> qtycls tyvar
simpletype -> tycon tyvar1 ... tyvark (k>=0)
constrs -> constr1 | ... | constrn (n>=1)
constrs -> constr1 | ... | constrn (n>=1)
constr -> con [!] atype1 ... [!] atypek (arity con = k, k>=0)
| (btype | ! atype) conop (btype | ! atype) (infix conop)
| con { fielddecl1 , ... , fielddecln } (n>=1)
fielddecl -> vars :: (type | ! atype)
deriving -> deriving (dclass | (dclass1, ... , dclassn)) (n>=0)
dclass -> qtycls

simpleclass -> tycls tyvar
cbody -> [ cmethods [ ; cdefaults ] ]
cmethods -> signdecl1 ; ... ; signdecln (n >= 1)
cdefaults -> valdef1 ; ... ; valdefn (n >= 1)

inst -> gtycon
| ( gtycon tyvar1 ... tyvark ) (k>=0, tyvars distinct)
| ( tyvar1 , ... , tyvark ) (k>=2, tyvars distinct)
| [ tyvar ]
| ( tyvar1 -> tyvar2 ) tyvar1 and tyvar2 distinct
valdefs -> valdef1 ; ... ; valdefn (n>=0)

valdef -> lhs = exp [where decllist]
| lhs gdrhs [where decllist]
lhs -> pat0
| funlhs
funlhs -> var apat {apat }
| pati+1 varop(a,i) pati+1
| lpati varop(l,i) pati+1
| pati+1 varop(r,i) rpati
gdrhs -> gd = exp [gdrhs]
gd -> | exp0

exp -> exp0 :: [context =>] type (expression type signature)
| exp0
expi -> expi+1 [qop(n,i) expi+1]
| lexpi
| rexpi
lexpi -> (lexpi | expi+1) qop(l,i) expi+1
lexp6 -> - exp7
rexpi -> expi+1 qop(r,i) (rexpi | expi+1)
exp10 -> \ apat1 ... apatn -> exp (lambda abstraction, n>=1)
| let decllist in exp (let expression)
| if exp then exp else exp (conditional)
| case exp of { alts [;] } (case expression)
| do { stmts [;] } (do expression)
| fexp
fexp -> [fexp] aexp (function application)

aexp -> qvar (variable)
| gcon (general constructor)
| literal
| ( exp ) (parenthesized expression)
| ( exp1 , ... , expk ) (tuple, k>=2)
| [ exp1 , ... , expk ] (list, k>=1)
| [ exp1 [, exp2] .. [exp3] ] (arithmetic sequence)
| [ exp | qual1 , ... , qualn ] (list comprehension, n>=1)
| ( expi+1 qop(a,i) ) (left section)
| ( qop(a,i) expi+1 ) (right section)
| qcon { fbind1 , ... , fbindn } (labeled construction, n>=0)
| aexp{qcon} { fbind1 , ... , fbindn } (labeled update, n >= 1)

qual -> pat <- exp
| let decllist
| exp
alts -> alt1 ; ... ; altn (n>=1)
alt -> pat -> exp [where decllist]
| pat gdpat [where decllist]
gdpat -> gd -> exp [ gdpat ]
stmts -> exp [; stmts]
| pat <- exp ; stmts
| let decllist ; stmts
fbinds -> { fbind1 , ... , fbindn } (n>=0)
fbind -> var | qvar = exp

pat -> var + integer (successor pattern)
| pat0
pati -> pati+1 [qconop(n,i) pati+1]
| lpati
| rpati
lpati -> (lpati | pati+1) qconop(l,i) pati+1
lpat6 -> - (integer | float) (negative literal)
rpati -> pati+1 qconop(r,i) (rpati | pati+1)
pat10-> apat
| gcon apat1 ... apatk (arity gcon = k, k>=1)

apat -> var [@ apat] (as pattern)
| gcon (arity gcon = 0)
| qcon { fpat1 , ... , fpatk } (labeled pattern, k>=0)
| literal
| _ (wildcard)
| ( pat ) (parenthesized pattern)
| ( pat1 , ... , patk ) (tuple pattern, k>=2)
| [ pat1 , ... , patk ] (list pattern, k>=1)
| ~ apat (irrefutable pattern)
fpat -> var = pat
| var

gcon -> ()
| []
| (,{,})
| qcon
var -> varid | ( varsym ) (variable)
qvar -> qvarid | ( qvarsym ) (qualified variable)
con -> conid | ( consym ) (constructor)
qcon -> qconid | ( qconsym ) (qualified constructor)
varop -> varsym | `varid` (variable operator)
qvarop -> qvarsym | `qvarid` (qualified variable operator)
conop -> consym | `conid` (constructor operator)
qconop -> qconsym | `qconid` (qualified constructor operator)
op -> varop | conop (operator)
qop -> qvarop | qconop (qualified operator)


The Haskell 1.4 Report
top | back | next | contents | function index
March 27, 1997