The Haskell 1.4 Report: Expressions

The Haskell 1.4 Report
top | back | next | contents | function index

3 Expressions

In this section, we describe the syntax and informal semantics of Haskell expressions, including their translations into the Haskell kernel, where appropriate. Except in the case of letexpressions, these translations preserve both the static and dynamic semantics. Some of the names and symbols used in the syntax are not reserved. These are indicated by the `special' productions in the lexical syntax. Examples include ! (used only in data declarations) and as (used in import declarations).

Free variables and constructors used in these translations refer to entities defined by the Prelude. To avoid clutter, we use True instead of Prelude.True or map instead of Prelude.map. (Prelude.True is a qualified name as described in Section 5.1.2.)

In the syntax that follows, there are some families of nonterminals indexed by precedence levels (written as a superscript). Similarly, the nonterminals op, varop, and conop may have a double index: a letter l, r, or n for left-, right- or non-associativity and a precedence level. A precedence-level variable i ranges from 0 to 9; an associativity variable a varies over {l, r, n}. Thus, for example

aexp -> ( exp_i+1 qop_(a,i) )
actually stands for 30 productions, with 10 substitutions for i and 3 for a.

exp -> exp₀ :: [context =>] type (expression type signature)
| exp₀
exp_i -> exp_i+1 [qop_(n,i) exp_i+1]
| lexp_i
| rexp_i
lexp_i -> (lexp_i | exp_i+1) qop_(l,i) exp_i+1
lexp₆ -> - exp₇
rexp_i -> exp_i+1 qop_(r,i) (rexp_i | exp_i+1)
exp₁₀ -> \ apat₁ ... apat_n -> exp (lambda abstraction, n>=1)
| let decllist in exp (let expression)
| if exp then exp else exp (conditional)
| case exp of { alts [;] } (case expression)
| do { stmts [;] } (do expression)
| fexp
fexp -> [fexp] aexp (function application)
aexp -> qvar (variable)
| gcon (general constructor)
| literal
| ( exp ) (parenthesized expression)
| ( exp₁ , ... , exp_k ) (tuple, k>=2)
| [ exp₁ , ... , exp_k ] (list, k>=1)
| [ exp₁ [, exp₂] .. [exp₃] ] (arithmetic sequence)
| [ exp | qual₁ , ... , qual_n ] (list comprehension, n>=1)
| ( exp_i+1 qop_(a,i) ) (left section)
| ( qop_(a,i) exp_i+1 ) (right section)
| qcon { fbind₁ , ... , fbind_n } (labeled construction, n>=0)
| aexp_{qcon} { fbind₁ , ... , fbind_n } (labeled update, n >= 1)

As an aid to understanding this grammar, Table 1 shows the relative precedence of expressions, patterns and definitions, plus an extended associativity. - indicates that the item is non-associative.

Item Associativity

simple terms, parenthesized terms --
irrefutable patterns (~) --
as-patterns (@) right
function application left
do, if, let, lambda(\), case (leftwards) right
case (rightwards) right

infix operators, prec. 9 as defined
... ...
infix operators, prec. 0 as defined

function types (->) right
contexts (=>) --
type constraints (::) --
do, if, let, lambda(\) (rightwards) right
sequences (..) --
generators (<-) --
grouping (,) n-ary
guards (|) --
case alternatives (->) --
definitions (=) --
separation (;) n-ary

Table 1

Precedence of expressions, patterns, definitions (highest to lowest)

The grammar is ambiguous regarding the extent of lambda abstractions, let expressions, and conditionals. The ambiguity is resolved by the metarule that each of these constructs extends as far to the right as possible. As a consequence, each of these constructs has two precedences, one to its left, which is the precedence used in the grammar; and one to its right, which is obtained via the metarule. See the sample parses below.

Expressions involving infix operators are disambiguated by the operator's fixity (see Section 5.6). Consecutive unparenthesized operators with the same precedence must both be either left or right associative to avoid a syntax error. Given an unparenthesized expression "x qop_(a,i) y qop_(b,j) z", parentheses must be added around either "x qop_(a,i) y" or "y qop_(b,j) z" when i=j unless a=b=l or a=b=r.

Negation is the only prefix operator in Haskell ; it has the same precedence as the infix - operator defined in the Prelude (see Figure 2).

The separation of function arrows from case alternatives solves the ambiguity that otherwise arises when an unparenthesized function type is used in an expression, such as the guard in a case expression.

Sample parses are shown below.

This Parses as
f x + g y (f x) + (g y)
- f x + y (- (f x)) + y
let { ... } in x + y let { ... } in (x + y)
z + let { ... } in x + y z + (let { ... } in (x + y))
f x y :: Int (f x y) :: Int
\ x -> a+b :: Int \ x -> ((a+b) :: Int)

For the sake of clarity, the rest of this section shows the syntax of expressions without their precedences.

3.1 Errors

Errors during expression evaluation, denoted by _|_, are indistinguishable from non-termination. Since Haskell is a lazy language, all Haskell types include _|_. That is, a value of any type may be bound to a computation that, when demanded, results in an error. When evaluated, errors cause immediate program termination and cannot be caught by the user. The Prelude provides two functions to directly cause such errors: error :: String -> a undefined :: aA call to error terminates execution of the program and returns an appropriate error indication to the operating system. It should also display the string in some system-dependent manner. When undefined is used, the error message is created by the compiler.

Translations of Haskell expressions use error and undefined to explicitly indicate where execution time errors may occur. The actual program behavior when an error occurs is up to the implementation. The messages passed to the error function in these translations are only suggestions; implementations may choose to display more or less information when an error occurs.

3.2 Variables, Constructors, and Operators

aexp	`->`	qvar	(variable)
	`\|`	gcon	(general constructor)
	`\|`	literal


gcon	`->`	`()`
	`\|`	`[]`
	`\|`	`(,`{`,`}`)`
	`\|`	qcon
qvar	`->`	qvarid \| `(` qvarsym `)`	(qualified variable)
qcon	`->`	qconid \| `(` qconsym `)`	(qualified constructor)

Alphanumeric operators are formed by enclosing an identifier between grave accents (backquotes). Any variable or constructor may be used as an operator in this way. If fun is an identifier (either variable or constructor), then an expression of the form fun x y is equivalent to x `fun`y. If no fixity declaration is given for `fun` then it defaults to highest precedence and left associativity (see Section 5.6).

Similarly, any symbolic operator may be used as a (curried) variable or constructor by enclosing it in parentheses. If op is an infix operator, then an expression or pattern of the form x op y is equivalent to (op) x y.

Qualified names may only be used to reference an imported variable or constructor (see Section 5.1.2) but not in the definition of a new variable or constructor. Thus let F.x = 1 in F.x -- invalidincorrectly uses a qualifier in the definition of x, regardless of the module containing this definition. Qualification does not affect the nature of an operator: F.+ is an infix operator just as + is.

Special syntax is used to name some constructors for some of the built-in types, as found in the production for gcon and literal. These are described in Section 6.1.

An integer literal represents the application of the function fromInteger to the appropriate value of type Integer. Similarly, a floating point literal stands for an application of fromRational to a value of type Rational (that is, Ratio Integer).

Translation:
The integer literal i is equivalent to fromInteger i, where fromInteger is a method in class Num (see Section 6.3.1).
The floating point literal f is equivalent to fromRational(n Ratio.% d), where fromRational is a method in class Fractionaland Ratio.% constructs a rational from two integers, as defined in the Ratio library. The integers n and d are chosen so that n/d = f.

3.3 Curried Applications and Lambda Abstractions

fexp	`->`	[fexp] aexp	(function application)
exp	`->`	`\` apat₁ ... apat_n `->` exp

Function application is written e₁ e₂. Application associates to the left, so the parentheses may be omitted in (f x) y. Because e₁ could be a data constructor, partial applications of data constructors are allowed.

Lambda abstractions are written \ p₁ ... p_n -> e, where the p_i are patterns. An expression such as \x:xs->x is syntactically incorrect, and must be rewritten as \(x:xs)->x.

The set of patterns must be linear---no variable may appear more than once in the set.

Translation:
The lambda abstraction \ p₁ ... p_n -> e is equivalent to
\ x₁ ... x_n -> case (x₁, ..., x_n) of (p₁, ..., p_n) -> e
where the x_i are new identifiers. Given this translation combined with the semantics of case expressions and pattern matching described in Section 3.17.3, if the pattern fails to match, then the result is _|_.

3.4 Operator Applications

exp	`->`	exp₁ qop exp₂
	`\|`	`-` exp	(prefix negation)

The form e₁ qop e₂ is the infix application of binary operator qop to expressions e₁ and e₂.

The special form -e denotes prefix negation, the only prefix operator in Haskell , and is syntax for negate (e). The binary - operator does not necessarily refer to the definition of - in the Prelude; it may be rebound by the module system. However, unary - will always refer to the negate function defined in the Prelude. There is no link between the local meaning of the - operator and unary negation.

Prefix negation has the same precedence as the infix operator -defined in the Prelude (see Table 2). Because e1-e2 parses as an infix application of the binary operator -, one must write e1(-e2) for the alternative parsing. Similarly, (-) is syntax for (\ x y -> x-y), as with any infix operator, and does not denote (\ x -> -x)---one must use negate for that.

Translation:
e₁ op e₂ is equivalent to (op) e₁ e₂. -e is equivalent to negate (e).

3.5 Sections

aexp	`->`	`(` exp qop `)`
	`\|`	`(` qop exp `)`

Sections are written as ( op e ) or ( e op ), where op is a binary operator and e is an expression. Sections are a convenient syntax for partial application of binary operators.

The normal rules of syntactic precedence apply to sections; for example, (*a+b) is syntactically invalid, but (+a*b) and (*(a+b)) are valid. Syntactic associativity, however, is not taken into account in sections; thus, (a+b+) must be written ((a+b)+).

Because - is treated specially in the grammar, (- exp) is not a section, but an application of prefix negation, as described in the preceding section. However, there is a subtractfunction defined in the Prelude such that (subtract exp) is equivalent to the disallowed section. The expression (+ (- exp)) can serve the same purpose.

Translation:
For binary operator op and expression e, if x is a variable that does not occur free in e, the section (op e) is equivalent to \ x -> x op e, and the section (e op) is equivalent to (op) e.

3.6 Conditionals

exp -> if exp₁ then exp₂ else exp₃

A conditional expression has the form if e₁ then e₂ else e₃ and returns the value of e₂ if the value of e₁ is True, e₃ if e₁ is False, and _|_ otherwise.

Translation:
if e₁ then e₂ else e₃ is equivalent to:
case e₁ of { True -> e₂ ; False -> e₃ }
where True and False are the two nullary constructors from the type Bool, as defined in the Prelude.

3.7 Lists

aexp -> [ exp₁ , ... , exp_k ] (k>=1)

Lists are written [e₁, ..., e_k], where k>=1; the empty list is written []. Standard operations on lists are given in the Prelude (see Appendix A, notably Section A.1).

Translation:
[e₁, ..., e_k] is equivalent to
e₁ : (e₂ : ( ... (e_k : [])))
where : and [] are constructors for lists, as defined in the Prelude (see Section 6.1.3). The types of e₁ through e_k must all be the same (call it t), and the type of the overall expression is [t] (see Section 4.1.1).

3.8 Tuples

aexp -> ( exp₁ , ... , exp_k ) (k>=2)

Tuples are written (e₁, ..., e_k), and may be of arbitrary length k>=2. Standard operations on tuples are given in the Prelude (see Appendix A).

Translation:
(e₁, ..., e_k) for k>=2 is an instance of a k-tuple as defined in the Prelude, and requires no translation. If t₁ through t_k are the types of e₁ through e_k, respectively, then the type of the resulting tuple is (t₁, ..., t_k) (see Section 4.1.1).

3.9 Unit Expressions and Parenthesized Expressions

aexp	`->`	`()`
	`\|`	`(` exp `)`

The form (e) is simply a parenthesized expression, and is equivalent to e. The unit expression () has type () (see Section 4.1.1); it is the only member of that type apart from _|_ (it can be thought of as the "nullary tuple")---see Section 6.1.5.

Translation:

(e) is equivalent to e.

3.10 Arithmetic Sequences

aexp -> [ exp₁ [, exp₂] .. [exp₃] ]

The form [e₁, e₂ .. e₃] denotes an arithmetic sequence from e₁ in increments of e₂-e₁ of values not greater than e₃ (if the increment is nonnegative) or not less than e₃ (if the increment is negative). Thus, the resulting list is empty if the increment is nonnegative and e₃ is less than e₁ or if the increment is negative and e₃ is greater than e₁. If the increment is zero, an infinite list of e₁s results if e₃ is not less than e₁. If e₃ is omitted, the result is an infinite list, unless the element type is finite, in which case the implied limit is the greatest value of the type if the increment is nonnegative, or the least value, otherwise.

The forms [e₁.. e₃] and [e₁..] are similar to those above, but with an implied increment of one.

Arithmetic sequences may be defined over any type in class Enum, including Char, Int, and Integer (see Figure 5 and Section 4.3.3). For example, ['a'..'z'] denotes the list of lowercase letters in alphabetical order.

Translation:
Arithmetic sequences satisfy these identities:

[ e₁.. ] = enumFrom e₁
[ e₁,e₂.. ] = enumFromThen e₁ e₂
[ e₁..e₃ ] = enumFromTo e₁ e₃
[ e₁,e₂..e₃ ] = enumFromThenTo e₁ e₂ e₃

where enumFrom, enumFromThen, enumFromTo, and enumFromThenToare class methods in the class Enum as defined in the Prelude (see Figure 5 ).

3.11 List Comprehensions

aexp	`->`	`[` exp `\|` qual₁ `,` ... `,` qual_n `]`	(list comprehension, n>=1)
qual	`->`	pat `<-` exp
	`\|`	`let` decllist
	`\|`	exp

A list comprehension has the form [ e | q₁, ..., q_n ], n>=1, where the q_i qualifiers are either

generators of the form p <- e, where p is a pattern (see Section 3.17) of type t and e is an expression of type Monad m => m t
guards, which are arbitrary expressions of type Bool
local bindings that provide new definitions for use in the generated expression e or subsequent guards and generators.

While list comprehensions are commonly used to generate lists, the definition of list comprehensions uses monadic operations that can be used with other types besides lists. This syntax provides a slightly more concise way of expressing some forms of do expressions (see Section 3.14). For simplicity, we will describe this construct only as it applies to lists.

Such a list comprehension returns the list of elements produced by evaluating e in the successive environments created by the nested, depth-first evaluation of the generators in the qualifier list. Binding of variables occurs according to the normal pattern matching rules (see Section 3.17), and if a match fails then that element of the list is simply skipped over. Thus: [ x | xs <- [ [(1,2),(3,4)], [(5,4),(3,2)] ], (3,x) <- xs ]yields the list [4,2]. If a qualifier is a guard, it must evaluate to True for the previous pattern match to succeed. As usual, bindings in list comprehensions can shadow those in outer scopes; for example:

[ x | x <- x, x <- x ] = [ z | y <- x, z <- y]

Translation:
List comprehensions satisfy these identities, which may be used as a translation into the kernel:

[ e | q₁ ...q_n ] = do {T(q₁);...;T(q_n); return e}
T(b) = guard b
T(let decllist) = let decllist
T(p <- l) = p <- l

where e ranges over expressions, p ranges over patterns, l ranges over list-valued expressions, and b ranges over boolean expressions. The return and guard functions are as defined in the Prelude.
As indicated by the translation of list comprehensions, variables bound by let have fully polymorphic types while those defined by <- are lambda bound and are thus monomorphic (see Section 4.5.4).

3.12 Let Expressions

exp -> let decllist in exp

Let expressions have the general form let { d₁ ; ... ; d_n } in e, and introduce a nested, lexically-scoped, mutually-recursive list of declarations (let is often called letrec in other languages). The scope of the declarations is the expression e and the right hand side of the declarations. Declarations are described in Section 4. Pattern bindings are matched lazily; an implicit ~ makes these patterns irrefutable. For example, let (x,y) = undefined in edoes not cause an execution-time error until x or y is evaluated.

Translation:
The dynamic semantics of the expression let { d₁ ; ... ; d_n } in e₀ are captured by this translation: After removing all type signatures, each declaration d_i is translated into an equation of the form p_i = e_i, where p_i and e_i are patterns and expressions respectively, using the translation in Section 4.4.2. Once done, these identities hold, which may be used as a translation into the kernel:

let {p₁ = e₁; ...; p_n = e_n} in e₀ = let (~p₁,...,~p_n) = (e₁,...,e_n) in e₀
let p = e₁ in e₀ = case e₁ of ~p -> e₀
where no variable in p appears free in e₁
let p = e₁ in e₀ = let p = fix ( \ ~p -> e₁) in e₀

where fix is the least fixpoint operator. Note the use of the irrefutable patterns in the second and third rules. This translation does not preserve the static semantics because the use of caseprecludes a fully polymorphic typing of the bound variables. The static semantics of the bindings in a let expression are described in Section 4.4.2.

3.13 Case Expressions


exp	`->`	`case` exp `of` `{` alts [`;`] `}`
alts	`->`	alt₁ `;` ... `;` alt_n	(n>=1)
alt	`->`	pat `->` exp [`where` decllist]
	`\|`	pat gdpat [`where` decllist]
gdpat	`->`	gd `->` exp [ gdpat ]
gd	`->`	`\|` exp₀

A case expression has the general form

case e of { p₁ match₁ ; ... ; p_n match_n }

where each match_i is of the general form

| g_i1 -> e_i1
...
| g_{im_i} -> e_{im_i}
where decllist_i

Each alternative p_i match_i consists of a pattern p_i and its matches, match_i, which consists of pairs of guards g_ij and bodies e_ij (expressions), as well as optional bindings (decllist_i) that scope over all of the guards and expressions of the alternative. An alternative of the form

pat -> exp where decllist

is treated as shorthand for:

pat | True -> expr
where decllist

A case expression must have at least one alternative and each alternative must have at least one body. Each body must have the same type, and the type of the whole expression is that type.

A case expression is evaluated by pattern matching the expression e against the individual alternatives. The matches are tried sequentially, from top to bottom. The first successful match causes evaluation of the corresponding alternative body, in the environment of the case expression extended by the bindings created during the matching of that alternative and by the decllist_i associated with that alternative. If no match succeeds, the result is _|_. Pattern matching is described in Section 3.17, with the formal semantics of case expressions in Section 3.17.3.

3.14 Do Expressions

exp	`->`	`do` `{` stmts [`;`]`}`	(do expression)
stmts	`->`	exp [`;` stmts]
	`\|`	pat `<-` exp `;` stmts
	`\|`	`let` decllist `;` stmts

A do expression provides a more readable syntax for monadic programming.

Translation:

Do expressions satisfy these identities, which may be used as a translation into the kernel:

`do {`e`}`	=	e
`do {`e`;`stmts`}`	=	e `>> do {`stmts`}`
`do {`p `<-` e`;` stmts`}`	=	e `>>= \`p `-> do {`stmts`}`
		where p is failure-free
`do {`p `<-` e`;` stmts`}`	=	`let ok` p `= do {`stmts`}`
		`ok _ = zero`
		`in` e `>>= ok`
		where p is not failure-free
`do {let` decllist`;` stmts`}`	=	`let` decllist `in do {`stmts`}`

>>, >>=, and zero are operations in the classes Monad and MonadZero, as defined in the Prelude., and ok is a new identifier not appearing in p.

A failure-free pattern is one that can only be refuted by _|_. Failure-free patterns are defined as follows:

All irrefutable patterns are failure-free (irrefutable patterns are described in Section 3.17.1).
If C is the only constructor in its type, then C p₁ ... p_n is failure-free when each of the p_i is failure free.
If pattern p is failure-free, then the pattern v@p is failure-free.

This translation requires a monad in class MonadZero if any pattern bound by <- is not failure-free. Otherwise, only class methods from Monad are generated. Type errors resulting from patterns that are not failure-free can be corrected by using ~ to force the pattern to be failure-free.

As indicated by the translation of do, variables bound by let have fully polymorphic types while those defined by <- are lambda bound and are thus monomorphic.

3.15 Datatypes with Field Labels

A datatype declaration may optionally include field labels for some or all of the components of the type (see Section 4.2.1). Readers unfamiliar with datatype declarations in Haskell may wish to read Section 4.2.1 first. These field labels can be used to construct, select from, and update fields in a manner that is independent of the overall structure of the datatype.

Different datatypes cannot share common field labels in the same scope. A field label can be used at most once in a constructor. Within a datatype, however, a field name can be used in more than one constructor provided the field has the same typing in all constructors.

3.15.1 Field Selection

aexp -> qvar

Field names are used as selector functions. When used as a variable, a field name serves as a function that extracts the field from an object. Selectors are top level bindings and so they may be shadowed by local variables but cannot conflict with other top level bindings of the same name. This shadowing only affects selector functions; in other record constructs, field labels cannot be confused with ordinary variables.

Translation:
A field label f introduces a selector function defined as:

f x = case x of { C₁ p₁₁ ...p_1k -> e₁ ; ... ; C_n p_n1 ...p_nk -> e_n }

where C₁ ...C_n are all the constructors of the datatype containing a field labeled with f, p_ij is y when f labels the jth component of C_i or _ otherwise, and e_i is y when some field in C_i has a label of f or undefined otherwise.

3.15.2 Construction Using Field Labels

aexp	`->`	qcon `{` fbind₁ `,` ... `,` fbind_n `}`	(labeled construction, n>=0)
fbind	`->`	var \| qvar `=` exp

A constructor with labeled fields may be used to construct a value in which the components are specified by name rather than by position. Unlike the braces used in declaration lists, these are not subject to layout; the { and } characters must be explicit. (This is also true of field updates and field patterns.) Construction using field names is subject to the following constraints:

Only field labels declared with the specified constructor may be mentioned.
A field name may not be mentioned more than once.
Fields not mentioned are initialized to _|_.
When the = exp is omitted and there is a variable with the same name as the field label in scope, the field is initialized to the value of that variable.
A compile-time error occurs when any strict fields (fields whose declared types are prefixed by !) are omitted during construction. Strict fields are discussed in Section 4.2.1.

Translation:
In the binding f = v, the field f labels v. Any binding f that omits the = v is expanded to f = f.

C { bs } = C (pick_C₁ bs undefined) ...(pick_C_k bs undefined)

k is the arity of C.
The auxiliary function pick_C_i bs d is defined as follows:
If the ith component of a constructor C has the field name f, and if f=v appears in the binding list bs, then pick_C_i bs d is v. Otherwise, pick_C_i bs d is the default value d.

3.15.3 Updates Using Field Labels

aexp -> aexp_<qcon> { fbind₁ , ... , fbind_n } (labeled update, n>=1)

Values belonging to a datatype with field names may be non-destructively updated. This creates a new value in which the specified field values replace those in the existing value. Updates are restricted in the following ways:

All labels must be taken from the same datatype.
At least one constructor must define all of the labels mentioned in the update.
No label may be mentioned more than once.
An execution error occurs when the value being updated does not contain all of the specified labels.
When the = exp is omitted, the field is updated to the value of the variable in scope with the same name as the field label.

Translation:

Using the prior definition of pick,

e `{` bs `}`	=	`case` e `of`
		C₁ v₁ ... v_k₁ `->` C (pick_C₁ bs v₁) ... (pick_C_k bs v_k₁)
		...
		C_j v₁ ... v_{k_j} `->` C (pick_C₁ bs v₁) ... (pick_C_k bs v_{k_j})
		`_ -> error "Update error"`

where {C₁,...,C_j} is the set of constructors containing all labels in b, and k_i is the arity of C_i.

Here are some examples using labeled fields: data T = C1 {f1,f2 :: Int} | C2 {f1 :: Int, f3,f4 :: Char}

Expression Translation
C1 {f1 = 3} C1' 3 undefined
C2 {f1 = 1, f4 = 'A', f3 = 'B'} C2' 1 'B' 'A'
x {f1 = 1} case x of C1' _ f2 -> C1' 1 f2
C2' _ f3 f4 -> C2' 1 f3 f4

The field f1 is common to both constructors in T. The constructors C1' and C2' are `hidden constructors', see the translation in Section 4.2.1. A compile-time error will result if no single constructor defines the set of field names used in an update, such as x {f2 = 1, f3 = 'x'}.

3.16 Expression Type-Signatures

exp -> exp :: [context =>] type

Expression type-signatures have the form e :: t, where e is an expression and t is a type (Section 4.1.1); they are used to type an expression explicitly and may be used to resolve ambiguous typings due to overloading (see Section 4.3.4). The value of the expression is just that of exp. As with normal type signatures (see Section 4.4.1), the declared type may be more specific than the principal type derivable from exp, but it is an error to give a type that is more general than, or not comparable to, the principal type.

3.17 Pattern Matching

Patterns appear in lambda abstractions, function definitions, pattern bindings, list comprehensions, do expressions, and case expressions. However, the first five of these ultimately translate into case expressions, so defining the semantics of pattern matching for case expressions is sufficient.

3.17.1 Patterns

Patterns have this syntax:

pat -> var + integer (successor pattern)
pat | pat₀
pat_i -> pat_i+1 [qconop_(n,i) pat_i+1]
| lpat_i
| rpat_i
lpat_i -> (lpat_i | pat_i+1) qconop_(l,i) pat_i+1
lpat₆ -> - (integer | float) (negative literal)
rpat_i -> pat_i+1 qconop_(r,i) (rpat_i | pat_i+1)
pat₁₀-> apat
| gcon apat₁ ... apat_k (arity gcon = k, k>=1)
apat -> var [@ apat] (as pattern)
| gcon (arity gcon = 0)
| qcon { fpat₁ , ... , fpat_k } (labeled pattern, k>=0)
| literal
| _ (wildcard)
| ( pat ) (parenthesized pattern)
| ( pat₁ , ... , pat_k ) (tuple pattern, k>=2)
| [ pat₁ , ... , pat_k ] (list pattern, k>=1)
| ~ apat (irrefutable pattern)
fpat -> var = pat
| var
The arity of a constructor must match the number of sub-patterns associated with it; one cannot match against a partially-applied constructor.

All patterns must be linear ---no variable may appear more than once.

Patterns of the form var@pat are called as-patterns, and allow one to use var as a name for the value being matched by pat. For example, case e of { xs@(x:rest) -> if x==0 then rest else xs }is equivalent to: let { xs = e } in case xs of { (x:rest) -> if x==0 then rest else xs }

Patterns of the form _ are wildcards and are useful when some part of a pattern is not referenced on the right-hand-side. It is as if an identifier not used elsewhere were put in its place. For example, case e of { [x,_,_] -> if x==0 then True else False }is equivalent to: case e of { [x,y,z] -> if x==0 then True else False }

In the pattern matching rules given below we distinguish two kinds of patterns: an irrefutable pattern is: a variable, a wildcard, N apat where N is a constructor defined by newtype and apat is irrefutable (see Section 4.2.3), var@apat where apat is irrefutable, or of the form ~apat (whether or not apat is irrefutable). All other patterns are refutable.

3.17.2 Informal Semantics of Pattern Matching

Patterns are matched against values. Attempting to match a pattern can have one of three results: it may fail; it may succeed, returning a binding for each variable in the pattern; or it may diverge (i.e. return _|_). Pattern matching proceeds from left to right, and outside to inside, according to these rules:

Matching a value v against the irrefutable pattern var always succeeds and binds var to v. Similarly, matching v against the irrefutable pattern ~apat always succeeds. The free variables in apat are bound to the appropriate values if matching v against apat would otherwise succeed, and to _|_ if matching v against apat fails or diverges. (Binding does not imply evaluation.)
Matching any value against the wildcard pattern _ always succeeds and no binding is done.
Operationally, this means that no matching is done on an irrefutable pattern until one of the variables in the pattern is used. At that point the entire pattern is matched against the value, and if the match fails or diverges, so does the overall computation.
Matching a value con v against the pattern con pat, where con is a constructor defined by newtype, is equivalent to matching v against the pattern pat. That is, constructors associated with newtype serve only to change the type of a value.
Matching _|_ against a refutable pattern always diverges.
Matching a non-_|_ value can occur against three kinds of refutable patterns:
1. Matching a non-_|_ value against a pattern whose outermost component is a constructor defined by data fails if the value being matched was created by a different constructor. If the constructors are the same, the result of the match is the result of matching the sub-patterns left-to-right against the components of the data value: if all matches succeed, the overall match succeeds; the first to fail or diverge causes the overall match to fail or diverge, respectively.
2. Numeric literals are matched using the overloaded == function. The behavior of numeric patterns depends entirely on the definition of == for the type of object being matched.
3. Matching a non-_|_ value x against a pattern of the form n+k (where n is a variable and k is a positive integer literal) succeeds if x>=k, resulting in the binding of n to x-k, and fails if x<k. The behavior of n+k patterns depends entirely on the underlying definitions of >=, fromInteger, and - for the type of the object being matched.
Matching against a constructor using labeled fields is the same as matching ordinary constructor patterns except that the fields are matched in the order they are named in the field list. All fields listed must be declared by the constructor; fields may not be named more than once. Fields not named by the pattern are ignored (matched against _).
The result of matching a value v against an as-pattern var@apat is the result of matching v against apat augmented with the binding of var to v. If the match of v against apat fails or diverges, then so does the overall match.

Aside from the obvious static type constraints (for example, it is a static error to match a character against a boolean), these static class constraints hold: an integer literal pattern can only be matched against a value in the class Num and a floating literal pattern can only be matched against a value in the class Fractional. A n+k pattern can only be matched against a value in the class Integral.

Many people feel that n+k patterns should not be used. These patterns may be removed or changed in future versions of Haskell . Compilers should support a flag that disables the use of these patterns.

Here are some examples:

If the pattern [1,2] is matched against [0,_|_], then 1fails to match against 0, and the result is a failed match. But if [1,2] is matched against [_|_,0], then attempting to match 1 against _|_ causes the match to diverge.
These examples demonstrate refutable vs. irrefutable matching: (\ ~(x,y) -> 0) _|_ => 0 (\ (x,y) -> 0) _|_ => _|_(\ ~[x] -> 0) [] => 0 (\ ~[x] -> x) [] => _|_(\ ~[x,~(a,b)] -> x) [(0,1),_|_] => (0,1) (\ ~[x, (a,b)] -> x) [(0,1),_|_] => _|_(\ (x:xs) -> x:x:xs) _|_ => _|_(\ ~(x:xs) -> x:x:xs) _|_ => _|_:_|_:_|_

Additional examples illustrating some of the subtleties of pattern matching may be found in Section 4.2.3.

Top level patterns in case expressions and the set of top level patterns in function or pattern bindings may have zero or more associated guards. A guard is a boolean expression that is evaluated only after all of the arguments have been successfully matched, and it must be true for the overall pattern match to succeed. The environment of the guard is the same as the right-hand-side of the case-expression alternative, function definition, or pattern binding to which it is attached.

The guard semantics have an obvious influence on the strictness characteristics of a function or case expression. In particular, an otherwise irrefutable pattern may be evaluated because of a guard. For example, in f ~(x,y,z) [a] | a==y = 1both a and y will be evaluated by a standard definition of ==.

(a) case e of { alts } = (\v -> case v of { alts }) e
where v is a completely new variable
(b) case v of { p₁ match₁;  ... ; p_n match_n }
=  case v of { p₁ match₁ ;
                _  -> ... case v of {
                           p_n match_n
                           _  -> error "No match" }...}
where each match_i has the form:
  | g_i,1 -> e_i,1 ; ... ; | g_{i,m_i} -> e_{i,m_i} where { decls_i }
(c) case v of { p | g₁ -> e₁ ; ...
             | g_n -> e_n where { decls }
            _      -> e' }
= case e' of
  {y -> (where y is a completely new variable)
   case v of {
         p -> let { decls } in
                if g₁ then e₁ ... else if g_n then e_n else y
         _ -> y }}
(d) case v of { ~p -> e; _ -> e' }
= (\x'₁ ... x'_n -> e₁ ) (case v of { p-> x₁ }) ... (case v of { p -> x_n})
where e₁ = e [x'₁/x₁, ..., x'_n/x_n]
x₁, ..., x_n are all the variables in p; x'₁, ..., x'_n are completely new variables
(e) case v of { x@p -> e; _ -> e' }
=  case v of { p -> ( \ x -> e ) v ; _ -> e' }
(f) case v of { _ -> e; _ -> e' } = e

Figure 3

Semantics of Case Expressions, Part 1

3.17.3 Formal Semantics of Pattern Matching

The semantics of all pattern matching constructs other than caseexpressions are defined by giving identities that relate those constructs to case expressions. The semantics of case expressions themselves are in turn given as a series of identities, in Figures 3--4. Any implementation should behave so that these identities hold; it is not expected that it will use them directly, since that would generate rather inefficient code.

(g) case v of { K p₁ ...p_n -> e; _ -> e' }
= case v of {
     K x₁ ...x_n -> case x₁ of {
                    p₁ -> ... case x_n of { p_n -> e ; _ -> e' } ...
                    _  -> e' }
     _ -> e' }
at least one of p₁, ..., p_n is not a variable; x₁, ..., x_n are new variables
(h) case v of { k -> e; _ -> e' } = if (v==k) then e else e'
(i) case v of { x -> e; _ -> e' } = case v of { x -> e }
(j) case v of { x -> e } = ( \ x -> e ) v
(k) case N v of { N p -> e; _ -> e' }
= case v of { p -> e; _ -> e' }
where N is a newtype constructor
(l) case _|_ of { N p -> e; _ -> e' } = case _|_ of { p -> e }
where N is a newtype constructor
(m) case v of { K { f₁ = p₁ , f₂ = p₂ , ... } -> e ; _ -> e' }
= case e' of {
   y ->
    case v of {
      K { f₁ = p₁ } ->
            case v of { K { f₂ = p₂ , ... } -> e ; _ -> y };
            _ -> y }}
where f₁, f₂, ... are fields of constructor K; y is a new variable
(n) case v of { K { f = p } -> e ; _ -> e' }
= case v of {
     K p₁ ... p_n -> e ; _ -> e' }
where p_i is p if f labels the ith component of K, _ otherwise
(o) case v of { K {} -> e ; _ -> e' }
= case v of {
     K _ ... _ -> e ; _ -> e' }
(p) case (K' e₁ ... e_m) of { K x₁ ... x_n -> e; _ -> e' } = e'
where K and K' are distinct data constructors of arity n and m, respectively
(q) case (K e₁ ... e_n) of { K x₁ ... x_n -> e; _ -> e' }
=  case e₁ of { x'₁ -> ...  case e_n of { x'_n -> e[x'₁/x₁ ...x'_n/x_n] }...}
where K is a constructor of arity n; x'₁ ...x'_n are completely new variables
(r) case e₀ of { x+k -> e; _ -> e' }
= if e₀ >= k then let {x' = e₀-k} in e[x'/x] else e' (x' is a new variable)

Figure 4

Semantics of Case Expressions, Part 2

In Figures 3--4: e, e' and e_i are expressions; g and g_i are boolean-valued expressions; p and p_i are patterns; v, x, and x_i are variables; K and K' are algebraic datatype (data) constructors (including tuple constructors); N is a newtype constructor;

and k is a character, string, or numeric literal.

Rule (b) matches a general source-language case expression, regardless of whether it actually includes guards---if no guards are written, then True is substituted for the guards g_i,j in the match_i forms. Subsequent identities manipulate the resulting case expression into simpler and simpler forms.

Rule (h) in Figure 4 involves the overloaded operator ==; it is this rule that defines the meaning of pattern matching against overloaded constants.

These identities all preserve the static semantics. Rules (d), (e), and (j) use a lambda rather than a let; this indicates that variables bound by case are monomorphically typed (Section 4.1.3).

The Haskell 1.4 Report
top | back | next | contents | function index
March 27, 1997


exp	`->`	exp₀ `::` [context `=>`] type	(expression type signature)
	`\|`	exp₀
exp_i	`->`	exp_i+1 [qop_(n,i) exp_i+1]
	`\|`	lexp_i
	`\|`	rexp_i
lexp_i	`->`	(lexp_i \| exp_i+1) qop_(l,i) exp_i+1
lexp₆	`->`	`-` exp₇
rexp_i	`->`	exp_i+1 qop_(r,i) (rexp_i \| exp_i+1)
exp₁₀	`->`	`\` apat₁ ... apat_n `->` exp	(lambda abstraction, n>=1)
	`\|`	`let` decllist `in` exp	(let expression)
	`\|`	`if` exp `then` exp `else` exp	(conditional)
	`\|`	`case` exp `of` `{` alts [`;`] `}`	(case expression)
	`\|`	`do` `{` stmts [`;`] `}`	(do expression)
	`\|`	fexp
fexp	`->`	[fexp] aexp	(function application)
aexp	`->`	qvar	(variable)
	`\|`	gcon	(general constructor)
	`\|`	literal
	`\|`	`(` exp `)`	(parenthesized expression)
	`\|`	`(` exp₁ `,` ... `,` exp_k `)`	(tuple, k>=2)
	`\|`	`[` exp₁ `,` ... `,` exp_k `]`	(list, k>=1)
	`\|`	`[` exp₁ [`,` exp₂] `..` [exp₃] `]`	(arithmetic sequence)
	`\|`	`[` exp `\|` qual₁ `,` ... `,` qual_n `]`	(list comprehension, n>=1)
	`\|`	`(` exp_i+1 qop_(a,i) `)`	(left section)
	`\|`	`(` qop_(a,i) exp_i+1 `)`	(right section)
	`\|`	qcon `{` fbind₁ `,` ... `,` fbind_n `}`	(labeled construction, n>=0)
	`\|`	aexp_{qcon} `{` fbind₁ `,` ... `,` fbind_n `}`	(labeled update, n >= 1)

Item	Associativity

simple terms, parenthesized terms	--
irrefutable patterns (`~`)	--
as-patterns (`@`)	right
function application	left
`do`, `if`, `let`, lambda(`\`), `case` (leftwards)	right
`case` (rightwards)	right

infix operators, prec. 9	as defined
...	...
infix operators, prec. 0	as defined

function types (`->`)	right
contexts (`=>`)	--
type constraints (`::`)	--
`do`, `if`, `let`, lambda(`\`) (rightwards)	right
sequences (`..`)	--
generators (`<-`)	--
grouping (`,`)	n-ary
guards (`\|`)	--
case alternatives (`->`)	--
definitions (`=`)	--
separation (`;`)	n-ary

This	Parses as
`f x + g y`	`(f x) + (g y)`
`- f x + y`	`(- (f x)) + y`
`let { ... } in x + y`	`let { ... } in (x + y)`
`z + let { ... } in x + y`	`z + (let { ... } in (x + y))`
`f x y :: Int`	`(f x y) :: Int`
`\ x -> a+b :: Int`	`\ x -> ((a+b) :: Int`)

	`\|` g_i1	`->` e_i1
	...
	`\|` g_{im_i}	`->` e_{im_i}
	`where` decllist_i

Expression	Translation
`C1 {f1 = 3}`	`C1' 3 undefined`
`C2 {f1 = 1, f4 = 'A', f3 = 'B'}`	`C2' 1 'B' 'A'`
`x {f1 = 1}`	`case x of C1' _ f2 -> C1' 1 f2`
	`C2' _ f3 f4 -> C2' 1 f3 f4`


pat	`->`	var `+` integer	(successor pattern)
pat	\| pat₀
pat_i	`->`	pat_i+1 [qconop_(n,i) pat_i+1]
	`\|`	lpat_i
	`\|`	rpat_i
lpat_i	`->`	(lpat_i \| pat_i+1) qconop_(l,i) pat_i+1
lpat₆	`->`	`-` (integer \| float)	(negative literal)
rpat_i	`->`	pat_i+1 qconop_(r,i) (rpat_i \| pat_i+1)
pat₁₀->	apat
	`\|`	gcon apat₁ ... apat_k	(arity gcon = k, k>=1)
apat	`->`	var [`@` apat]	(as pattern)
	`\|`	gcon	(arity gcon = 0)
	`\|`	qcon `{` fpat₁ `,` ... `,` fpat_k `}`	(labeled pattern, k>=0)
	`\|`	literal
	`\|`	`_`	(wildcard)
	`\|`	`(` pat `)`	(parenthesized pattern)
	`\|`	`(` pat₁ `,` ... `,` pat_k `)`	(tuple pattern, k>=2)
	`\|`	`[` pat₁ `,` ... `,` pat_k `]`	(list pattern, k>=1)
	`\|`	`~` apat	(irrefutable pattern)
fpat	`->`	var `=` pat
	`\|`	var