In this section, we describe the syntax and informal semantics of Haskell declarations.
module | -> | module modid [exports] where body | |
| | body | ||
body | -> | { [impdecls ;] [[fixdecls ;] topdecls [;]] } | |
| | { impdecls [;] } | ||
topdecls | -> | topdecl1 ; ... ; topdecln | (n>=0) |
topdecl | -> | type simpletype = type | |
| | data [context =>] simpletype = constrs [deriving] | ||
| | newtype [context =>] simpletype = con atype [deriving] | ||
| | class [context =>] simpleclass [where { cbody [;] }] | ||
| | instance [context =>] qtycls inst [where { valdefs [;] }] | ||
| | default (type1 , ... , typen) | (n>=0) | |
| | decl | ||
decls | -> | decl1 ; ... ; decln | (n>=0) |
decl | -> | signdecl | |
| | valdef | ||
decllist | -> | { decls [;] } | |
signdecl | -> | vars :: [context =>] type | |
vars | -> | var1 , ..., varn | (n>=1) |
The declarations in the syntactic category topdecls are only allowed at the top level of a Haskell module (see Section 5), whereas decls may be used either at the top level or in nested scopes (i.e. those within a let or where construct).
For exposition, we divide the declarations into three groups: user-defined datatypes, consisting of type, newtype, and data declarations (Section 4.2); type classes and overloading, consisting of class, instance, and default declarations (Section 4.3); and nested declarations, consisting of value bindings and type signatures (Section 4.4).
Haskell has several primitive datatypes that are "hard-wired" (such as integers and floating-point numbers), but most "built-in" datatypes are defined with normal Haskell code, using normal type and data declarations. These "built-in" datatypes are described in detail in Section 6.1.
Haskell uses a traditional Hindley-Milner polymorphic type system to provide a static type semantics [3, 4], but the type system has been extended with type and constructor classes (or just classes) that provide a structured way to introduce overloaded functions.
A class declaration (Section 4.3.1) introduces a new type class and the overloaded operations that must be supported by any type that is an instance of that class. An instance declaration (Section 4.3.2) declares that a type is an instance of a class and includes the definitions of the overloaded operations---called class methods---instantiated on the named type.
For example, suppose we wish to overload the operations (+) and
negate on types Int and Float. We introduce a new
type class called Num:
class Num a where -- simplified class declaration for Num
(+) :: a -> a -> a
negate :: a -> a
This declaration may be read "a type a is an instance of the class
Num if there are (overloaded) class methods (+) and negate, of the
appropriate types, defined on it."
We may then declare Int and Float to be instances of this class:
instance Num Int where -- simplified instance of Num Int
x + y = addInt x y
negate x = negateInt x
instance Num Float where -- simplified instance of Num Float
x + y = addFloat x y
negate x = negateFloat x
where addInt, negateInt, addFloat, and negateFloat are assumed
in this case to be primitive functions, but in general could be any
user-defined function. The first declaration above may be read
"Int is an instance of the class Num as witnessed by these
definitions (i.e. class methods) for (+) and negate."
More examples of type and constructor classes can be found in the papers by Jones [6] or Wadler and Blott [11]. The term `type class' was used to describe the original Haskell 1.0 type system; `constructor class' was used to describe an extension to the original type classes. There is no longer any reason to use two different terms: in this report, `type class' includes both the original Haskell type classes and the constructor classes introduced by Jones.
type | -> | btype [-> type] | (function type) |
btype | -> | [btype] atype | (type application) |
atype | -> | gtycon | |
| | tyvar | ||
| | ( type1 , ... , typek ) | (tuple type, k>=2) | |
| | [ type ] | (list type) | |
| | ( type ) | (parenthesised constructor) | |
gtycon | -> | qtycon | |
| | () | (unit type) | |
| | [] | (list constructor) | |
| | (->) | (function constructor) | |
| | (,{,}) | (tupling constructors) |
The syntax for Haskell type expressions is given above. Just as data values are built using data constructors, type values are built from type constructors. As with data constructors, the names of type constructors start with uppercase letters.
To ensure that they are valid, type expressions are classified into different kinds, which take one of two possible forms:
Special syntax is provided to allow certain type expressions to be written in a more traditional style:
Expressions and types have a consistent syntax. If ti is the type of expression or pattern ei, then the expressions (\ e1 -> e2), [e1], and (e1,e2) have the types (t1 -> t2), [t1], and (t1,t2), respectively.
With one exception, the type variables in a Haskell type expression are all assumed to be universally quantified; there is no explicit syntax for universal quantification [3]. For example, the type expression a -> a denotes the type forall a. a ->a. For clarity, however, we often write quantification explicitly when discussing the types of Haskell programs.
The exception referred to is that of the distinguished type variable in a class declaration (Section 4.3.1).
context | -> | class | |
| | ( class1 , ... , classn ) | (n>=1) | |
class | -> | qtycls tyvar | |
simpleclass | -> | tycls tyvar | |
tycls | -> | conid | |
tyvar | -> | varid |
A context consists of one or more class assertions, and has the general form
( C1 u1, ..., Cn un )
where C1, ..., Cn are class identifiers, and u1, ..., un are type variables; the parentheses may be omitted when n=1. In general, we use c to denote a context and we write c => t to indicate the type t restricted by the context c. The context c must only contain type variables referenced in t. For convenience, we write c => t even if the context c is empty, although in this case the concrete syntax contains no =>.
In this subsection, we provide informal details of the type system. (Wadler and Blott [11] and Jones [6] discuss type and constructor classes, respectively, in more detail.)
The Haskell type system attributes a type to each expression in the program. In general, a type is of the form forall u. c =>t, where u is a set of type variables u1, ..., un. In any such type, any of the universally-quantified type variables ui that are free in c must also be free in t. Furthermore, the context c must be of the form given above in Section 4.1.2; that is, it must have the form (C1 u1, ..., Cn un) where C1, ..., Cn are class identifiers, and u1, ..., un are type variables.
The type of an expression e depends on a type environment that gives types for the free variables in e, and a class environment that declares which types are instances of which classes (a type becomes an instance of a class only via the presence of an instance declaration or a deriving clause).
Types are related by a generalization order (specified below); the most general type that can be assigned to a particular expression (in a given environment) is called its principal type. Haskell 's extended Hindley-Milner type system can infer the principal type of all expressions, including the proper use of overloaded class methods (although certain ambiguous overloadings could arise, as described in Section 4.3.4). Therefore, explicit typings (called type signatures) are usually optional (see Sections 3.16 and 4.4.1).
The type forall u. c1 =>t1 is more general than the type forall w. c2 =>t2 if and only if there is a substitution S whose domain is u such that:
The main point about contexts above is that, given the type
forall u. c =>t,
the presence of C ui in the context c expresses the
constraint that the type variable ui may be instantiated as t'
within the type expression t only if t' is a member of the class
C. For example, consider the function double:
double x = x + x
The most general type of double is
forall a. Num a =>a ->a.
double may be applied to values of type Int (instantiating a to
Int), since Int is an instance of the class Num. However,
double may not be applied to values of type Char, because Char
is not an instance of class Num.
In this section, we describe algebraic datatypes (data declarations), renamed datatypes (newtype declarations), and type synonyms (type declarations). These declarations may only appear at the top level of a module.
topdecl | -> | data [context =>] simpletype = constrs [deriving] | |
simpletype | -> | tycon tyvar1 ... tyvark | (k>=0) |
constrs | -> | constr1 | ... | constrn | (n>=1) |
constr | -> | con [!] atype1 ... [!] atypek | (arity con = k, k>=0) |
| | (btype | ! atype) conop (btype | ! atype) | (infix conop) | |
| | con { fielddecl1 , ... , fielddecln } | (n>=1) | |
fielddecl | -> | vars :: (type | ! atype) | |
deriving | -> | deriving (dclass | (dclass1, ... , dclassn)) | (n>=0) |
dclass | -> | qtycls |
An algebraic datatype declaration introduces a new type and constructors over that type and has the form:
data c => T u1 ... uk = K1 t11 ... t1k1 | ...| Kn tn1 ... tnkn
where c is a context. This declaration introduces a new type constructor T with constituent data constructors K1, ..., Kn whose types are given by:
Ki :: forall u1 ... uk. ci =>ti1 ->...->tiki ->(T u1 ... uk)
where ci is the largest subset of c that constrains only those type variables free in the types ti1, ..., tiki. The type variables u1 through uk must be distinct and may appear in c and the tij; it is a static error for any other type variable to appear in c or on the right-hand-side. The new type constant T has a kind of the form k1->...->kk->* where the kinds ki of the argument variables ui are determined by kind inference as described in Section 4.6. This means that T may be used in type expressions with anywhere between 0 and k arguments.
For example, the declaration
data Eq a => Set a = NilSet | ConsSet a (Set a)
introduces a type constructor Set of kind *->*, and constructors NilSet and
ConsSet with types
NilSet | :: forall a. Set a |
ConsSet | :: forall a. Eq a =>a ->Set a ->Set a |
In the example given, the overloaded type for ConsSet ensures that ConsSet can only be applied to values whose type is an instance of the class Eq. The context in the data declaration has no other effect whatsoever.
The visibility of a datatype's constructors (i.e. the "abstractness" of the datatype) outside of the module in which the datatype is defined is controlled by the form of the datatype's name in the export list as described in Section 5.5.
The optional deriving part of a data declaration has to do with derived instances, and is described in Section 4.3.3.
A constructor definition in a data declaration using the { }
syntax assigns labels to the components of the constructor.
Constructors using field labels may be freely mixed with constructors
without them.
A constructor with associated field labels may still be used as an
ordinary constructor; features using labels are
simply a shorthand for operations using an underlying positional
constructor. The arguments to the positional constructor occur in the
same order as the labeled fields. For example, the declaration
data C = F { f1,f2 :: Int, f3 :: Bool}
defines a type and constructor identical to the one produced by
data C = F Int Int Bool
Operations using field labels are described in Section 3.15.
A data declaration may use the same field label in multiple
constructors as long as the typing of the field is the same in all
cases after type synonym expansion. A label cannot be shared by
more than one type in scope. Field names share the top level namespace
with ordinary variables and class methods and must not conflict with
other top level names in scope.
Translation:A declaration of the formdata c => T u1 ... uk = ... | K s1 ... sn | ... where each si is either of the form ! ti or ti, replaces every occurance of K in an expression by (\ x1 ... xn -> ( ((K op1 x1) op2 x2) ... ) opn xn) where opi is the lazy apply function $ if si is of the form ti, and opi is the strict apply function `strict` (see Section 6.2.7) if si is of the form ! ti. Pattern matching on K is not affected by strictness flags. |
Strictness flags may require the explicit inclusion of an Eval context in a
data declaration (see Section 6.2.7).
This occurs precisely when the context of a strict function used in the
above translation propagates to a type variable. For example, in
data (Eval a) => Pair a b = MakePair !a b
the class assertion (Eval a) is required by the use of strict in
the translation of the constructor MakePair. This context must be
explicitly supplied by the programmer. The Eval context may be
implied by a more general one; for example, the Num class includes
Eval as a superclass to avoid mentioning Eval in the following:
data (Integral a) => Rational a = !a :% !a -- Rational library
For some types, the Eval context may not be expressible (see
Section 4.5.3. For example, in
data T a b = T !(a b)
the context Eval (a b) would be required. Since this context is not
legal, the strictness flag cannot be used in this situation.
topdecl | -> | type simpletype = type | |
simpletype | -> | tycon tyvar1 ... tyvark | (k>=0) |
type T u1 ... uk = t
which introduces a new type constructor, T. The type (T t1 ...
tk) is equivalent to the type t[t1/u1, ..., tk/uk]. The type
variables u1 through uk must be distinct and are scoped only
over t; it is a static error for any other type variable to appear
in t. The kind of the new type constructor T is of the form
k1->...->kk->k where
the kinds ki of the arguments ui and k of the right hand
side t are determined by kind inference as described in
Section 4.6.
For example, the following definition can be used to provide an alternative
way of writing the list type constructor:
type List = []
Type constructor symbols T introduced by type synonym declarations cannot
be partially applied; it is a static error to use T without the full number
of arguments.
Although recursive and mutually recursive datatypes are allowed,
this is not so for type synonyms, unless an algebraic datatype
intervenes. For example,
type Rec a = [Circ a]
data Circ a = Tag [Rec a]
is allowed, whereas
type Rec a = [Circ a] -- invalid
type Circ a = [Rec a] --
is not. Similarly, type Rec a = [Rec a] is not allowed.
Type synonyms are a strictly syntactic mechanism to make type signatures more readable. A synonym and its definition are completely interchangeable.
topdecl | -> | newtype [context =>] simpletype = con atype [deriving] | |
simpletype | -> | tycon tyvar1 ... tyvark | (k>=0) |
A declaration of the form
newtype c => T u1 ... uk = N t
introduces a new type whose representation is the same as an existing type. The type ( T u1 ... uk ) renames the datatype t. It differs from a type synonym in that it creates a distinct type that must be explicitly coerced to or from the original type. Also, unlike type synonyms, newtype may be used to define recursive types. The constructor N in an expression coerces a value from type t to type ( T u1 ... uk ). Using N in a pattern coerces a value from type ( T u1 ... uk ) to type t. These coercions may be implemented without execution time overhead; newtype does not change the underlying representation of an object.
New instances (see Section 4.3.2) can be defined for a type defined by newtype but may not be defined for a type synonym. A type created by newtype differs from an algebraic datatype in that the representation of an algebraic datatype has an extra level of indirection. This difference makes access to the representation less efficient. The difference is reflected in different rules for pattern matching (see Section 3.17). Unlike algebraic datatypes, the newtype constructor N is unlifted, so that N _|_ is the same as _|_.
The following examples clarify the differences between data (algebraic
datatypes), type (type synonyms), and newtype (renaming types.)
Given the declarations
data D1 = D1 Int
data D2 = D2 !Int
type S = Int
newtype N = N Int
d1 (D1 i) = 42
d2 (D2 i) = 42
s i = 42
n (N i) = 42
the expressions ( d1 _|_), ( d2 _|_) and
(d2 (D2 _|_) ) are all
equivalent to _|_, whereas ( n _|_), ( n ( N
_|_) ), ( d1 ( D1 _|_) ) and ( s _|_)
are all equivalent to 42. In particular, ( N _|_) is equivalent to
_|_ while ( D1 _|_) is not equivalent to _|_.
The optional deriving part of a newtype declaration is treated in the same way as the deriving component of a data declaration; see Section 4.3.3.
Every type, both those declared by data and newtype, is made an
instance of the Eval class by an implicit derived instance
declaration for Eval (see Section 6.2.7). It is as if
there was an implicit
"deriving(Eval)" on every type declaration.
For newtype, the instance declaration has the form
instance CEval => Eval (T u1 ... uk) where
(N x) `seq` y = x `seq` y
where CEval is the context obtained by simplifying Eval t.
For example, the declaration
newtype Age = MkAge Int
gives rise to the instance declaration
instance Eval Age where
(MkAge x) `seq` y = x `seq` y
since simplifying Eval Int yields the empty context.
On the other hand,
newtype Id a = MkId a
gives rise to
instance Eval a => Eval (Id a) where
(MkId a) `seq` b = a `seq` b
This derived instance may lead to a context reduction error (see
Section 4.5.3). A static error occurs when it is
not possible to find CEval
for a newtype declaration (just as with other derived instances).
For example
newtype T a = MkT (a Int)
is illegal, because one cannot reduce the context Eval (a Int).
The derived Eval instance for data declarations has an empty
context and thus will never generate static errors. Types
that cannot be renamed by newtype due to this context problem are
the same as those that cannot be marked as strict in a data
declaration (see Section 4.2.1).
topdecl | -> | class [context =>] simpleclass [where { cbody [;] }] | |
cbody | -> | [ cmethods [ ; cdefaults ] ] | |
cmethods | -> | signdecl1 ; ... ; signdecln | (n >= 1) |
cdefaults | -> | valdef1 ; ... ; valdefn | (n >= 1) |
A class declaration introduces a new class and the operations (class methods) on it. A class declaration has the general form:
class c => C u where { | v1 :: c1 => t1 ; ... ; vn :: cn => tn ; |
valdef1 ; ... ; valdefm } |
This introduces a new class name C; the type variable u is scoped only over the class method signatures in the class body. The context c specifies the superclasses of C, as described below; the only type variable that may be referred to in c is u. The class declaration introduces new class methods v1, ..., vn, whose scope extends outside the class declaration, with types:
vi :: forall u,w. (C u, ci) =>ti
The ti must mention u; they may mention type variables
w other than u, and the type of vi is
polymorphic in both u and w.
The ci may constrain only w; in particular,
the ci may not constrain u.
For example:
class Foo a where
op :: Num b => a -> b -> a
Here the type of op is
forall a, b. (Foo a, Num b) =>a ->b ->a.
Default class methods for any of the vi may be included in the class declaration as a normal valdef; no other definitions are permitted. The default class method for vi is used if no binding for it is given in a particular instance declaration (see Section 4.3.2).
Class methods share the top level namespace with variable bindings and field names; they must not conflict with other top level bindings in scope. That is, a class method can not have the same name as a top level definition, a field name, or another class method.
A class
declaration with no where part
may be useful for combining a
collection of classes into a larger one that inherits all of the
class methods in the original ones. For example:
class (Read a, Show a) => Textual a
In such a case, if a type is an instance of all
superclasses, it is
not automatically an instance of the subclass, even though the
subclass has no immediate class methods. The instance declaration must be
given explicitly with no where part.
The superclass relation must not be cyclic; i.e. it must form a directed acyclic graph.
topdecl | -> | instance [context =>] qtycls inst [where { valdefs [;] }] | |
inst | -> | gtycon | |
| | ( gtycon tyvar1 ... tyvark ) | (k>=0, tyvars distinct) | |
| | ( tyvar1 , ... , tyvark ) | (k>=2, tyvars distinct) | |
| | [ tyvar ] | ||
| | ( tyvar1 -> tyvar2 ) | tyvar1 and tyvar2 distinct | |
valdefs | -> | valdef1 ; ... ; valdefn | (n>=0) |
class c => C u where { cbody }
be a class declaration. The general form of the corresponding instance declaration is:
instance c' => C (T u1 ... uk) where { d }
where k>=0 and T is not a type synonym.
The constructor being instanced, (T u1 ... uk), is
a type constructor applied to simple type variables u1, ... uk,
which must be distinct. This prohibits instance declarations
such as:
instance C (a,a) where ...
instance C (Int,a) where ...
instance C [[a]] where ...
The constructor (T u1 ... uk) must have an appropriate kind for
the class C; this can be determined using kind inference as described
in Section 4.6.
The declarations d may contain bindings only for the class
methods of C. The declarations may not contain any type
signatures
since the class method signatures have already been given in the class
declaration.
If no binding is given for some class method then the corresponding default class method in the class declaration is used (if present); if such a default does not exist then the class method of this instance is bound to undefined and no compile-time error results.
An instance declaration that makes the type T to be an instance of class C is called a C-T instance declaration and is subject to these static restrictions:
In fact, except in pathological cases it is possible to infer from the instance declaration the most general instance context c' satisfying the above two constraints, but it is nevertheless mandatory to write an explicit instance context.
If the two instance declarations instead read like this:
instance Num a => Foo [a] where ...
instance (Eq a, Show a) => Bar [a] where ...
then the program would be invalid. The second instance declaration is
valid only if [a] is an instance of Foo under the assumptions
(Eq a, Show a). But this does not hold, since [a] is only an
instance of Foo under the stronger assumption Num a.
Further examples of instance declarations may be found in Appendix A.
As mentioned in Section 4.2.1, data and newtype declarations contain an optional deriving form. If the form is included, then derived instance declarations are automatically generated for the datatype in each of the named classes. These instances are subject to the same restrictions as user-defined instances. When deriving a class C for a type T, instances for all superclasses of C must exist for T, either via an explicit instance declaration or by including the superclass in the deriving clause.
Derived instances provide convenient commonly-used operations for user-defined datatypes. For example, derived instances for datatypes in the class Eq define the operations == and /=, freeing the programmer from the need to define them.
The only classes in the Prelude for which derived instances are allowed are Eq, Ord, Enum, Bounded, Show, and Read, all defined in Figure 5, page . The precise details of how the derived instances are generated for each of these classes are provided in Appendix D, including a specification of when such derived instances are possible. Instances of class Eval are always implicitly derived for algebraic datatypes. The class Eval may not be explicitly listed in a deriving form or defined by an explicit instance declaration. Classes defined by the standard libraries may also be derivable.
A static error results if it is not possible to derive an instance declaration over a class named in a deriving form. For example, not all datatypes can properly support class methods in Enum. It is also a static error to give an explicit instance declaration for a class that is also derived.
If the deriving form is omitted from a data or newtype declaration, then no instance declarations (except for Eval) are derived for that datatype; that is, omitting a deriving form is equivalent to including an empty deriving form: deriving ().
topdecl | -> | default (type1 , ... , typen) | (n>=0) |
A problem inherent with Haskell -style overloading is the
possibility of an ambiguous type.
For example, using the
read and show functions defined in Appendix D,
and supposing that just Int and Bool are members of Read and
Show, then the expression
let x = read "..." in show x -- invalid
is ambiguous, because the types for show and read,
show | :: forall a. Show a =>a ->String |
read | :: forall a. Read a =>String ->a |
could be satisfied by instantiating a as either Int in both cases, or Bool. Such expressions are considered ill-typed, a static error.
We say that an expression e is ambiguously overloaded if, in its type forall u. c =>t, there is a type variable u in u that occurs in c but not in t. Such types are invalid.
For example, the earlier expression involving show and read is ambiguously overloaded since its type is forall a. Show a, Read a =>String.
Overloading ambiguity can only be circumvented by
input from the user. One way is through the use of expression
type-signatures
as described in Section 3.16.
For example, for the ambiguous expression given earlier, one could
write:
let x = read "..." in show (x::Bool)
which disambiguates the type.
Occasionally, an otherwise ambiguous expression needs to be made
the same type as some variable, rather than being given a fixed
type with an expression type-signature. This is the purpose
of the function asTypeOf (Appendix A):
x `asTypeOf` y has the value of x, but x and y are
forced to have the same type. For example,
approxSqrt x = encodeFloat 1 (exponent x `div` 2) `asTypeOf` x
(See Section 6.3.6.)
Ambiguities in the class Num are most common, so Haskell provides another way to resolve them---with a default declaration:
default (t1 , ... , tn)
where n>=0, and each ti must be a monotype for which Num ti holds. In situations where an ambiguous type is discovered, an ambiguous type variable is defaultable if at least one of its classes is a numeric class (that is, Num or a subclass of Num) and if all of its classes are defined in the Prelude or a standard library (Figures 6--7, pages -- show the numeric classes, and Figure 5, page , shows the classes defined in the Prelude.) Each defaultable variable is replaced by the first type in the default list that is an instance of all the ambiguous variable's classes. It is a static error if no such type is found.
Only one default declaration is permitted per module, and its effect
is limited to that module. If no default declaration is given in a
module then it assumed to be:
default (Int, Double)
The empty default declaration default () must be given to turn off
all defaults in a module.
The following declarations may be used in any declaration list, including the top level of a module.
signdecl | -> | vars :: [context =>] type |
v1, ..., vn :: c => t
which is equivalent to asserting vi :: c => t for each i from 1 to n. Each vi must have a value binding in the same declaration list that contains the type signature; i.e. it is invalid to give a type signature for a variable bound in an outer scope. Moreover, it is invalid to give more than one type signature for one variable.
As mentioned in Section 4.1.1,
every type variable appearing in a signature
is universally quantified over that signature, and hence
the scope of a type variable is limited to the type
signature that contains it. For example, in the following
declarations
f :: a -> a
f x = x :: a -- invalid
the a's in the two type signatures are quite distinct. Indeed,
these declarations contain a static error, since x does not have
type forall a. a. (The type of x is dependent on the type of
f; there is currently no way in Haskell to specify a signature
for a variable with a dependent type; this is explained in Section
4.5.4.)
If a given program includes a signature for a variable f, then each use of f is treated as having the declared type. It is a static error if the same type cannot also be inferred for the defining occurrence of f.
If a variable f is defined without providing a corresponding type signature declaration, then each use of f outside its own declaration group (see Section 4.5) is treated as having the corresponding inferred, or principal type . However, to ensure that type inference is still possible, the defining occurrence, and all uses of f within its declaration group must have the same monomorphic type (from which the principal type is obtained by generalization, as described in Section 4.5.2).
For example, if we define
sqr x = x*x
then the principal type is
sqr :: forall a. Num a =>a ->a,
which allows
applications such as sqr 5 or sqr 0.1. It is also valid to declare
a more specific type, such as
sqr :: Int -> Int
but now applications such as sqr 0.1 are invalid. Type signatures such as
sqr :: (Num a, Num b) => a -> b -- invalid
sqr :: a -> a -- invalid
are invalid, as they are more general than the principal type of sqr.
Type signatures can also be used to support
polymorphic recursion.
The following definition is pathological, but illustrates how a type
signature can be used to specify a type more general than the one that
would be inferred:
data T a = K (T Int) (T a)
f :: T a -> a
f (K x y) = if f x == 1 then f y else undefined
If we remove the signature declaration, the type of f will be
inferred as T Int -> Int due to the first recursive call for which
the argument to f is T Int. Polymorphic recursion allows the user
to supply the more general type signature, T a -> a.
decl | -> | valdef |
valdef | -> | lhs = exp [where decllist] |
| | lhs gdrhs [where decllist] | |
lhs | -> | pat0 |
| | funlhs | |
funlhs | -> | var apat {apat } |
| | pati+1 varop(a,i) pati+1 | |
| | lpati varop(l,i) pati+1 | |
| | pati+1 varop(r,i) rpati | |
gdrhs | -> | gd = exp [gdrhs] |
gd | -> | | exp0 |
x | p11 ... p1k | match1 |
... | ||
x | pn1 ... pnk | matchn |
where each pij is a pattern, and where each matchi is of the general form:
= ei where { declsi } |
or
| gi1 | = ei1 |
... | |
| gimi | = eimi |
where { declsi } |
and where n>=1, 1<=i<=n, mi>=1. The former is treated as shorthand for a particular case of the latter, namely:
| True = ei where { declsi } |
Note that all clauses defining a function must be contiguous, and the number of patterns in each clause must be the same. The set of patterns corresponding to each match must be linear---no variable is allowed to appear more than once in the entire set.
Alternative syntax is provided for binding functional values to infix
operators. For example, these two function
definitions are equivalent:
plus x y z = x+y+z
x `plus` y = \ z -> x+y+z
Translation:The general binding form for functions is semantically equivalent to the equation (i.e. simple pattern binding):x x1 x2 ... xk = case (x1, ..., xk) of
where the xi are new identifiers. |
The general form of a pattern binding is p match, where a match is the same structure as for function bindings above; in other words, a pattern binding is:
p | | g1 | = e1 |
| g2 | = e2 | |
... | ||
| gm | = em | |
where { decls } |
Translation:The pattern binding above is semantically equivalent to this simple pattern binding:
|
The static semantics of the function and pattern bindings of a let expression or where clause are discussed in this section.
In general the static semantics are given by the normal Hindley-Milner inference rules. A dependency analysis transformation is first performed to enhance polymorphism. Two variables bound by value declarations are in the same declaration group if either
The Hindley-Milner type system assigns types to a let-expression in two stages. First, the right-hand side of the declaration is typed, giving a type with no universal quantification. Second, all type variables that occur in this type are universally quantified unless they are associated with bound variables in the type environment; this is called generalization. Finally, the body of the let-expression is typed.
For example, consider the declaration
f x = let g y = (y,y)
in ...
The type of g's definition is
a ->(a,a). The generalization step
attributes to g the polymorphic type
forall a. a ->(a,a),
after which the typing of the "..." part can proceed.
When typing overloaded definitions, all the overloading
constraints from a single declaration group are collected together,
to form the context for the type of each variable declared in the group.
For example, in the definition:
f x = let g1 x y = if x>y then show x else g2 y x
g2 p q = g1 q p
in ...
The types of the definitions of g1 and g2 are both
a ->a ->String, and the accumulated constraints are
Ord a (arising from the use of >), and Show a (arising from the
use of show).
The type variables appearing in this collection of constraints are
called the constrained type variables.
The generalization step attributes to both g1 and g2 the type
forall a. (Ord a, Show a) =>a ->a ->String
Notice that g2 is overloaded in the same way as g1 even though the occurrences of > and show are in the definition of g1.
If the programmer supplies explicit type signatures for more than one variable in a declaration group, the contexts of these signatures must be identical up to renaming of the type variables.
The context may also fail to simplify, leading to a static error, because it
contains a constraint of the form C (m t) where m is one of the the type
variable being generalized. That is, the class C applies to a type
expression that is not a type variable or a type constructor.
For example:, the
f x = show (return x)
The type of return is Monad m => a -> m a; the type of show is
Show a => a -> String. The type of f should be
(Monad m, Show (m a)) => a -> String. The context to be simplified
will therefore be (Monad m, Show (m a)), which cannot be further
reduced, resulting in a static error.
Code generated by derived instance functions (see
Section 4.3.3) may lead to generalization errors. For
example, in the type
data Apply a b = App (a b) deriving Show
the derived Show instance will produce a context Show (a b), which
cannot be reduced and thus results in a static error. Context
reduction error may also arise from strictness flags in data
declarations (see Section 4.2.1) and the implicitly
derived Eval instance in newtype declarations (see
Section 4.2.3).
The effect of such monomorphism is that the first argument of all
applications of g must be of a single type.
For example, it would be valid for
the "..." to be
(g True, g False)
(which would, incidentally, force x to have type Bool) but invalid
for it to be
(g True, g 'c')
In general, a type forall u. c =>t
is said to be monomorphic
in the type variable a if a is free in
forall u. c =>t.
It is worth noting that the explicit type signatures provided by Haskell
are not powerful enough to express types that include monomorphic type
variables. For example, we cannot write
f x = let
g :: a -> b -> ([a],b)
g y z = ([x,y], z)
in ...
because that would claim that g was polymorphic in both a and b
(Section 4.4.1). In this program, g can only be given
a type signature if its first argument is restricted to a type not involving
type variables; for example
g :: Int -> b -> ([Int],b)
This signature would also cause x to have type Int.
Haskell places certain extra restrictions on the generalization step, beyond the standard Hindley-Milner restriction described above, which further reduces polymorphism in particular cases.
The monomorphism restriction depends on the binding syntax of a variable. Recall that a variable is bound by either a function binding or a pattern binding, and that a simple pattern binding is a pattern binding in which the pattern consists of only a single variable (Section 4.4.2).
Two rules define the monomorphism restriction:
Rule 1 is required for two reasons, both of which are fairly subtle.
First, it prevents computations from being unexpectedly repeated.
For example, genericLength is a standard function (in library List) whose
type is given by
genericLength :: Num a => [b] -> a
Now consider the following expression:
let { len = genericLength xs } in (len, len)
It looks as if len should be computed only once, but without Rule 1 it might
be computed twice, once at each of two different overloadings. If the
programmer does actually wish the computation to be repeated, an explicit
type signature may be added:
let { len :: Num a => a; len = genericLength xs } in (len, len)
When non-simple pattern bindings are used, the types inferred are
always monomorphic in their constrained type variables, irrespective of whether
a type signature is provided. For example, in
(f,g) = ((+),(-))
both f and g are monomorphic regardless of any type
signatures supplied for f or g.
Rule 1 also prevents ambiguity. For example, consider the declaration
group
[(n,s)] = reads t
Recall that reads is a standard function whose type is given by the
signature
reads :: (Read a) => String -> [(a,String)]
Without Rule 1, n would be assigned the
type forall a. Read a =>a
and s the type forall a. Read a =>String.
The latter is an invalid type, because it is inherently ambiguous.
It is not possible to determine at what overloading to use s.
Rule 1 makes n and s monomorphic in a.
Lastly, Rule 2 is required because there is no way to enforce monomorphic use
of an exported binding, except by performing type inference on modules
outside the current module. Exported variables are handled in the
same way as non-exported ones even though their usage outside the
module could theoreticly be used to determine monomorphic type.
For example, in the program
module M(x) where
x = 1
the monomorphism restriction prevents the type of x from being
generalized to Num a => a. Since references to x outside module
M cannot be used to determine the type of x, the defaulting rule
(see Section 4.3.4) assigns the type Int to x.
The monomorphism rule has a number of consequences for the programmer.
Anything defined with function syntax usually
generalizes as a function is expected to. Thus in
f x y = x+y
the function f may be used at any overloading in class Num.
There is no danger of recomputation here. However, the same function
defined with pattern syntax:
f = \x -> \y -> x+y
requires a type signature if f is to be fully overloaded.
Many functions are most naturally defined using simple pattern
bindings; the user must be careful to affix these with type signatures
to retain full overloading. The standard prelude contains many
examples of this:
sum :: (Num a) => [a] -> a
sum = foldl (+) 0
This section describes the rules that are used to perform kind inference, i.e. to calculate a suitable kind for each type constructor and class appearing in a given program.
The first step in the kind inference process is to arrange the set of
datatype, synonym, and class definitions into dependency groups. This can
be achieved in much the same way as the dependency analysis for value
declarations that was described in Section 4.5.
For example, the following program fragment includes the definition
of a datatype constructor D, a synonym S and a class C, all of
which would be included in the same dependency group:
data C a => D a = Foo (S a)
type S a = [D a]
class C a where
bar :: a -> D a -> Bool
The kinds of variables, constructors, and classes within each group
are determined using standard techniques of type inference and
kind-preserving unification [6]. For example, in the
definitions above, the parameter a appears as an argument of the
function constructor (->) in the type of bar and hence must
have kind *. It follows that both D and S must have
kind *->* and that every instance of class C must
have kind *.
It is possible that some parts of an inferred kind may not be fully
determined by the corresponding definitions; in such cases, a default
of * is assumed. For example, we could assume an arbitrary kind
k for the a parameter in each of the following examples:
data App f a = A (f a)
data Tree a = Leaf | Fork (Tree a) (Tree a)
This would give kinds
(k->*)->k->* and
k->* for App and Tree, respectively, for any
kind k, and would require an extension to allow polymorphic
kinds. Instead, using the default binding k=*, the
actual kinds for these two constructors are
(*->*)->*->* and
*->*, respectively.
Defaults are applied to each dependency group without consideration of
the ways in which particular type constructor constants or classes are
used in later dependency groups or elsewhere in the program. For example,
adding the following definition to those above do not influence the
kind inferred for Tree (by changing it to
(*->*)->*, for instance), and instead
generates a static error because the kind of [], *->*,
does not match the kind * that is expected for an argument of Tree:
type FunnyTree = Tree [] -- invalid
This is important because it ensures that each constructor and class are
used consistently with the same kind whenever they are in scope.