Previous Contents
2.4 The GNU Prolog compiler

2.4.1 Different kinds of codes
One of the main advantages of GNU Prolog is its ability to produce stand alone executables. A Prolog program can be compiled to native code to give rise to a machine-dependent executable using the GNU Prolog compiler. However native-code predicates cannot be listed nor fully debugged. So there is an alternative to native-code compilation: byte-code compilation. By default the GNU Prolog compiler produces native-code but via a command-line option it can produce a file ready for byte-code loading. This is exactly what consult/1 does as was explained above (section 2.2.3). GNU Prolog also manages interpreted code using a Prolog interpreter written in Prolog. Obviously interpreted code is slower than byte-code but does not require the invocation of the GNU Prolog compiler. This interpreter is used each time a meta-call is needed as by call/1 (section 5.2.3). This also the case of dynamically asserted clauses. The following table summarizes these three kinds of codes:

Type Speed Debug ? For what
interpreted-code slow yes meta-call and dynamically asserted clauses
byte-code medium yes consulted predicates
native-code fast no compiled predicates

2.4.2 Compilation scheme
Native-code compilation: a Prolog source is compiled in several stages to produce an object file that is linked to the GNU Prolog libraries to produce an executable. The Prolog source is first compiled to obtain a WAM [8] file. For a detailed study of the WAM the interested reader can refer to ``Warren's Abstract Machine: A Tutorial Reconstruction'' [1]. The WAM file is translated to a machine-independent language specifically designed for GNU Prolog. This language is close to a (universal) assembly language and is based on a very reduced instruction set. For this reason this language is called mini-assembly (MA). The mini-assembly file is then mapped to the assembly language of the target machine. This assembly file is assembled to give rise to an object file which is then linked with the GNU Prolog libraries to provide an executable. The compiler also takes into account Finite Domain constraint definition files. It translates them to C and invoke the C compiler to obtain object files. The following figure presents this compilation scheme:


Obviously all intermediate stages are hidden to the user who simply invokes the compiler on his Prolog file(s) (plus other files: C,...) and obtains an executable. However, it is also possible to stop the compiler at any given stage. This can be useful, for instance, to see the WAM code produced (perhaps when learning the WAM). Finally it is possible to give any kind of file to the compiler which will insert it in the compilation chain at the stage corresponding to its type. The type of a file is determined using the suffix of its file name. The following table presents all recognized types/suffixes:

Suffix of the file Type of the file Handled by:
.pl, .pro Prolog source file pl2wam
.wam WAM source file wam2ma
.ma Mini-assembly source file ma2asm
.s Assembly source file the assembler
.c, .C, .CC, .cc, .cxx, .c++, .cpp C or C++ source file the C compiler
.fd Finite Domain constraint source file fd2c
any other suffix (.o, .a,...) any other type (object, library,...) the linker (C linker)

Byte-code compilation: the same compiler can be used to compile a source Prolog file for byte-code. In that case the Prolog to WAM compiler is invoked using a specific option and produces a WAM for byte-code source file (suffixed .wbc) that can be later loaded using load/1 (section 6.23.2). Note that this is exactly what consult/1 (section 6.23.1) does as explained above (section 2.2.3).

2.4.3 Using the compiler
The GNU Prolog compiler is a command-line compiler similar in spirit to a Unix C compiler like gcc. To invoke the compiler use the gplc command as follows:

% gplc [OPTION]... FILE...    (the % symbol is the operating system shell prompt)
The arguments of gplc are file names that are dispatched in the compilation scheme depending on the type determined from their suffix as was explained previously (section 2.4.2). All object files are then linked to produce an executable. Note however that GNU Prolog has no module facility (since there is not yet an ISO reference for Prolog modules) thus a predicate defined in a Prolog file is visible from any other predicate defined in any other file. GNU Prolog allows the user to split a big Prolog source into several files but does not offer any way to hide a predicate from others.

The simplest way to obtain an executable from a Prolog source file prog.pl is to use:

% gplc prog.pl
This will produce an native executable called prog which can be executed as follows:

% prog
However, there are several options that can be used to control the compilation:

General options:

-o FILE, --output FILE use FILE as the name of the output file
-W, --wam-for-native stop after producing WAM files(s)
-w, --wam-for-byte-code stop after producing WAM for byte-code file(s) (force --no-call-c)
-M, --mini-assembly stop after producing mini-assembly files(s)
-S, --assembly stop after producing assembly files (s)
-F, --fd-to-c stop after producing C files(s) from FD constraint definition file(s)
-c, --object stop after producing object files(s)
--temp-dir PATH use PATH as directory for temporary files
--no-del-temp do not delete temporary files
--no-decode-hexa do not decode hexadecimal predicate names
-v, --verbose print executed commands
-h, --help print a help and exit
--version print version number and exit

Prolog to WAM compiler options:

--pl-state FILE read FILE to set the initial Prolog state
--no-inline do not inline predicates
--no-reorder do not reorder predicate arguments
--no-reg-opt do not optimize registers
--min-reg-opt minimally optimize registers
--no-opt-last-subterm do not optimize last subterm compilation
--fast-math use fast mathematical mode (assume integer arithmetic)
--keep-void-inst keep void WAM instructions in the output file
--no-susp-warn do not show warnings for suspicious predicates
--no-singl-warn do not show warnings for named singleton variables
--no-redef-error no not show errors for built-in predicate redefinitions
--no-call-c do not allow the use of fd_tell, '$call_c',...
--compile-msg print a compile message
--statistics print statistics information

WAM to mini-assembly translator options:

--comment include comments in the output file

Mini-assembly to assembly translator options:

--comment include comments in the output file

C compiler options:

--c-compiler FILE use FILE as C compiler
-C OPTION pass OPTION to the C compiler

Assembler options:

-A OPTION pass OPTION to the assembler

Linker options:

--local-size N set default local stack size to N Kb
--global-size N set default global stack size to N Kb
--trail-size N set default trail stack size to N Kb
--cstr-size N set default constraint stack size to N Kb
--fixed-sizes do not consult environment variables at run-time (use default sizes)
--no-top-level do not link the top-level (force --no-debugger)
--no-debugger do not link the Prolog/WAM debugger
--min-pl-bips link only used Prolog built-in predicates
--min-fd-bips link only used FD solver built-in predicates
--min-bips shorthand for: --no-top-level --min-pl-bips --min-fd-bips
--min-size shorthand² for: --min-bips --strip
--no-fd-lib do not look for the FD library (maintenance only)
-s, --strip strip the executable
-L OPTION Pass OPTION to the linker

It is possible to only give the prefix of an option if there is no ambiguity.

The name of the output file is controlled via the -o FILE option. If present the output file produced will be named FILE. If not specified, the output file name depends on the last stage reached by the compiler. If the link is not done the output file name(s) is the input file name(s) with the suffix associated to the last stage. If the link is done, the name of the executable is the name (without suffix) of the first file name encountered in the command-line. Note that if the link is not done -o should be used if only one file name is given as argument.

By default the compiler runs in the native-code compilation scheme. To generate a WAM file for byte-code use the --wam-for-byte-code option. The resulting file can then be loaded using load/1 (section 6.23.2).

To execute the Prolog to WAM compiler in a given read environment (operator definitions, character conversion table,...) use --pl-state FILE. The state file should be produced by write_pl_state_file/1 (section 6.22.5).

By default the Prolog to WAM compiler inlines calls to some deterministic built-in predicates (e.g. arg/3 and functor/3). Namely a call to such a predicate will not yield a classical predicate call but a simple C function call (which is obviously faster). It is possible to avoid this using --no-inline.

Another optimization performed by the Prolog to WAM compiler is unification reordering. The arguments of a predicate are reordered to optimize unification. This can be deactivated using --no-reorder. The compiler also optimizes the unification/loading of nested compound terms. More precisely, the compiler emits optimized instructions when the last subterm of a compound term is itself a compound term (e.g. lists). This can be deactivated using --no-opt-last-subterm.

By default the Prolog to WAM compiler fully optimizes the allocation of registers to decrease both the number of instruction produced and the number of used registers. A good allocation will generate many void instructions that are removed from the produced file except if --keep-void-inst is specified. To prevent any optimization use --no-reg-opt while --min-reg-opt forces the compiler to only perform simple register optimizations.

The Prolog to WAM compiler emits an error when a control construct or a built-in predicate is redefined. This can be avoided using --no-redef-error. The compiler also emits warnings for suspicious predicate definitions like -/2 since this often corresponds to an earlier syntax error (e.g. - instead of _. This can be deactivated by specifying --no-susp-warn. Finally, the compiler warns when a singleton variable has a name (i.e. not the generic anonymous name _). This can be deactivated specifying --no-singl-warn.

Predicate names are encoded with an hexadecimal representation. This is explained in more detail later (section 2.4.6). By default the error messages from the linker (e.g. multiple definitions for a given predicate, reference to an undefined predicate,...) are filtered to replace any hexadecimal representation by the real predicate name. Specifying the --no-decode-hexa prevents gplc from filtering linker output messages and hexadecimal representations are then shown.

When producing an executable it is possible to specify default stack sizes (using --STACK_NAME-size) and to prevent it from consulting environment variables (using --fixed-sizes) as was explained above (section 2.3). By default the produced executable will include the top-level, the Prolog/WAM debugger and all Prolog and FD built-in predicates. It is possible to avoid linking the top-level (section 2.2) by specifying --no-top-level. In this case, at least one initialization/1 directive (section 5.1.13) should be defined. The option --no-debugger does not link the debugger. To include only used built-in predicates that are actually used the options --no-pl-bips and/or --no-fd-bips can be specified. For the smallest executable all these options should be specified. This can be abbreviated by using the shorthand option --min-bips. By default, executables are not stripped, i.e. their symbol table is not removed. This table is only useful for the C debugger (e.g. when interfacing Prolog and C). To remove the symbol table (and then to reduce the size of the final executable) use --strip. Finally --min-size is a shortcut for --min-bips and --strip, i.e. the produced executable is as small as possible.

Example: compile and link two Prolog sources prog1.pl and prog2.pl. The resulting executable will be named prog1 (since -o is not specified):

% gplc prog1.pl prog2.pl
Example: compile the Prolog file prog.pl to study basic WAM code. The resulting file will be named prog.wam:

% gplc -W --no-inline --no-reorder --keep-void-inst prog.pl
Example: compile the Prolog file prog.pl and its C interface file utils.c to provide an autonomous executable called mycommand. The executable is not stripped to allow the use of the C debugger:

% gplc -o mycommand prog.pl utils.c
Example: detail all steps to compile the Prolog file prog.pl (the resulting executable is stripped). All intermediate files are produced (prog.wam, prog.ma, prog.s, prog.o and the executable prog):

% gplc -W prog.pl
% gplc -M --comment prog.wam
% gplc -S --comment prog.ma
% gplc -c prog.s
% gplc -o prog -s prog.o
2.4.4 Running an executable
In this section we explain what happens when running an executable produced by the GNU Prolog native-code compiler. The default main function first starts the Prolog engine. This function collects all linked objects (issued from the compilation of Prolog files) and initializes them. The initialization of a Prolog object file consists in adding to appropriate tables new atoms, new predicates and executing its system directives. A system directive is generated by the Prolog to WAM compiler to reflect a (user) directive executed at compile-time such as op/3 (section 5.1.10). Indeed, when the compiler encounters such a directive it immediately executes it and also generates a system directive to execute it at the start of the executable. When all system directives have been executed the Prolog engine executes all initialization directives defined with initialization/1 (section 5.1.13). If several initialization directives appear in the same file they are executed in the order of appearance. If several initialization directives appear in different files the order in which they are executed is machine-dependant. However, on most machines the order will be the reverse order in which the associated files have been linked (this is not true under native win32). When all initialization directives have been executed the default main function looks for the GNU Prolog top-level. If present (i.e. it has been linked) it is called otherwise the program simply ends. Note that if the top-level is not linked and if there is no initialization directive the program is useless since it simply ends without doing any work. The default main function detects such a behavior and emits a warning message.

Example: compile an empty file prog.pl without linking the top-level and execute it:

% gplc --no-top-level prog.pl
% prog
Warning: no initial goal executed
   use a directive :- initialization(Goal)
   or remove the link option --no-top-level (or --min-bips or --min-size)
2.4.5 Generating a new interactive interpreter
In this section we show how to define a new top-level extending the GNU Prolog interactive interpreter with new predicate definitions. The obtained top-level can then be considered as an enriched version of the basic GNU Prolog top-level (section 2.2). Indeed, each added predicate can be viewed as a predefined predicate just like any other built-in predicate. This can be achieved by compiling these predicates and including the top-level at link-time.

The real question is: why would we include some predicates in a new top-level instead of simply consulting them under the GNU Prolog top-level ? There are two reasons for this:

To define a new top-level simply compile the set of desired predicates and linking them with the GNU Prolog top-level (this is the default) using gplc (section 2.4.3).

Example: let us define a new top-level called my_top_level including all predicates defined in prog.pl:

% gplc -o my_top_level prog.pl
By the way, note that if prog.pl is an empty Prolog file the previous command will simply create a new interactive interpreter similar to the GNU Prolog top-level.

Example: as before where some predicates of prog.pl call C functions defined in utils.c:

% gplc -o my_top_level prog.pl utils.c
In conclusion, defining a particular top-level is nothing else but a particular case of the native-code compilation. It is simple to do and very useful in practice.

2.4.6 The hexadecimal predicate name encoding
When the GNU Prolog compiler compiles a Prolog source to an object file it has to associate a symbol to each predicate name. However, the syntax of symbols is restricted to identifiers: string containing only letters, digits or underscore characters. On the other hand, predicate names (i.e. atoms) can contain any character with quotes if necessary (e.g. 'x+y=z' is a valid predicate name). The compiler has then to encode predicate names respecting the syntax of identifiers. To achieve this, GNU Prolog uses an hexadecimal representation where each predicate name is translated to a symbol beginning with an X followed by the hexadecimal notation of the code of each character of the name.

Example: 'x+y=z' will be encoded as X782B793D7A since 78 is the hexadecimal representation of the code of x, 2B of the code of +, etc.

Since Prolog allows the user to define several predicates with the same name but with a different arity GNU Prolog encodes predicate indicators (predicate name followed by the arity). The symbol associated to the predicate name is then followed by an underscore and by the decimal notation of the arity.

Example: 'x+y=z'/3 will be encoded as X782B793D7A_3.

So, from the mini-assembly stage, each predicate indicator is replaced by its hexadecimal encoding. The knowledge of this encoding is normally not of interest for the user, i.e. the Prolog programmer. For this reason the GNU Prolog compiler hides this encoding. When an error occurs on a predicate (undefined predicate, predicate with multiple definitions,...) the compiler has to decode the symbol associated to the predicate indicator. For this gplc filters each message emitted by the linker to locate and decode eventual predicate indicators. This filtering can be deactivated specifying --no-decode-hexa when invoking gplc (section 2.4.3).

This filter is provided as an utility that can be invoked using the hexgplc command as follows:

% hexgplc [OPTION]... FILE...    (the % symbol is the operating system shell prompt)
Options:

--encode encoding mode (default mode is decoding)
--relax decode also predicate names (not only predicate indicators)
--printf FORMAT pass encoded/decoded string to C printf(3) with FORMAT
--aux-father decode an auxiliary predicate as its father
--aux-father2 decode an auxiliary predicate as its father + auxiliary number
--cmd-line encode/decode each argument of the command-line
-H same as: --cmd-line --encode
-P same as: --cmd-line --relax
--help print a help and exit
--version print version number and exit

It is possible to give a prefix of an option if there is no ambiguity.

Without arguments hexgplc runs in decoding mode reading its standard input and decoding each symbol corresponding to a predicate indicator. To use hexgplc in the encoding mode the --encode option must be specified. By default hexgplc only decodes predicate indicators, this can be relaxed using --relax to also take into account simple predicate names (the arity can be omitted). It is possible to format the output of an encoded/decoded string using --printf FORMAT in that case each string S is passed to the C printf(3) function as printf(FORMAT,S).

Auxiliary predicates are generated by the Prolog to WAM compiler when simplifying some control constructs like ';'/2 present in the body of a clause. They are of the form '$NAME/ARITY_$auxN' where NAME/ARITY is the predicate indicator of the simplified (i.e. father) predicate and N is a sequential number (a predicate can give rise to several auxiliary predicates). It is possible to force hexgplc to decode an auxiliary predicate as its father predicate indicator using --aux-father or as its father predicate indicator followed by the sequential number using --aux-father2.

If no file is specified, hexgplc processes its standard input otherwise each file is treated sequentially. Specifying the --cmd-line option informs hexgplc that each argument is not a file name but a string that must be encoded (or decoded). This is useful to encode/decode a particular string. For this reason the option -H (encode to hexadecimal) and -P (decode to Prolog) are provided as shorthand. Then, to obtain the hexadecimal representation of a predicate P use:

% hexgplc -H P
Example:

% hexgplc -H 'x+y=z'
X782B793D7A

Copyright (C) 1999,2000 Daniel Diaz

Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.

More about the copyright
Previous Contents