Go to the previous, next section.
This chapter describes what parallel Prolog execution is about, and how Muse can be used as a "prolog accelerator" to make programs run faster in parallel than they do sequentially.
Logic programs offer many opportunities for the exploitation of parallelism, the main ones being OR-parallelism and AND-parallelism [Conery 83]. Each kind has its own advantages and disadvantages, but we have chosen to focus on OR-parallelism as a first step for a number of reasons. From our perspective, the wide range of potential applications and the very slight execution overhead have been the main reasons. See [Lusk et al. 90] for a fuller discussion about the advantages of OR-parallelism in the context of the Aurora system.
The goal with Muse, as with Aurora, has been to allow multiple processors to exploit the implicit OR-parallelism of Prolog programs with a minimum of overhead, preserving the full Prolog language with its standard semantics.
In this section we illustrate the parallelism in the well-known "8-queens" problem. This code is reproduced from [Sterling & Shapiro 86] with permission from the authors:
% file: queens.pl queens(N, Qs) :- range(1, N, Ns), queens(Ns, [], Qs). queens([], Qs, Qs). queens(UnplacedQs, SafeQs, Qs) :- select(UnplacedQs, UnplacedQs1, Q), not_attack(SafeQs, Q), queens(UnplacedQs1, [Q|SafeQs], Qs). not_attack(Xs, X) :- not_attack(Xs, X, 1). not_attack([], _, _) :- !. not_attack([Y|Ys], X, N) :- X =\= Y+N, X =\= Y-N, N1 is N+1, not_attack(Ys, X, N1). select([X|Xs], Xs, X). select([Y|Ys], [Y|Zs], X) :- select(Ys, Zs, X). range(N, N, [N]) :- !. range(M, N, [M|Ns]) :- M < N, M1 is M+1, range(M1, N, Ns). run :- statistics(walltime, [Start,_]), findall(X, queens(10, X), L), statistics(walltime, [End,_]), length(L, N), Time is End - Start, format('~d solutions in ~3d seconds.~n', [N,Time]).
Note the use of the walltime
key (i.e. real elapsed time) in
the call to statistics/2
. The runtime
key measures CPU
time on the worker that happens to be executing, and so normally is
not useful.
Below we show the result of running the queens problem first with 5 workers and then with a single worker. In this case the speed-up is 3.8.
% sicstus -P SICStus -P 3 #0: Mon Oct 3 12:44:31 MET 1994 | ?- compile(queens). {compiling /src/sicstus/muse/V3/examples/queens.pl...} yes | ?- muse_flag(num_workers,_,5). | ?- run. 724 solutions in 2.760 seconds. yes | ?- muse_flag(num_workers,_,1). | ?- run. 724 solutions in 10.400 seconds. yes | ?- Speedup is 10400/2760. Speedup = 3.7681159420289854 ? yes | ?- halt. %
The Muse model assumes a multiprocessor system with a number of processors or processes, called workers, with identical local address spaces, and some global address space shared by all workers. We use the term worker to represent a process or processor. The Muse model implements each worker as a sequential Prolog engine with its own WAM stacks in private memory. The Prolog program is stored in shared memory.
During execution, the actual work is to explore a search tree implicitly defined by the Prolog program. The available work is represented as choicepoints containing outstanding execution alternatives on one of the WAM stacks.
The work is performed concurrently by the workers. A distributed scheduler is responsible for dynamic load balancing of available work among available workers. Workers try to maximize the time they spend working and minimize the time they spend scheduling. When a worker is working, it adopts a depth-first left-to-right search strategy.
In the Muse model each worker keeps its own WAM stacks to avoid conflicting variable bindings in common parts of the search tree. Whenever a worker runs Prolog it therefore can use (more or less) the same machinery as the sequential SICStus Prolog implementation. The exceptions are (1) bookkeeping of local work each time a choicepoint is created, (2) testing for signals at predicate calls, and (3) synchronizing with other workers when taking work from a shared (public) choicepoint. The overhead for doing this is some few percent making Muse (nearly) as efficient as sequential SICStus Prolog.
When a worker (Q) runs out of work it must interrupt some busy worker (P) to get work. All of P's private choicepoints are then made public and the difference between P's and Q's state is copied from P to Q. Q can then simply simulate failure to backtrack for work.
The two main functions of the scheduler are to maintain the sequential semantics of Prolog and to match idle workers with the available work with minimal overhead. The sources of overhead in the Muse model include (1) copying a part of worker state, (2) making local choicepoints sharable, and (3) grabbing a piece of work (a task) from a shared choicepoint. Other important scheduling issues are to maximize the grain size of computations, and to prefer "promising" work over less promising work (see section Cut, Side Effects and Suspension).
Cut has a semantics strictly compatible with sequential Prolog. Sometimes a cut may be (partly) invalid due to the fact that the current branch may be pruned by another cut of smaller scope [Hausman 90]. In this case the worker executes as much as possible of the cut operation, saves information in the public search tree on how to later on continue executing the cut operation, and then the worker continues with the current work.
Calls to predicates that produce side-effects are required to synchronize, i.e. to proceed only if they are on the leftmost branch of the entire tree. If a side-effect predicate is reached on a non-leftmost branch, that branch will have to suspend.
Suspending and resuming execution branches can be costly, as these actions involve copying state. The overhead can even outweigh the speedup gained from running in parallel. Therefore, the use of side-effects in parallel computations is discouraged. Also see section Programming Considerations.
The current work may be endangered (i.e. possibly pruned) by some
other worker executing cut or raise_exception/1
. Such work is
called speculative work. If there exists untaken work that is
less speculative, then it might be a good idea to suspend the current
work and move to the less speculative one [Ali & Karlsson 92]. This
functionality (called voluntary suspension) is implemented in Muse and
it improves execution of "search for one solution" problems
dramatically, possibly from speed-up 1 to nearly linear speed-up.
Unfortunately the moving of workers introduces some overhead
(typically below 10 percent). Therefore it is possible to turn
voluntary suspension off for programs searching for all solutions.
The Muse distribution provides two tools that enable one to obtain insight into the parallel behavior of his Prolog program. These tools can help in understanding "performance bugs", situations in which a program is executing correctly but not as fast as anticipated, due to an unforeseen restriction of the degree of parallelism in the program. Once such restrictions are understood they can often be removed by changing the program, resulting in greatly improved performance.
Both of these tools use log files, that is, they do not display graphics while the program is running, but rather display the contents of a file that is created during program execution. This method has several advantages.
In this manual we will not discuss the various graphics tracers in detail, but rather we will leave the details of each to its own manual. We will, however, describe how to collect the log files each one needs.
The Must graphics tracing facility [Karlsson 92] shows the Prolog search tree as it is expanded and contracted in parallel by Muse workers.
Having created a version of Muse with the Must tracing option, the call
"muse_trace(Query)
" is used for running the query
"Query
" with tracing. After the query is run, there will be a
file (with extension `.mt') containing a trace of the last query
execution. The distribution only contains the Sun-4 executable. It can
be used both under SunOS 4.x (Solaris 1.x) and SunOS 5.x (Solaris 2.x).
The program itself has a dialog box to request the filename of the
trace file or you can just specify the file on the command line.
The Visandor graphics tracing program was originally developed for the
&-Prolog
system [Hermenegildo and Greene 90]. It is supplied in
binary executable form for Sun-4 computers. More information about that
tool and related work is available via WWW from:
`http://www.clip.dia.fi.upm.es', or via email from
<clip@clip.dia.fi.upm.es>
.
The mt2vt
tool may be used for converting a Must file to a
Visandor file (with extension `.vt').
The Must and Visandor tools can be combined into the Vimust tool. If
the Must file (with extension `.mt') and its corresponding Visandor
file (with extension `.vt') both exist, they
can the be viewed together with the vimust
tool.
It is possible to control the parallelism of a program by means of
parallel
and sequential
declarations (see section Declarations).
Sequential declarations may be necessary to prevent infinite loops in
certain programming constructs, such as failure driven loops.
Calls of interpreted predicates are synchronized and all interpreted predicates are sequential. The debugger also sequentializes the execution.
All nondeterministic built-in predicates are sequential. However, if a
parallel predicate is called by call/1
, its execution is still
parallel.
We have taken pains to preserve the observable "sequential" semantics
during parallel execution. This is achieved through synchronizing all
built-in predicates with side-effects. As the foreign predicates may
involve side-effects, in principle they should be synchronized, too. We
decided not to synchronize the foreign predicates as this often may not
be necessary. (The user can call the muse_sync/0
built-in predicate
immediately before the foreign predicate to achieve synchronization;
see section Muse).
There are application areas where the principle of preserving sequential semantics seriously limits the available parallelism, since communication between parallel execution branches becomes virtually impossible. The principle may even slow down the execution, since suspending and resuming execution branches are costly operations. One way to overcome these limitations is to use the foreign language interface. As an experimental language extension, we have furthermore provided a set of primitives which can be used for non-synchronized communication. These primitives maintain a blackboard, i.e. a mapping from keys to terms, that can be read or written simultaneously by parallel execution branches. See section Blackboard Primitives.
Warning. The blackboard predicates, as well as predicates
loaded via the foreign language interface, can be executed even if the
current work is speculative. These predicates are cavalier,
i.e. a call might be executed in a parallel execution, even if that
call would never be reached in a sequential execution, if the code
contains red cuts (see section The Cut Symbol) or calls to
raise_exception/1
.
This problem can be partly cured by avoiding red cuts while searching for solutions, or by inserting extra validity tests in the program before calling a cavalier primitive, tests which would be redundant in a sequential execution. Remember that the validity tests may themselves succeed out of order if they contain parallelism, so they should also be written without red cuts and exceptions, or be explicitly sequentialized.
Ideally, the cavalier predicates should be asynchronous rather than
cavalier. This means that they should suspend until they are no
longer endangered by cuts or calls to raise_exception/1
, but that
functionality cannot be provided in the current release, at least.
Other limitations in Muse concern certain built-in predicates and the foreign language interface. The limitations are mentioned when describing these functions. They can be summarized as follows.
stream_select/3
, stream_interrupt/3
, undo/1
,
break/0
are not available.
save/1
, save/2
, save_program/1
,
restore/1
, and load_foreign_files/2
require that Muse be adjusted to one worker.
statistics/2
options runtime
is not meaningful.
SP_read_hook
is not available.
Muse is still an experimental system and is not formally supported.
Go to the previous, next section.