A shared memory multiprocessor experiences various types of delays, ranging from primary cache misses to page faults. If locks are used, care must be taken that the system is not delayed as a whole when a processor is holding a lock is delayed. For non-blocking algorithms, however, no such care is necessary.
In this thesis, I investigate the performance of non-blocking algorithms. The investigation consists of a series of simulation experiences on non-blocking algorithms. To properly evaluate the performances of these algorithms, the simulator is accurate down to the bus-cycle level. The design and implementation of the simulator is based on a study of current shared-memory multiprocessors and their cache-coherence protocols.
For these experiments, I use non-blocking algorithms developed from a survey of recent techniques for deriving non-blocking algorithms based on primitive instructions such as COMPARE&SWAP, and LOAD-LINKED/STORE-CONDITIONAL. A consequence of this survey was that I developed an efficience non-blocking Queue implementation.
The main conclusion of my investigation is that, even in the absence of long delays, non-blocking algorithms typically outperform traditional lock-based algorithms, and, in particular, a priority-based software protocol developed by me matches the performance of hardware-based implementations.