SGI STL Allocator Design

This document consists of 2 sections. The first discusses some of the decisions made in the implementation of default allocators in the SGI STL. These decisions influence both the performance of the default allocator, as well as its behavior with leak detectors.

The second section describes the original design of SGI STL allocators. The current SGI STL also supports the allocator interface in the C++ standard. The interface described here is specific to the SGI STL and cannot be used in code that must be portable to other STL implementations. Thus its use is now discouraged. The discussion is nonetheless preserved both for historical reasons, and because it exposes some of the issues surrounding the standard interface.

The SGI Default Allocator

The default allocator alloc maintains its own free lists for small objects. It uses an underlying system-provided allocator both to satisfy larger requests, and to obtain larger chunks of memory which can be subdivided into small objects. Two of the detailed design decisions have proven to be controversial, and are explained here.

`Alloc` obtains memory from `malloc`

Malloc is used as the underlying system-provided allocator. This is a minor design decision. ::operator new could also have been used. Malloc has the advantage that it allows for predictable and simple failure detection. ::operator new would have made this more complicated given the portability and thread-safety constraints, and the possibility of user provided new handlers.

`Alloc` does not `free` all memory

Memory allocated for blocks of small objects is not returned to malloc. It can only be reused by subsequent alloc::allocate requests of (approximately) the same size. Thus programs that use alloc may appear to leak memory when monitored by some simple leak detectors. This is intentional. Such "leaks" do not accumulate over time. Such "leaks" are not reported by garbage-collector-like leak detectors.

The primary design criterion for alloc was to make it no slower than the HP STL per-class allocators, but potentially thread-safe, and significantly less prone to fragmentation. Like the HP allocators, it does not maintain the necessary data structures to free entire chunks of small objects when none of the contained small objects are in use. This is an intentional choice of execution time over space use. It may not be appropriate for all programs. On many systems malloc_alloc may be more space efficient, and can be used when that is crucial.

The HP allocator design returned entire memory pools when the entire allocator was no longer needed. To allow this, it maintains a count of containers using a particular allocator. With the SGI design, this would only happen when the last container disappears, which is typically just before program exit. In most environments, this would be highly counterproductive; free would typically have to touch many long unreferenced pages just before the operating system reclaims them anyway. It would often introduce a significant delay on program exit, and would possibly page out large portions of other applications. There is nothing to be gained by this action, since the OS normally reclaims memory on program exit, and it should do so without touching that memory.

In general, we recommend that leak detection tests be run with malloc_alloc instead of alloc. This yields more precise results with GC-based detectors (e.g. Pure Atria's Purify), and it provides useful results with detectors that simply count allocations and deallocations.

No Attempt to sort free lists

The default allocator makes no special attempt to ensure that consecutively allocated objects are "close" to each other, i.e. share a cache line or a page. A deallocation request adds an object to the head of a free list, and allocations remove the last deallocated object of the appropriate size. Thus allocation and deallocation each require a minimum number of instructions.

This appears to perform very well for small applications which fit into cache. It also performs well for regular applications that deallocate consecutively allocated objects consecutively. For such applications, free lists tend to remain approximately in address order. But there are no doubt applications for which some effort invested in approximate sorting of free lists would be repayed in improved cache performance.

The Original SGI STL allocator interface

The SGI specific allocator interface is much simpler than either the HP STL one or the interface in the C++ standard. Here we outline the considerations that led to this design.

An SGI STL allocator consists of a class with 3 required member functions, all of which are static:

void * allocate(size_t n): Allocates an object with the indicated size (in bytes). The object must be sufficiently aligned for any object of the requested size.
void deallocate(void *p, size_t n): Deallocates the object p, which must have been allocated with the same allocator and a requested size of n.
void * reallocate(void *p, size_t old_sz, size_t new_sz): Resize object p, previously allocated to contain old_sz bytes, so that it now contains new_sz bytes. Return a pointer to the resulting object of size new_sz. The functions either returns p, after suitably adjusting the object in-place, or it copies min(old_sz, new_sz) bytes from the old object into a newly allocated object, deallocates the old object, and returns a pointer to the new object. Note that the copy is performed bit-wise; this is usually inappropriate if the object has a user defined copy constructor or destructor. Fully generic container implementations do not normally use reallocate; however it can be a performance enhancement for certain container specializations.

A discussion of the design decisions behind this rather simple design follows:

Allocators are not templates

Our allocators operate on raw, untyped memory in the same way that C's malloc does. They know nothing about the eventual type of the object. This means that the implementor of an allocator to worry about types; allocators can deal with just bytes. We provide the simple_alloc adaptor to turn a byte-based allocator into one that allocates n objects of type T. Type-oblivious allocators have the advantage that containers do not require either template template arguments or the "rebind" mechanism in the standard.

The cost is that allocators that really do need type information (e.g. for garbage collection) become somewhat harder to implement. This can be handled in a limited way by specializing simple_alloc.

(The STL standard allocators are in fact implemented with the aid of templates. But this is done mostly so that they can be distributed easily as .h files.)

Allocators do not encapsulate pointer types

(Largely shared with SGI standard allocators. The standard allows allocators to encapsulate pointer types, but does not guarantee that standard containers are functional with allocators using nonstandard pointer types.) Unlike the HP STL, our allocators do not attempt to allow use of different pointer types. They traffic only in standard void * pointers. There were several reasons for abandoning the HP approach:

It is not really possible to define an alternate notion of pointer within C++ without explicit compiler support. Doing so would also require the definition of a corresponding "reference" type. But there is no class that behaves like a reference. The "." field selection operation can only be applied to a reference. A template function (e.g. max) expecting a T& will usually not work when instantiated with a proxy reference type. Even proxy pointer types are problematic. Constructors require a real this pointer, not a proxy.
Existing STL data structures should usually not be used in environments which require very different notions of pointers, e.g. for disk-based data structures. A disk-bases set or map would require a data structure that is much more aware of locality issues. The implementation would probably also have to deal with two different pointer types: real pointers to memory allocated temporaries and references to disk based objects. Thus even the HP STL notion of encapsulated pointer types would probably be insufficient.
This leaves compiler supported pointers of different sizes (e.g. DOS/win16 "far" pointers). These no longer seem to be of much interest in a general purpose library. Win32 no longer makes such distinctions. Neither do any other modern (i.e. 32- or 64-bit pointer) environments of which we are aware. The issue will probably continue to matter for low end embedded systems, but those typically require somewhat nonstandard programming environments in any case. Furthermore, the same template instantiation problems as above will apply.

There are no allocator objects

An allocator's behavior is completely determined by its type. All data members of an allocator are static.

This means that containers do not need allocator members in order to allocate memory from the proper source. This avoids unneeded space overhead and/or complexity in the container code.

It also avoids a number of tricky questions about memory allocation in operations involving multiple containers. For example, it would otherwise be unclear whether concatenation of ropes built with two different allocators should be acceptable and if so, which allocator should be used for the result.

This is occasionally a significant restriction. For example, it is not possible to arrange for different containers to allocate memory mapped to different files by passing different allocator instances to the container constructors. Instead one must use one of the following alternatives:

The container classes must be instantiated with different allocators, one for each file. This results in different container types. This forces containers that may be mapped to different files to have distinct type, which may be a troublesome restriction, though it also results in compile-time detection of errors that might otherwise be difficult to diagnose.
The containers can be instantiated with a single allocator, which can be redirected to different files by calling additional member functions. The allocator must be suitably redirected before container calls that may allocate.

Allocators allocate individual objects

(Shared with standard allocators.) Some C++ programming texts have suggested that individual classes keep a free lists of frequently allocated kinds of objects, so that most allocation requests can be satisfied from the per-class free list. When an allocation request encounters an empty free list, a potentially slower allocator (e.g. new or malloc) is called to allocate several of the required objects at once, which are then used to replenish the free list.

The HP STL was designed along these lines. Allocators were intended to be used as the slower backup allocator. Containers like list were presumed to maintain their own free list.

Based on some small experiments, this has no performance advantage over directly calling the allocate function for individual objects. In fact, the generated code is essentially the same. The default allocator we provide is easily inlined. Hence any calling overhead is eliminated. If the object size is statically known (the case in which per-class free lists may be presumed to help), the address of the free list header can also be determined by the compiler.

Per-class free lists do however have many disadvantages:

They introduce fragmentation. Memory in the free list for class A cannot be reused by another class B, even if only class B objects are allocated for the remainder of program execution. This is particularly unfortunate if instead of a single class A there are many instances of a template class each of which has its own free list.
They make it much more difficult to construct thread-safe containers. A class that maintains its own free list must ensure that allocations from different threads on behalf of different containers cannot interfere with each other. This typically means that every class must be aware of the underlying synchronization primitives. If allocators allocate individual objects, then only allocators themselves need to address this issue, and most container implementations can be independent of the threading mechanism.
Garbage collecting allocators are difficult to accommodate. The garbage collector will see per-class free lists as accessible memory. If pointers in these free objects are not explicitly cleared, anything they reference will also be retained by the collector, reducing the effectiveness of the collector, sometimes seriously so.

Deallocate requires size argument

We chose to require an explicit size argument, both for deallocate, and to describe the original size of the object in the reallocate call. This means that no hidden per-object space overhead is required; small objects use only the space explicitly requested by the client. Thus, even in the absence of fragmentation, space usage is the same as for per-class allocators.

This choice also removes some time overhead in the important special case of fixed-size allocation. In this case, the size of the object, and thus the address of the free-list header becomes a clearly recognizable compile-time constant. Thus the generated code should be identical to that needed by a per-class free-list allocator, even if the class requires objects of only a single size.

We include realloc-style reallocation in the interface

This is probably the most questionable design decision. It would have probably been a bit more useful to provide a version of reallocate that either changed the size of the existing object without copying or returned NULL. This would have made it directly useful for objects with copy constructors. It would also have avoided unnecessary copying in cases in which the original object had not been completely filled in.

Unfortunately, this would have prohibited use of realloc from the C library. This in turn would have added complexity to many allocator implementations, and would have made interaction with memory-debugging tools more difficult. Thus we decided against this alternative.

The actual version of reallocate is still quite useful for container specializations to specific element types. The STL implementation is starting to take advantage of that.