The library contains only header files, and does not require any
other libraries except the standard C++ library . All classes are
defined in namespace __gnu_pbds
. The library internally
uses macros beginning with PB_DS
, but
#undef
s anything it #define
s (except for
header guards). Compiling the library in an environment where macros
beginning in PB_DS
are defined, may yield unpredictable
results in compilation, execution, or both.
Further dependencies are necessary to create the visual output for the performance tests. To create these graphs, an additional package is needed: pychart.
The various data structures are organized as follows.
Branch-Based
basic_branch
is an abstract base class for branched-based
associative-containers
tree
is a concrete base class for tree-based
associative-containers
trie
is a concrete base class trie-based
associative-containers
Hash-Based
basic_hash_table
is an abstract base class for hash-based
associative-containers
cc_hash_table
is a concrete collision-chaining hash-based
associative-containers
gp_hash_table
is a concrete (general) probing hash-based
associative-containers
List-Based
list_update
list-based update-policy associative container
Heap-Based
priority_queue
A priority queue.
The hierarchy is composed naturally so that commonality is
captured by base classes. Thus operator[]
is defined at the base of any hierarchy, since all derived
containers support it. Conversely split
is
defined in basic_branch
, since only
tree-like containers support it.
In addition, there are the following diagnostics classes, used to report errors specific to this library's data structures.
For the most part, the policy-based containers containers in
namespace __gnu_pbds
have the same interface as
the equivalent containers in the standard C++ library, except for
the names used for the container classes themselves. For example,
this shows basic operations on a collision-chaining hash-based
container:
#include <ext/pb_ds/assoc_container.h> int main() { __gnu_pbds::cc_hash_table<int, char> c; c[2] = 'b'; assert(c.find(1) == c.end()); };
The container is called
__gnu_pbds::cc_hash_table
instead of
std::unordered_map
, since “unordered
map” does not necessarily mean a hash-based map as implied by
the C++ library (C++11 or TR1). For example, list-based associative
containers, which are very useful for the construction of
"multimaps," are also unordered.
This snippet shows a red-black tree based container:
#include <ext/pb_ds/assoc_container.h> int main() { __gnu_pbds::tree<int, char> c; c[2] = 'b'; assert(c.find(2) != c.end()); };
The container is called tree
instead of
map
since the underlying data structures are
being named with specificity.
The member function naming convention is to strive to be the same as
the equivalent member functions in other C++ standard library
containers. The familiar methods are unchanged:
begin
, end
,
size
, empty
, and
clear
.
This isn't to say that things are exactly as one would expect, given the container requirments and interfaces in the C++ standard.
The names of containers' policies and policy accessors are different then the usual. For example, if hash_type is some type of hash-based container, then
hash_type::hash_fn
gives the type of its hash functor, and if obj
is
some hash-based container object, then
obj.get_hash_fn()
will return a reference to its hash-functor object.
Similarly, if tree_type is some type of tree-based container, then
tree_type::cmp_fn
gives the type of its comparison functor, and if
obj
is some tree-based container object,
then
obj.get_cmp_fn()
will return a reference to its comparison-functor object.
It would be nice to give names consistent with those in the existing
C++ standard (inclusive of TR1). Unfortunately, these standard
containers don't consistently name types and methods. For example,
std::tr1::unordered_map
uses
hasher for the hash functor, but
std::map
uses key_compare for
the comparison functor. Also, we could not find an accessor for
std::tr1::unordered_map
's hash functor, but
std::map
uses compare
for accessing the comparison functor.
Instead, __gnu_pbds
attempts to be internally
consistent, and uses standard-derived terminology if possible.
Another source of difference is in scope:
__gnu_pbds
contains more types of associative
containers than the standard C++ library, and more opportunities
to configure these new containers, since different types of
associative containers are useful in different settings.
Namespace __gnu_pbds
contains different classes for
hash-based containers, tree-based containers, trie-based containers,
and list-based containers.
Since associative containers share parts of their interface, they are organized as a class hierarchy.
Each type or method is defined in the most-common ancestor in which it makes sense.
For example, all associative containers support iteration expressed in the following form:
const_iterator begin() const; iterator begin(); const_iterator end() const; iterator end();
But not all containers contain or use hash functors. Yet, both
collision-chaining and (general) probing hash-based associative
containers have a hash functor, so
basic_hash_table
contains the interface:
const hash_fn& get_hash_fn() const; hash_fn& get_hash_fn();
so all hash-based associative containers inherit the same hash-functor accessor methods.
In general, each of this library's containers is parametrized by more policies than those of the standard library. For example, the standard hash-based container is parametrized as follows:
template<typename Key, typename Mapped, typename Hash, typename Pred, typename Allocator, bool Cache_Hashe_Code> class unordered_map;
and so can be configured by key type, mapped type, a functor that translates keys to unsigned integral types, an equivalence predicate, an allocator, and an indicator whether to store hash values with each entry. this library's collision-chaining hash-based container is parametrized as
template<typename Key, typename Mapped, typename Hash_Fn, typename Eq_Fn, typename Comb_Hash_Fn, typename Resize_Policy, bool Store_Hash typename Allocator> class cc_hash_table;
and so can be configured by the first four types of
std::tr1::unordered_map
, then a
policy for translating the key-hash result into a position
within the table, then a policy by which the table resizes,
an indicator whether to store hash values with each entry,
and an allocator (which is typically the last template
parameter in standard containers).
Nearly all policy parameters have default values, so this need not be considered for casual use. It is important to note, however, that hash-based containers' policies can dramatically alter their performance in different settings, and that tree-based containers' policies can make them useful for other purposes than just look-up.
As opposed to associative containers, priority queues have relatively few configuration options. The priority queue is parametrized as follows:
template<typename Value_Type, typename Cmp_Fn,typename Tag, typename Allocator> class priority_queue;
The Value_Type
, Cmp_Fn
, and
Allocator
parameters are the container's value type,
comparison-functor type, and allocator type, respectively;
these are very similar to the standard's priority queue. The
Tag
parameter is different: there are a number of
pre-defined tag types corresponding to binary heaps, binomial
heaps, etc., and Tag
should be instantiated
by one of them.
Note that as opposed to the
std::priority_queue
,
__gnu_pbds::priority_queue
is not a
sequence-adapter; it is a regular container.
A containers underlying data structure affect their performance; Unfortunately, they can also affect their interface. When manipulating generically associative containers, it is often useful to be able to statically determine what they can support and what the cannot.
Happily, the standard provides a good solution to a similar
problem - that of the different behavior of iterators. If
It
is an iterator, then
typename std::iterator_traits<It>::iterator_category
is one of a small number of pre-defined tag classes, and
typename std::iterator_traits<It>::value_type
is the value type to which the iterator "points".
Similarly, in this library, if C is a
container, then container_traits
is a
trait class that stores information about the kind of
container that is implemented.
typename container_traits<C>::container_category
is one of a small number of predefined tag structures that uniquely identifies the type of underlying data structure.
In most cases, however, the exact underlying data structure is not really important, but what is important is one of its other attributes: whether it guarantees storing elements by key order, for example. For this one can use
typename container_traits<C>::order_preserving
Also,
typename container_traits<C>::invalidation_guarantee
is the container's invalidation guarantee. Invalidation guarantees are especially important regarding priority queues, since in this library's design, iterators are practically the only way to manipulate them.
This library differentiates between two types of methods
and iterators: point-type, and range-type. For example,
find
and insert
are point-type methods, since
they each deal with a specific element; their returned
iterators are point-type iterators. begin
and
end
are range-type methods, since they are not used to
find a specific element, but rather to go over all elements in
a container object; their returned iterators are range-type
iterators.
Most containers store elements in an order that is determined by their interface. Correspondingly, it is fine that their point-type iterators are synonymous with their range-type iterators. For example, in the following snippet
std::for_each(c.find(1), c.find(5), foo);
two point-type iterators (returned by find
) are used
for a range-type purpose - going over all elements whose key is
between 1 and 5.
Conversely, the above snippet makes no sense for self-organizing containers - ones that order (and reorder) their elements by implementation. It would be nice to have a uniform iterator system that would allow the above snippet to compile only if it made sense.
This could trivially be done by specializing
std::for_each
for the case of iterators returned by
std::tr1::unordered_map
, but this would only solve the
problem for one algorithm and one container. Fundamentally, the
problem is that one can loop using a self-organizing
container's point-type iterators.
This library's containers define two families of iterators: point_const_iterator and point_iterator are the iterator types returned by point-type methods; const_iterator and iterator are the iterator types returned by range-type methods.
class <- some container -> { public: ... typedef <- something -> const_iterator; typedef <- something -> iterator; typedef <- something -> point_const_iterator; typedef <- something -> point_iterator; ... public: ... const_iterator begin () const; iterator begin(); point_const_iterator find(...) const; point_iterator find(...); };
For containers whose interface defines sequence order , it is very simple: point-type and range-type iterators are exactly the same, which means that the above snippet will compile if it is used for an order-preserving associative container.
For self-organizing containers, however, (hash-based
containers as a special example), the preceding snippet will
not compile, because their point-type iterators do not support
operator++
.
In any case, both for order-preserving and self-organizing containers, the following snippet will compile:
typename Cntnr::point_iterator it = c.find(2);
because a range-type iterator can always be converted to a point-type iterator.
Distingushing between iterator types also
raises the point that a container's iterators might have
different invalidation rules concerning their de-referencing
abilities and movement abilities. This now corresponds exactly
to the question of whether point-type and range-type iterators
are valid. As explained above, container_traits
allows
querying a container for its data structure attributes. The
iterator-invalidation guarantees are certainly a property of
the underlying data structure, and so
container_traits<C>::invalidation_guarantee
gives one of three pre-determined types that answer this query.
Additional code examples are provided in the source distribution, as part of the regression and performance testsuite.
Basic use of maps:
basic_map.cc
Basic use of sets:
basic_set.cc
Conditionally erasing values from an associative container object:
erase_if.cc
Basic use of multimaps:
basic_multimap.cc
Basic use of multisets:
basic_multiset.cc
Basic use of priority queues:
basic_priority_queue.cc
Splitting and joining priority queues:
priority_queue_split_join.cc
Conditionally erasing values from a priority queue:
priority_queue_erase_if.cc
Using container_traits
to query
about underlying data structure behavior:
assoc_container_traits.cc
A non-compiling example showing wrong use of finding keys in
hash-based containers: hash_find_neg.cc
Using container_traits
to query about underlying data structure behavior:
priority_queue_container_traits.cc
Setting the initial size of a hash-based container
object:
hash_initial_size.cc
A non-compiling example showing how not to resize a
hash-based container object:
hash_resize_neg.cc
Resizing the size of a hash-based container object:
hash_resize.cc
Showing an illegal resize of a hash-based container
object:
hash_illegal_resize.cc
Changing the load factors of a hash-based container
object: hash_load_set_change.cc
Using a modulo range-hashing function for the case of an
unknown skewed key distribution:
hash_mod.cc
Writing a range-hashing functor for the case of a known
skewed key distribution:
shift_mask.cc
Storing the hash value along with each key:
store_hash.cc
Writing a ranged-hash functor:
ranged_hash.cc
Joining two tree-based container objects:
tree_join.cc
Splitting a PATRICIA trie container object:
trie_split.cc
Order statistics while joining two tree-based container
objects:
tree_order_statistics_join.cc
Using trees for order statistics:
tree_order_statistics.cc
Augmenting trees to support operations on line
intervals:
tree_intervals.cc