Container classes are the
solution to a specific kind of code reuse problem. They are building blocks used
to create object-oriented programs – they make the internals of a program
much easier to construct.
A container class describes an object
that holds other objects. Container classes are so important that they were
considered fundamental to early object-oriented languages. In Smalltalk, for
example, programmers think of the language as the program translator together
with the class library, and a critical part of that library is the container
classes. So it became natural that C++ compiler vendors also include a container
class library. You’ll note that the vector was so useful that it
was introduced in its simplest form very early in this book.
Like many other early C++ libraries,
early container class libraries followed Smalltalk’s object-based
hierarchy, which worked well for Smalltalk, but turned out to be awkward and
difficult to use in C++. Another approach was required.
This chapter attempts to slowly work you
into the concepts of the C++ Standard Template Library (STL), which is a
powerful library of containers (as well as algorithms, but these are
covered in the following chapter). In the past, I have taught that there is a
relatively small subset of elements and ideas that you need to understand in
order to get much of the usefulness from the STL. Although this can be true it
turns out that understanding the STL more deeply is important to gain the full
power of the library. This chapter and the next probe into the STL containers
and
algorithms.
If you don’t know how many objects
you’re going to need to solve a particular problem, or how long they will
last, you also don’t know how to store those objects. How can you know how
much space to create? You can’t, since that information isn’t known
until run time.
The solution to most problems in
object-oriented design seems flippant: you create another type of object. For
the storage problem, the new type of object holds other objects, or pointers to
objects. Of course, you can do the same thing with an array, but there’s
more. This new type of object, which is typically referred to in C++ as a
container (also called a collection in some languages), will
expand itself whenever necessary to accommodate everything you place inside it.
So you don’t need to know how many objects you’re going to hold in a
collection. You just create a collection object and let it take care of the
details.
Fortunately, a good OOP language comes
with a set of containers as part of the package. In C++, it’s the Standard
Template Library (STL). In some libraries, a generic container is considered
good enough for all needs, and in others (C++ in particular) the library has
different types of containers for different needs: a vector for consistent
access to all elements, and a linked list for consistent insertion at all
elements, for example, so you can choose the particular type that fits your
needs. These may include sets, queues, hash tables, trees, stacks,
etc.
All containers have some way to put
things in and get things out. The way that you place something into a container
is fairly obvious. There’s a function called “push” or
“add” or a similar name. Fetching things out of a container is not
always as apparent; if it’s an array-like entity such as a vector, you
might be able to use an indexing operator or function. But in many situations
this doesn’t make sense. Also, a single-selection function is restrictive.
What if you want to manipulate or compare a group of elements in the
container?
The solution is an iterator, which
is an object whose job is to select the elements within a container and present
them to the user of the iterator. As a class, it also provides a level of
abstraction. This abstraction can be used to separate the details of the
container from the code that’s accessing that container. The container,
via the iterator, is abstracted to be simply a sequence. The iterator allows you
to traverse that sequence without worrying about the underlying structure
– that is, whether it’s a vector, a linked list, a stack or
something else. This gives you the flexibility to easily change the underlying
data structure without disturbing the code in your program.
From the design standpoint, all you
really want is a sequence that can be manipulated to solve your problem. If a
single type of sequence satisfied all of your needs, there’d be no reason
to have different kinds. There are two reasons that you need a choice of
containers. First, containers provide different types of interfaces and external
behavior. A stack has a different interface and behavior than that of a queue,
which is different than that of a set or a list. One of these might provide a
more flexible solution to your problem than the other. Second, different
containers have different efficiencies for certain operations. The best example
is a vector and a list. Both are simple sequences that can have identical
interfaces and external behaviors. But certain operations can have radically
different costs. Randomly accessing elements in a vector is a constant-time
operation; it takes the same amount of time regardless of the element you
select. However, in a linked list it is expensive to move through the list to
randomly select an element, and it takes longer to find an element if it is
further down the list. On the other hand, if you want to insert an element in
the middle of a sequence, it’s much cheaper in a list than in a vector.
These and other operations have different efficiencies depending upon the
underlying structure of the sequence. In the design phase, you might start with
a list and, when tuning for performance, change to a vector. Because of the
abstraction via iterators, you can change from one to the other with minimal
impact on your code.
In the end, remember that a container is
only a storage cabinet to put objects in. If that cabinet solves all of your
needs, it doesn’t really matter how it is implemented (a basic
concept with most types of objects). If you’re working in a programming
environment that has built-in overhead due to other factors, then the cost
difference between a vector and a linked list might not matter. You might need
only one type of sequence. You can even imagine the “perfect”
container abstraction, which can automatically change its underlying
implementation according to the way it is
used.
You will notice that this chapter does
not contain exhaustive documentation describing each of the member functions in
each STL container. Although I describe the member functions that I use,
I’ve left the full descriptions to others: there are at least two very
good on-line sources of STL documentation in HTML format that you can keep
resident on your computer and view with a Web browser whenever you need to look
something up. The first is the Dinkumware library (which covers the entire
Standard C and C++ library) mentioned at the beginning of this book section
(page XXX). The second is the freely-downloadable SGI STL and documentation,
freely downloadable at http://www.sgi.com/Technology/STL/. These should provide
complete references when you’re writing code. In addition, the STL books
listed in Appendix XX will provide you with other
resources.
The C++
STL[19]
is a powerful library intended to satisfy the vast bulk of your needs for
containers and algorithms, but in a completely portable fashion. This means that
not only are your programs easier to port to other platforms, but that your
knowledge itself does not depend on the libraries provided by a particular
compiler vendor (and the STL is likely to be more tested and scrutinized than a
particular vendor’s library). Thus, it will benefit you greatly to look
first to the STL for containers and algorithms, before looking at
vendor-specific solutions.
A fundamental principle of software
design is that all problems can be simplified by introducing an extra level
of indirection. This simplicity is achieved in the STL using
iterators to perform operations on a data structure while knowing as
little as possible about that structure, thus producing data structure
independence. With the STL, this means that any operation that can be performed
on an array of objects can also be performed on an STL container of objects and
vice versa. The STL containers work just as easily with built-in types as they
do with user-defined types. If you learn the library, it will work on
everything.
The drawback to this independence is that
you’ll have to take a little time at first getting used to the way things
are done in the STL. However, the STL uses a consistent pattern, so once you fit
your mind around it, it doesn’t change from one STL tool to
another.
Consider an example using the STL
set class. A set will allow
only one of each object value to be inserted into itself. Here is a simple
set created to work with ints by providing int as the
template argument to set:
//: C07:Intset.cpp // Simple use of STL set //{L} ../TestSuite/Test #include <set> #include <iostream> using namespace std; int main() { set<int> intset; for(int i = 0; i < 25; i++) for(int j = 0; j < 10; j++) // Try to insert multiple copies: intset.insert(j); // Print to output: copy(intset.begin(), intset.end(), ostream_iterator<int>(cout, "\n")); } ///:~
The insert( ) member does all
the work: it tries putting the new element in and rejects it if it’s
already there. Very often the activities involved in using a set are simply
insertion and a test to see whether it contains the element. You can also form a
union, intersection, or difference of sets, and test to see if one set is a
subset of another.
In this example, the values 0 - 9 are
inserted into the set 25 times, and the results are printed out to show that
only one of each of the values is actually retained in the set.
The copy( ) function is
actually the instantiation of an STL template function, of which there are many.
These template functions are generally referred to as “the STL
Algorithms” and will be the subject of the following chapter. However,
several of the algorithms are so useful that they will be introduced in this
chapter. Here, copy( ) shows the use of iterators. The set
member functions begin( ) and end( ) produce iterators
as their return values. These are used by copy( ) as beginning and
ending points for its operation, which is simply to move between the boundaries
established by the iterators and copy the elements to the third argument, which
is also an iterator, but in this case, a special type created for iostreams.
This places int objects on cout and separates them with a
newline.
Because of its genericity,
copy( ) is certainly not restricted to printing on a stream. It can
be used in virtually any situation: it needs only three iterators to talk to.
All of the algorithms follow the form of copy( ) and simply
manipulate iterators (the use of iterators is the “extra level of
indirection”).
Now consider taking the form of
Intset.cpp and reshaping it to display a list of the words used in a
document. The solution becomes remarkably simple.
//: C07:WordSet.cpp //{L} ../TestSuite/Test #include "../require.h" #include <string> #include <fstream> #include <iostream> #include <set> using namespace std; void wordSet(char* fileName) { ifstream source(fileName); assure(source, fileName); string word; set<string> words; while(source >> word) words.insert(word); copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n")); cout << "Number of unique words:" << words.size() << endl; } int main(int argc, char* argv[]) { if(argc > 1) wordSet(argv[1]); else wordSet("WordSet.cpp"); } ///:~
The only substantive difference here is
that string is used instead of int. The words are pulled from a
file, but everything else is the same as in Intset.cpp. The
operator>> returns a whitespace-separated group of characters each
time it is called, until there’s no more input from the file. So it
approximately breaks an input stream up into words. Each string is placed
in the set using insert( ), and the copy( )
function is used to display the results. Because of the way set is
implemented (as a tree), the words are automatically sorted.
Consider how much effort it would be to
accomplish the same task in C, or even in C++ without the
STL.
The primary idea in the STL is the
container (also known as a collection), which is just what it
sounds like: a place to hold things. You need containers because objects are
constantly marching in and out of your program and there must be someplace to
put them while they’re around. You can’t make named local objects
because in a typical program you don’t know how many, or what type, or the
lifetime of the objects you’re working with. So you need a container that
will expand whenever necessary to fill your needs.
All the containers in the STL hold
objects and expand themselves. In addition, they hold your objects in a
particular way. The difference between one container and another is the way the
objects are held and how the sequence is created. Let’s start by looking
at the simplest containers.
A vector is a linear sequence that
allows rapid random access to its elements. However, it’s expensive to
insert an element in the middle of the sequence, and is also expensive when it
allocates additional storage. A deque is also a linear sequence, and it
allows random access that’s nearly as fast as vector, but
it’s significantly faster when it needs to allocate new storage, and you
can easily add new elements at either end (vector only allows the
addition of elements at its tail). A list the third type of basic linear
sequence, but it’s expensive to move around randomly and cheap to insert
an element in the middle. Thus list, deque and vector are
very similar in their basic functionality (they all hold linear sequences), but
different in the cost of their activities. So for your first shot at a program,
you could choose any one, and only experiment with the others if you’re
tuning for efficiency.
Many of the problems you set out to solve
will only require a simple linear sequence like a vector, deque or
list. All three have a member function push_back( ) which you
use to insert a new element at the back of the sequence (deque and
list also have push_front( )).
But now how do you retrieve those
elements? With a vector or deque, it is possible to use the
indexing operator[ ], but that doesn’t work with list. Since
it would be nicest to learn a single interface, we’ll often use the one
defined for all STL containers: the iterator.
An iterator is a class that abstracts the
process of moving through a sequence. It allows you to select each element of a
sequence without knowing the underlying structure of that sequence. This
is a powerful feature, partly because it allows us to learn a single interface
that works with all containers, and partly because it allows containers to be
used interchangeably.
One more observation and you’re
ready for another example. Even though the STL containers hold objects by value
(that is, they hold the whole object inside themselves) that’s probably
not the way you’ll generally use them if you’re doing
object-oriented programming. That’s because in OOP, most of the time
you’ll create objects on the heap with new and then upcast
the address to the base-class type, later manipulating it as a pointer to the
base class. The beauty of this is that you don’t worry about the specific
type of object you’re dealing with, which greatly reduces the complexity
of your code and increases the maintainability of your program. This process of
upcasting is what you try to do in OOP with polymorphism, so you’ll
usually be using containers of pointers.
Consider the classic “shape”
example where shapes have a set of common operations, and you have different
types of shapes. Here’s what it looks like using the STL vector to
hold pointers to various types of Shape created on the
heap:
//: C07:Stlshape.cpp // Simple shapes w/ STL //{L} ../TestSuite/Test #include <vector> #include <iostream> using namespace std; class Shape { public: virtual void draw() = 0; virtual ~Shape() {}; }; class Circle : public Shape { public: void draw() { cout << "Circle::draw\n"; } ~Circle() { cout << "~Circle\n"; } }; class Triangle : public Shape { public: void draw() { cout << "Triangle::draw\n"; } ~Triangle() { cout << "~Triangle\n"; } }; class Square : public Shape { public: void draw() { cout << "Square::draw\n"; } ~Square() { cout << "~Square\n"; } }; typedef std::vector<Shape*> Container; typedef Container::iterator Iter; int main() { Container shapes; shapes.push_back(new Circle); shapes.push_back(new Square); shapes.push_back(new Triangle); for(Iter i = shapes.begin(); i != shapes.end(); i++) (*i)->draw(); // ... Sometime later: for(Iter j = shapes.begin(); j != shapes.end(); j++) delete *j; } ///:~
The creation of Shape,
Circle, Square and Triangle should be fairly familiar.
Shape is a pure abstract base class (because of the pure specifier
=0) that defines the interface for all types of shapes. The
derived classes redefine the virtual function draw( ) to
perform the appropriate operation. Now we’d like to create a bunch of
different types of Shape object, but where to put them? In an STL
container, of course. For convenience, this typedef:
typedef std::vector<Shape*> Container;
creates
an alias for a vector of Shape*, and this
typedef:
typedef Container::iterator Iter;
uses
that alias to create another one, for vector<Shape*>::iterator.
Notice that the container type name must be used to produce the
appropriate iterator, which is defined as a nested class. Although there are
different types of iterators (forward, bidirectional, reverse, etc., which will
be explained later) they all have the same basic interface: you can increment
them with ++, you can dereference them to produce the object
they’re currently selecting, and you can test them to see if they’re
at the end of the sequence. That’s what you’ll want to do 90% of the
time. And that’s what is done in the above example: after creating a
container, it’s filled with different types of Shape*. Notice that
the upcast happens as the Circle, Square or Rectangle
pointer is added to the shapes container, which doesn’t know about
those specific types but instead holds only Shape*. So as soon as the
pointer is added to the container it loses its specific identity and becomes an
anonymous Shape*. This is exactly what we want: toss them all in and let
polymorphism sort it out.
The first for loop creates an
iterator and sets it to the beginning of the sequence by calling the
begin( ) member function for the container. All containers have
begin( ) and end( ) member functions that produce an
iterator selecting, respectively, the beginning of the sequence and one past the
end of the sequence. To test to see if you’re done, you make sure
you’re != to the iterator produced by end( ). Not
< or <=. The only test that works is !=. So
it’s very common to write a loop like:
for(Iter i = shapes.begin(); i != shapes.end(); i++)
This says: “take me through every
element in the sequence.”
What do you do with the iterator to
produce the element it’s selecting? You dereference it using (what else)
the ‘*’ (which is actually an overloaded operator). What you
get back is whatever the container is holding. This container holds
Shape*, so that’s what *i produces. If you want to send a
message to the Shape, you must select that message with ->, so
you write the line:
(*i)->draw();
This calls the draw( )
function for the Shape* the iterator is currently selecting. The
parentheses are ugly but necessary to produce the proper order of evaluation. As
an alternative, operator-> is defined so that you can
say:
i->draw();
As they are destroyed or in other cases
where the pointers are removed, the STL containers do not call
delete for the pointers they contain. If you create an object on the heap
with new and place its pointer in a container, the container can’t
tell if that pointer is also placed inside another container. So the STL just
doesn’t do anything about it, and puts the responsibility squarely in your
lap. The last lines in the program move through and delete every object in the
container so proper cleanup occurs.
It’s very interesting to note that
you can change the type of container that this program uses with two lines.
Instead of including <vector>, you include <list>, and
in the first typedef you say:
typedef std::list<Shape*> Container;
instead of using a vector.
Everything else goes untouched. This is possible not because of an interface
enforced by inheritance (there isn’t any inheritance in the STL, which
comes as a surprise when you first see it), but because the interface is
enforced by a convention adopted by the designers of the STL, precisely so you
could perform this kind of interchange. Now you can easily switch between
vector and list and see which one works fastest for your
needs.
In the prior example, at the end of
main( ), it was necessary to move through the whole list and
delete all the Shape pointers.
for(Iter j = shapes.begin(); j != shapes.end(); j++) delete *j;
This highlights what could be seen as a
flaw in the STL: there’s no facility in any of the STL containers to
automatically delete the pointers they contain, so you must do it by
hand. It’s as if the assumption of the STL designers was that containers
of pointers weren’t an interesting problem, although I assert that it is
one of the more common things you’ll want to do.
Automatically deleting a pointer turns
out to be a rather aggressive thing to do because of the multiple
membership problem. If a container holds a pointer to an object, it’s
not unlikely that pointer could also be in another container. A pointer to an
Aluminum object in a list of Trash pointers could also reside in a
list of Aluminum pointers. If that happens, which list is responsible for
cleaning up that object – that is, which list “owns” the
object?
This question is virtually eliminated if
the object rather than a pointer resides in the list. Then it seems clear that
when the list is destroyed, the objects it contains must also be destroyed.
Here, the STL shines, as you can see when creating a container of string
objects. The following example stores each incoming line as a string in a
vector<string>:
//: C07:StringVector.cpp // A vector of strings //{L} ../TestSuite/Test #include "../require.h" #include <string> #include <vector> #include <fstream> #include <iostream> #include <iterator> #include <sstream> using namespace std; int main(int argc, char* argv[]) { char* fname = "StringVector.cpp"; if(argc > 1) fname = argv[1]; ifstream in(fname); assure(in, fname); vector<string> strings; string line; while(getline(in, line)) strings.push_back(line); // Do something to the strings... int i = 1; vector<string>::iterator w; for(w = strings.begin(); w != strings.end(); w++) { ostringstream ss; ss << i++; *w = ss.str() + ": " + *w; } // Now send them out: copy(strings.begin(), strings.end(), ostream_iterator<string>(cout, "\n")); // Since they aren't pointers, string // objects clean themselves up! } ///:~
Once the vector<string>
called strings is created, each line in the file is read into a
string and put in the vector:
while(getline(in, line)) strings.push_back(line);
The operation that’s being
performed on this file is to add line numbers. A stringstream provides
easy conversion from an int to a string of characters representing
that int.
Assembling string objects is quite
easy, since operator+ is overloaded. Sensibly enough, the iterator
w can be dereferenced to produce a string that can be used as both an
rvalue and an lvalue:
*w = ss.str() + ": " + *w;
The fact that you can assign back into
the container via the iterator may seem a bit surprising at first, but
it’s a tribute to the careful design of the STL.
Because the vector<string>
contains the objects themselves, a number of interesting things take place.
First, no cleanup is necessary. Even if you were to put addresses of the
string objects as pointers into other containers, it’s clear
that strings is the “master list” and maintains ownership of
the objects.
Second, you are effectively using dynamic
object creation, and yet you never use new or delete! That’s
because, somehow, it’s all taken care of for you by the vector
(this is non-trivial. You can try to figure it out by looking at the header
files for the STL – all the code is there – but it’s quite an
exercise). Thus your coding is significantly cleaned up.
The limitation of holding objects instead
of pointers inside containers is quite severe: you can’t upcast from
derived types, thus you can’t use polymorphism. The problem with upcasting
objects by value is that they get sliced and converted until their type is
completely changed into the base type, and there’s no remnant of the
derived type left. It’s pretty safe to say that you never want to
do this.
The power of instantly creating a
sequence of elements is amazing, and it makes you realize how much time
you’ve spent (or rather, wasted) in the past solving this particular
problem. For example, many utility programs involve reading a file into memory,
modifying the file and writing it back out to disk. One might as well take the
functionality in StringVector.cpp and package it into a class for later
reuse.
Now the question is: do you create a
member object of type vector, or do you inherit? A general guideline is
to always prefer composition (member objects) over inheritance, but with the STL
this is often not true, because there are so many existing algorithms that work
with the STL types that you may want your new type to be an STL type. So
the list of strings should also be a vector, thus
inheritance is desired.
//: C07:FileEditor.h // File editor tool #ifndef FILEEDITOR_H #define FILEEDITOR_H #include <string> #include <vector> #include <iostream> class FileEditor : public std::vector<std::string> { public: void open(char* filename); FileEditor(char* filename) { open(filename); } FileEditor() {}; void write(std::ostream& out = std::cout); }; #endif // FILEEDITOR_H ///:~
Note the careful avoidance of a global
using namespace std statement here, to prevent the opening of the
std namespace to every file that includes this header.
The constructor opens the file and reads
it into the FileEditor, and write( ) puts the vector
of string onto any ostream. Notice in write( ) that
you can have a default argument for a reference.
The implementation is quite
simple:
//: C07:FileEditor.cpp {O} #include "FileEditor.h" #include "../require.h" #include <fstream> using namespace std; void FileEditor::open(char* filename) { ifstream in(filename); assure(in, filename); string line; while(getline(in, line)) push_back(line); } // Could also use copy() here: void FileEditor::write(ostream& out) { for(iterator w = begin(); w != end(); w++) out << *w << endl; } ///:~
The functions from
StringVector.cpp are simply repackaged. Often this is the way classes
evolve – you start by creating a program to solve a particular
application, then discover some commonly-used functionality within the program
that can be turned into a class.
The line numbering program can now be
rewritten using FileEditor:
//: C07:FEditTest.cpp //{L} FileEditor ../TestSuite/Test // Test the FileEditor tool #include "FileEditor.h" #include "../require.h" #include <sstream> using namespace std; int main(int argc, char* argv[]) { FileEditor file; if(argc > 1) { file.open(argv[1]); } else { file.open("FEditTest.cpp"); } // Do something to the lines... int i = 1; FileEditor::iterator w = file.begin(); while(w != file.end()) { ostringstream ss; ss << i++; *w = ss.str() + ": " + *w; w++; } // Now send them to cout: file.write(); } ///:~
Now the operation of reading
the file is in the constructor:
FileEditor file(argv[1]);
(or in the open( ) method) and
writing happens in the single line (which defaults to sending the output to
cout):
file.write();
As mentioned earlier, the iterator is the
abstraction that allows a piece of code to be generic, and to work with
different types of containers without knowing the underlying structure of those
containers. Every container produces iterators. You must always be able
to say:
ContainerType::iterator ContainerType::const_iterator
to produce the types of the iterators
produced by that container. Every container has a begin( ) method
that produces an iterator indicating the beginning of the elements in the
container, and an end( ) method that produces an iterator which is
the as the past-the-end value of the container. If the container is
const¸ begin( ) and end( ) produce
const iterators.
Every iterator can be moved forward to
the next element using the operator++ (an iterator may be able to do more
than this, as you shall see, but it must at least support forward movement with
operator++).
The basic iterator is only guaranteed to
be able to perform == and != comparisons. Thus, to move an
iterator it forward without running it off the end you say something
like:
while(it != pastEnd) { // Do something it++; }
Where pastEnd is the past-the-end
value produced by the container’s end( ) member
function.
An iterator can be used to produce the
element that it is currently selecting within a container by dereferencing the
iterator. This can take two forms. If it is an iterator and f( )
is a member function of the objects held in the container that the iterator
is pointing within, then you can say either:
(*it).f();
or
it->f();
Knowing this, you can create a template
that works with any container. Here, the apply( ) function template
calls a member function for every object in the container, using a pointer to
member that is passed as an argument:
//: C07:Apply.cpp // Using basic iterators //{L} ../TestSuite/Test //{-g++3} #include <iostream> #include <vector> #include <iterator> using namespace std; template<class Cont, class PtrMemFun> void apply(Cont& c, PtrMemFun f) { typename Cont::iterator it = c.begin(); while(it != c.end()) { (it->*f)(); // Compact form ((*it).*f)(); // Alternate form it++; } } class Z { int i; public: Z(int ii) : i(ii) {} void g() { i++; } friend ostream& operator<<(ostream& os, const Z& z) { return os << z.i; } }; int main() { ostream_iterator<Z> out(cout, " "); vector<Z> vz; for(int i = 0; i < 10; i++) vz.push_back(Z(i)); copy(vz.begin(), vz.end(), out); cout << endl; apply(vz, &Z::g); copy(vz.begin(), vz.end(), out); } ///:~
Because operator-> is defined
for STL iterators, it can be used for pointer-to-member dereferencing (in the
following chapter you’ll learn a more elegant way to handle the problem of
applying a member function or ordinary function to every object in a
container).
Much of the time, this is all you need to
know about iterators – that they are produced by begin( ) and
end( ), and that you can use them to move through a container and
select elements. Many of the problems that you solve, and the STL algorithms
(covered in the next chapter) will allow you to just flail away with the basics
of iterators. However, things can at times become more subtle, and in those
cases you need to know more about iterators. The rest of this section gives you
the details.
All containers must produce the basic
iterator. A container may also be reversible, which means that it
can produce iterators that move backwards from the end, as well as the iterators
that move forward from the beginning.
A reversible container has the methods
rbegin( ) (to produce a reverse_iterator selecting the end)
and rend( ) (to produce a reverse_iterator indicating
“one past the beginning”). If the container is const then
rbegin( ) and rend( ) will produce
const_reverse_iterators.
All the basic sequence containers
vector, deque and list are reversible containers. The
following example uses vector, but will work with deque and
list as well:
//: C07:Reversible.cpp // Using reversible containers //{L} ../TestSuite/Test #include "../require.h" #include <vector> #include <iostream> #include <fstream> #include <string> using namespace std; int main() { ifstream in("Reversible.cpp"); assure(in, "Reversible.cpp"); string line; vector<string> lines; while(getline(in, line)) lines.push_back(line); vector<string>::reverse_iterator r; for(r = lines.rbegin(); r != lines.rend(); r++) cout << *r << endl; } ///:~
You move backward through the container
using the same syntax as moving forward through a container with an ordinary
iterator.
The associative containers set,
multiset, map and multimap are also reversible. Using
iterators with associative containers is a bit different, however, and will be
delayed until those containers are more fully
introduced.
The iterators are classified into
different “categories” which describe what they are capable of
doing. The order in which they are generally described moves from the categories
with the most restricted behavior to those with the most powerful
behavior.
The only predefined implementations of
input iterators are istream_iterator and istreambuf_iterator, to
read from an istream. As you can imagine, an input iterator can only be
dereferenced once for each element that’s selected, just as you can only
read a particular portion of an input stream once. They can only move forward.
There is a special constructor to define the past-the-end value. In summary, you
can dereference it for reading (once only for each value), and move it
forward.
This is the complement of an input
iterator, but for writing rather than reading. The only predefined
implementations of output iterators are ostream_iterator and
ostreambuf_iterator, to write to an ostream, and the
less-commonly-used raw_storage_iterator. Again, these can only be
dereferenced once for each written value, and they can only move forward. There
is no concept of a terminal past-the-end value for an output iterator.
Summarizing, you can dereference it for writing (once only for each value) and
move it forward.
The forward iterator contains all the
functionality of both the input iterator and the output iterator, plus you can
dereference an iterator location multiple times, so you can read and write to a
value multiple times. As the name implies, you can only move forward. There are
no predefined iterators that are only forward iterators.
The bidirectional iterator has all the
functionality of the forward iterator, and in addition it can be moved backwards
one location at a time using operator--.
Finally, the random-access iterator has
all the functionality of the bidirectional iterator plus all the functionality
of a pointer (a pointer is a random-access iterator). Basically, anything
you can do with a pointer you can do with a random-access iterator, including
indexing with operator[ ], adding integral values to a pointer to move it
forward or backward by a number of locations, and comparing one iterator to
another with <, >=, etc.
Why do you care about this
categorization? When you’re just using containers in a straightforward way
(for example, just hand-coding all the operations you want to perform on the
objects in the container) it usually doesn’t impact you too much. Things
either work or they don’t. The iterator categories become important
when:
The STL has a predefined set of iterator
classes that can be quite handy. For example, you’ve already seen
reverse_iterator (produced by calling rbegin( ) and
rend( ) for all the basic containers).
The insertion iterators are
necessary because some of the STL algorithms – copy( ) for
example – use the assignment operator= in order to place objects in
the destination container. This is a problem when you’re using the
algorithm to fill the container rather than to overwrite items that are
already in the destination container. That is, when the space isn’t
already there. What the insert iterators do is change the implementation of the
operator= so that instead of doing an assignment, it calls a
“push” or “insert” function for that container, thus
causing it to allocate new space. The constructors for both
back_insert_iterator and front_insert_iterator take a basic
sequence container object (vector, deque or list) as their
argument and produce an iterator that calls push_back( ) or
push_front( ), respectively, to perform assignment. The shorthand
functions back_inserter( ) and front_inserter( ) produce
the same objects with a little less typing. Since all the basic sequence
containers support push_back( ), you will probably find yourself
using back_inserter( ) with some regularity.
The insert_iterator allows you to
insert elements in the middle of the sequence, again replacing the meaning of
operator=, but this time with insert( ) instead of one of the
“push” functions. The insert( ) member function requires
an iterator indicating the place to insert before, so the insert_iterator
requires this iterator in addition to the container object. The shorthand
function inserter( ) produces the same object.
The following example shows the use of
the different types of inserters:
//: C07:Inserters.cpp // Different types of iterator inserters //{L} ../TestSuite/Test #include <iostream> #include <vector> #include <deque> #include <list> #include <iterator> using namespace std; int a[] = { 1, 3, 5, 7, 11, 13, 17, 19, 23 }; template<class Cont> void frontInsertion(Cont& ci) { copy(a, a + sizeof(a)/sizeof(int), front_inserter(ci)); copy(ci.begin(), ci.end(), ostream_iterator<int>(cout, " ")); cout << endl; } template<class Cont> void backInsertion(Cont& ci) { copy(a, a + sizeof(a)/sizeof(int), back_inserter(ci)); copy(ci.begin(), ci.end(), ostream_iterator<int>(cout, " ")); cout << endl; } template<class Cont> void midInsertion(Cont& ci) { typename Cont::iterator it = ci.begin(); it++; it++; it++; copy(a, a + sizeof(a)/(sizeof(int) * 2), inserter(ci, it)); copy(ci.begin(), ci.end(), ostream_iterator<int>(cout, " ")); cout << endl; } int main() { deque<int> di; list<int> li; vector<int> vi; // Can't use a front_inserter() with vector frontInsertion(di); frontInsertion(li); di.clear(); li.clear(); backInsertion(vi); backInsertion(di); backInsertion(li); midInsertion(vi); midInsertion(di); midInsertion(li); } ///:~
Since vector does not support
push_front( ), it cannot produce a front_insertion_iterator.
However, you can see that vector does support the other two types of
insertion (even though, as you shall see later, insert( ) is not a
very efficient operation for vector).
You’ve already seen some use of the
ostream_iterator (an output iterator) in conjunction with
copy( ) to place the contents of a container on an output stream.
There is a corresponding istream_iterator (an input iterator) which
allows you to “iterate” a set of objects of a specified type from an
input stream. An important difference between ostream_iterator and
istream_iterator comes from the fact that an output stream doesn’t
have any concept of an “end,” since you can always just keep writing
more elements. However, an input stream eventually terminates (for example, when
you reach the end of a file) so there needs to be a way to represent that. An
istream_iterator has two constructors, one that takes an istream
and produces the iterator you actually read from, and the other which is the
default constructor and produces an object which is the past-the-end sentinel.
In the following program this object is named end:
//: C07:StreamIt.cpp // Iterators for istreams and ostreams //{L} ../TestSuite/Test //{-msc} #include "../require.h" #include <iostream> #include <fstream> #include <vector> #include <string> using namespace std; int main() { ifstream in("StreamIt.cpp"); assure(in, "StreamIt.cpp"); istream_iterator<string> init(in), end; ostream_iterator<string> out(cout, "\n"); vector<string> vs; copy(init, end, back_inserter(vs)); copy(vs.begin(), vs.end(), out); *out++ = vs[0]; *out++ = "That's all, folks!"; } ///:~
When in runs out of input (in this
case when the end of the file is reached) then init becomes equivalent to
end and the copy( ) terminates.
Because out is an
ostream_iterator<string>, you can simply assign any string
object to the dereferenced iterator using operator= and that
string will be placed on the output stream, as seen in the two
assignments to out. Because out is defined with a newline as its
second argument, these assignments also cause a newline to be inserted along
with each assignment.
While it is possible to create an
istream_iterator<char> and ostream_iterator<char>,
these actually parse the input and thus will for example automatically
eat whitespace (spaces, tabs and newlines), which is not desirable if you want
to manipulate an exact representation of an istream. Instead, you can use
the special iterators istreambuf_iterator and ostreambuf_iterator,
which are designed strictly to move
characters[20].
Although these are templates, the only template arguments they will accept are
either char or wchar_t (for wide characters). The following
example allows you to compare the behavior of the stream iterators vs. the
streambuf iterators:
//: C07:StreambufIterator.cpp // istreambuf_iterator & ostreambuf_iterator //{L} ../TestSuite/Test //{-g++295} #include "../require.h" #include <iostream> #include <fstream> #include <iterator> #include <algorithm> using namespace std; int main() { ifstream in("StreambufIterator.cpp"); assure(in, "StreambufIterator.cpp"); // Exact representation of stream: istreambuf_iterator<char> isb(in), end; ostreambuf_iterator<char> osb(cout); while(isb != end) *osb++ = *isb++; // Copy 'in' to cout cout << endl; ifstream in2("StreambufIterator.cpp"); // Strips white space: istream_iterator<char> is(in2), end2; ostream_iterator<char> os(cout); while(is != end2) *os++ = *is++; cout << endl; } ///:~
The stream iterators use the parsing
defined by istream::operator>>, which is probably
not
what you want if you are parsing characters
directly – it’s fairly rare that you would want all the whitespace
stripped out of your character stream. You’ll virtually always want to use
a streambuf iterator when using characters and streams, rather than a stream
iterator. In addition, istream::operator>> adds significant
overhead for each operation, so it is only appropriate for higher-level
operations such as parsing floating-point
numbers.[21]
This is a little more esoteric and is
generally used in the implementation of other Standard Library functions, but it
is nonetheless interesting. The raw_storage_iterator is defined in
<algorithm> and is an output iterator. It is provided to enable
algorithms to store their results into uninitialized memory. The interface is
quite simple: the constructor takes an output iterator that is pointing to the
raw memory (thus it is typically a pointer) and the operator= assigns an
object into that raw memory. The template parameters are the type of the output
iterator pointing to the raw storage, and the type of object that will be
stored. Here’s an example which creates Noisy objects (you’ll
be introduced to the Noisy class shortly; it’s not necessary to
know its details for this example):
//: C07:RawStorageIterator.cpp // Demonstrate the raw_storage_iterator //{L} ../TestSuite/Test //{-g++295} #include "Noisy.h" #include <iostream> #include <iterator> #include <algorithm> using namespace std; int main() { const int quantity = 10; // Create raw storage and cast to desired type: Noisy* np = (Noisy*)new char[quantity * sizeof(Noisy)]; raw_storage_iterator<Noisy*, Noisy> rsi(np); for(int i = 0; i < quantity; i++) *rsi++ = Noisy(); // Place objects in storage cout << endl; copy(np, np + quantity, ostream_iterator<Noisy>(cout, " ")); cout << endl; // Explicit destructor call for cleanup: for(int j = 0; j < quantity; j++) (&np[j])->~Noisy(); // Release raw storage: delete (char*)np; } ///:~
To make the raw_storage_iterator
template happy, the raw storage must be of the same type as the objects
you’re creating. That’s why the pointer from the new array of
char is cast to a Noisy*. The assignment operator forces the
objects into the raw storage using the copy-constructor. Note that the explicit
destructor call must be made for proper cleanup, and this also allows the
objects to be deleted one at a time during container
manipulation.
If you take a step back from the STL
containers you’ll see that there are really only two types of container:
sequences (including vector, list, deque,
stack, queue, and priority_queue) and
associations (including set, multiset, map and
multimap). The sequences keep the objects in whatever sequence that you
establish (either by pushing the objects on the end or inserting them in the
middle).
Since all the sequence containers have
the same basic goal (to maintain your order) they seem relatively
interchangeable. However, they differ in the efficiency of their operations, so
if you are going to manipulate a sequence in a particular fashion you can choose
the appropriate container for those types of manipulations. The
“basic” sequence containers are vector, list and
deque – these actually have fleshed-out implementations, while
stack, queue and priority_queue are built on top of the
basic sequences, and represent more specialized uses rather than differences in
underlying structure (stack, for example, can be implemented using a
deque, vector or list).
So far in this book I have been using
vector as a catch-all container. This was acceptable because I’ve
only used the simplest and safest operations, primarily push_back( )
and operator[ ]. However, when you start making more sophisticated uses
of containers it becomes important to know more about their underlying
implementations and behavior, so you can make the right choices (and, as
you’ll see, stay out of
trouble).
Using a template, the following example
shows the operations that all the basic sequences (vector, deque
or list) support. As you shall learn in the sections on the specific
sequence containers, not all of these operations make sense for each basic
sequence, but they are supported.
//: C07:BasicSequenceOperations.cpp // The operations available for all the // basic sequence Containers. //{L} ../TestSuite/Test //{-msc} #include <iostream> #include <vector> #include <deque> #include <list> using namespace std; template<typename Container> void print(Container& c, char* s = "") { cout << s << ":" << endl; if(c.empty()) { cout << "(empty)" << endl; return; } typename Container::iterator it; for(it = c.begin(); it != c.end(); it++) cout << *it << " "; cout << endl; cout << "size() " << c.size() << " max_size() "<< c.max_size() << " front() " << c.front() << " back() " << c.back() << endl; } template<typename ContainerOfInt> void basicOps(char* s) { cout << "------- " << s << " -------" << endl; typedef ContainerOfInt Ci; Ci c; print(c, "c after default constructor"); Ci c2(10, 1); // 10 elements, values all 1 print(c2, "c2 after constructor(10,1)"); int ia[] = { 1, 3, 5, 7, 9 }; const int iasz = sizeof(ia)/sizeof(*ia); // Initialize with begin & end iterators: Ci c3(ia, ia + iasz); print(c3, "c3 after constructor(iter,iter)"); Ci c4(c2); // Copy-constructor print(c4, "c4 after copy-constructor(c2)"); c = c2; // Assignment operator print(c, "c after operator=c2"); c.assign(10, 2); // 10 elements, values all 2 print(c, "c after assign(10, 2)"); // Assign with begin & end iterators: c.assign(ia, ia + iasz); print(c, "c after assign(iter, iter)"); cout << "c using reverse iterators:" << endl; typename Ci::reverse_iterator rit = c.rbegin(); while(rit != c.rend()) cout << *rit++ << " "; cout << endl; c.resize(4); print(c, "c after resize(4)"); c.push_back(47); print(c, "c after push_back(47)"); c.pop_back(); print(c, "c after pop_back()"); typename Ci::iterator it = c.begin(); it++; it++; c.insert(it, 74); print(c, "c after insert(it, 74)"); it = c.begin(); it++; c.insert(it, 3, 96); print(c, "c after insert(it, 3, 96)"); it = c.begin(); it++; c.insert(it, c3.begin(), c3.end()); print(c, "c after insert(" "it, c3.begin(), c3.end())"); it = c.begin(); it++; c.erase(it); print(c, "c after erase(it)"); typename Ci::iterator it2 = it = c.begin(); it++; it2++; it2++; it2++; it2++; it2++; c.erase(it, it2); print(c, "c after erase(it, it2)"); c.swap(c2); print(c, "c after swap(c2)"); c.clear(); print(c, "c after clear()"); } int main() { basicOps<vector<int> >("vector"); basicOps<deque<int> >("deque"); basicOps<list<int> >("list"); } ///:~
The first function template,
print( ), demonstrates the basic information you can get from any
sequence container: whether it’s empty, its current size, the size of the
largest possible container, the element at the beginning and the element at the
end. You can also see that every container has begin( ) and
end( ) methods that return iterators.
The basicOps( ) function
tests everything else (and in turn calls print( )), including a
variety of constructors: default, copy-constructor, quantity and initial value,
and beginning and ending iterators. There’s an assignment operator=
and two kinds of assign( ) member functions, one which takes a
quantity and initial value and the other which take a beginning and ending
iterator.
All the basic sequence containers are
reversible containers, as shown by the use of the rbegin( ) and
rend( ) member functions. A sequence container can be resized, and
the entire contents of the container can be removed with
clear( ).
Using an iterator to indicate where you
want to start inserting into any sequence container, you can
insert( ) a single element, a number of elements that all have the
same value, and a group of elements from another container using the beginning
and ending iterators of that group.
To erase( ) a single element
from the middle, use an iterator; to erase( ) a range of elements,
use a pair of iterators. Notice that since a list only supports
bidirectional iterators, all the iterator motion must be performed with
increments and decrements (if the containers were limited to vector and
deque, which produce random-access iterators, then operator+ and
operator- could have been used to move the iterators in big
jumps).
Although both list and
deque support push_front( ) and pop_front( ),
vector does not, so the only member functions that work with all three
are push_back( ) and pop_back( ).
The naming of the member function
swap( ) is a little confusing, since there’s also a non-member
swap( ) algorithm that switches two elements of a container. The
member swap( ), however, swaps everything in one container
for another (if the containers hold the same type), effectively swapping the
containers themselves. There’s also a non-member version of this
function.
The following sections on the sequence
containers discuss the particulars of each type of
container.
The vector is intentionally made
to look like a souped-up array, since it has array-style indexing but also can
expand dynamically. vector is so fundamentally useful that it was
introduced in a very primitive way early in this book, and used quite regularly
in previous examples. This section will give a more in-depth look at
vector.
To achieve maximally-fast indexing and
iteration, the vector maintains its storage as a single contiguous array
of objects. This is a critical point to observe in understanding the behavior of
vector. It means that indexing and iteration are lighting-fast, being
basically the same as indexing and iterating over an array of objects. But it
also means that inserting an object anywhere but at the end (that is, appending)
is not really an acceptable operation for a vector. It also means that
when a vector runs out of pre-allocated storage, in order to maintain its
contiguous array it must allocate a whole new (larger) chunk of storage
elsewhere and copy the objects to the new storage. This has a number of
unpleasant side effects.
A vector starts by grabbing a
block of storage, as if it’s taking a guess at how many objects you plan
to put in it. As long as you don’t try to put in more objects than can be
held in the initial block of storage, everything is very rapid and efficient
(note that if you do know how many objects to expect, you can
pre-allocate storage using reserve( )). But eventually you will put
in one too many objects and, unbeknownst to you, the vector responds
by:
For complex
objects, this copy-construction and destruction can end up being very expensive
if you overfill your vector a lot. To see what happens when you’re filling
a vector, here is a class that prints out information about its
creations, destructions, assignments and copy-constructions:
//: C07:Noisy.h // A class to track various object activities #ifndef NOISY_H #define NOISY_H #include <iostream> class Noisy { static long create, assign, copycons, destroy; long id; public: Noisy() : id(create++) { std::cout << "d[" << id << "]"; } Noisy(const Noisy& rv) : id(rv.id) { std::cout << "c[" << id << "]"; copycons++; } Noisy& operator=(const Noisy& rv) { std::cout << "(" << id << ")=[" << rv.id << "]"; id = rv.id; assign++; return *this; } friend bool operator<(const Noisy& lv, const Noisy& rv) { return lv.id < rv.id; } friend bool operator==(const Noisy& lv, const Noisy& rv) { return lv.id == rv.id; } ~Noisy() { std::cout << "~[" << id << "]"; destroy++; } friend std::ostream& operator<<(std::ostream& os, const Noisy& n) { return os << n.id; } friend class NoisyReport; }; struct NoisyGen { Noisy operator()() { return Noisy(); } }; // A singleton. Will automatically report the // statistics as the program terminates: class NoisyReport { static NoisyReport nr; NoisyReport() {} // Private constructor public: ~NoisyReport() { std::cout << "\n-------------------\n" << "Noisy creations: " << Noisy::create << "\nCopy-Constructions: " << Noisy::copycons << "\nAssignments: " << Noisy::assign << "\nDestructions: " << Noisy::destroy << std::endl; } }; // Because of these this file can only be used // in simple test situations. Move them to a // .cpp file for more complex programs: long Noisy::create = 0, Noisy::assign = 0, Noisy::copycons = 0, Noisy::destroy = 0; NoisyReport NoisyReport::nr; #endif // NOISY_H ///:~
Each Noisy object has its own
identifier, and there are static variables to keep track of all the
creations, assignments (using operator=), copy-constructions and
destructions. The id is initialized using the create counter
inside the default constructor; the copy-constructor and assignment operator
take their id values from the rvalue. Of course, with operator=
the lvalue is already an initialized object so the old value of id is
printed before it is overwritten with the id from the
rvalue.
In order to support certain operations
like sorting and searching (which are used implicitly by some of the
containers), Noisy must have an operator< and
operator==. These simply compare the id values. The
operator<< for ostream follows the standard form and simply
prints the id.
NoisyGen produces a function
object (since it has an operator( )) that is used to automatically
generate Noisy objects during testing.
NoisyReport is a type of class
called a singleton, which is a “design pattern” (these are
covered more fully in Chapter XX). Here, the goal is to make sure there is one
and only one NoisyReport object, because it is responsible for printing
out the results at program termination. It has a private constructor so
no one else can make a NoisyReport object, and a single static instance
of NoisyReport called nr. The only executable statements are in
the destructor, which is called as the program exits and the static destructors
are called; this destructor prints out the statistics captured by the
static variables in Noisy.
The one snag to this header file is the
inclusion of the definitions for the statics at the end. If you include
this header in more than one place in your project, you’ll get
multiple-definition errors at link time. Of course, you can put the
static definitions in a separate cpp file and link it in, but that
is less convenient, and since Noisy is just intended for quick-and-dirty
experiments the header file should be reasonable for most
situations.
Using Noisy.h, the following
program will show the behaviors that occur when a vector overflows its
currently allocated storage:
//: C07:VectorOverflow.cpp // Shows the copy-construction and destruction // That occurs when a vector must reallocate // (It maintains a linear array of elements) //{L} ../TestSuite/Test #include "Noisy.h" #include <vector> #include <iostream> #include <string> #include <cstdlib> using namespace std; int main(int argc, char* argv[]) { int size = 1000; if(argc >= 2) size = atoi(argv[1]); vector<Noisy> vn; Noisy n; for(int i = 0; i < size; i++) vn.push_back(n); cout << "\n cleaning up \n"; } ///:~
You can either use the default value of
1000, or use your own value by putting it on the command-line.
When you run this program, you’ll
see a single default constructor call (for n), then a lot of
copy-constructor calls, then some destructor calls, then some more
copy-constructor calls, and so on. When the vector runs out of space in the
linear array of bytes it has allocated, it must (to maintain all the objects in
a linear array, which is an essential part of its job) get a bigger piece of
storage and move everything over, copying first and then destroying the old
objects. You can imagine that if you store a lot of large and complex objects,
this process could rapidly become prohibitive.
There are two solutions to this problem.
The nicest one requires that you know beforehand how many objects you’re
going to make. In that case you can use reserve( ) to tell the
vector how much storage to pre-allocate, thus eliminating all the copies and
destructions and making everything very fast (especially random access to the
objects with operator[ ]). Note that the use of reserve( ) is
different from using the vector constructor with an integral first
argument; the latter initializes each element using the default
copy-constructor.
However, in the more general case you
won’t know how many objects you’ll need. If vector
reallocations are slowing things down, you can change sequence containers. You
could use a list, but as you’ll see, the deque allows
speedy insertions at either end of the sequence, and never needs to copy or
destroy objects as it expands its storage. The deque also allows random
access with operator[ ], but it’s not quite as fast as
vector’s operator[ ]. So in the case where
you’re creating all your objects in one part of the program and randomly
accessing them in another, you may find yourself filling a deque, then
creating a vector from the deque and using the vector for
rapid indexing. Of course, you don’t want to program this way habitually,
just be aware of these issues (avoid premature optimization).
There is a darker side to
vector’s reallocation of memory, however. Because vector
keeps its objects in a nice, neat array (allowing, for one thing, maximally-fast
random access), the iterators used by vector are generally just pointers.
This is a good thing – of all the sequence containers, these pointers
allow the fastest selection and manipulation. However, consider what happens
when you’re holding onto an iterator (i.e. a pointer) and then you add the
one additional object that causes the vector to reallocate storage and
move it elsewhere. Your pointer is now pointing off into
nowhere:
//: C07:VectorCoreDump.cpp // How to break a program using a vector //{-msc} //{-bor} //{-g++3} #include <vector> #include <iostream> using namespace std; int main() { vector<int> vi(10, 0); ostream_iterator<int> out(cout, " "); copy(vi.begin(), vi.end(), out); vector<int>::iterator i = vi.begin(); cout << "\n i: " << long(i) << endl; *i = 47; copy(vi.begin(), vi.end(), out); // Force it to move memory (could also just add // enough objects): vi.resize(vi.capacity() + 1); // Now i points to wrong memory: cout << "\n i: " << long(i) << endl; cout << "vi.begin(): " << long(vi.begin()); *i = 48; // Access violation } ///:~
If your program is breaking mysteriously,
look for places where you hold onto an iterator while adding more objects to a
vector. You’ll need to get a new iterator after adding elements, or
use operator[ ] instead for element selections. If you combine the above
observation with the awareness of the potential expense of adding new objects to
a vector, you may conclude that the safest way to use one is to fill it
up all at once (ideally, knowing first how many objects you’ll need) and
then just use it (without adding more objects) elsewhere in the program. This is
the way vector has been used in the book up to this
point.
You may observe that using vector
as the “basic” container in the earlier chapters of this book
may not be the best choice in all cases. This is a fundamental issue in
containers, and in data structures in general: the “best” choice
varies according to the way the container is used. The reason vector has
been the “best” choice up until now is that it looks a lot like an
array, and was thus familiar and easy for you to adopt. But from now on
it’s also worth thinking about other issues when choosing
containers.
The vector is most efficient
if:
It is possible to insert
and erase elements from the middle of a vector using an iterator, but the
following program demonstrates what a bad idea it is:
//: C07:VectorInsertAndErase.cpp // Erasing an element from a vector //{L} ../TestSuite/Test #include "Noisy.h" #include <iostream> #include <vector> #include <algorithm> using namespace std; int main() { vector<Noisy> v; v.reserve(11); cout << "11 spaces have been reserved" << endl; generate_n(back_inserter(v), 10, NoisyGen()); ostream_iterator<Noisy> out(cout, " "); cout << endl; copy(v.begin(), v.end(), out); cout << "Inserting an element:" << endl; vector<Noisy>::iterator it = v.begin() + v.size() / 2; // Middle v.insert(it, Noisy()); cout << endl; copy(v.begin(), v.end(), out); cout << "\nErasing an element:" << endl; // Cannot use the previous value of it: it = v.begin() + v.size() / 2; v.erase(it); cout << endl; copy(v.begin(), v.end(), out); cout << endl; } ///:~
When you run the program you’ll see
that the call to reserve( ) really does only allocate storage
– no constructors are called. The generate_n( ) call is pretty
busy: each call to NoisyGen::operator( ) results in a construction,
a copy-construction (into the vector) and a destruction of the temporary.
But when an object is inserted into the vector in the middle, it must
shove everything down to maintain the linear array and – since there is
enough space – it does this with the assignment operator (if the argument
of reserve( ) is 10 instead of eleven then it would have to
reallocate storage). When an object is erased from the vector, the
assignment operator is once again used to move everything up to cover the place
that is being erased (notice that this requires that the assignment operator
properly cleans up the lvalue). Lastly, the object on the end of the array is
deleted.
You can imagine how enormous the overhead
can become if objects are inserted and removed from the middle of a
vector if the number of elements is large and the objects are
complicated. It’s obviously a practice to
avoid.
The deque (double-ended-queue,
pronounced “deck”) is the basic sequence container optimized for
adding and removing elements from either end. It also allows for reasonably fast
random access – it has an operator[ ] like vector. However,
it does not have vector’s constraint of keeping everything in a
single sequential block of memory. Instead, deque uses multiple blocks of
sequential storage (keeping track of all the blocks and their order in a mapping
structure). For this reason the overhead for a deque to add or remove
elements at either end is very low. In addition, it never needs to copy and
destroy contained objects during a new storage allocation (like vector
does) so it is far more efficient than vector if you are adding an
unknown quantity of objects. This means that vector is the best choice
only if you have a pretty good idea of how many objects you need. In addition,
many of the programs shown earlier in this book that use vector and
push_back( ) might be more efficient with a deque. The
interface to deque is only slightly different from a vector (deque
has a push_front( ) and pop_front( ) while vector
does not, for example) so converting code from using vector to using
deque is almost trivial. Consider StringVector.cpp, which can be
changed to use deque by replacing the word “vector” with
“deque” everywhere. The following program adds parallel deque
operations to the vector operations in StringVector.cpp, and
performs timing comparisons:
//: C07:StringDeque.cpp // Converted from StringVector.cpp //{L} ../TestSuite/Test #include "../require.h" #include <string> #include <deque> #include <vector> #include <fstream> #include <iostream> #include <iterator> #include <sstream> #include <ctime> using namespace std; int main(int argc, char* argv[]) { char* fname = "StringDeque.cpp"; if(argc > 1) fname = argv[1]; ifstream in(fname); assure(in, fname); vector<string> vstrings; deque<string> dstrings; string line; // Time reading into vector: clock_t ticks = clock(); while(getline(in, line)) vstrings.push_back(line); ticks = clock() - ticks; cout << "Read into vector: " << ticks << endl; // Repeat for deque: ifstream in2(fname); assure(in2, fname); ticks = clock(); while(getline(in2, line)) dstrings.push_back(line); ticks = clock() - ticks; cout << "Read into deque: " << ticks << endl; // Now compare indexing: ticks = clock(); for(int i = 0; i < vstrings.size(); i++) { ostringstream ss; ss << i; vstrings[i] = ss.str() + ": " + vstrings[i]; } ticks = clock() - ticks; cout << "Indexing vector: " << ticks << endl; ticks = clock(); for(int j = 0; j < dstrings.size(); j++) { ostringstream ss; ss << j; dstrings[j] = ss.str() + ": " + dstrings[j]; } ticks = clock() - ticks; cout << "Indexing deqeue: " << ticks << endl; // Compare iteration ofstream tmp1("tmp1.tmp"), tmp2("tmp2.tmp"); ticks = clock(); copy(vstrings.begin(), vstrings.end(), ostream_iterator<string>(tmp1, "\n")); ticks = clock() - ticks; cout << "Iterating vector: " << ticks << endl; ticks = clock(); copy(dstrings.begin(), dstrings.end(), ostream_iterator<string>(tmp2, "\n")); ticks = clock() - ticks; cout << "Iterating deqeue: " << ticks << endl; } ///:~
Knowing now what you do about the
inefficiency of adding things to vector because of storage reallocation,
you may expect dramatic differences between the two. However, on a 1.7 Megabyte
text file one compiler’s program produced the following (measured in
platform/compiler specific clock ticks, not seconds):
Read into vector: 8350 Read into deque: 7690 Indexing vector: 2360 Indexing deqeue: 2480 Iterating vector: 2470 Iterating deqeue: 2410
A different compiler and platform roughly
agreed with this. It’s not so dramatic, is it? This points out some
important issues:
Of course, this
doesn’t mean you shouldn’t use a deque rather than a
vector when you know that an uncertain number of objects will be pushed
onto the end of the container. On the contrary, you should – when
you’re tuning for performance. But you should also be aware that
performance issues are usually not where you think they are, and the only way to
know for sure where your bottlenecks are is by testing. Later in this chapter
there will be a more “pure” comparison of performance between
vector, deque and
list.
Sometimes you need the behavior or
efficiency of one kind of container for one part of your program, and a
different container’s behavior or efficiency in another part of the
program. For example, you may need the efficiency of a deque when adding
objects to the container but the efficiency of a vector when indexing
them. Each of the basic sequence containers (vector, deque and
list) has a two-iterator constructor (indicating the beginning and ending
of the sequence to read from when creating a new object) and an
assign( ) member function to read into an existing container, so you
can easily move objects from one sequence container to another.
The following example reads objects into
a deque and then converts to a vector:
//: C07:DequeConversion.cpp // Reading into a Deque, converting to a vector //{L} ../TestSuite/Test //{-msc} #include "Noisy.h" #include <deque> #include <vector> #include <iostream> #include <algorithm> #include <cstdlib> using namespace std; int main(int argc, char* argv[]) { int size = 25; if(argc >= 2) size = atoi(argv[1]); deque<Noisy> d; generate_n(back_inserter(d), size, NoisyGen()); cout << "\n Converting to a vector(1)" << endl; vector<Noisy> v1(d.begin(), d.end()); cout << "\n Converting to a vector(2)" << endl; vector<Noisy> v2; v2.reserve(d.size()); v2.assign(d.begin(), d.end()); cout << "\n Cleanup" << endl; } ///:~
You can try various sizes, but you should
see that it makes no difference – the objects are simply copy-constructed
into the new vectors. What’s interesting is that v1 does not
cause multiple allocations while building the vector, no matter how many
elements you use. You might initially think that you must follow the process
used for v2 and preallocate the storage to prevent messy reallocations,
but the constructor used for v1 determines the memory need ahead of time
so this is unnecessary.
It’s illuminating to see what
happens with a deque when it overflows a block of storage, in contrast
with VectorOverflow.cpp:
//: C07:DequeOverflow.cpp // A deque is much more efficient than a vector // when pushing back a lot of elements, since it // doesn't require copying and destroying. //{L} ../TestSuite/Test #include "Noisy.h" #include <deque> #include <cstdlib> using namespace std; int main(int argc, char* argv[]) { int size = 1000; if(argc >= 2) size = atoi(argv[1]); deque<Noisy> dn; Noisy n; for(int i = 0; i < size; i++) dn.push_back(n); cout << "\n cleaning up \n"; } ///:~
Here you will never see any destructors
before the words “cleaning up” appear. Since the deque
allocates all its storage in blocks instead of a contiguous array like
vector, it never needs to move existing storage (thus no additional
copy-constructions and destructions occur). It simply allocates a new block. For
the same reason, the deque can just as efficiently add elements to the
beginning of the sequence, since if it runs out of storage it (again)
just allocates a new block for the beginning. Insertions in the middle of a
deque, however, could be even messier than for vector (but not as
costly).
Because a deque never moves its
storage, a held iterator never becomes invalid when you add new things to either
end of a deque, as it was demonstrated to do with vector (in
VectorCoreDump.cpp). However, it’s still possible (albeit harder)
to do bad things:
//: C07:DequeCoreDump.cpp // How to break a program using a deque #include <queue> #include <iostream> using namespace std; int main() { deque<int> di(100, 0); // No problem iterating from beginning to end, // even though it spans multiple blocks: copy(di.begin(), di.end(), ostream_iterator<int>(cout, " ")); deque<int>::iterator i = // In the middle: di.begin() + di.size() / 2;; // Walk the iterator forward as you perform // a lot of insertions in the middle: for(int j = 0; j < 1000; j++) { cout << j << endl; di.insert(i++, 1); // Eventually breaks } } ///:~
Of course, there are two things here that
you wouldn’t normally do with a deque: first, elements are inserted
in the middle, which deque allows but isn’t designed for. Second,
calling insert( ) repeatedly with the same iterator would not
ordinarily cause an access violation, but the iterator is walked forward after
each insertion. I’m guessing it eventually walks off the end of a block,
but I’m not sure what actually causes the problem.
If you stick to what deque is best
at – insertions and removals from either end, reasonably rapid traversals
and fairly fast random-access using operator[ ] – you’ll be
in good shape.
Both vector and deque
provide two ways to perform random access of their elements: the operator[
], which you’ve seen already, and at( ), which checks the
boundaries of the container that’s being indexed and throws an exception
if you go out of bounds. It does cost more to use
at( ):
//: C07:IndexingVsAt.cpp // Comparing "at()" to operator[] //{L} ../TestSuite/Test //{-g++295} #include "../require.h" #include <vector> #include <deque> #include <iostream> #include <ctime> using namespace std; int main(int argc, char* argv[]) { long count = 1000; int sz = 1000; if(argc >= 2) count = atoi(argv[1]); if(argc >= 3) sz = atoi(argv[2]); vector<int> vi(sz); clock_t ticks = clock(); for(int i1 = 0; i1 < count; i1++) for(int j = 0; j < sz; j++) vi[j]; cout << "vector[] " << clock() - ticks << endl; ticks = clock(); for(int i2 = 0; i2 < count; i2++) for(int j = 0; j < sz; j++) vi.at(j); cout << "vector::at() " << clock()-ticks <<endl; deque<int> di(sz); ticks = clock(); for(int i3 = 0; i3 < count; i3++) for(int j = 0; j < sz; j++) di[j]; cout << "deque[] " << clock() - ticks << endl; ticks = clock(); for(int i4 = 0; i4 < count; i4++) for(int j = 0; j < sz; j++) di.at(j); cout << "deque::at() " << clock()-ticks <<endl; // Demonstrate at() when you go out of bounds: try { di.at(vi.size() + 1); } catch(...) { cerr << "Exception thrown" << endl; } } ///:~
As you’ll learn in the
exception-handling chapter, different systems may handle the uncaught exception
in different ways, but you’ll know one way or another that something went
wrong with the program when using at( ), whereas it’s possible
to go blundering ahead using operator[
].
A list is implemented as a
doubly-linked list and is thus designed for rapid insertion and removal of
elements in the middle of the sequence (whereas for vector and deque
this is a much more costly operation). A list is so slow when randomly
accessing elements that it does not have an operator[ ]. It’s best
used when you’re traversing a sequence, in order, from beginning to end
(or end to beginning) rather than choosing elements randomly from the middle.
Even then the traversal is significantly slower than either a vector or a
deque, but if you aren’t doing a lot of traversals that won’t
be your bottleneck.
Another thing to be aware of with a
list is the memory overhead of each link, which requires a forward and
backward pointer on top of the storage for the actual object. Thus a list
is a better choice when you have larger objects that you’ll be inserting
and removing from the middle of the list. It’s better not to use a
list if you think you might be traversing it a lot, looking for objects,
since the amount of time it takes to get from the beginning of the list
– which is the only place you can start unless you’ve already got an
iterator to somewhere you know is closer to your destination – to the
object of interest is proportional to the number of objects between the
beginning and that object.
The objects in a list never move
after they are created; “moving” a list element means changing the
links, but never copying or assigning the actual objects. This means that a held
iterator never moves when you add new things to a list as it was demonstrated to
do in vector. Here’s an example using the Noisy
class:
//: C07:ListStability.cpp // Things don't move around in lists //{L} ../TestSuite/Test #include "Noisy.h" #include <list> #include <iostream> #include <algorithm> using namespace std; int main() { list<Noisy> l; ostream_iterator<Noisy> out(cout, " "); generate_n(back_inserter(l), 25, NoisyGen()); cout << "\n Printing the list:" << endl; copy(l.begin(), l.end(), out); cout << "\n Reversing the list:" << endl; l.reverse(); copy(l.begin(), l.end(), out); cout << "\n Sorting the list:" << endl; l.sort(); copy(l.begin(), l.end(), out); cout << "\n Swapping two elements:" << endl; list<Noisy>::iterator it1, it2; it1 = it2 = l.begin(); it2++; swap(*it1, *it2); cout << endl; copy(l.begin(), l.end(), out); cout << "\n Using generic reverse(): " << endl; reverse(l.begin(), l.end()); cout << endl; copy(l.begin(), l.end(), out); cout << "\n Cleanup" << endl; } ///:~
Operations as seemingly radical as
reversing and sorting the list require no copying of objects, because instead of
moving the objects, the links are simply changed. However, notice that
sort( ) and reverse( ) are member functions of
list, so they have special knowledge of the internals of list and
can perform the pointer movement instead of copying. On the other hand, the
swap( ) function is a generic algorithm, and doesn’t know
about list in particular and so it uses the copying approach for swapping
two elements. There are also generic algorithms for sort( ) and
reverse( ), but if you try to use these you’ll discover that
the generic reverse( ) performs lots of copying and destruction (so
you should never use it with a list) and the generic sort( )
simply doesn’t work because it requires random-access iterators that
list doesn’t provide (a definite benefit, since this would
certainly be an expensive way to sort compared to list’s own
sort( )). The generic sort( ) and reverse( )
should only be used with arrays, vectors and
deques.
If you have large and complex objects you
may want to choose a list first, especially if construction, destruction,
copy-construction and assignment are expensive and if you are doing things like
sorting the objects or otherwise reordering them a
lot.
The list has some special
operations that are built-in to make the best use of the structure of the
list. You’ve already seen reverse( ) and
sort( ), and here are some of the others in use:
//: C07:ListSpecialFunctions.cpp //{L} ../TestSuite/Test #include "Noisy.h" #include <list> #include <iostream> #include <algorithm> using namespace std; ostream_iterator<Noisy> out(cout, " "); void print(list<Noisy>& ln, char* comment = "") { cout << "\n" << comment << ":\n"; copy(ln.begin(), ln.end(), out); cout << endl; } int main() { typedef list<Noisy> LN; LN l1, l2, l3, l4; generate_n(back_inserter(l1), 6, NoisyGen()); generate_n(back_inserter(l2), 6, NoisyGen()); generate_n(back_inserter(l3), 6, NoisyGen()); generate_n(back_inserter(l4), 6, NoisyGen()); print(l1, "l1"); print(l2, "l2"); print(l3, "l3"); print(l4, "l4"); LN::iterator it1 = l1.begin(); it1++; it1++; it1++; l1.splice(it1, l2); print(l1, "l1 after splice(it1, l2)"); print(l2, "l2 after splice(it1, l2)"); LN::iterator it2 = l3.begin(); it2++; it2++; it2++; l1.splice(it1, l3, it2); print(l1, "l1 after splice(it1, l3, it2)"); LN::iterator it3 = l4.begin(), it4 = l4.end(); it3++; it4--; l1.splice(it1, l4, it3, it4); print(l1, "l1 after splice(it1,l4,it3,it4)"); Noisy n; LN l5(3, n); generate_n(back_inserter(l5), 4, NoisyGen()); l5.push_back(n); print(l5, "l5 before remove()"); l5.remove(l5.front()); print(l5, "l5 after remove()"); l1.sort(); l5.sort(); l5.merge(l1); print(l5, "l5 after l5.merge(l1)"); cout << "\n Cleanup" << endl; } ///:~
The print( ) function is used
to display results. After filling four lists with Noisy objects,
one list is spliced into another in three different ways. In the first, the
entire list l2 is spliced into l1 at the iterator it1.
Notice that after the splice, l2 is empty – splicing means removing
the elements from the source list. The second splice inserts elements from l3
starting at it2 into l1 starting at it1. The third
splice starts at it1 and uses elements from l4 starting at
it3 and ending at it4 (the seemingly-redundant mention of the
source list is because the elements must be erased from the source list as part
of the transfer to the destination list).
The output from the code that
demonstrates remove( ) shows that the list does not have to be
sorted in order for all the elements of a particular value to be
removed.
Finally, if you merge( ) one
list with another, the merge only works sensibly if the lists have been sorted.
What you end up with in that case is a sorted list containing all the elements
from both lists (the source list is erased – that is, the elements are
moved to the destination list).
There’s also a
unique( ) member function that removes all duplicates, but only if
the list has been sorted first:
//: C07:UniqueList.cpp // Testing list's unique() function //{L} ../TestSuite/Test #include <list> #include <iostream> using namespace std; int a[] = { 1, 3, 1, 4, 1, 5, 1, 6, 1 }; const int asz = sizeof a / sizeof *a; int main() { // For output: ostream_iterator<int> out(cout, " "); list<int> li(a, a + asz); li.unique(); // Oops! No duplicates removed: copy(li.begin(), li.end(), out); cout << endl; // Must sort it first: li.sort(); copy(li.begin(), li.end(), out); cout << endl; // Now unique() will have an effect: li.unique(); copy(li.begin(), li.end(), out); cout << endl; } ///:~
The list constructor used here
takes the starting and past-the-end iterator from another container, and it
copies all the elements from that container into itself (a similar constructor
is available for all the containers). Here, the “container” is just
an array, and the “iterators” are pointers into that array, but
because of the design of the STL it works with arrays just as easily as any
other container.
If you run this program, you’ll see
that unique( ) will only remove adjacent duplicate elements,
and thus sorting is necessary before calling
unique( ).
There are four additional list
member functions that are not demonstrated here: a remove_if( )
that takes a predicate which is used to decide whether an object should be
removed, a unique( ) that takes a binary predicate to perform
uniqueness comparisons, a merge( ) that takes an additional argument
which performs comparisons, and a sort( ) that takes a comparator
(to provide a comparison or override the existing one).
Looking at the previous example you may
note that if you want a sorted list with no duplicates, a set can give
you that, right? It’s interesting to compare the performance of
the two containers:
//: C07:ListVsSet.cpp // Comparing list and set performance //{L} ../TestSuite/Test #include <iostream> #include <list> #include <set> #include <algorithm> #include <ctime> #include <cstdlib> using namespace std; class Obj { int a[20]; // To take up extra space int val; public: Obj() : val(rand() % 500) {} friend bool operator<(const Obj& a, const Obj& b) { return a.val < b.val; } friend bool operator==(const Obj& a, const Obj& b) { return a.val == b.val; } friend ostream& operator<<(ostream& os, const Obj& a) { return os << a.val; } }; template<class Container> void print(Container& c) { typename Container::iterator it; for(it = c.begin(); it != c.end(); it++) cout << *it << " "; cout << endl; } struct ObjGen { Obj operator()() { return Obj(); } }; int main() { const int sz = 5000; srand(time(0)); list<Obj> lo; clock_t ticks = clock(); generate_n(back_inserter(lo), sz, ObjGen()); lo.sort(); lo.unique(); cout << "list:" << clock() - ticks << endl; set<Obj> so; ticks = clock(); generate_n(inserter(so, so.begin()), sz, ObjGen()); cout << "set:" << clock() - ticks << endl; print(lo); print(so); } ///:~
When you run the program, you should
discover that set is much faster than list. This is reassuring
– after all, it is set’s primary job
description!
It turns out that all basic sequences
have a member function swap( ) that’s designed to switch one
sequence with another (however, this swap( ) is only defined for
sequences of the same type). The member swap( ) makes use of its
knowledge of the internal structure of the particular container in order to be
efficient:
//: C07:Swapping.cpp // All basic sequence containers can be swapped //{L} ../TestSuite/Test #include "Noisy.h" #include <list> #include <vector> #include <deque> #include <iostream> #include <algorithm> using namespace std; ostream_iterator<Noisy> out(cout, " "); template<class Cont> void print(Cont& c, char* comment = "") { cout << "\n" << comment << ": "; copy(c.begin(), c.end(), out); cout << endl; } template<class Cont> void testSwap(char* cname) { Cont c1, c2; generate_n(back_inserter(c1), 10, NoisyGen()); generate_n(back_inserter(c2), 5, NoisyGen()); cout << "\n" << cname << ":" << endl; print(c1, "c1"); print(c2, "c2"); cout << "\n Swapping the " << cname << ":" << endl; c1.swap(c2); print(c1, "c1"); print(c2, "c2"); } int main() { testSwap<vector<Noisy> >("vector"); testSwap<deque<Noisy> >("deque"); testSwap<list<Noisy> >("list"); } ///:~
When you run this, you’ll discover
that each type of sequence container is able to swap one sequence for another
without any copying or assignments, even if the sequences are of different
sizes. In effect, you’re completely swapping the memory of one object for
another.
The STL algorithms also contain a
swap( ), and when this function is applied to two containers of the
same type, it will use the member swap( ) to achieve fast
performance. Consequently, if you apply the sort( ) algorithm to a
container of containers, you will find that the performance is very fast –
it turns out that fast sorting of a container of containers was a design goal of
the STL.
To break a list, you have to work
pretty hard:
//: C07:ListRobustness.cpp // lists are harder to break //{L} ../TestSuite/Test #include <list> #include <iostream> using namespace std; int main() { list<int> li(100, 0); list<int>::iterator i = li.begin(); for(int j = 0; j < li.size() / 2; j++) i++; // Walk the iterator forward as you perform // a lot of insertions in the middle: for(int k = 0; k < 1000; k++) li.insert(i++, 1); // No problem li.erase(i); i++; //! *i = 2; // Oops! It's invalid } ///:~
When the link that the iterator i
was pointing to was erased, it was unlinked from the list and thus became
invalid. Trying to move forward to the “next link” from an invalid
link is poorly-formed code. Notice that the operation that broke deque in
DequeCoreDump.cpp is perfectly fine with a
list.
To get a better feel for the differences
between the sequence containers, it’s illuminating to race them against
each other while performing various operations.
//: C07:SequencePerformance.cpp // Comparing the performance of the basic // sequence containers for various operations //{L} ../TestSuite/Test #include <vector> #include <queue> #include <list> #include <iostream> #include <string> #include <typeinfo> #include <ctime> #include <cstdlib> using namespace std; class FixedSize { int x[20]; // Automatic generation of default constructor, // copy-constructor and operator= } fs; template<class Cont> struct InsertBack { void operator()(Cont& c, long count) { for(long i = 0; i < count; i++) c.push_back(fs); } char* testName() { return "InsertBack"; } }; template<class Cont> struct InsertFront { void operator()(Cont& c, long count) { long cnt = count * 10; for(long i = 0; i < cnt; i++) c.push_front(fs); } char* testName() { return "InsertFront"; } }; template<class Cont> struct InsertMiddle { void operator()(Cont& c, long count) { typename Cont::iterator it; long cnt = count / 10; for(long i = 0; i < cnt; i++) { // Must get the iterator every time to keep // from causing an access violation with // vector. Increment it to put it in the // middle of the container: it = c.begin(); it++; c.insert(it, fs); } } char* testName() { return "InsertMiddle"; } }; template<class Cont> struct RandomAccess { // Not for list void operator()(Cont& c, long count) { int sz = c.size(); long cnt = count * 100; for(long i = 0; i < cnt; i++) c[rand() % sz]; } char* testName() { return "RandomAccess"; } }; template<class Cont> struct Traversal { void operator()(Cont& c, long count) { long cnt = count / 100; for(long i = 0; i < cnt; i++) { typename Cont::iterator it = c.begin(), end = c.end(); while(it != end) it++; } } char* testName() { return "Traversal"; } }; template<class Cont> struct Swap { void operator()(Cont& c, long count) { int middle = c.size() / 2; typename Cont::iterator it = c.begin(), mid = c.begin(); it++; // Put it in the middle for(int x = 0; x < middle + 1; x++) mid++; long cnt = count * 10; for(long i = 0; i < cnt; i++) swap(*it, *mid); } char* testName() { return "Swap"; } }; template<class Cont> struct RemoveMiddle { void operator()(Cont& c, long count) { long cnt = count / 10; if(cnt > c.size()) { cout << "RemoveMiddle: not enough elements" << endl; return; } for(long i = 0; i < cnt; i++) { typename Cont::iterator it = c.begin(); it++; c.erase(it); } } char* testName() { return "RemoveMiddle"; } }; template<class Cont> struct RemoveBack { void operator()(Cont& c, long count) { long cnt = count * 10; if(cnt > c.size()) { cout << "RemoveBack: not enough elements" << endl; return; } for(long i = 0; i < cnt; i++) c.pop_back(); } char* testName() { return "RemoveBack"; } }; template<class Op, class Container> void measureTime(Op f, Container& c, long count){ string id(typeid(f).name()); bool Deque = id.find("deque") != string::npos; bool List = id.find("list") != string::npos; bool Vector = id.find("vector") !=string::npos; string cont = Deque ? "deque" : List ? "list" : Vector? "vector" : "unknown"; cout << f.testName() << " for " << cont << ": "; // Standard C library CPU ticks: clock_t ticks = clock(); f(c, count); // Run the test ticks = clock() - ticks; cout << ticks << endl; } typedef deque<FixedSize> DF; typedef list<FixedSize> LF; typedef vector<FixedSize> VF; int main(int argc, char* argv[]) { srand(time(0)); long count = 1000; if(argc >= 2) count = atoi(argv[1]); DF deq; LF lst; VF vec, vecres; vecres.reserve(count); // Preallocate storage measureTime(InsertBack<VF>(), vec, count); measureTime(InsertBack<VF>(), vecres, count); measureTime(InsertBack<DF>(), deq, count); measureTime(InsertBack<LF>(), lst, count); // Can't push_front() with a vector: //! measureTime(InsertFront<VF>(), vec, count); measureTime(InsertFront<DF>(), deq, count); measureTime(InsertFront<LF>(), lst, count); measureTime(InsertMiddle<VF>(), vec, count); measureTime(InsertMiddle<DF>(), deq, count); measureTime(InsertMiddle<LF>(), lst, count); measureTime(RandomAccess<VF>(), vec, count); measureTime(RandomAccess<DF>(), deq, count); // Can't operator[] with a list: //! measureTime(RandomAccess<LF>(), lst, count); measureTime(Traversal<VF>(), vec, count); measureTime(Traversal<DF>(), deq, count); measureTime(Traversal<LF>(), lst, count); measureTime(Swap<VF>(), vec, count); measureTime(Swap<DF>(), deq, count); measureTime(Swap<LF>(), lst, count); measureTime(RemoveMiddle<VF>(), vec, count); measureTime(RemoveMiddle<DF>(), deq, count); measureTime(RemoveMiddle<LF>(), lst, count); vec.resize(vec.size() * 10); // Make it bigger measureTime(RemoveBack<VF>(), vec, count); measureTime(RemoveBack<DF>(), deq, count); measureTime(RemoveBack<LF>(), lst, count); } ///:~
This example makes heavy use of templates
to eliminate redundancy, save space, guarantee identical code and improve
clarity. Each test is represented by a class that is templatized on the
container it will operate on. The test itself is inside the
operator( ) which, in each case, takes a reference to the container
and a repeat count – this count is not always used exactly as it is, but
sometimes increased or decreased to prevent the test from being too short or too
long. The repeat count is just a factor, and all tests are compared using the
same value.
Each test class also has a member
function that returns its name, so that it can easily be printed. You might
think that this should be accomplished using run-time type identification, but
since the actual name of the class involves a template expansion, this turns out
to be the more direct approach.
The measureTime( ) function
template takes as its first template argument the operation that it’s
going to test – which is itself a class template selected from the group
defined previously in the listing. The template argument Op will not only
contain the name of the class, but also (decorated into it) the type of the
container it’s working with. The RTTI typeid( ) operation
allows the name of the class to be extracted as a char*, which can then
be used to create a string called id. This string can be
searched using string::find( ) to look for deque, list
or vector. The bool variable that corresponds to the matching
string becomes true, and this is used to properly initialize the
string cont so the container name can be accurately printed, along
with the test name.
Once the type of test and the container
being tested has been printed out, the actual test is quite simple. The Standard
C library function clock( ) is used to capture the starting and
ending CPU ticks (this is typically more fine-grained than trying to measure
seconds). Since f is an object of type Op, which is a class that
has an operator( ), the line:
f(c, count);
is actually calling the
operator( ) for the object f.
In main( ), you can see that
each different type of test is run on each type of container, except for the
containers that don’t support the particular operation being tested (these
are commented out).
When you run the program, you’ll
get comparative performance numbers for your particular compiler and your
particular operating system and platform. Although this is only intended to give
you a feel for the various performance features relative to the other sequences,
it is not a bad way to get a quick-and-dirty idea of the behavior of your
library, and also to compare one library with
another.
The set produces a container that
will accept only one of each thing you place in it; it also sorts the elements
(sorting isn’t intrinsic to the conceptual definition of a set, but the
STL set stores its elements in a balanced binary tree to provide rapid
lookups, thus producing sorted results when you traverse it). The first two
examples in this chapter used sets.
Consider the problem of creating an index
for a book. You might like to start with all the words in the book, but you only
want one instance of each word and you want them sorted. Of course, a set
is perfect for this, and solves the problem effortlessly. However, there’s
also the problem of punctuation and any other non-alpha characters, which must
be stripped off to generate proper words. One solution to this problem is to use
the Standard C library function strtok( ), which produces tokens (in
our case, words) given a set of delimiters to strip out:
//: C07:WordList.cpp // Display a list of words used in a document //{L} ../TestSuite/Test #include "../require.h" #include <string> #include <cstring> #include <set> #include <iostream> #include <fstream> using namespace std; const char* delimiters = " \t;()\"<>:{}[]+-=&*#.,/\\~"; int main(int argc, char* argv[]) { char* fname = "WordList.cpp"; if(argc > 1) fname = argv[1]; ifstream in(fname); assure(in, fname); set<string> wordlist; string line; while(getline(in, line)) { // Capture individual words: char* s = // Cast probably won’t crash: strtok((char*)line.c_str(), delimiters); while(s) { // Automatic type conversion: wordlist.insert(s); s = strtok(0, delimiters); } } // Output results: copy(wordlist.begin(), wordlist.end(), ostream_iterator<string>(cout, "\n")); } ///:~
strtok( )
takes the starting address of
a character buffer (the first argument) and looks for delimiters (the second
argument). It replaces the delimiter with a zero, and returns the address of the
beginning of the token. If you call it subsequent times with a first argument of
zero it will continue extracting tokens from the rest of the string until it
finds the end. In this case, the delimiters are those that delimit the keywords
and identifiers of C++, so it extracts these keywords and identifiers. Each word
is turned into a string and placed into the wordlist vector, which
eventually contains the whole file, broken up into words.
You don’t have to use a set
just to get a sorted sequence. You can use the sort( ) function
(along with a multitude of other functions in the STL) on different STL
containers. However, it’s likely that set will be
faster.
Some programmers consider
strtok( ) to be the poorest design in the Standard C library because
it uses a static buffer to hold its data between function calls. This
means:
For all these
reasons it seems like a good idea to find an alternative for
strtok( ). The following example will use an
istreambuf_iterator (introduced earlier) to move the characters from one
place (which happens to be an istream) to another (which happens to be a
string), depending on whether the Standard C library function
isalpha( ) is true:
//: C07:WordList2.cpp // Eliminating strtok() from Wordlist.cpp //{L} ../TestSuite/Test //{-g++295} #include "../require.h" #include <string> #include <cstring> #include <set> #include <iostream> #include <fstream> #include <iterator> using namespace std; int main(int argc, char* argv[]) { char* fname = "WordList2.cpp"; if(argc > 1) fname = argv[1]; ifstream in(fname); assure(in, fname); istreambuf_iterator<char> p(in), end; set<string> wordlist; while (p != end) { string word; insert_iterator<string> ii(word, word.begin()); // Find the first alpha character: while(!isalpha(*p) && p != end) p++; // Copy until the first non-alpha character: while (isalpha(*p) && p != end) *ii++ = *p++; if (word.size() != 0) wordlist.insert(word); } // Output results: copy(wordlist.begin(), wordlist.end(), ostream_iterator<string>(cout, "\n")); } ///:~
This example was suggested by Nathan
Myers, who invented the istreambuf_iterator and its relatives. This
iterator extracts information character-by-character from a stream. Although the
istreambuf_iterator template argument might suggest to you that you could
extract, for example, ints instead of char, that’s not the
case. The argument must be of some character type – a regular char
or a wide character.
After the file is open, an
istreambuf_iterator called p is attached to the istream so
characters can be extracted from it. The set<string> called
wordlist will be used to hold the resulting words.
The while loop reads words until
the end of the input stream is found. This is detected using the default
constructor for istreambuf_iterator which produces the past-the-end
iterator object end. Thus, if you want to test to make sure you’re
not at the end of the stream, you simply say p != end.
The second type of iterator that’s
used here is the insert_iterator, which creates an iterator that knows
how to insert objects into a container. Here, the “container” is the
string called word which, for the purposes of
insert_iterator, behaves like a container. The constructor for
insert_iterator requires the container and an iterator indicating where
it should start inserting the characters. You could also use a
back_insert_iterator, which requires that the container have a
push_back( ) (string does).
After the while loop sets
everything up, it begins by looking for the first alpha character, incrementing
start until that character is found. Then it copies characters from one
iterator to the other, stopping when a non-alpha character is found. Each
word, assuming it is non-empty, is added to
wordlist.
The above program parses its input into
strings of words containing only alpha characters, but that’s still a
special case compared to the generality of strtok( ). What
we’d like now is an actual replacement for strtok( ) so
we’re never tempted to use it. WordList2.cpp can be modified to
create a class called StreamTokenizer that delivers a new token as a
string whenever you call next( ), according to the delimiters
you give it upon construction (very similar to
strtok( )):
//: C07:StreamTokenizer.h // C++ Replacement for Standard C strtok() #ifndef STREAMTOKENIZER_H #define STREAMTOKENIZER_H #include <string> #include <iostream> #include <iterator> class StreamTokenizer { typedef std::istreambuf_iterator<char> It; It p, end; std::string delimiters; bool isDelimiter(char c) { return delimiters.find(c) != std::string::npos; } public: StreamTokenizer(std::istream& is, std::string delim = " \t\n;()\"<>:{}[]+-=&*#" ".,/\\~!0123456789") : p(is), end(It()), delimiters(delim) {} std::string next(); // Get next token }; #endif STREAMTOKENIZER_H ///:~
The default delimiters for the
StreamTokenizer constructor extract words with only alpha characters, as
before, but now you can choose different delimiters to parse different tokens.
The implementation of next( ) looks similar to
Wordlist2.cpp:
//: C07:StreamTokenizer.cpp {O} //{-g++295} #include "StreamTokenizer.h" using namespace std; string StreamTokenizer::next() { string result; if(p != end) { insert_iterator<string> ii(result, result.begin()); while(isDelimiter(*p) && p != end) p++; while (!isDelimiter(*p) && p != end) *ii++ = *p++; } return result; } ///:~
The first non-delimiter is found, then
characters are copied until a delimiter is found, and the resulting
string is returned. Here’s a test:
//: C07:TokenizeTest.cpp // Test StreamTokenizer //{L} StreamTokenizer ../TestSuite/Test //{-g++295} #include "StreamTokenizer.h" #include "../require.h" #include <iostream> #include <fstream> #include <set> using namespace std; int main(int argc, char* argv[]) { char* fname = "TokenizeTest.cpp"; if(argc > 1) fname = argv[1]; ifstream in(fname); assure(in, fname); StreamTokenizer words(in); set<string> wordlist; string word; while((word = words.next()).size() != 0) wordlist.insert(word); // Output results: copy(wordlist.begin(), wordlist.end(), ostream_iterator<string>(cout, "\n")); } ///:~
Now the tool is more reusable than
before, but it’s still inflexible, because it can only work with an
istream. This isn’t as bad as it first seems, since a string
can be turned into an istream via an istringstream. But in the
next section we’ll come up with the most general, reusable tokenizing
tool, and this should give you a feeling of what “reusable” really
means, and the effort necessary to create truly reusable
code.
Since the STL containers and algorithms
all revolve around iterators, the most flexible solution will itself be an
iterator. You could think of the TokenIterator as an iterator that wraps
itself around any other iterator that can produce characters. Because it is
designed as an input iterator (the most primitive type of iterator) it can be
used with any STL algorithm. Not only is it a useful tool in itself, the
TokenIterator is also a good example of how you can design your own
iterators.[22]
The TokenIterator is doubly
flexible: first, you can choose the type of iterator that will produce the
char input. Second, instead of just saying what characters represent the
delimiters, TokenIterator will use a predicate which is a function object
whose operator( ) takes a char and decides if it should be in
the token or not. Although the two examples given here have a static concept of
what characters belong in a token, you could easily design your own function
object to change its state as the characters are read, producing a more
sophisticated parser.
The following header file contains the
two basic predicates Isalpha and Delimiters, along with the
template for TokenIterator:
//: C07:TokenIterator.h #ifndef TOKENITERATOR_H #define TOKENITERATOR_H #include <string> #include <iterator> #include <algorithm> #include <cctype> struct Isalpha { bool operator()(char c) { using namespace std; //[[For a compiler bug]] return isalpha(c); } }; class Delimiters { std::string exclude; public: Delimiters() {} Delimiters(const std::string& excl) : exclude(excl) {} bool operator()(char c) { return exclude.find(c) == std::string::npos; } }; template <class InputIter, class Pred = Isalpha> class TokenIterator: public std::iterator< std::input_iterator_tag,std::string,ptrdiff_t>{ InputIter first; InputIter last; std::string word; Pred predicate; public: TokenIterator(InputIter begin, InputIter end, Pred pred = Pred()) : first(begin), last(end), predicate(pred) { ++*this; } TokenIterator() {} // End sentinel // Prefix increment: TokenIterator& operator++() { word.resize(0); first = std::find_if(first, last, predicate); while (first != last && predicate(*first)) word += *first++; return *this; } // Postfix increment class Proxy { std::string word; public: Proxy(const std::string& w) : word(w) {} std::string operator*() { return word; } }; Proxy operator++(int) { Proxy d(word); ++*this; return d; } // Produce the actual value: std::string operator*() const { return word; } std::string* operator->() const { return &(operator*()); } // Compare iterators: bool operator==(const TokenIterator&) { return word.size() == 0 && first == last; } bool operator!=(const TokenIterator& rv) { return !(*this == rv); } }; #endif // TOKENITERATOR_H ///:~
TokenIterator is inherited from
the std::iterator template. It might appear that there’s some kind
of functionality that comes with std::iterator, but it is purely a way of
tagging an iterator so that a container that uses it knows what it’s
capable of. Here, you can see input_iterator_tag as a template argument
– this tells anyone who asks that a TokenIterator only has the
capabilities of an input iterator, and cannot be used with algorithms requiring
more sophisticated iterators. Apart from the tagging, std::iterator
doesn’t do anything else, which means you must design all the other
functionality in yourself.
TokenIterator may look a little
strange at first, because the first constructor requires both a
“begin” and “end” iterator as arguments, along with the
predicate. Remember that this is a “wrapper” iterator that has no
idea of how to tell whether it’s at the end of its input source, so the
ending iterator is necessary in the first constructor. The reason for the second
(default) constructor is that the STL algorithms (and any algorithms you write)
need a TokenIterator sentinel to be the past-the-end value. Since all the
information necessary to see if the TokenIterator has reached the end of
its input is collected in the first constructor, this second constructor creates
a TokenIterator that is merely used as a placeholder in
algorithms.
The core of the behavior happens in
operator++. This erases the current value of word using
string::resize( ), then finds the first character that satisfies the
predicate (thus discovering the beginning of the new token) using
find_if( ) (from the STL algorithms, discussed in the following
chapter). The resulting iterator is assigned to first, thus moving
first forward to the beginning of the token. Then, as long as the end of
the input is not reached and the predicate is satisfied, characters are copied
into the word from the input. Finally, the TokenIterator object is returned, and
must be dereferenced to access the new token.
The postfix increment requires a proxy
object to hold the value before the increment, so it can be returned (see the
operator overloading chapter for more details of this). Producing the actual
value is a straightforward operator*. The only other functions that must
be defined for an output iterator are the operator== and
operator!= to indicate whether the TokenIterator has reached the
end of its input. You can see that the argument for operator== is ignored
– it only cares about whether it has reached its internal last
iterator. Notice that operator!= is defined in terms of
operator==.
A good test of TokenIterator
includes a number of different sources of input characters including a
streambuf_iterator, a char*, and a
deque<char>::iterator. Finally, the original Wordlist.cpp
problem is solved:
//: C07:TokenIteratorTest.cpp //{L} ../TestSuite/Test //{-g++295} //{-g++3} #include "TokenIterator.h" #include "../require.h" #include <fstream> #include <iostream> #include <vector> #include <deque> #include <set> using namespace std; int main(int argc, char* argv[]) { char* fname = "TokenIteratorTest.cpp"; if(argc > 1) fname = argv[1]; ifstream in(fname); assure(in, fname); ostream_iterator<string> out(cout, "\n"); typedef istreambuf_iterator<char> IsbIt; IsbIt begin(in), isbEnd; Delimiters delimiters(" \t\n~;()\"<>:{}[]+-=&*#.,/\\"); TokenIterator<IsbIt, Delimiters> wordIter(begin, isbEnd, delimiters), end; vector<string> wordlist; copy(wordIter, end, back_inserter(wordlist)); // Output results: copy(wordlist.begin(), wordlist.end(), out); *out++ = "-----------------------------------"; // Use a char array as the source: char* cp = "typedef std::istreambuf_iterator<char> It"; TokenIterator<char*, Delimiters> charIter(cp, cp + strlen(cp), delimiters), end2; vector<string> wordlist2; copy(charIter, end2, back_inserter(wordlist2)); copy(wordlist2.begin(), wordlist2.end(), out); *out++ = "-----------------------------------"; // Use a deque<char> as the source: ifstream in2("TokenIteratorTest.cpp"); deque<char> dc; copy(IsbIt(in2), IsbIt(), back_inserter(dc)); TokenIterator<deque<char>::iterator,Delimiters> dcIter(dc.begin(), dc.end(), delimiters), end3; vector<string> wordlist3; copy(dcIter, end3, back_inserter(wordlist3)); copy(wordlist3.begin(), wordlist3.end(), out); *out++ = "-----------------------------------"; // Reproduce the Wordlist.cpp example: ifstream in3("TokenIteratorTest.cpp"); TokenIterator<IsbIt, Delimiters> wordIter2(IsbIt(in3), isbEnd, delimiters); set<string> wordlist4; while(wordIter2 != end) wordlist4.insert(*wordIter2++); copy(wordlist4.begin(), wordlist4.end(), out); } ///:~
When using an istreambuf_iterator,
you create one to attach to the istream object, and one with the default
constructor as the past-the-end marker. Both of these are used to create the
TokenIterator that will actually produce the tokens; the default
constructor produces the faux TokenIterator past-the-end sentinel (this
is just a placeholder, and as mentioned previously is actually ignored). The
TokenIterator produces strings that are inserted into a container
which must, naturally, be a container of string – here a
vector<string> is used in all cases except the last (you could also
concatenate the results onto a string). Other than that, a
TokenIterator works like any other input
iterator.
The stack, along with the
queue and priority_queue, are classified as adapters, which
means they are implemented using one of the basic sequence containers:
vector, list or deque. This, in my opinion, is an
unfortunate case of confusing what something does with the details of its
underlying implementation – the fact that these are called
“adapters” is of primary value only to the creator of the library.
When you use them, you generally don’t care that they’re adapters,
but instead that they solve your problem. Admittedly there are times when
it’s useful to know that you can choose an alternate implementation or
build an adapter from an existing container object, but that’s generally
one level removed from the adapter’s behavior. So, while you may see it
emphasized elsewhere that a particular container is an adapter, I shall only
point out that fact when it’s useful. Note that each type of adapter has a
default container that it’s built upon, and this default is the most
sensible implementation, so in most cases you won’t need to concern
yourself with the underlying implementation.
The following example shows
stack<string> implemented in the three possible ways: the default
(which uses deque), with a vector and with a
list:
//: C07:Stack1.cpp // Demonstrates the STL stack //{L} ../TestSuite/Test #include <iostream> #include <fstream> #include <stack> #include <list> #include <vector> #include <string> using namespace std; // Default: deque<string>: typedef stack<string> Stack1; // Use a vector<string>: typedef stack<string, vector<string> > Stack2; // Use a list<string>: typedef stack<string, list<string> > Stack3; int main() { ifstream in("Stack1.cpp"); Stack1 textlines; // Try the different versions // Read file and store lines in the stack: string line; while(getline(in, line)) textlines.push(line + "\n"); // Print lines from the stack and pop them: while(!textlines.empty()) { cout << textlines.top(); textlines.pop(); } } ///:~
The top( ) and
pop( ) operations will probably seem non-intuitive if you’ve
used other stack classes. When you call pop( ) it returns
void rather than the top element that you might have expected. If you want the
top element, you get a reference to it with top( ). It turns out
this is more efficient, since a traditional pop( ) would have to
return a value rather than a reference, and thus invoke the copy-constructor.
When you’re using a stack (or a priority_queue, described
later) you can efficiently refer to top( ) as many times as you
want, then discard the top element explicitly using pop( ) (perhaps
if some other term than the familiar “pop” had been used, this would
have been a bit clearer).
The stack template has a very
simple interface, essentially the member functions you see above. It
doesn’t have sophisticated forms of initialization or access, but if you
need that you can use the underlying container that the stack is
implemented upon. For example, suppose you have a function that expects a
stack interface but in the rest of your program you need the objects
stored in a list. The following program stores each line of a file along
with the leading number of spaces in that line (you might imagine it as a
starting point for performing some kinds of source-code
reformatting):
//: C07:Stack2.cpp // Converting a list to a stack //{L} ../TestSuite/Test //{-msc} #include <iostream> #include <fstream> #include <stack> #include <list> #include <string> using namespace std; // Expects a stack: template<class Stk> void stackOut(Stk& s, ostream& os = cout) { while(!s.empty()) { os << s.top() << "\n"; s.pop(); } } class Line { string line; // Without leading spaces int lspaces; // Number of leading spaces public: Line(string s) : line(s) { lspaces = line.find_first_not_of(' '); if(lspaces == string::npos) lspaces = 0; line = line.substr(lspaces); } friend ostream& operator<<(ostream& os, const Line& l) { for(int i = 0; i < l.lspaces; i++) os << ' '; return os << l.line; } // Other functions here... }; int main() { ifstream in("Stack2.cpp"); list<Line> lines; // Read file and store lines in the list: string s; while(getline(in, s)) lines.push_front(s); // Turn the list into a stack for printing: stack<Line, list<Line> > stk(lines); stackOut(stk); } ///:~
The function that requires the
stack interface just sends each top( ) object to an
ostream and then removes it by calling pop( ). The
Line class determines the number of leading spaces, then stores the
contents of the line without the leading spaces. The ostream
operator<< re-inserts the leading spaces so the line prints
properly, but you can easily change the number of spaces by changing the value
of lspaces (the member functions to do this are not shown
here).
In main( ), the input file is
read into a list<Line>, then a stack is wrapped around this
list so it can be sent to stackOut( ).
You cannot iterate through a
stack; this emphasizes that you only want to perform stack
operations when you create a stack. You can get equivalent
“stack” functionality using a vector and its
back( ), push_back( ) and pop_back( )
methods, and then you have all the additional functionality of the
vector. Stack1.cpp can be rewritten to show this:
//: C07:Stack3.cpp // Using a vector as a stack; modified Stack1.cpp //{L} ../TestSuite/Test #include <iostream> #include <fstream> #include <vector> #include <string> using namespace std; int main() { ifstream in("Stack3.cpp"); vector<string> textlines; string line; while(getline(in, line)) textlines.push_back(line + "\n"); while(!textlines.empty()) { cout << textlines.back(); textlines.pop_back(); } } ///:~
You’ll see this produces the same
output as Stack1.cpp, but you can now perform vector operations as
well. Of course, list has the additional ability to push things at the
front, but it’s generally less efficient than using
push_back( ) with vector. (In addition, deque is
usually more efficient than list for pushing things at the
front).
The queue is a restricted form of
a deque – you can only enter elements at one end, and pull them off
the other end. Functionally, you could use a deque anywhere you need a
queue, and you would then also have the additional functionality of the
deque. The only reason you need to use a queue rather than a
deque, then, is if you want to emphasize that you will only be performing
queue-like behavior.
The queue is an adapter class like
stack, in that it is built on top of another sequence container. As you
might guess, the ideal implementation for a queue is a deque, and
that is the default template argument for the queue; you’ll rarely
need a different implementation.
Queues are often used when modeling
systems where some elements of the system are waiting to be served by other
elements in the system. A classic example of this is the “bank-teller
problem,” where you have customers arriving at random intervals, getting
into a line, and then being served by a set of tellers. Since the customers
arrive randomly and each take a random amount of time to be served,
there’s no way to deterministically know how long the line will be at any
time. However, it’s possible to simulate the situation and see what
happens.
A problem in performing this simulation
is the fact that, in effect, each customer and teller should be run by a
separate process. What we’d like is a multithreaded environment, then each
customer or teller would have their own thread. However, Standard C++ has no
model for multithreading so there is no standard solution to this problem. On
the other hand, with a little adjustment to the code it’s possible to
simulate enough multithreading to provide a satisfactory solution to our
problem.
Multithreading means you have multiple
threads of control running at once, in the same address space (this differs from
multitasking, where you have different processes each running in their
own address space). The trick is that you have fewer CPUs than you do threads
(and very often only one CPU) so to give the illusion that each thread has its
own CPU there is a time-slicing mechanism that says “OK, current
thread – you’ve had enough time. I’m going to stop you and go
give time to some other thread.” This automatic stopping and starting of
threads is called pre-emptive and it means you don’t need to manage
the threading process at all.
An alternative approach is for each
thread to voluntarily yield the CPU to the scheduler, which then goes and finds
another thread that needs running. This is easier to synthesize, but it still
requires a method of “swapping” out one thread and swapping in
another (this usually involves saving the stack frame and using the standard C
library functions setjmp( ) and longjmp( ); see my
article in the (XX) issue of Computer Language magazine for an example). So
instead, we’ll build the time-slicing into the classes in the system. In
this case, it will be the tellers that represent the “threads,” (the
customers will be passive) so each teller will have an infinite-looping
run( ) method that will execute for a certain number of “time
units,” and then simply return. By using the ordinary return mechanism, we
eliminate the need for any swapping. The resulting program, although small,
provides a remarkably reasonable simulation:
//: C07:BankTeller.cpp // Using a queue and simulated multithreading // To model a bank teller system //{L} ../TestSuite/Test #include <iostream> #include <queue> #include <list> #include <cstdlib> #include <ctime> using namespace std; class Customer { int serviceTime; public: Customer() : serviceTime(0) {} Customer(int tm) : serviceTime(tm) {} int getTime() { return serviceTime; } void setTime(int newtime) { serviceTime = newtime; } friend ostream& operator<<(ostream& os, const Customer& c) { return os << '[' << c.serviceTime << ']'; } }; class Teller { queue<Customer>& customers; Customer current; enum { slice = 5 }; int ttime; // Time left in slice bool busy; // Is teller serving a customer? public: Teller(queue<Customer>& cq) : customers(cq), ttime(0), busy(false) {} Teller& operator=(const Teller& rv) { customers = rv.customers; current = rv.current; ttime = rv.ttime; busy = rv.busy; return *this; } bool isBusy() { return busy; } void run(bool recursion = false) { if(!recursion) ttime = slice; int servtime = current.getTime(); if(servtime > ttime) { servtime -= ttime; current.setTime(servtime); busy = true; // Still working on current return; } if(servtime < ttime) { ttime -= servtime; if(!customers.empty()) { current = customers.front(); customers.pop(); // Remove it busy = true; run(true); // Recurse } return; } if(servtime == ttime) { // Done with current, set to empty: current = Customer(0); busy = false; return; // No more time in this slice } } }; // Inherit to access protected implementation: class CustomerQ : public queue<Customer> { public: friend ostream& operator<<(ostream& os, const CustomerQ& cd) { copy(cd.c.begin(), cd.c.end(), ostream_iterator<Customer>(os, "")); return os; } }; int main() { CustomerQ customers; list<Teller> tellers; typedef list<Teller>::iterator TellIt; tellers.push_back(Teller(customers)); srand(time(0)); // Seed random number generator clock_t ticks = clock(); // Run simulation for at least 5 seconds: while(clock() < ticks + 5 * CLK_TCK) { // Add a random number of customers to the // queue, with random service times: for(int i = 0; i < rand() % 5; i++) customers.push(Customer(rand() % 15 + 1)); cout << '{' << tellers.size() << '}' << customers << endl; // Have the tellers service the queue: for(TellIt i = tellers.begin(); i != tellers.end(); i++) (*i).run(); cout << '{' << tellers.size() << '}' << customers << endl; // If line is too long, add another teller: if(customers.size() / tellers.size() > 2) tellers.push_back(Teller(customers)); // If line is short enough, remove a teller: if(tellers.size() > 1 && customers.size() / tellers.size() < 2) for(TellIt i = tellers.begin(); i != tellers.end(); i++) if(!(*i).isBusy()) { tellers.erase(i); break; // Out of for loop } } } ///:~
Each customer requires a certain amount
of service time, which is the number of time units that a teller must spend on
the customer in order to serve that customer’s needs. Of course, the
amount of service time will be different for each customer, and will be
determined randomly. In addition, you won’t know how many customers will
be arriving in each interval, so this will also be determined randomly.
The Customer objects are kept in a
queue<Customer>, and each Teller object keeps a reference to
that queue. When a Teller object is finished with its current
Customer object, that Teller will get another Customer from
the queue and begin working on the new Customer, reducing the
Customer’s service time during each time slice that the
Teller is allotted. All this logic is in the run( ) member
function, which is basically a three-way if statement based on whether
the amount of time necessary to serve the customer is less than, greater than or
equal to the amount of time left in the teller’s current time slice.
Notice that if the Teller has more time after finishing with a
Customer, it gets a new customer and recurses into
itself.
Just as with a stack, when you use
a queue, it’s only a queue and doesn’t have any of the
other functionality of the basic sequence containers. This includes the ability
to get an iterator in order to step through the stack. However, the
underlying sequence container (that the queue is built upon) is held as a
protected member inside the queue, and the identifier for this
member is specified in the C++ Standard as ‘c’, which means
that you can inherit from queue in order to access the underlying
implementation. The CustomerQ class does exactly that, for the sole
purpose of defining an ostream operator<< that can iterate
through the queue and print out its members.
The driver for the simulation is the
while loop in main( ), which uses processor ticks (defined in
<ctime>) to determine if the simulation has run for at least 5
seconds. At the beginning of each pass through the loop, a random number of
customers are added, with random service times. Both the number of tellers and
the queue contents are displayed so you can see the state of the system. After
running each teller, the display is repeated. At this point, the system adapts
by comparing the number of customers and the number of tellers; if the line is
too long another teller is added and if it is short enough a teller can be
removed. It is in this adaptation section of the program that you can experiment
with policies regarding the optimal addition and removal of tellers. If this is
the only section that you’re modifying, you may want to encapsulate
policies inside of different
objects.
When you push( ) an object
onto a priority_queue, that object is sorted into the queue according to
a function or function object (you can allow the default less template to
supply this, or provide one of your own). The priority_queue ensures that
when you look at the top( ) element, it will be the one with the
highest priority. When you’re done with it, you call pop( ) to
remove it and bring the next one into place. Thus, the priority_queue has
nearly the same interface as a stack, but it behaves
differently.
Like stack and queue,
priority_queue is an adapter which is built on top of one of the basic
sequences – the default is vector.
It’s trivial to make a
priority_queue that works with ints:
//: C07:PriorityQueue1.cpp //{L} ../TestSuite/Test #include <iostream> #include <queue> #include <cstdlib> #include <ctime> using namespace std; int main() { priority_queue<int> pqi; srand(time(0)); // Seed random number generator for(int i = 0; i < 100; i++) pqi.push(rand() % 25); while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
This pushes into the priority_queue
100 random values from 0 to 24. When you run this program you’ll see
that duplicates are allowed, and the highest values appear first. To show how
you can change the ordering by providing your own function or function object,
the following program gives lower-valued numbers the highest
priority:
//: C07:PriorityQueue2.cpp // Changing the priority //{L} ../TestSuite/Test #include <iostream> #include <queue> #include <cstdlib> #include <ctime> using namespace std; struct Reverse { bool operator()(int x, int y) { return y < x; } }; int main() { priority_queue<int, vector<int>, Reverse> pqi; // Could also say: // priority_queue<int, vector<int>, // greater<int> > pqi; srand(time(0)); for(int i = 0; i < 100; i++) pqi.push(rand() % 25); while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
Although you can easily use the Standard
Library greater template to produce the predicate, I went to the trouble
of creating Reverse so you could see how to do it in case you have a more
complex scheme for ordering your objects.
If you look at the description for
priority_queue, you see that the constructor can be handed a
“Compare” object, as shown above. If you don’t use your own
“Compare” object, the default template behavior is the less
template function. You might think (as I did) that it would make sense to
leave the template instantiation as priority_queue<int>, thus using
the default template arguments of vector<int> and
less<int>. Then you could inherit a new class from
less<int>, redefine operator( ) and hand an object of
that type to the priority_queue constructor. I tried this, and got it to
compile, but the resulting program produced the same old less<int>
behavior. The answer lies in the less< > template:
template <class T> struct less : binary_function<T, T, bool> { // Other stuff... bool operator()(const T& x, const T& y) const { return x < y; } };
The operator( ) is not
virtual, so even though the constructor takes your subclass of
less<int> by reference (thus it doesn’t slice it down to a
plain less<int>), when operator( ) is called, it is the
base-class version that is used. While it is generally reasonable to expect
ordinary classes to behave polymorphically, you cannot make this assumption when
using the STL.
Of course, a priority_queue of
int is trivial. A more interesting problem is a to-do list, where each
object contains a string and a primary and secondary priority
value:
//: C07:PriorityQueue3.cpp // A more complex use of priority_queue //{L} ../TestSuite/Test #include <iostream> #include <queue> #include <string> using namespace std; class ToDoItem { char primary; int secondary; string item; public: ToDoItem(string td, char pri ='A', int sec =1) : item(td), primary(pri), secondary(sec) {} friend bool operator<( const ToDoItem& x, const ToDoItem& y) { if(x.primary > y.primary) return true; if(x.primary == y.primary) if(x.secondary > y.secondary) return true; return false; } friend ostream& operator<<(ostream& os, const ToDoItem& td) { return os << td.primary << td.secondary << ": " << td.item; } }; int main() { priority_queue<ToDoItem> toDoList; toDoList.push(ToDoItem("Empty trash", 'C', 4)); toDoList.push(ToDoItem("Feed dog", 'A', 2)); toDoList.push(ToDoItem("Feed bird", 'B', 7)); toDoList.push(ToDoItem("Mow lawn", 'C', 3)); toDoList.push(ToDoItem("Water lawn", 'A', 1)); toDoList.push(ToDoItem("Feed cat", 'B', 1)); while(!toDoList.empty()) { cout << toDoList.top() << endl; toDoList.pop(); } } ///:~
ToDoItem’s
operator< must be a non-member function for it to work with
less< >. Other than that, everything happens automatically. The
output is:
A1: Water lawn A2: Feed dog B1: Feed cat B7: Feed bird C3: Mow lawn C4: Empty trash
Note that you cannot iterate through a
priority_queue. However, it is possible to emulate the behavior of a
priority_queue using a vector, thus allowing you access to that
vector. You can do this by looking at the implementation of
priority_queue, which uses make_heap( ),
push_heap( ) and pop_heap( ) (they are the soul of the
priority_queue; in fact you could say that the heap is the
priority queue and priority_queue is just a wrapper around it). This
turns out to be reasonably straightforward, but you might think that a shortcut
is possible. Since the container used by priority_queue is
protected (and has the identifier, according to the Standard C++
specification, named c) you can inherit a new class which provides access
to the underlying implementation:
//: C07:PriorityQueue4.cpp // Manipulating the underlying implementation //{L} ../TestSuite/Test #include <iostream> #include <queue> #include <cstdlib> #include <ctime> using namespace std; class PQI : public priority_queue<int> { public: vector<int>& impl() { return c; } }; int main() { PQI pqi; srand(time(0)); for(int i = 0; i < 100; i++) pqi.push(rand() % 25); copy(pqi.impl().begin(), pqi.impl().end(), ostream_iterator<int>(cout, " ")); cout << endl; while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
However, if you run this program
you’ll discover that the vector doesn’t contain the items in
the descending order that you get when you call pop( ), the order
that you want from the priority queue. It would seem that if you want to create
a vector that is a priority queue, you have to do it by hand, like
this:
//: C07:PriorityQueue5.cpp // Building your own priority queue //{L} ../TestSuite/Test #include <iostream> #include <queue> #include <cstdlib> #include <ctime> using namespace std; template<class T, class Compare> class PQV : public vector<T> { Compare comp; public: PQV(Compare cmp = Compare()) : comp(cmp) { make_heap(begin(), end(), comp); } const T& top() { return front(); } void push(const T& x) { push_back(x); push_heap(begin(), end(), comp); } void pop() { pop_heap(begin(), end(), comp); pop_back(); } }; int main() { PQV<int, less<int> > pqi; srand(time(0)); for(int i = 0; i < 100; i++) pqi.push(rand() % 25); copy(pqi.begin(), pqi.end(), ostream_iterator<int>(cout, " ")); cout << endl; while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
But this program behaves in the same way
as the previous one! What you are seeing in the underlying vector is
called a heap. This heap represents the tree of the priority queue
(stored in the linear structure of the vector), but when you iterate
through it you do not get a linear priority-queue order. You might think that
you can simply call sort_heap( ), but that only works once, and then
you don’t have a heap anymore, but instead a sorted list. This means that
to go back to using it as a heap the user must remember to call
make_heap( ) first. This can be encapsulated into your custom
priority queue:
//: C07:PriorityQueue6.cpp //{L} ../TestSuite/Test #include <iostream> #include <queue> #include <algorithm> #include <cstdlib> #include <ctime> using namespace std; template<class T, class Compare> class PQV : public vector<T> { Compare comp; bool sorted; void assureHeap() { if(sorted) { // Turn it back into a heap: make_heap(begin(), end(), comp); sorted = false; } } public: PQV(Compare cmp = Compare()) : comp(cmp) { make_heap(begin(), end(), comp); sorted = false; } const T& top() { assureHeap(); return front(); } void push(const T& x) { assureHeap(); // Put it at the end: push_back(x); // Re-adjust the heap: push_heap(begin(), end(), comp); } void pop() { assureHeap(); // Move the top element to the last position: pop_heap(begin(), end(), comp); // Remove that element: pop_back(); } void sort() { if(!sorted) { sort_heap(begin(), end(), comp); reverse(begin(), end()); sorted = true; } } }; int main() { PQV<int, less<int> > pqi; srand(time(0)); for(int i = 0; i < 100; i++) { pqi.push(rand() % 25); copy(pqi.begin(), pqi.end(), ostream_iterator<int>(cout, " ")); cout << "\n-----\n"; } pqi.sort(); copy(pqi.begin(), pqi.end(), ostream_iterator<int>(cout, " ")); cout << "\n-----\n"; while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
If sorted is true, then the
vector is not organized as a heap, but instead as a sorted sequence.
assureHeap( ) guarantees that it’s put back into heap form
before performing any heap operations on it.
The first for loop in
main( ) now has the additional quality that it displays the heap as
it’s being built.
The only drawback to this solution is
that the user must remember to call sort( ) before viewing it as a
sorted sequence (although one could conceivably override all the methods that
produce iterators so that they guarantee sorting). Another solution is to build
a priority queue that is not a vector, but will build you a vector
whenever you want one:
//: C07:PriorityQueue7.cpp // A priority queue that will hand you a vector //{L} ../TestSuite/Test #include <iostream> #include <queue> #include <algorithm> #include <cstdlib> #include <ctime> using namespace std; template<class T, class Compare> class PQV { vector<T> v; Compare comp; public: // Don't need to call make_heap(); it's empty: PQV(Compare cmp = Compare()) : comp(cmp) {} void push(const T& x) { // Put it at the end: v.push_back(x); // Re-adjust the heap: push_heap(v.begin(), v.end(), comp); } void pop() { // Move the top element to the last position: pop_heap(v.begin(), v.end(), comp); // Remove that element: v.pop_back(); } const T& top() { return v.front(); } bool empty() const { return v.empty(); } int size() const { return v.size(); } typedef vector<T> TVec; TVec vector() { TVec r(v.begin(), v.end()); // It’s already a heap sort_heap(r.begin(), r.end(), comp); // Put it into priority-queue order: reverse(r.begin(), r.end()); return r; } }; int main() { PQV<int, less<int> > pqi; srand(time(0)); for(int i = 0; i < 100; i++) pqi.push(rand() % 25); const vector<int>& v = pqi.vector(); copy(v.begin(), v.end(), ostream_iterator<int>(cout, " ")); cout << "\n-----------\n"; while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
PQV follows the same form as the
STL’s priority_queue, but has the additional member
vector( ), which creates a new vector that’s a copy of
the one in PQV (which means that it’s already a heap), then sorts
it (thus it leave’s PQV’s vector untouched), then
reverses the order so that traversing the new vector produces the same
effect as popping the elements from the priority queue.
You may observe that the approach of
inheriting from priority_queue used in PriorityQueue4.cpp could be
used with the above technique to produce more succinct code:
//: C07:PriorityQueue8.cpp // A more compact version of PriorityQueue7.cpp //{L} ../TestSuite/Test #include <iostream> #include <queue> #include <algorithm> #include <cstdlib> #include <ctime> using namespace std; template<class T> class PQV : public priority_queue<T> { public: typedef vector<T> TVec; TVec vector() { TVec r(c.begin(), c.end()); // c is already a heap sort_heap(r.begin(), r.end(), comp); // Put it into priority-queue order: reverse(r.begin(), r.end()); return r; } }; int main() { PQV<int> pqi; srand(time(0)); for(int i = 0; i < 100; i++) pqi.push(rand() % 25); const vector<int>& v = pqi.vector(); copy(v.begin(), v.end(), ostream_iterator<int>(cout, " ")); cout << "\n-----------\n"; while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
The brevity of this solution makes it the
simplest and most desirable, plus it’s guaranteed that the user will not
have a vector in the unsorted state. The only potential problem is that
the vector( ) member function returns the vector<T> by
value, which might cause some overhead issues with complex values of the
parameter type T.
Most of my computer education was in
hardware-level design and programming, and I spent my first few years doing
embedded systems development. Because C was a language that purported to be
“close to the hardware,” I have always found it dismaying that there
was no native binary representation for numbers. Decimal, of course, and
hexadecimal (tolerable only because it’s easier to group the bits in your
mind), but octal? Ugh. Whenever you read specs for chips you’re trying to
program, they don’t describe the chip registers in octal, or even
hexadecimal – they use binary. And yet C won’t let you say
0b0101101, which is the obvious solution for a language close to the
hardware.
Although there’s still no native
binary representation in C++, things have improved with the addition of two
classes: bitset and vector<bool>, both of which are designed
to manipulate a group of on-off values. The primary differences between these
types are:
There is no
trivial conversion between a bitset and a vector<bool>,
which implies that the two are for very different
purposes.
The template for bitset accepts an
integral template argument which is the number of bits to represent. Thus,
bitset<10> is a different type than bitset<20>, and
you cannot perform comparisons, assignments, etc. between the
two.
A bitset provides virtually any
bit operation that you could ask for, in a very efficient form. However, each
bitset is made up of an integral number of longs (typically 32
bits), so even though it uses no more space than it needs, it always uses at
least the size of a long. This means you’ll use space most
efficiently if you increase the size of your bitsets in chunks of the
number of bits in a long. In addition, the only conversion from a
bitset to a numerical value is to an unsigned long, which means
that 32 bits (if your long is the typical size) is the most flexible form
of a bitset.
The following example tests almost all
the functionality of the bitset (the missing operations are redundant or
trivial). You’ll see the description of each of the bitset outputs
to the right of the output so that the bits all line up and you can compare them
to the source values. If you still don’t understand bitwise operations,
running this program should help.
//: C07:BitSet.cpp // Exercising the bitset class //{L} ../TestSuite/Test //{-bor} //{-g++295} //{-g++3} #include <iostream> #include <bitset> #include <cstdlib> #include <ctime> #include <climits> #include <string> using namespace std; const int sz = 32; typedef bitset<sz> BS; template<int bits> bitset<bits> randBitset() { bitset<bits> r(rand()); for(int i = 0; i < bits/16 - 1; i++) { r <<= 16; // "OR" together with a new lower 16 bits: r |= bitset<bits>(rand()); } return r; } int main() { srand(time(0)); cout << "sizeof(bitset<16>) = " << sizeof(bitset<16>) << endl; cout << "sizeof(bitset<32>) = " << sizeof(bitset<32>) << endl; cout << "sizeof(bitset<48>) = " << sizeof(bitset<48>) << endl; cout << "sizeof(bitset<64>) = " << sizeof(bitset<64>) << endl; cout << "sizeof(bitset<65>) = " << sizeof(bitset<65>) << endl; BS a(randBitset<sz>()), b(randBitset<sz>()); // Converting from a bitset: unsigned long ul = a.to_ulong(); string s = b.to_string(); // Converting a string to a bitset: char* cbits = "111011010110111"; cout << "char* cbits = " << cbits <<endl; cout << BS(cbits) << " [BS(cbits)]" << endl; cout << BS(cbits, 2) << " [BS(cbits, 2)]" << endl; cout << BS(cbits, 2, 11) << " [BS(cbits, 2, 11)]" << endl; cout << a << " [a]" << endl; cout << b << " [b]"<< endl; // Bitwise AND: cout << (a & b) << " [a & b]" << endl; cout << (BS(a) &= b) << " [a &= b]" << endl; // Bitwise OR: cout << (a | b) << " [a | b]" << endl; cout << (BS(a) |= b) << " [a |= b]" << endl; // Exclusive OR: cout << (a ^ b) << " [a ^ b]" << endl; cout << (BS(a) ^= b) << " [a ^= b]" << endl; cout << a << " [a]" << endl; // For reference // Logical left shift (fill with zeros): cout << (BS(a) <<= sz/2) << " [a <<= (sz/2)]" << endl; cout << (a << sz/2) << endl; cout << a << " [a]" << endl; // For reference // Logical right shift (fill with zeros): cout << (BS(a) >>= sz/2) << " [a >>= (sz/2)]" << endl; cout << (a >> sz/2) << endl; cout << a << " [a]" << endl; // For reference cout << BS(a).set() << " [a.set()]" << endl; for(int i = 0; i < sz; i++) if(!a.test(i)) { cout << BS(a).set(i) << " [a.set(" << i <<")]" << endl; break; // Just do one example of this } cout << BS(a).reset() << " [a.reset()]"<< endl; for(int j = 0; j < sz; j++) if(a.test(j)) { cout << BS(a).reset(j) << " [a.reset(" << j <<")]" << endl; break; // Just do one example of this } cout << BS(a).flip() << " [a.flip()]" << endl; cout << ~a << " [~a]" << endl; cout << a << " [a]" << endl; // For reference cout << BS(a).flip(1) << " [a.flip(1)]"<< endl; BS c; cout << c << " [c]" << endl; cout << "c.count() = " << c.count() << endl; cout << "c.any() = " << (c.any() ? "true" : "false") << endl; cout << "c.none() = " << (c.none() ? "true" : "false") << endl; c[1].flip(); c[2].flip(); cout << c << " [c]" << endl; cout << "c.count() = " << c.count() << endl; cout << "c.any() = " << (c.any() ? "true" : "false") << endl; cout << "c.none() = " << (c.none() ? "true" : "false") << endl; // Array indexing operations: c.reset(); for(int k = 0; k < c.size(); k++) if(k % 2 == 0) c[k].flip(); cout << c << " [c]" << endl; c.reset(); // Assignment to bool: for(int ii = 0; ii < c.size(); ii++) c[ii] = (rand() % 100) < 25; cout << c << " [c]" << endl; // bool test: if(c[1] == true) cout << "c[1] == true"; else cout << "c[1] == false" << endl; } ///:~
To generate interesting random
bitsets, the randBitset( ) function is created. The Standard
C rand( ) function only generates an int, so this function
demonstrates operator<<= by shifting each 16 random bits to the
left until the bitset (which is templatized in this function for size) is
full. The generated number and each new 16 bits is combined using the
operator|=.
The first thing demonstrated in
main( ) is the unit size of a bitset. If it is less than 32
bits, sizeof produces 4 (4 bytes = 32 bits), which is the size of a
single long on most implementations. If it’s between 32 and 64, it
requires two longs, greater than 64 requires 3 longs, etc. Thus
you make the best use of space if you use a bit quantity that fits in an
integral number of longs. However, notice there’s no extra overhead
for the object – it’s as if you were hand-coding to use a
long.
Another clue that bitset is
optimized for longs is that there is a to_ulong( ) member
function that produces the value of the bitset as an unsigned long. There
are no other numerical conversions from bitset, but there is a
to_string( ) conversion that produces a string containing
ones and zeros, and this can be as long as the actual bitset. However,
using bitset<32> may make your life simpler because of
to_ulong( ).
There’s still no primitive format
for binary values, but the next best thing is supported by bitset: a
string of ones and zeros with the least-significant bit (lsb) on the
right. The three constructors demonstrated show taking the entire string
(the char array is automatically converted to a string), the
string starting at character 2, and the string from character 2 through
11. You can write to an ostream from a bitset using
operator<< and it comes out as ones and zeros. You can also read
from an istream using operator>> (not shown
here).
You’ll notice that bitset
only has three non-member operators: and (&), or
(|) and exclusive-or (^). Each of these create a new
bitset as their return value. All of the member operators opt for the
more efficient &=, |=, etc. form where a temporary is not
created. However, these forms actually change their lvalue (which is a in
most of the tests in the above example). To prevent this, I created a temporary
to be used as the lvalue by invoking the copy-constructor on a; this is
why you see the form BS(a). The result of each test is printed out, and
occasionally a is reprinted so you can easily look at it for
reference.
The rest of the example should be
self-explanatory when you run it; if not you can find the details in your
compiler’s documentation or the other documentation mentioned earlier in
this chapter.
vector<bool> is a
specialization of the vector template. A normal bool variable
requires at least one byte, but since a bool only has two states the
ideal implementation of vector<bool> is such that each bool
value only requires one bit. This means the iterator must be specially-defined,
and cannot be a bool*.
The bit-manipulation functions for
vector<bool> are much more limited than those of bitset. The
only member function that was added to those already in vector is
flip( ), to invert all the bits; there is no set( ) or
reset( ) as in bitset. When you use operator[ ], you
get back an object of type vector<bool>::reference, which also has
a flip( ) to invert that individual bit.
//: C07:VectorOfBool.cpp // Demonstrate the vector<bool> specialization //{L} ../TestSuite/Test //{-msc} //{-g++295} #include <iostream> #include <sstream> #include <vector> #include <bitset> #include <iterator> using namespace std; int main() { vector<bool> vb(10, true); vector<bool>::iterator it; for(it = vb.begin(); it != vb.end(); it++) cout << *it; cout << endl; vb.push_back(false); ostream_iterator<bool> out(cout, ""); copy(vb.begin(), vb.end(), out); cout << endl; bool ab[] = { true, false, false, true, true, true, true, false, false, true }; // There's a similar constructor: vb.assign(ab, ab + sizeof(ab)/sizeof(bool)); copy(vb.begin(), vb.end(), out); cout << endl; vb.flip(); // Flip all bits copy(vb.begin(), vb.end(), out); cout << endl; for(int i = 0; i < vb.size(); i++) vb[i] = 0; // (Equivalent to "false") vb[4] = true; vb[5] = 1; vb[7].flip(); // Invert one bit copy(vb.begin(), vb.end(), out); cout << endl; // Convert to a bitset: ostringstream os; copy(vb.begin(), vb.end(), ostream_iterator<bool>(os, "")); bitset<10> bs(os.str()); cout << "Bitset:\n" << bs << endl; } ///:~
The last part of this example takes a
vector<bool> and converts it to a bitset by first turning it
into a string of ones and zeros. Of course, you must know the size of the
bitset at compile-time. You can see that this conversion is not the kind
of operation you’ll want to do on a regular
basis.
The set, map,
multiset and multimap are called associative containers
because they associate keys with values. Well, at least
maps and multimaps associate keys to values, but you can look at a
set as a map that has no values, only keys (and they can in fact
be implemented this way), and the same for the relationship between
multiset and multimap. So, because of the structural similarity
sets and multisets are lumped in with associative
containers.
The most important basic operations with
associative containers are putting things in, and in the case of a set,
seeing if something is in the set. In the case of a map, you want to
first see if a key is in the map, and if it exists you want the
associated value for that key to be returned. Of course, there are many
variations on this theme but that’s the fundamental concept. The following
example shows these basics:
//: C07:AssociativeBasics.cpp // Basic operations with sets and maps //{L} ../TestSuite/Test #include "Noisy.h" #include <iostream> #include <set> #include <map> using namespace std; int main() { Noisy na[] = { Noisy(), Noisy(), Noisy(), Noisy(), Noisy(), Noisy(), Noisy() }; // Add elements via constructor: set<Noisy> ns(na, na+ sizeof na/sizeof(Noisy)); // Ordinary insertion: Noisy n; ns.insert(n); cout << endl; // Check for set membership: cout << "ns.count(n)= " << ns.count(n) << endl; if(ns.find(n) != ns.end()) cout << "n(" << n << ") found in ns" << endl; // Print elements: copy(ns.begin(), ns.end(), ostream_iterator<Noisy>(cout, " ")); cout << endl; cout << "\n-----------\n"; map<int, Noisy> nm; for(int i = 0; i < 10; i++) nm[i]; // Automatically makes pairs cout << "\n-----------\n"; for(int j = 0; j < nm.size(); j++) cout << "nm[" << j <<"] = " << nm[j] << endl; cout << "\n-----------\n"; nm[10] = n; cout << "\n-----------\n"; nm.insert(make_pair(47, n)); cout << "\n-----------\n"; cout << "\n nm.count(10)= " << nm.count(10) << endl; cout << "nm.count(11)= " << nm.count(11) << endl; map<int, Noisy>::iterator it = nm.find(6); if(it != nm.end()) cout << "value:" << (*it).second << " found in nm at location 6" << endl; for(it = nm.begin(); it != nm.end(); it++) cout << (*it).first << ":" << (*it).second << ", "; cout << "\n-----------\n"; } ///:~
The set<Noisy> object
ns is created using two iterators into an array of Noisy objects,
but there is also a default constructor and a copy-constructor, and you can pass
in an object that provides an alternate scheme for doing comparisons. Both
sets and maps have an insert( ) member function to put
things in, and there are a couple of different ways to check to see if an object
is already in an associative container: count( ), when given a key,
will tell you how many times that key occurs (this can only be zero or one in a
set or map, but it can be more than one with a multiset or
multimap). The find( ) member function will produce an
iterator indicating the first occurrence (with set and map, the
only occurrence) of the key that you give it, or the past-the-end
iterator if it can’t find the key. The count( ) and
find( ) member functions exist for all the associative containers,
which makes sense. The associative containers also have member functions
lower_bound( ), upper_bound( ) and
equal_range( ), which actually only make sense for multiset
and multimap, as you shall see (but don’t try to figure out how
they would be useful for set and map, since they are designed for
dealing with a range of duplicate keys, which those containers don’t
allow).
Designing an operator[ ] always
produces a little bit of a dilemma because it’s intended to be treated as
an array-indexing operation, so people don’t tend to think about
performing a test before they use it. But what happens if you decide to index
out of the bounds of the array? One option, of course, is to throw an exception,
but with a map “indexing out of the array” could mean that
you want an entry there, and that’s the way the STL map treats it.
The first for loop after the creation of the map<int, Noisy>
nm just “looks up” objects using the operator[ ], but
this is actually creating new Noisy objects! The map creates a new
key-value pair (using the default constructor for the value) if you look up a
value with operator[ ] and it isn’t there. This means that if you
really just want to look something up and not create a new entry, you must use
count( ) (to see if it’s there) or find( ) (to get
an iterator to it).
The for loop that prints out the
values of the container using operator[ ] has a number of problems.
First, it requires integral keys (which we happen to have in this case). Next
and worse, if all the keys are not sequential, you’ll end up counting from
0 to the size of the container, and if there are some spots which don’t
have key-value pairs you’ll automatically create them, and miss some of
the higher values of the keys. Finally, if you look at the output from the
for loop you’ll see that things are very busy, and
it’s quite puzzling at first why there are so many constructions and
destructions for what appears to be a simple lookup. The answer only becomes
clear when you look at the code in the map template for operator[
], which will be something like this:
mapped_type& operator[] (const key_type& k) { value_type tmp(k,T()); return (*((insert(tmp)).first)).second; }
Following the trail, you’ll find
that map::value_type is:
typedef pair<const Key, T> value_type;
Now you need to know what a pair
is, which can be found in <utility>:
template <class T1, class T2> struct pair { typedef T1 first_type; typedef T2 second_type; T1 first; T2 second; pair(); pair(const T1& x, const T2& y) : first(x), second(y) {} // Templatized copy-constructor: template<class U, class V> pair(const pair<U, V> &p); };
It turns out this is a very important
(albeit simple) struct which is used quite a bit in the STL. All it
really does it package together two objects, but it’s very useful,
especially when you want to return two objects from a function (since a
return statement only takes one object). There’s even a shorthand
for creating a pair called make_pair( ), which is used in
AssociativeBasics.cpp.
So to retrace the steps,
map::value_type is a pair of the key and the value of the map
– actually, it’s a single entry for the map. But notice that
pair packages its objects by value, which means that copy-constructions
are necessary to get the objects into the pair. Thus, the creation of
tmp in map::operator[ ] will involve at least a copy-constructor
call and destructor call for each object in the pair. Here, we’re
getting off easy because the key is an int. But if you want to really see
what kind of activity can result from map::operator[ ], try running
this:
//: C07:NoisyMap.cpp // Mapping Noisy to Noisy //{L} ../TestSuite/Test #include "Noisy.h" #include <map> using namespace std; int main() { map<Noisy, Noisy> mnn; Noisy n1, n2; cout << "\n--------\n"; mnn[n1] = n2; cout << "\n--------\n"; cout << mnn[n1] << endl; cout << "\n--------\n"; } ///:~
You’ll see that both the insertion
and lookup generate a lot of extra objects, and that’s because of the
creation of the tmp object. If you look back up at map::operator[
] you’ll see that the second line calls insert( ) passing
it tmp – that is, operator[ ] does an insertion every time.
The return value of insert( ) is a different kind of pair,
where first is an iterator pointing to the key-value pair that was
just inserted, and second is a bool indicating whether the
insertion took place. You can see that operator[ ] grabs first
(the iterator), dereferences it to produce the pair, and then returns the
second which is the value at that location.
So on the upside, map has this
fancy “make a new entry if one isn’t there” behavior, but the
downside is that you always get a lot of extra object creations and
destructions when you use map::operator[ ]. Fortunately,
AssociativeBasics.cpp also demonstrates how to reduce the overhead of
insertions and deletions, by not using operator[ ] if you don’t
have to. The insert( ) member function is slightly more efficient
than operator[ ]. With a set you only hold one object, but with a
map you hold key-value pairs, so insert( ) requires a
pair as its argument. Here’s where make_pair( ) comes
in handy, as you can see.
For looking objects up in a map,
you can use count( ) to see whether a key is in the map, or you can
use find( ) to produce an iterator pointing directly at the
key-value pair. Again, since the map contains pairs that’s
what the iterator produces when you dereference it, so you have to select
first and second. When you run AssociativeBasics.cpp
you’ll notice that the iterator approach involves no extra object
creations or destructions at all. It’s not as easy to write or read,
though.
If you use a map with large,
complex objects and discover there’s too much overhead when doing lookups
and insertions (don’t assume this from the beginning – take the easy
approach first and use a profiler to discover bottlenecks), then you can use the
counted-handle approach shown in Chapter XX so that you are only passing around
small, lightweight objects.
Of course, you can also iterate through a
set or map and operate on each of its objects. This will be
demonstrated in later examples.
You’ve seen how useful the
fill( ), fill_n( ), generate( ) and
generate_n( ) function templates in <algorithm> have
been for filling the sequential containers (vector, list and
deque) with data. However, these are implemented by using operator=
to assign values into the sequential containers, and the way that you add
objects to associative containers is with their respective insert( )
member functions. Thus the default “assignment” behavior causes a
problem when trying to use the “fill” and “generate”
functions with associative containers.
One solution is to duplicate the
“fill” and “generate” functions, creating new ones that
can be used with associative containers. It turns out that only the
fill_n( ) and generate_n( ) functions can be duplicated
(fill( ) and generate( ) copy in between two iterators,
which doesn’t make sense with associative containers), but the job is
fairly easy, since you have the <algorithm> header file to work
from (and since it contains templates, all the source code is
there):
//: C07:assocGen.h // The fill_n() and generate_n() equivalents // for associative containers. #ifndef ASSOCGEN_H #define ASSOCGEN_H template<class Assoc, class Count, class T> void assocFill_n(Assoc& a, Count n, const T& val) { while(n-- > 0) a.insert(val); } template<class Assoc, class Count, class Gen> void assocGen_n(Assoc& a, Count n, Gen g) { while(n-- > 0) a.insert(g()); } #endif // ASSOCGEN_H ///:~
You can see that instead of using
iterators, the container class itself is passed (by reference, of course, since
you wouldn’t want to make a local copy, fill it, and then have it
discarded at the end of the scope).
This code demonstrates two valuable
lessons. The first lesson is that if the algorithms don’t do what you
want, copy the nearest thing and modify it. You have the example at hand in the
STL header, so most of the work has already been done.
The second lesson is more pointed: if you
look long enough, there’s probably a way to do it in the STL
without inventing anything new. The present problem can instead be solved
by using an insert_iterator (produced by a call to
inserter( )), which calls insert( ) to place items in
the container instead of operator=. This is not simply a variation
of front_insert_iterator (produced by a call to
front_inserter( )) or back_insert_iterator (produced by a
call to back_inserter( )), since those iterators use
push_front( ) and push_back( ), respectively. Each of
the insert iterators is different by virtue of the member function it uses for
insertion, and insert( ) is the one we need. Here’s a
demonstration that shows filling and generating both a map and a
set (of course, it can also be used with multimap and
multiset). First, some templatized, simple generators are created (this
may seem like overkill, but you never know when you’ll need them; for that
reason they’re placed in a header file):
//: C07:SimpleGenerators.h // Generic generators, including // one that creates pairs #include <iostream> #include <utility> // A generator that increments its value: template<typename T> class IncrGen { T i; public: IncrGen(T ii) : i (ii) {} T operator()() { return i++; } }; // A generator that produces an STL pair<>: template<typename T1, typename T2> class PairGen { T1 i; T2 j; public: PairGen(T1 ii, T2 jj) : i(ii), j(jj) {} std::pair<T1,T2> operator()() { return std::pair<T1,T2>(i++, j++); } }; // A generic global operator<< // for printing any STL pair<>: template<typename Pair> std::ostream& operator<<(std::ostream& os, const Pair& p) { return os << p.first << "\t" << p.second << std::endl; } ///:~
Both generators expect that T can
be incremented, and they simply use operator++ to generate new values
from whatever you used for initialization. PairGen creates an STL
pair object as its return value, and that’s what can be placed into
a map or multimap using insert( ).
The last function is a generalization of
operator<< for ostreams, so that any pair can be
printed, assuming each element of the pair supports a stream
operator<<. As you can see below, this allows the use of
copy( ) to output the map:
//: C07:AssocInserter.cpp // Using an insert_iterator so fill_n() and // generate_n() can be used with associative // containers //{L} ../TestSuite/Test //{-bor} //{-msc} //{-g++3} #include "SimpleGenerators.h" #include <iterator> #include <iostream> #include <algorithm> #include <set> #include <map> using namespace std; int main() { set<int> s; fill_n(inserter(s, s.begin()), 10, 47); generate_n(inserter(s, s.begin()), 10, IncrGen<int>(12)); copy(s.begin(), s.end(), ostream_iterator<int>(cout, "\n")); map<int, int> m; fill_n(inserter(m, m.begin()), 10, make_pair(90,120)); generate_n(inserter(m, m.begin()), 10, PairGen<int, int>(3, 9)); copy(m.begin(), m.end(), ostream_iterator<pair<int,int> >(cout,"\n")); } ///:~
The second argument to inserter is
an iterator, which actually isn’t used in the case of associative
containers since they maintain their order internally, rather than allowing you
to tell them where the element should be inserted. However, an
insert_iterator can be used with many different types of containers so
you must provide the iterator.
Note how the ostream_iterator is
created to output a pair; this wouldn’t have worked if the
operator<< hadn’t been created, and since it’s a
template it is automatically instantiated for pair<int,
int>.
An ordinary array uses an integral value
to index into a sequential set of elements of some type. A map is an
associative array, which means you associate one object with another in
an array-like fashion, but instead of selecting an array element with a number
as you do with an ordinary array, you look it up with an object! The example
which follows counts the words in a text file, so the index is the string
object representing the word, and the value being looked up is the object that
keeps count of the strings.
In a single-item container like a
vector or list, there’s only one thing being held. But in a
map, you’ve got two things: the key (what you look up by, as
in mapname[key]) and the value that results from the lookup with
the key. If you simply want to move through the entire map and list each
key-value pair, you use an iterator, which when dereferenced produces a
pair object containing both the key and the value. You access the members
of a pair by selecting first or second.
This same philosophy of packaging two
items together is also used to insert elements into the map, but the pair
is created as part of the instantiated map and is called
value_type, containing the key and the value. So one option for inserting
a new element is to create a value_type object, loading it with the
appropriate objects and then calling the insert( ) member function
for the map. Instead, the following example makes use of the
aforementioned special feature of map: if you’re trying to find an
object by passing in a key to operator[ ] and that object doesn’t
exist, operator[ ] will automatically insert a new key-value pair for
you, using the default constructor for the value object. With that in mind,
consider an implementation of a word counting program:
//: C07:WordCount.cpp //{L} StreamTokenizer ../TestSuite/Test //{-g++295} // Count occurrences of words using a map #include "StreamTokenizer.h" #include "../require.h" #include <string> #include <map> #include <iostream> #include <fstream> using namespace std; class Count { int i; public: Count() : i(0) {} void operator++(int) { i++; } // Post-increment int& val() { return i; } }; typedef map<string, Count> WordMap; typedef WordMap::iterator WMIter; int main(int argc, char* argv[]) { char* fname = "WordCount.cpp"; if(argc > 1) fname = argv[1]; ifstream in(fname); assure(in, fname); StreamTokenizer words(in); WordMap wordmap; string word; while((word = words.next()).size() != 0) wordmap[word]++; for(WMIter w = wordmap.begin(); w != wordmap.end(); w++) cout << (*w).first << ": " << (*w).second.val() << endl; } ///:~
The need for the Count class is to
contain an int that’s automatically initialized to zero. This is
necessary because of the crucial line:
wordmap[word]++;
This finds the word that has been
produced by StreamTokenizer and increments the Count object
associated with that word, which is fine as long as there is a key-value
pair for that string. If there isn’t, the map automatically
inserts a key for the word you’re looking up, and a Count object,
which is initialized to zero by the default constructor. Thus, when it’s
incremented the Count becomes 1.
Printing the entire list requires
traversing it with an iterator (there’s no copy( ) shortcut
for a map unless you want to write an operator<< for the
pair in the map). As previously mentioned, dereferencing this iterator
produces a pair object, with the first member the key and the
second member the value. In this case second is a Count
object, so its val( ) member must be called to produce the actual
word count.
If you want to find the count for a
particular word, you can use the array index operator, like
this:
cout << "the: " << wordmap["the"].val() << endl;
You can see that one of the great
advantages of the map is the clarity of the syntax; an associative array
makes intuitive sense to the reader (note, however, that if “the”
isn’t already in the wordmap a new entry will be
created!).
A problem that often comes up in
programming is the management of program arguments that you can specify on the
command line. Usually you’d like to have a set of defaults that can be
changed via the command line. The following tool expects the command line
arguments to be in the form flag1=value1 with no spaces around the
‘=‘ (so it will be treated as a single argument). The
ProgVal class simply inherits from map<string,
string>:
//: C07:ProgVals.h // Program values can be changed by command line #ifndef PROGVALS_H #define PROGVALS_H #include <map> #include <iostream> #include <string> class ProgVals : public std::map<std::string, std::string> { public: ProgVals(std::string defaults[][2], int sz); void parse(int argc, char* argv[], std::string usage, int offset = 1); void print(std::ostream& out = std::cout); }; #endif // PROGVALS_H ///:~
The constructor expects an array of
string pairs (as you’ll see, this allows you to initialize it with
an array of char*) and the size of that array. The parse( )
member function is handed the command-line arguments along with a
“usage” string to print if the command line is given incorrectly,
and the “offset” which tells it which command-line argument to start
with (so you can have non-flag arguments at the beginning of the command line).
Finally, print( ) displays the values. Here is the
implementation:
//: C07:ProgVals.cpp {O} #include "ProgVals.h" using namespace std; ProgVals::ProgVals( std::string defaults[][2], int sz) { for(int i = 0; i < sz; i++) insert(make_pair( defaults[i][0], defaults[i][1])); } void ProgVals::parse(int argc, char* argv[], string usage, int offset) { // Parse and apply additional // command-line arguments: for(int i = offset; i < argc; i++) { string flag(argv[i]); int equal = flag.find('='); if(equal == string::npos) { cerr << "Command line error: " << argv[i] << endl << usage << endl; continue; // Next argument } string name = flag.substr(0, equal); string value = flag.substr(equal + 1); if(find(name) == end()) { cerr << name << endl << usage << endl; continue; // Next argument } operator[](name) = value; } } void ProgVals::print(ostream& out) { out << "Program values:" << endl; for(iterator it = begin(); it != end(); it++) out << (*it).first << " = " << (*it).second << endl; } ///:~
The constructor uses the STL
make_pair( ) helper function to convert each pair of char*
into a pair object that can be inserted into the map. In
parse( ), each command-line argument is checked for the existence of
the telltale ‘=‘ sign (reporting an error if it isn’t
there), and then is broken into two strings, the name which appears
before the ‘=‘, and the value which appears after. The
operator[ ] is then used to change the existing value to the new
one.
Here’s an example to test the
tool:
//: C07:ProgValTest.cpp //{L} ProgVals #include "ProgVals.h" using namespace std; string defaults[][2] = { { "color", "red" }, { "size", "medium" }, { "shape", "rectangular" }, { "action", "hopping"}, }; const char* usage = "usage:\n" "ProgValTest [flag1=val1 flag2=val2 ...]\n" "(Note no space around '=')\n" "Where the flags can be any of: \n" "color, size, shape, action \n"; // So it can be used globally: ProgVals pvals(defaults, sizeof defaults / sizeof *defaults); class Animal { string color, size, shape, action; public: Animal(string col, string sz, string shp, string act) :color(col),size(sz),shape(shp),action(act){} // Default constructor uses program default // values, possibly change on command line: Animal() : color(pvals["color"]), size(pvals["size"]), shape(pvals["shape"]), action(pvals["action"]) {} void print() { cout << "color = " << color << endl << "size = " << size << endl << "shape = " << shape << endl << "action = " << action << endl; } // And of course pvals can be used anywhere // else you'd like. }; int main(int argc, char* argv[]) { // Initialize and parse command line values // before any code that uses pvals is called: pvals.parse(argc, argv, usage); pvals.print(); Animal a; cout << "Animal a values:" << endl; a.print(); } ///:~
This program can create Animal
objects with different characteristics, and those characteristics can be
established with the command line. The default characteristics are given in the
two-dimensional array of char* called defaults and, after the
usage string you can see a global instance of ProgVals called
pvals is created; this is important because it allows the rest of the
code in the program to access the values.
Note that Animal’s default
constructor uses the values in pvals inside its constructor initializer
list. When you run the program you can try creating different animal
characteristics.
Many command-line programs also use a
style of beginning a flag with a hyphen, and sometimes they use single-character
flags.
A multimap is a map that
can contain duplicate keys. At first this may seem like a strange idea, but it
can occur surprisingly often. A phone book, for example, can have many entries
with the same name.
Suppose you are monitoring wildlife, and
you want to keep track of where and when each type of animal is spotted. Thus,
you may see many animals of the same kind, all in different locations and at
different times. So if the type of animal is the key, you’ll need a
multimap. Here’s what it looks like:
//: C07:WildLifeMonitor.cpp //{L} ../TestSuite/Test //{-msc} #include <vector> #include <map> #include <string> #include <algorithm> #include <iostream> #include <sstream> #include <ctime> #include <cstdlib> using namespace std; class DataPoint { int x, y; // Location coordinates time_t time; // Time of Sighting public: DataPoint() : x(0), y(0), time(0) {} DataPoint(int xx, int yy, time_t tm) : x(xx), y(yy), time(tm) {} // Synthesized operator=, copy-constructor OK int getX() const { return x; } int getY() const { return y; } const time_t* getTime() const { return &time; } }; string animal[] = { "chipmunk", "beaver", "marmot", "weasel", "squirrel", "ptarmigan", "bear", "eagle", "hawk", "vole", "deer", "otter", "hummingbird", }; const int asz = sizeof animal/sizeof *animal; vector<string> animals(animal, animal + asz); // All the information is contained in a // "Sighting," which can be sent to an ostream: typedef pair<string, DataPoint> Sighting; ostream& operator<<(ostream& os, const Sighting& s) { return os << s.first << " sighted at x= " << s.second.getX() << ", y= " << s.second.getY() << ", time = " << ctime(s.second.getTime()); } // A generator for Sightings: class SightingGen { vector<string>& animals; enum { d = 100 }; public: SightingGen(vector<string>& an) : animals(an) { srand(time(0)); } Sighting operator()() { Sighting result; int select = rand() % animals.size(); result.first = animals[select]; result.second = DataPoint( rand() % d, rand() % d, time(0)); return result; } }; // Display a menu of animals, allow the user to // select one, return the index value: int menu() { cout << "select an animal or 'q' to quit: "; for(int i = 0; i < animals.size(); i++) cout <<'['<< i <<']'<< animals[i] << ' '; cout << endl; string reply; cin >> reply; if(reply.at(0) == 'q') return 0; istringstream r(reply); int i; r >> i; // Converts to int i %= animals.size(); return i; } typedef multimap<string, DataPoint> DataMap; typedef DataMap::iterator DMIter; int main() { DataMap sightings; generate_n( inserter(sightings, sightings.begin()), 50, SightingGen(animals)); // Print everything: copy(sightings.begin(), sightings.end(), ostream_iterator<Sighting>(cout, "")); // Print sightings for selected animal: for(int count = 1; count < 10; count++) { // Use menu to get selection: // int i = menu(); // Generate randomly (for automated testing): int i = rand() % animals.size(); // Iterators in "range" denote begin, one // past end of matching range: pair<DMIter, DMIter> range = sightings.equal_range(animals[i]); copy(range.first, range.second, ostream_iterator<Sighting>(cout, "")); } } ///:~
All the data about a sighting is
encapsulated into the class DataPoint, which is simple enough that it can
rely on the synthesized assignment and copy-constructor. It uses the Standard C
library time functions to record the time of the sighting.
In the array of string
animal, notice that the char* constructor is automatically used
during initialization, which makes initializing an array of string quite
convenient. Since it’s easier to use the animal names in a vector,
the length of the array is calculated and a vector<string> is
initialized using the vector(iterator, iterator)
constructor.
The key-value pairs that make up a
Sighting are the string which names the type of animal, and the
DataPoint that says where and when it was sighted. The standard
pair template combines these two types and is typedefed to produce the
Sighting type. Then an ostream operator<< is created
for Sighting; this will allow you to iterate through a map or multimap of
Sightings and print it out.
SightingGen generates random
sightings at random data points to use for testing. It has the usual
operator( ) necessary for a function object, but it also has a
constructor to capture and store a reference to a vector<string>,
which is where the aforementioned animal names are stored.
A DataMap is a multimap of
string-DataPoint pairs, which means it stores Sightings. It
is filled with 50 Sightings using generate_n( ), and printed
out (notice that because there is an operator<< that takes a
Sighting, an ostream_iterator can be created). At this point the
user is asked to select the animal that they want to see all the sightings for.
If you press ‘q’ the program will quit, but if you select an
animal number, then the equal_range( ) member function is invoked.
This returns an iterator (DMIter) to the beginning of the set of matching
pairs, and one indicating past-the-end of the set. Since only one object can be
returned from a function, equal_range( ) makes use of pair.
Since the range pair has the beginning and ending iterators of the
matching set, those iterators can be used in copy( ) to print out
all the sightings for a particular type of
animal.
You’ve seen the set, which
only allows one object of each value to be inserted. The multiset is odd
by comparison since it allows more than one object of each value to be inserted.
This seems to go against the whole idea of “setness,” where you can
ask “is ‘it’ in this set?” If there can be more than one
of ‘it’, then what does that question mean?
With some thought, you can see that it
makes no sense to have more than one object of the same value in a set if those
duplicate objects are exactly the same (with the possible exception of
counting occurrences of objects, but as seen earlier in this chapter that can be
handled in an alternative, more elegant fashion). Thus each duplicate object
will have something that makes it unique from the other duplicates – most
likely different state information that is not used in the calculation of the
value during the comparison. That is, to the comparison operation, the objects
look the same but they actually contain some differing internal
state.
Like any STL container that must order
its elements, the multiset template uses the less template by
default to determine element ordering. This uses the contained classes’
operator<, but you may of course substitute your own comparison
function.
Consider a simple class that contains one
element that is used in the comparison, and another that is
not:
//: C07:MultiSet1.cpp // Demonstration of multiset behavior //{L} ../TestSuite/Test //{-msc} #include <iostream> #include <set> #include <algorithm> #include <ctime> using namespace std; class X { char c; // Used in comparison int i; // Not used in comparison // Don't need default constructor and operator= X(); X& operator=(const X&); // Usually need a copy-constructor (but the // synthesized version works here) public: X(char cc, int ii) : c(cc), i(ii) {} // Notice no operator== is required friend bool operator<(const X& x, const X& y) { return x.c < y.c; } friend ostream& operator<<(ostream& os, X x) { return os << x.c << ":" << x.i; } }; class Xgen { static int i; // Number of characters to select from: enum { span = 6 }; public: Xgen() { srand(time(0)); } X operator()() { char c = 'A' + rand() % span; return X(c, i++); } }; int Xgen::i = 0; typedef multiset<X> Xmset; typedef Xmset::const_iterator Xmit; int main() { Xmset mset; // Fill it with X's: generate_n(inserter(mset, mset.begin()), 25, Xgen()); // Initialize a regular set from mset: set<X> unique(mset.begin(), mset.end()); copy(unique.begin(), unique.end(), ostream_iterator<X>(cout, " ")); cout << "\n----\n"; // Iterate over the unique values: for(set<X>::iterator i = unique.begin(); i != unique.end(); i++) { pair<Xmit, Xmit> p = mset.equal_range(*i); copy(p.first, p.second, ostream_iterator<X>(cout, " ")); cout << endl; } } ///:~
In X, all the comparisons are made
with the char c. The comparison is performed with operator<,
which is all that is necessary for the multiset, since in this example
the default less comparison object is used. The class Xgen is used
to randomly generate X objects, but the comparison value is restricted to
the span from ‘A’ to ‘E’. In
main( ), a multiset<X> is created and filled with 25
X objects using Xgen, guaranteeing that there will be duplicate
keys. So that we know what the unique values are, a regular set<X>
is created from the multiset (using the iterator, iterator
constructor). These values are displayed, then each one is used to produce the
equal_range( ) in the multiset (equal_range( )
has the same meaning here as it does with multimap: all the elements with
matching keys). Each set of matching keys is then printed.
As a second example, a (possibly) more
elegant version of WordCount.cpp can be created using
multiset:
//: C07:MultiSetWordCount.cpp //{L} StreamTokenizer ../TestSuite/Test //{-g++295} // Count occurrences of words using a multiset #include "StreamTokenizer.h" #include "../require.h" #include <string> #include <set> #include <fstream> #include <iterator> using namespace std; int main(int argc, char* argv[]) { char* fname = "MultiSetWordCount.cpp"; if(argc > 1) fname = argv[1]; ifstream in(fname); assure(in, fname); StreamTokenizer words(in); multiset<string> wordmset; string word; while((word = words.next()).size() != 0) wordmset.insert(word); typedef multiset<string>::iterator MSit; MSit it = wordmset.begin(); while(it != wordmset.end()) { pair<MSit, MSit> p=wordmset.equal_range(*it); int count = distance(p.first, p.second); cout << *it << ": " << count << endl; it = p.second; // Move to the next word } } ///:~
The setup in main( ) is
identical to WordCount.cpp, but then each word is simply inserted into
the multiset<string>. An iterator is created and initialized to the
beginning of the multiset; dereferencing this iterator produces the
current word. equal_range( ) produces the starting and ending
iterators of the word that’s currently selected, and the STL algorithm
distance( ) (which is in <iterator>) is used to count
the number of elements in that range. Then the iterator it is moved
forward to the end of the range, which puts it at the next word. Although if
you’re unfamiliar with the multiset this code can seem more
complex, the density of it and the lack of need for supporting classes like
Count has a lot of appeal.
In the end, is this really a
“set,” or should it be called something else? An alternative is the
generic “bag” that has been defined in some container libraries,
since a bag holds anything at all without discrimination – including
duplicate objects. This is close, but it doesn’t quite fit since a bag has
no specification about how elements should be ordered, while a multiset
(which requires that all duplicate elements be adjacent to each other) is even
more restrictive than the concept of a set, which could use a hashing function
to order its elements, in which case they would not be in sorted order. Besides,
if you wanted to store a bunch of objects without any special criterions,
you’d probably just use a vector, deque or list.
When using a thesaurus, you have a word
and you want to know all the words that are similar. When you look up a word,
then, you want a list of words as the result. Here, the “multi”
containers (multimap or multiset) are not appropriate. The
solution is to combine containers, which is easily done using the STL. Here, we
need a tool that turns out to be a powerful general concept, which is a
map of vector:
//: C07:Thesaurus.cpp // A map of vectors //{L} ../TestSuite/Test //{-msc} //{-g++3} #include <map> #include <vector> #include <string> #include <iostream> #include <algorithm> #include <ctime> #include <cstdlib> using namespace std; typedef map<string, vector<string> > Thesaurus; typedef pair<string, vector<string> > TEntry; typedef Thesaurus::iterator TIter; ostream& operator<<(ostream& os,const TEntry& t){ os << t.first << ": "; copy(t.second.begin(), t.second.end(), ostream_iterator<string>(os, " ")); return os; } // A generator for thesaurus test entries: class ThesaurusGen { static const string letters; static int count; public: int maxSize() { return letters.size(); } ThesaurusGen() { srand(time(0)); } TEntry operator()() { TEntry result; if(count >= maxSize()) count = 0; result.first = letters[count++]; int entries = (rand() % 5) + 2; for(int i = 0; i < entries; i++) { int choice = rand() % maxSize(); char cbuf[2] = { 0 }; cbuf[0] = letters[choice]; result.second.push_back(cbuf); } return result; } }; int ThesaurusGen::count = 0; const string ThesaurusGen::letters("ABCDEFGHIJKL" "MNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); // Ask for a "word" to look up: string menu(Thesaurus& thesaurus) { while(true) { cout << "Select a \"word\", 0 to quit: "; for(TIter it = thesaurus.begin(); it != thesaurus.end(); it++) cout << (*it).first << ' '; cout << endl; string reply; cin >> reply; if(reply.at(0) == '0') exit(0); // Quit if(thesaurus.find(reply) == thesaurus.end()) continue; // Not in list, try again return reply; } } int main() { Thesaurus thesaurus; // Fill with 10 entries: generate_n( inserter(thesaurus, thesaurus.begin()), 10, ThesaurusGen()); // Print everything: copy(thesaurus.begin(), thesaurus.end(), ostream_iterator<TEntry>(cout, "\n")); // Create a list of the keys: string keys[10]; int i = 0; for(TIter it = thesaurus.begin(); it != thesaurus.end(); it++) keys[i++] = (*it).first; for(int count = 0; count < 10; count++) { // Enter from the console: // string reply = menu(thesaurus); // Generate randomly (for automated testing): string reply = keys[rand() % 10]; vector<string>& v = thesaurus[reply]; copy(v.begin(), v.end(), ostream_iterator<string>(cout, " ")); cout << endl; } } ///:~
A Thesaurus maps a string
(the word) to a vector<string> (the synonyms). A TEntry is a
single entry in a Thesaurus. By creating an ostream
operator<< for a TEntry, a single entry from the
Thesaurus can easily be printed (and the whole Thesaurus can
easily be printed with copy( )). The ThesaurusGen creates
“words” (which are just single letters) and “synonyms”
for those words (which are just other randomly-chosen single letters) to be used
as thesaurus entries. It randomly chooses the number of synonym entries to make,
but there must be at least two. All the letters are chosen by indexing into a
static string that is part of ThesaurusGen.
In main( ), a
Thesaurus is created, filled with 10 entries and printed using the
copy( ) algorithm. The menu( ) function ask user to choose a
“word” to look up by typing the letter of that word. The
find( ) member function is used to find whether the entry exists in
the map (remember, you don’t want to use operator[ ] or it
will automatically make a new entry if it doesn’t find a match!). If so,
operator[ ] is used to fetch out the vector<string> which is
displayed.
In the above code, the selection of the
reply string is generated randomly, to allow
automated testing.
Because templates make the expression of
powerful concepts easy, you can take this concept much further, creating a
map of vectors containing maps, etc. For that matter, you
can combine any of the STL containers this
way.
In Stlshape.cpp, the pointers did
not clean themselves up automatically. It would be convenient to be able to do
this easily, rather than writing out the code each time. Here is a function
template that will clean up the pointers in any sequence container; note that it
is placed in the book’s root directory for easy access:
//: :purge.h // Delete pointers in an STL sequence container #ifndef PURGE_H #define PURGE_H #include <algorithm> template<class Seq> void purge(Seq& c) { typename Seq::iterator i; for(i = c.begin(); i != c.end(); i++) { delete *i; *i = 0; } } // Iterator version: template<class InpIt> void purge(InpIt begin, InpIt end) { while(begin != end) { delete *begin; *begin = 0; begin++; } } #endif // PURGE_H ///:~
In the first version of
purge( ), note that typename is absolutely necessary; indeed
this is exactly the case that the keyword was added for: Seq is a
template argument, and iterator is something that is nested within that
template. So what does Seq::iterator refer to? The typename
keyword specifies that it refers to a type, and not something
else.
While the container version of purge must
work with an STL-style container, the iterator version of purge( )
will work with any range, including an array.
Here is Stlshape.cpp, modified to
use the purge( ) function:
//: C07:Stlshape2.cpp // Stlshape.cpp with the purge() function //{L} ../TestSuite/Test #include "../purge.h" #include <vector> #include <iostream> using namespace std; class Shape { public: virtual void draw() = 0; virtual ~Shape() {}; }; class Circle : public Shape { public: void draw() { cout << "Circle::draw\n"; } ~Circle() { cout << "~Circle\n"; } }; class Triangle : public Shape { public: void draw() { cout << "Triangle::draw\n"; } ~Triangle() { cout << "~Triangle\n"; } }; class Square : public Shape { public: void draw() { cout << "Square::draw\n"; } ~Square() { cout << "~Square\n"; } }; typedef std::vector<Shape*> Container; typedef Container::iterator Iter; int main() { Container shapes; shapes.push_back(new Circle); shapes.push_back(new Square); shapes.push_back(new Triangle); for(Iter i = shapes.begin(); i != shapes.end(); i++) (*i)->draw(); purge(shapes); } ///:~
When using purge( ), you must
be careful to consider ownership issues – if an object pointer is held in
more than one container, then you must be sure not to delete it twice, and you
don’t want to destroy the object in the first container before the second
one is finished with it. Purging the same container twice is not a problem,
because purge( ) sets the pointer to zero once it deletes that
pointer, and calling delete for a zero pointer is a safe
operation.
With the STL as a foundation, it’s
possible to create your own containers. Assuming you follow the same model of
providing iterators, your new container will behave as if it were a built-in STL
container.
Consider the “ring” data
structure, which is a circular sequence container. If you reach the end, it just
wraps around to the beginning. This can be implemented on top of a list
as follows:
//: C07:Ring.cpp // Making a "ring" data structure from the STL //{L} ../TestSuite/Test //{-g++295} #include <iostream> #include <list> #include <string> using namespace std; template<class T> class Ring { list<T> lst; public: // Declaration necessary so the following // 'friend' statement sees this 'iterator' // instead of std::iterator: class iterator; friend class iterator; class iterator : public std::iterator< std::bidirectional_iterator_tag,T,ptrdiff_t>{ list<T>::iterator it; list<T>* r; public: // "typename" necessary to resolve nesting: iterator(list<T>& lst, const typename list<T>::iterator& i) : r(&lst), it(i) {} bool operator==(const iterator& x) const { return it == x.it; } bool operator!=(const iterator& x) const { return !(*this == x); } list<T>::reference operator*() const { return *it; } iterator& operator++() { ++it; if(it == r->end()) it = r->begin(); return *this; } iterator operator++(int) { iterator tmp = *this; ++*this; return tmp; } iterator& operator--() { if(it == r->begin()) it = r->end(); --it; return *this; } iterator operator--(int) { iterator tmp = *this; --*this; return tmp; } iterator insert(const T& x){ return iterator(*r, r->insert(it, x)); } iterator erase() { return iterator(*r, r->erase(it)); } }; void push_back(const T& x) { lst.push_back(x); } iterator begin() { return iterator(lst, lst.begin()); } int size() { return lst.size(); } }; int main() { Ring<string> rs; rs.push_back("one"); rs.push_back("two"); rs.push_back("three"); rs.push_back("four"); rs.push_back("five"); Ring<string>::iterator it = rs.begin(); it++; it++; it.insert("six"); it = rs.begin(); // Twice around the ring: for(int i = 0; i < rs.size() * 2; i++) cout << *it++ << endl; } ///:~
You can see that the iterator is where
most of the coding is done. The Ring iterator must know how to
loop back to the beginning, so it must keep a reference to the list of
its “parent” Ring object in order to know if it’s
at the end and how to get back to the beginning.
You’ll notice that the interface
for Ring is quite limited; in particular there is no end( ),
since a ring just keeps looping. This means that you won’t be able to use
a Ring in any STL algorithms that require a past-the-end iterator –
which is many of them. (It turns out that adding this feature is a non-trivial
exercise). Although this can seem limiting, consider stack, queue
and priority_queue, which don’t produce any iterators at
all!
Although the STL containers may provide
all the functionality you’ll ever need, they are not complete. For
example, the standard implementations of set and map use trees,
and although these are reasonably fast they may not be fast enough for your
needs. In the C++ Standards Committee it was generally agreed that hashed
implementations of set and map should have been included in
Standard C++, however there was not considered to be enough time to add these
components, and thus they were left out.
Fortunately, there are freely-available
alternatives. One of the nice things about the STL is that it establishes a
basic model for creating STL-like classes, so anything built using the same
model is easy to understand if you are already familiar with the
STL.
The SGI STL (freely available at
http://www.sgi.com/Technology/STL/) is one of the most robust implementations of
the STL, and can be used to replace your compiler’s STL if that is found
wanting. In addition they’ve added a number of extensions including
hash_set, hash_multiset, hash_map, hash_multimap,
slist (a singly-linked list) and rope (a variant of string
optimized for very large strings and fast concatenation and substring
operations).
Let’s consider a performance
comparison between a tree-based map and the SGI hash_map. To keep
things simple, the mappings will be from int to
int:
//: C07:MapVsHashMap.cpp // The hash_map header is not part of the // Standard C++ STL. It is an extension that // is only available as part of the SGI STL // (It is included with the g++ distribution) //{L} ../TestSuite/Test //{-bor} You can add the header by hand //{-msc} You can add the header by hand //{-g++3} #include <hash_map> #include <iostream> #include <map> #include <ctime> using namespace std; int main(){ hash_map<int, int> hm; map<int, int> m; clock_t ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) m.insert(make_pair(j,j)); cout << "map insertions: " << clock() - ticks << endl; ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) hm.insert(make_pair(j,j)); cout << "hash_map insertions: " << clock() - ticks << endl; ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) m[j]; cout << "map::operator[] lookups: " << clock() - ticks << endl; ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) hm[j]; cout << "hash_map::operator[] lookups: " << clock() - ticks << endl; ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) m.find(j); cout << "map::find() lookups: " << clock() - ticks << endl; ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) hm.find(j); cout << "hash_map::find() lookups: " << clock() - ticks << endl; } ///:~
The performance test I ran showed a speed
improvement of roughly 4:1 for the hash_map over the map in all
operations (and as expected, find( ) is slightly faster than
operator[ ] for lookups for both types of map). If a profiler shows a
bottleneck in your map, you should consider a
hash_map.
The goal of this chapter was not just to
introduce the STL containers in some considerable depth (of course, not every
detail could be covered here, but you should have enough now that you can look
up further information in the other resources). My higher hope is that this
chapter has made you grasp the incredible power available in the STL, and shown
you how much faster and more efficient your programming activities can be by
using and understanding the STL.
The fact that I could not escape from
introducing some of the STL algorithms in this chapter suggests how useful they
can be. In the next chapter you’ll get a much more focused look at the
algorithms.
[19]
Contributed to the C++ Standard by Alexander Stepanov and Meng Lee at
Hewlett-Packard.
[20]
These were actually created to abstract the “locale” facets away
from iostreams, so that locale facets could operate on any sequence of
characters, not only iostreams. Locales allow iostreams to easily handle
culturally-different formatting (such as representation of money), and are
beyond the scope of this book.
[21]
I am indebted to Nathan Myers for explaining this to me.
[22]
This is another example coached by Nathan Myers.