Andre's Blog • STL matchmaking gone bad

STL matchmaking gone bad

Posted Thu, 23 Apr 2009 20:00:00 GMT in Programming by Andre

Some STL containers, such as std::map, are designed to use std::pair to maintain keys and values. A special convenience function, std::make_pair, is often used to create pairs without having to specify template argument types. Typical code that makes use of an STL map looks like this:

std::map<std::string, std::string> m;		// map<key,value>
m.insert(std::make_pair("1", "one"));
m.insert(std::make_pair("2", "two"));
printf("%s", m.find("1")->second.c_str());	// prints "one"

Without using std::make_pair, which is declared as follows:

template <class T1, class T2> pair<T1, T2> make_pair(T1 x, T2 y);

, an instance of std::pair would have to be constructed with a template argument list:

m.insert(std::pair<std::string, std::string>("1", "one"));

Those who are familiar with STL will notice that even though the map in this example is defined to use STL strings as keys and values, all calls above pass C-strings, which means that std::string temporaries would be constructed as necessary. Naturally, one has to wonder if using STL strings instead would result in faster code. Let's see if that's the case.

Character Pointers

At the first glance, using character pointers to create a pair seems not very efficient because key and the value strings would have to be traversed in order to determine their length, in addition to the cost of creating a pair within the map. That is, given this source:

std::string key = "1";
std::string value = "one";
m.insert(std::make_pair(key.c_str(), value.c_str()));

, the compiler will generate code to create one temporary pair containing key and value string pointers (p1), one temporary pair containing copies of the original key and value as STL strings (p2) and one more copy of the latter within a map node:

p1 = std::make_pair<char*,char*>(key.c_str(), value.c_str());
     └► return std::pair<char*,char*>(key.c_str(), value.c_str());
               └► p1.first(key.c_str()), p1.second(value.c_str()); 
p2 = std::pair<std::string,std::string>(p1);
     └► p2.first(p1.first), p2.second(p1.second);
m.insert(p2);
     └► std::pair<std::string,std::string>(p2);
std::pair::~pair(p2);
std::pair::~pair(p1);

Overall, three instances of std::pair and four instances of std::string were constructed and one std::pair and two instances of std::string were kept in the map. In this case, calling std::make_pair cost us two instances of std::pair and two instances of std::string (p2's data members).

STL strings

Using instances of std::string to construct STL pairs seems to be most popular. Probably because the resulting code is easier to read:

std::string key = "2";
std::string value = "two";
m.insert(std::make_pair(key, value));

In this case, the compiler will generate code to create two temporary STL strings, which will be used as key and value parameters (k1 and v1), one temporary pair containing copies of k1 and v1 (p1) and one std::pair containing a copy of p1 within a map node:

std::string k1(key);
std::string v1(value);
p1 = std::make_pair<std::string,std::string>(k1, v1);
     └► return std::pair<std::string,std::string>(k1, v1);
               └► first(k1), second(v1);
m.insert(p1);
     └► std::pair<std::string,std::string>(p1);
std::pair::~pair(p1);
std::string::~string(v1);
std::string::~string(k1);

Overall, two instances of std::pair and six instances of std::string were constructed and only one pair and two STL strings were kept in the map. This method turns out to be the most expensive, constructing one std::pair and four instances of std::string (p1's data members) just to make code more readable.

STL pair

Creating an instance of std::pair explicitly is probably the most straightforward method, which surprisingly isn't used very much. Probably because the result doesn't look as compact as in two other cases above. Consider this source (note the use of const):

std::string key = "3";
std::string value = "three";
m.insert(std::pair<const std::string, std::string>(key, value));

In this case, only one instance of std::pair is created and passed to the map::insert method:

p1 = std::pair<const std::string,std::string>(key, value);
     └► first(key), second(value);
m.insert(p1);
std::pair::~pair(p1);

Overall, two instances of std::pair and four instances of std::string were created. The overhead of one std::pair and two STL strings (p1's data members) make this method the least expensive of all three.

Putting it to the test

Let's put these conclusions to a test. A simple loop below inserts the same key into a map 5 million times. Each time, except the first one, the map rejects the new key, which allows us to time just the construction of the input parameters.

key = "1";
value = "one";
stime = GetTickCount();
for(i = 0; i < 5000000; i++)
   m.insert(...);
printf("%d\n", GetTickCount()-stime);

Two columns below show loop running time, in milliseconds, for short and long keys and values. That is, strings shorter than 16 characters are placed in the std::string's internal buffer, while longer strings are placed in dynamically allocated memory, which dramatically affects the performance of this loop.

              short key   long key
char*      :      1141       2531
STL string :      2625       6687
STL pair   :      1078       2350

Conclusion

Of course, authors writing desktop applications may consider these expenses as negligible. After all, a few extra microseconds won't visibly affect experience of a user who's trying to browse a list of fonts. However, these microseconds add up in high-performance utilities and servers, as well as in embedded applications and games and sometimes one has to choose between code readability and its speed.

Comments:

▢ Posted Wed, 08 Oct 2014 07:07:24 GMT by Brolock

Hi,
as lolohammer said, now, with C++11 you can do pretty good things:

m.insert(make_pair("key", "value"));

auto p = make_pair("key", "value");

m.insert(std::move(p));

Of course the idea of your exemple is to show that we shouldn't declare a variable if we just want to pass it to a container / wrapper afterward (which should be by move and not copy construct if you don't plan to reuse your variable afterward).

▢ Posted Sat, 22 Feb 2014 23:00:11 GMT by lolohammer

you should maybe look a bit more, there are ways to remove the extra copy in c++ even with make_pair with optimization and also the move constructor.

▢ Posted Mon, 09 May 2011 11:45:46 GMT by goldfishka

Cool:)