Andre's Blog
Perfection is when there is nothing left to take away
You #include'd!

The C++ #include directive is probably the most commonly misused construct of the language. Generally accepted and rarely disputed practice today is to use angle brackets for standard headers and double quotes for internal header files and, sometimes, for 3rd-party libraries, which at the first glance is well-aligned with the C++ standard that suggests the following use of the #include directive:

Note: Although an implementation may provide a mechanism for making arbitrary source files available to the < > search, in general programmers should use the < > form for headers provided with the implementation, and the " " form for sources outside the control of the implementation.

The example following the quote above in the C++ standard does make sense and works just fine for simple projects that have limited number of dependencies on internal and 3rd-party headers, assuming that usefullib.h is a 3rd-party library and myprog.h is a project header in this example:

#include <stdio.h>
#include <unistd.h>
#include "usefullib.h"
#include "myprog.h"

Let's now dive deeper into how the C++ standard prescribes each form of #include to be implemented and how inclusion is actually implemented in most common modern C++ compilers.

First, the angle bracket inclusion is completely implementation-defined, so if a compiler was designed to pull included headers from a relational database, they would be completely in their right to do so.

A preprocessing directive of the form

    # include < h-char-sequence> new-line

searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined.

The double quote form is similar to the angle bracket form in that it's completely implementation-defined, but it has a very important bit that if the search of this form of inclusion fails, then the angle bracket form should be used instead, which is the key to good source inclusion patterns.

A preprocessing directive of the form

     # include " q-char-sequence" new-line

causes the replacement of that directive by the entire contents of the source file identified by the specified sequence between the " delimiters. The named source file is searched for in an implementation-defined manner. If this search is not supported, or if the search fails, the directive is reprocessed as if it read

     # include < h-char-sequence> new-line

with the identical contained sequence (including > characters, if any) from the original directive.

Most modern compilers define the same implementation-defined behavior for both forms of #include directives, which is to search for specified headers in a list of directories commonly called a search path for the angle bracket form and to look for the included header in the directory of the source file containing the #include directive for the double quote form. Some compilers extend the latter by allowing a separate search path for double quote includes.

Let's consider a hypothetical database library project that groups library source in directories, such as i18n for internationalization code and database for various types of database clients:

~/mydblib/
    i18n/
        ustring.h
        cpconv.h
    database/
        mysql.h
        berkeleydb.h
    mydb.h

This is where it starts getting messy because many developers try to avoid using directories in the double quote form of the #include directive, so their way of solving failing inclusion is to add ~/mydblib/i18n and ~/mydblib/database to the project's search path and write mydb.h, in part, as follows:

// mydb.h
#include "ustring.h"
#include "cpconv.h"
#include "mysql.h"
#include "berkeleydb.h"

The compiler is then forced to search ~/mydblib/ for each header in this example and then search through directories in the project's search path. While this approach works, it causes unnecessary file name look-ups in ~/mydblib, which is wasteful and unnecessary. These unnecessary extra file look-ups are not limited to just the directory where mydb.h is located, because a few more will be made in each directory in the search path that leads to the actual directory where the included header is located, which in the large project will likely add another half a dozen failed file look-ups.

As bad as unnecessary file look-ups sound, this is not the worst side effect of adding everything into the search path. The worst part is that by forcing the compiler to search through the search path we increase the probability of file name collision, which means that if the search path contains any other directory that has headers named ustring.h, mysql.h, etc, they will be picked up instead, leading to all sorts of problems that are hard to troubleshoot.

A few years ago I ran into this problem with CLucene, an Open Source C++ implementation of the full-text search engine called Lucene. CLucene was using a specific version of Boost that was shipped with their source and CLucene header files included Boost headers this way:

#include "CLucene/StdHeader.h"
#include "CLucene/util/Equators.h"
#include <boost/shared_ptr.hpp>
#include <boost/ptr_container/ptr_vector.hpp>
#include "CLucene/index/Term.h"
#include "CLucene/store/Directory.h"

Because CLucene used the form of the include directive with angle brackets, the compiler immediately went to the the include search path of the project that made use of CLuceme, which just happened to use its own version of Boost that was found instead of the one CLucene expected and it broke the CLucene build because of differences between these two versions of Boost.

Speaking of Boost, despite all the goodness of this wonderful library, their source inclusion practice of using angle brackets to include internal headers makes it impossible to have some hypothetical library to use its own version of Boost because any Boost header included in a public header of that library may be found in the project's version of Boost.

Even the big guys, like VC++ and GCC teams, get source inclusion wrong. Imagine you developed a library that has header files called features.h and yvals.h and then you distribute this library and users of this library would write their code to include features.h or yvals.h in angle brackets because this library may be installed into different directories on different systems:

#include <features.h>
#include <yvals.h>
#include <cstdio>

int main(int argc, char **argv)
{
    my_feature f(123, my_val());
    printf("int: %d\n", f.m());
    return 0;
}

This code will not compile in GCC because GCC ships with an internal header features.h and some GCC headers include features.h in angle brackets, which means that the header features.h shipped with that hypothetical library will not be found. If the directory containing the library's version of features.h is added to the search path, it will make things worse and will break the build badly because then the GCC's features.h will not be found. Similarly, this code will not compile in VC++ because Microsoft does the same thing with the yvals.h header and other internal headers.

I don't know what was the original rationale in the C++ standard for using the angle bracket form of #include as a fallback mechanism for double quote inclusion failures. Sounds like it was meant to provide some way to override headers that can be otherwise found in the search path, but I fail to see a wide-spread practical application for such override mechanism. However, considering the way most C++ compilers implemented the double quote form of source inclusion, it is safe to say that if you use double quotes to include a header that you know doesn't exist in the directory relative to the source file with the #include directive, and you are not implementing that override mentioned above, you are doing it wrong. Similarly, including internal headers in public headers of some library project in angle brackets is just as bad and will cause some grief for library users.

As a rule of thumb, any source used by your package that can be installed independently from your package should be included in angle brackets because that 3rd-party source will be located in different places from a computer to a computer and from a C++ implementation to a C++ implementation. Any source that comes with your library should be included using double quotes to ensure that it builds against known headers and to avoid picking up unintended headers from the search path.

Comments:
Name:

Comment: