Modules Build System Scenarios
Introduction
Presented here are several scenarios that are intended to be familiar to those
with experience creating, maintaining, and using build systems within the C++
ecosystem.
Discussion focuses on problems and possible solutions for supporting C++20
modules within each scenario.
Topics discussed include:
- How to resolve a module import to either:
- A pre-built binary module interface (BMI) that is compatible with
the importing tool chain and build configuration.
- One or more module interface unit source files that define the
required module and from which a BMI can be constructed.
- How to build a BMI for a module interface unit source file when a
compatible pre-existing BMI is not available.
Tangential topics include:
- How to determine if a pre-existing BMI is or is not compatible with the
importing tool chain.
- How to determine if a pre-existing BMI is or is not compatible with the
build configuration (including compatibility with code to be compiled
and pre-compiled code linked via an existing object file, static library,
or dynamic library).
- How to determine a suitable order in which to build BMIs for module
interface unit source files.
This paper focuses on scenarios involving module interface units due to the
imposed requirement that they be built before source files that import the
modules they define can be successfully translated.
Module implementation units are discussed only with regard to pre-compiled
code and BMI compatibility.
Within this document, "compile" refers to the act of producing object code
for a given source file and "build" is used to refer to both
1) construction of a BMI for a given module interface unit, and
2) the suite of compiler or tool invocations initiated to construct all
derived artifacts for a given project.
"Translation" refers to the act of parsing and semantically analyzing a
source file.
For the purposes of this document, please note that a BMI need not be a file
persisted on a file system.
Rather, a BMI is a semantic representation of a module interface that may be
constructed eagerly or lazily, in memory or on disk, locally or remotely.
However, it is often convenient to think of a BMI as a file that holds the
cached result of a previous build of a module interface unit.
The presented scenarios are intended to apply to all build systems.
This includes traditional build systems that invoke compilers to produce
executable files as well as the processes that are used by IDEs,
static analyzers, or code review tools to find and translate C++ source
code for presentation or analytical purposes.
C++20 modules will require all tools that purport to accurately parse and
semantically analyze C++ code to be able to dynamically navigate a translation
unit dependency graph since C++ code that imports a module can no longer be
processed as a single translation unit.
Terminology
- Build System
A tool or set of tools that translate libraries into desired outputs.
This may include production of executable files.
Single-purpose tools that produce other kinds of artifacts, such as
API documentation or source analysis results, also qualify as build
systems for the purposes of this document.
- Library
A collection of source files that, together, provide a cohesive set of
features.
A library might or might not include source files that must be separately
compiled (e.g., a traditional “header-only” library would not include
such files).
A library might or might not be dependent on one or more other
libraries.
- Package
A collection of source files (header files, module interface units),
derived artifacts (BMIs, object files, static/dynamic libraries, etc…)
pre-built for one or more targets, and configuration files that describe
the package, its dependencies, and requirements for consumption.
A package usually corresponds to a library and includes a subset of its
files; generally those that comprise the library interface
(e.g., header files, module interface units).
- Project
A combination of a build system and one or more libraries for which the
build system has direct knowledge of how to build for one or more
targets or configurations.
Scenario 1: In-project dependent modules
In this scenario, a build system is tasked with translating a collection of
source files, some of which import a module M that is defined by
module interface unit source files that are known to the build system.
Problems
- The build system may lack knowledge of which source file(s) define
module M.
- The build system may lack knowledge of which modules a source file
imports.
- The build system may lack knowledge of module dependencies and implied
module build order requirements.
Solution
- Scan all source files within the project to
1) identify source files that define modules,
2) identify which modules each source file imports, and
3) construct a DAG of module dependencies to be used to execute a build
plan that ensures that, for each module M, a BMI is built for
the module interface unit source file(s) that define M before
any source file that imports M is translated.
See the
clang-scan-deps presentation
by Alex Lorenz and Michael Spencer from the April 2019 LLVM Developer's
Meeting.
Scenario 2: External dependent source-only modules
In this scenario, a build system is tasked with translating a collection of
source files, some of which import a module M that is not defined by
a module interface unit source file that is known to the build system.
In this case, the build system must follow a protocol that facilitates
discovery of such source files and requirements for building a BMI.
Problems
- The build system lacks a mechanism to:
- Discover the external source file(s) that define module
M.
- Discover the requirements for building a BMI for module M
(language dialect, include paths, macros, etc…)
Solution
- Allow for a sequence of library identifiers to be provided to the build
system such that each library identifier has an associated path
(either well-known to the build system or user-provided) where the source
files and other artifacts of the library are located.
- Specify a name and format for a library description file to be provided
by cooperating libraries that provides the following
(note that these configurations may need to vary by the consuming tool
chain, target, or build configuration):
- Paths relative to the library description file that contain public
header files.
- Paths relative to the library description file that contain private
header files (to be used when building module interface units).
- Paths relative to the library description file that contain module
interface unit source files (to be used when searching for source
files that define modules).
- Language dialect requirements for building module interface
units.
- Macro requirements for building module interface units.
- Libraries directly required by the library.
- Search for library description files in each of the library search paths.
For each library description file found, scan all source files within
the relative module interface unit source path(s) as is done in
scenario 1.
- Once scanning is complete, if any imported modules remain unresolved,
issue an error.
Scenario 3: External dependent packaged modules
This scenario is similar to scenario 2, but differs in that the external
package might include compatible pre-built BMIs that can be used to
1) obviate the need to build BMIs for module interface unit source files
included in the package, and
2) ensure that BMIs are used that are compatible with any object files or
static/dynamic libraries linked from the package.
Problems
- The package might provide BMIs for numerous tool chains, targets, and
build configurations and the build system will have to select the right
one.
- The package might not provide BMIs that are compatible with the consuming
tool chain, target, or build configuration.
- The build system and tool chain must collaborate to select BMIs that are
compatible with both the tool chain and build configuration
(e.g., any object files or static/dynamic libraries to be linked in).
- There is no obvious best solution to determine if a BMI is compatible at
a build configuration level.
Two BMIs that reflect minor declaration differences indicate the presence
of ODR differences, but do not necessarily indicate an ABI difference
that would be problematic in practice.
There is a practical benefit to not being overly strict.
Solution
- Extend the library description file from scenario 2 to include additional
tool chain, target, and build configuration dependent information
including:
- Paths relative to the library description file that contain BMIs,
object files and static/dynamic libraries.
- Maps from module name to the name of a BMI.
- Linker options.
- Issue warnings/errors if multiple maps are found for the same module name
to distinct BMIs
(e.g., to ones that do not reflect a consistent interface).
- Embed either a BMI or BMI signature for each imported module M
when building an object file.
At link time, compare BMIs or BMI signatures across link units.
Issue warnings/errors if mismatches are found among BMIs that are
compatible with the tool chain (ignore those that are not).