File Mapping

Because many of the data files used by this library are very large, especially the FASTA indexes which are normally around 13GB, we use file mapping to avoid having to load them.

The library Boost.Interprocess has a number of methods for dealing with file mapping which also allow two processes to share access to the same file without having to have two copies of the data.

With modern hardware, we should avoid reading files in the traditional manner unless we are processing data from the file sequentially. Databases and other random access files are usually mapped to process address space with the disk cache visible directly from the process. This also gives us the benefit of being able to have much more allocated memory than we would usually have due to the limits of size of the swap file.

Because of this, most of the classes come with mapped_* versions which can use an external memory image to represent their data. We construct the data using the non-mapped version and then write the binary data to disk. We can then map the disk image multiple times using the mapped_* classes or store the data in a managed object in Python, for example.