Alternative compilation model for C

Thiago Adams

2024-05-20 20:03:33 UTC

This is an old post of mine

https://groups.google.com/g/comp.lang.c/c/-mv5UOpTM2U/m/ewJZtnIjBgAJ?pli=1

I wrote:
"
I was thinking in a new compilation model for C (just to experiment)
without changing the language syntax.

Maybe breaking some code - so this compilation mode is optional.

For the #include instead of text inclusion something like "symbol import".

Today in C if we do

#define X
int f(){}
#include "file.h"

The macro X and function f can be used inside file.h because text is
expanded and everything is at same scope.

Although this is possible a common practice for headers is to include
other headers with the definition and not rely on external scope. The
exception is global macros like DEBUG for instance.

So one difference of this compilation model is that #include is NOT text
inclusion. It is "import symbols" and the external context is not
accessible.

Another common practice in C is to use include guards to avoid including
the same symbols twice. In this model this is the default, the symbols
are included just once.

Instead of compiling one source each time this model can compile many
sources and the parsed header files can be "reserved in memory" to be
included again without re-parsing. This works because differently from
normal C text inclusion the header file does not change depending where
it is included.

To solve the global macro usage like DEBUG the compiler settings works
like if they were the first include but this is implicit. We also can
have our config.h that is included in each file. Actually some C
projects have this.

I forgot to say something.. my idea is also to move all the preprocessor
phases to compile phase and again in a way that does not break code.(at
least having a big common subset the works in both models)

This process of moving the preprocessor phases have a lot of details and
each problem deserver its own topic. Macro expansion is the difficult part.

How this could work? Back to #include... include is now at parser phase.

When the compiler finds #include "file.h" it checks if the file is
already loaded (parsed) if yes the symbols are injected at the current
context.

If not, the file is load first (parsed) with empty context (like the
initial sample X and f are not present) and then symbols are injected.

Included files inside included files also will inject symbols at the
current context. Something like private include also could be considered
but then the source cannot be used on the old model.

One way to implement this is for-loop injecting each symbol at the
external scope. If the symbol already exist then it is a error.

Another way I was thinking is to just inject all the new scope making
the global scope a collection. But I need to check if the symbol already
exists anyway so does not help too much..
"

I just discovery that C++ 20 has something called Header Units

"Header units were introduced in C++20 as a way to temporarily bridge
the gap between header files and modules. They provide some of the speed
and robustness benefits of modules, while you migrate your code to use
modules."

See
https://learn.microsoft.com/en-us/cpp/build/compare-inclusion-methods?view=msvc-170

"Header units are the recommended alternative to precompiled header
files (PCH). Header units are easier to set up and use, are
significantly smaller on disk, provide similar performance benefits, and
are more flexible than a shared PCH."

https://learn.microsoft.com/en-us/cpp/build/walkthrough-header-units?view=msvc-170

"What is a header unit

A header unit is a binary representation of a header file. A header unit
ends with an .ifc extension. The same format is used for named modules.

An important difference between a header unit and a header file is that
a header unit isn't affected by macro definitions outside of the header
unit. That is, you can't define a preprocessor symbol that causes the
header unit to behave differently. By the time you import the header
unit, the header unit is already compiled. That's different from how an
#include file is treated. An included file can be affected by a macro
definition outside of the header file because the header file goes
through the preprocessor when you compile the source file that includes it.

Header units can be imported in any order, which isn't true of header
files. Header file order matters because macro definitions defined in
one header file might affect a subsequent header file. Macro definitions
in one header unit can't affect another header unit."

So it is very similar of what I was suggesting.
The only difference is that it is binary file. The binary only makes
difference on speed at first load, because if the compiler is compiling
many files the extra load time will be just once for many files.

So, something much simpler than modules, and simpler than binary files,
is just a header reuse, just like I was suggesting.
I also would keep #include and make a pragma to control the behaviour
"old mode" or "new mode" instead of import.

Also:
Progress Report: Adopting Header Units in Microsoft Word