Post by Kaz KylhekuPost by David BrownYou should read the footnotes to 5.1.1.2 "Translation phases".
Footnotes are not normative, but they are helpful in explaining the
meaning of the text. They note that compilers don't have to follow the
details of the translation phases, and that source files, translation
units, and translated translation units don't have to have one-to-one
correspondences.
Yes, I'm aware of that. For instance preprocessing can all be jumbled
into one process. But it has to produce that result.
Even if translation phases 7 and 8 are combined, the semantic analysis
of the individual translation unit has to appear to be settled before
linkage. So for instance a translation unit could incrementally emerge
from the semantic analysis steps, and those parts of it already analyzed
(phase 7) could start to be linked to other translation units (phase 8).
Again, you are inferring far too much here. The standard is /not/
limiting like this.
Compilers can make use of all sorts of additional information. They
have always been able to do so. They can use extra information provided
by compiler extensions - such as gcc attributes. They can use
information from profiling to optimise based on real-world usage. They
can analyse source code files and use that analysis for optimisation
(and hopefully also static error checking).
Consider this:
A compiler can happily analyse each source code file in all kinds of
ways, completely independently of what the C standards (or perhaps, by
happy coincidence, using the same types of pre-processing and
interpretation). This analysis can be stored in files or some other
storage place. Do you agree that this is allowed, or do you think the C
standards somehow ban it? Note that we are calling this "analysis" -
not C compilation.
Now the compiler starts the "real" compilation, passing through the
translation phases one by one. When it gets to phase 7, it reads all
this stored analysis information. (Nothing in the standards says the
compiler can't pull in extra information - it is quite normal, for
example, to pull in code snippets as part of the compilation process.)
For each translation unit, it produces two outputs (in one "fat" object
file) - one part is a relatively dumb translation that does not make use
of the analysis, the other uses the analysis information to generate
more optimal code. Both parts make up the "translator output" for the
translation unit. Again, can you point to anything in the C standards
that would forbid this?
Then we come to phase 8. The compiler (or linker) reads all the
"translator output" files needed for the complete program. It checks
that it has the same set of input files as were used during the
pre-compilation analysis. If they are all the same, then the analysis
information about the different units is valid, and thus the
optimisations using that extra information are valid. The "dumb
translation" versions can be used as a fallback if the analysis was not
valid - otherwise they are thrown out, and the more optimised versions
are linked together.
There is nothing in the description of the translation phases that
hinders this. All the compiler has to do is ensure that the final
program - not any individual translation units - has correct observable
behaviour.
I would also refer you to section 1 of the C standards - "Scope". In
particular, note that "This document does /not/ specify the mechanism by
which C programs are transformed for use by a data-processing system".
(Emphasis mine.) The workings of the compiler are not part of the standard.
Post by Kaz KylhekuI'm just saying that certain information leakage is clearly permitted,
regardless of how the phases are integrated.
Post by David BrownThe standard also does not say what the output of "translation" is - it
does not have to be assembly or machine code. It can happily be an
internal format, as used by gcc and clang/llvm. It does not define what
"linking" is, or how the translated translation units are "collected
into a program image" - combining the partially compiled units,
optimising, and then generating a program image is well within that
definition.
Post by Kaz Kylheku(That can be inferred
from the rules which forbid semantic analysis across translation
units, only linkage.)
The rules do not forbid semantic analysis across translation units -
they merely do not /require/ it. You are making an inference without
any justification that I can see.
Translation phase 7 is clearly about a single translation unit in
"The resulting tokens are syntactically and semantically analyzed
and translated as a translation unit."
Not: "as a combination of multiple translation uints".
The point is that many things are local to a translation unit, such as
statics, type definitions, and so on. These are valid within the
translation unit (within their scope, of course), and independent of
identically named items in other translation units. It is about
defining a kind of "unit of compilation" for the language semantics - it
is /not/ restricting the behaviour of a compiler.
LTO does not change the language semantics in any way. The language
semantics determine the observable behaviour of the program, and we have
already established that this must be unchanged. Generated instructions
for a target are not part of the language semantics.
Post by Kaz Kylheku5.1.1.1 clearly refers to "[t]he separate translation units of a
program".
It does so all in terms of what a compiler /may/ do.
And there is never any specification of the result of a "translation".
It can happily be byte-code, or internal toolchain-specific formats.
Post by Kaz KylhekuLTO pretends that the program is still divided into the same translation
units, while minging them together in ways contrary to all those
chapter 5 descriptions.
No.
Post by Kaz KylhekuThe conforming way to obtain LTO is to actually combine multiple
preprocessing translation units into one.
You could do that if you like (after manipulating things to handle
statics, type definitions, etc.).
And you would then find that if "foo()" in "foo.c" called "bar()" in
"bar.c", the call to "bar()" might be inlined, or omitted, or otherwise
optimised, just as it could be if they were both defined in the same
translation unit.
The result would be the same kind of object code as you get with LTO -
one in which the observable behaviour is as expected, but you might get
different details in the generated code.
I don't know why you would think that this kind of combination of units
is conforming, but LTO is not. It's all the same thing in principle -
the only difference is that real-world implementations of LTO are
designed to be scalable, do as much as possible in parallel, and avoid
re-doing work for files that don't change.
Some link-time optimisation or "whole program optimisation" toolchains
are aimed at small code bases (such as might fit into a small
microcontroller) and combine all the source code together then handle it
all at once. Again, the principles and the semantics are not any
different from gcc LTO - it's just a different way of splitting up the work.
Post by Kaz KylhekuPost by David BrownPost by Kaz KylhekuThat's why we can have a real world security issue caused by zeroing
being optimized away.
No, it is not. We have real-world security issues for all sorts of
reasons, including people mistakenly thinking they can force particular
types of code generation by calling functions in different source files.
In fact, that code generation is forced, when people do not use LTO,
which is not enabled by default.
No, it is not.
The C standards don't talk about LTO, or whether or not it is enabled,
or what is "default", or even what kind of code generation you get.
If the compiler knows that a function call will not have or affect
observable behaviour, it can omit that call. It does not matter how it
knows this. LTO is a very practical way to get this information, but it
might not be the only way. Profile-guided optimisation information may
provide the same information. So could attributes given in the function
declaration (and a future C standard will likely support such attributes).
But if the compiler doesn't know for sure that it is safe to omit the
call, then it must generate it. Correctness trumps optimisation!
Post by Kaz KylhekuPost by David BrownPost by Kaz KylhekuThe rules spelled out in ISO C allow us to unit test a translation
unit by linking it to some harness, and be sure it has exactly the
same behaviors when linked to the production program.
No, they don't.
If the unit you are testing calls something outside that unit, you may
get different behaviours when testing and when used in production.
Yes; if you do nonconforming things.
No one is suggesting doing "nonconforming things".
To give a simple example, suppose your unit is intended to perform some
calculations and then call a callback with the result. In a test
harness, you would provide a callback that checks the result against the
expected value, and provides a pass/fail log message. In production
use, you would provide a callback that pops up a window with the value,
or sends it in an email to the user. The observable behaviour of the
production program and the test program is very different.
In fact, unless you are testing the production version, or you are
producing a test harness, you would normally expect very different
observable behaviours from any unit testing and real usage of the code.
Post by Kaz KylhekuPost by David Brownonly thing you can be sure of from testing is that if you find a bug
during testing, you have a bug in the code. You can never use testing
to be sure that the code works (with the exception of exhaustive testing
of all possible inputs, which is rarely practical).
LTO will break translation units that are simple enough to be trivially
proven to have a certain behavior.
Again, claiming this will not make it true. You need to update your
ideas about what observable behaviour actually is.
Post by Kaz KylhekuPost by David BrownPost by Kaz KylhekuIf I have some translation unit in which there is a function foo, such
that when I call foo, it then calls an external function bar, that's
observable.
5.1.2.2.1p6 lists the three things that C defines as "observable
behaviour". Function calls - internal or external - are not amongst these.
External calls are de facto observable,
The phrase "de facto" is an admission that you understand that none of
this is part of the /actual/ standards. You have dropped from "the
official standards make this clear" down to "I think this".
Post by Kaz Kylhekubecause we have it for granted
when we have a translation unit that calls a certain function, we can
supply another translation unit which supplies that function. In
that function we can communicate with the host environment to confirm
that it was called.
All such boundaries are lost in the link stage, before observable
behaviour becomes relevant.
Post by Kaz KylhekuPost by David BrownPost by Kaz KylhekuI can link that unit to a program which supplies bar,
containing a printf call, then call foo and verify that the printf call
is executed.
Yes, you can. The printf call - or, more exactly, the "input and output
dynamics" - are observable behaviour. The call to "bar", however, is not.
If bar does not call the function, then the observable behavior of
printf doesn't occur either; they linked by logic / cause-and-effect.
Nonsense.
The compiler-generated code must produce the correct observable
behaviour. It can do that however it likes. It can put a call to
"printf" directly in "foo". It can replace the "printf" with a "puts"
or a series of target-specific "write_a_char" calls if the results are
the same.
C is defined in terms of behaviour, not particular instruction
sequences. If you write "x = y * 4;", the compiler can generate
instructions that look like "x = y + y + y + y;", or "x = y * 2; x = x +
y + y;", or "x = y << 8 - (2 * y + 3 * y - y)_;", or anything it likes
as long as the result is correct (and obviously avoiding any extra
overflows).
Post by Kaz KylhekuA behavior that is not itself formally classified as observable can be
discovered by logical linkage to be necessary for the production of
observable behavior. It can be an "if, and only if" linkage.
If an observable behavior B occurs if, and only if, some behavior A
occurs, then the fact of whether A occurs or not is de facto observable.
Calling it "de facto observable behaviour" is just confusing your
understanding here. But you can well say that if B is observed, that
means A must have happened.
However, you have not in any way shown that A (in this case,
instructions to call the function "bar") is the only way to result in
the observable behaviour.
Post by Kaz KylhekuPost by David BrownThe compiler, when compiling the source of "foo", will include a call to
"bar" when it does not have the source code (or other detailed semantic
information) for "bar" available at the time.
Translation phases 1 to 7 forbid processing material from another
translation unit.
Nope.
Post by Kaz KylhekuConforming semantic analysis of a translation unit has
nothing but that translation unit.
Nope.
Post by Kaz KylhekuPost by David BrownBut you are mistaken to
think it does so because the call is "observable" or required by the C
standard.
Sure; let's say that the call can be tied to observable behavior
elsewhere such that the call occurs if and only if the observable
behavior occurs.
That would be a better way to put it. But it is still not the case here.
Post by Kaz KylhekuPost by David BrownIt does so because it cannot prove that /running/ the
function "bar" contains no observable behaviour, or otherwise affects
the observable behaviour of the program. The compiler cannot skip the
call unless it can be sure it is safe to do so - and if it knows nothing
about the implementation of "bar", it must assume the worst.
The compiler cannot do any of this if it is in a conforming mode.
The compiler can omit the call to "bar" if it is sure that it results in
no observable behaviour. It cannot omit it if it is not sure of this.
It is /that/ simple.
Post by Kaz KylhekuBut sure, in the nonconforming LTO paradigm, which does have to adhere
to sane rules, that more or less follow what would have to happen if
multiple preprocessing translation units were merged at the token level
and thus analyzed together.
Post by David BrownSometimes the compiler may have additional information - such as if it
is declared the gcc "const" or "pure" attributes (or the standardised
"unsequenced" and "reproducible" attributes in the draft for the next C
version after C23).
If the declarations are available only in another translation unit,
they cannot be taken into account when analyzing this translation unit.
Wrong.
This is really the crux of your misunderstandings. You have read
between the lines of the standard and imagined rules that don't exist.
Once you realise that they are imaginary, I expect the rest to fall into
place.
Post by Kaz KylhekuPost by David BrownPost by Kaz KylhekuSince ISO C says that the semantic analysis has been done (that
unit having gone through phase 7), we can take it for granted as a
done-and-dusted property of that translation unit that it calls bar
whenever its foo is invoked.
No, we can't - see above. Nothing in the C standards forbids any
additional analysis, or using other information in code generation.
Any semantic analysis performed be that which is stated in translation
phase 7, which happens for one translation unit, before considering
linkage to other translation units.
What forbids is is that no semantic analysis activity is decribed as
taking place in translation phase 8, other than linage.
The C standards also don't describe drinking coffee while waiting for
the compiler. Just because something is not mentioned, does not mean it
is forbidden!
Post by Kaz KylhekuPost by David BrownPost by Kaz KylhekuPost by Keith ThompsonSay I have a call to foo in main, and the definition of foo is in
another translation unit. In the absence of LTO, the compiler will have
to generate a call to foo. If LTO is able to determine that foo doesn't
do anything, it can remove the code for the function call, and the
resulting behavior of the linked program is unchanged.
There always situations in which optimizations that have been forbidden
don't cause a problem, and are even desirable.
Can you give examples?
You already mentioned "-fast-math" (and by implication, its various
subflags in gcc, clang and icc). These are clearly documented as
allowing some violations of the C standards (and not least, the IEEE
floating point standards, which are stricter than those of C).
Yes, and some people want that, learn how it works, and get their
programs working with it, all the while knowing that it's
nonconforming to IEEE and ISO C.
Indeed. I am "some people" in this context.
Post by Kaz KylhekuAnother tool in the box.
Agreed.
But "-ffast-math" was already covered, and is irrelevant precisely
because it is entirely clear that it is potentially standards-violating.
(But it is not "forbidden". I have yet to see any ISO C police
enforcers at my office door, waving a warrant.)
I wanted to know if you had other examples of what you see as
standards-violating optimisations that are not documented as such.
Post by Kaz KylhekuPost by David Brown(While I don't much like an "appeal to authority" argument, I think it's
worth noting that the major C / C++ compilers, gcc, clang/llvm and MSVC,
all support link-time optimisation. They also all work together with
both the C and C++ standards committees. It would be quite the scandal
if there were any truth in your claims and these compiler vendors were
all breaking the rules of the languages they help to specify!)
Why would it be?
It would run counter to the whole point of having a standard.
Post by Kaz KylhekuIn the first place, all the implementations you mention have to be
explicitly put into a nondefault configuration in order to resemble
conforming ISO C implementations.
Yes, but they are clear about that. (At least, gcc is - I haven't read
the documentation for clang as thoroughly, and have barely touched MSVC.)
It is absolutely fine for a compiler to have conforming and
non-conforming modes. But it is /not/ fine for it to have a major part
of its optimisation that is as critically non-conforming as you seem to
believe, and not even mention this fact.
Post by Kaz KylhekuLTO is not even enabled by default (for good reasons).
The good reasons are that not all setups support it (it needs particular
linkers), it can significantly increase build times, it makes some kinds
of debugging nearly impossible, it plays badly with other tools such as
profilers and code coverage analysis, and you can have trouble if you
are doing weird things with compiler and linker file interaction or some
other kinds of non-standard C coding.
And like many optimisations, it can change the behaviour of incorrect
code that happens to work by luck with different choices of optimisation
settings.
Those are all very good reasons for not enabling it for default, when
the results are often only a few percent improvement in efficiency (for
some code, it can be a lot more helpful).
Most compilers don't enable /any/ significant optimisation by default.
Post by Kaz KylhekuA few goofballs who maintain GNU/Linux distros are turning on LTO for
compiling upstream packages whose development they know nothing about
beyond ./configure && make. (Luckily, the projects themselves can take
countermeasures to defend against this.)
I think the fact that LTO is almost certainly nonconforming deserves
more attention, but not panic or anything like that.
If it /were/ nonconforming, I think that would deserve huge attention.
But it is not.
Post by Kaz KylhekuLTO should be made into a conforming feature that is optional.
Translation phase 8 can be split into 8 and 9. In 8, translation units
would be optionally partitioned into subsets. Each subset containing
two or more translation units would be be subjected to further semantic
analysis, as a group, and turned into a subset translation unit.
Phase 9 would be same as former 8.
Whether an implementation supports subsetting and the manner in which
units are indicated for subsetting would be implementation-defined, but
it would be clear that there is a semantic difference, and that each
implementation must support a translation mode in which the subsetting
isn't performed.