Discussion:
C Macros Badly Defined?
Add Reply
bartc
2017-05-15 10:46:04 UTC
Reply
Permalink
Raw Message
Take a look at this macro invocation:

#include <stdio.h>

#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2

int main(void) {

#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}

It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.

The thing is that every so often you come across troublesome macros like
this that only work on some compilers. But why should that be the case?
Are C's preprocessor and macro expansion rules really so poorly defined
that so many compilers get it wrong? (I certainly thought so when I
tried to implement a preprocessor earlier this year.)

Maybe, you can get more consistent behaviour by doing multiple passes,
so that all the #ifs are done first for example, then the macro expansion.

But then maybe someone will have a macro expansion that generates
#-directives, or other macro invocations created from parts joined
together with ## or that use #-stringifying, that gcc will somehow
manage to compile as expected! Then that becomes the benchmark for what
is expected to work.

So, does anyone actually know EXACTLY what the capabilities of the C
macro system are? Or do compilers just make them up as they go along?
With gcc in the lead. (I don't intend to make this work in my own
implementation. I believe there should be clearly-defined limits to what
is possible and what is considered reasonable.)

(This is not a made-up example; the following was posted in
comp.lang.python today in "How to install Python package from source on
If you're using 3.6, you'll have to build from source. The package has
a single C extension without external dependencies, so it should be a
straight-forward build if you have Visual Studio 2015+ installed with
the C/C++ compiler for x86. Ideally it should work straight from pip.
But I tried and it failed in 3.6.1 due to the new PySlice_GetIndicesEx
macro. Apparently MSVC doesn't like preprocessor code like this in
#if PY_MAJOR_VERSION >= 3
if (PySlice_GetIndicesEx(item, Py_SIZE(self),
#else
if (PySlice_GetIndicesEx((PySliceObject*)item, Py_SIZE(self),
#endif
&start, &stop, &step, &slicelength) < 0) {
It fails with a C1057 error (unexpected end of file in macro
expansion). The build will succeed if you copy the common line with
`&start` to each case and comment out the original line, such that the
macro invocation isn't split across an #if / #endif. This is an ugly
consequence of making PySlice_GetIndicesEx a macro. I wonder if it
could be written differently to avoid this problem.
--
bartc
Ben Bacarisse
2017-05-15 11:32:30 UTC
Reply
Permalink
Raw Message
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
Why is it surprising that tcc compiles it?
Post by bartc
The thing is that every so often you come across troublesome macros
like this that only work on some compilers. But why should that be the
case?
Compilers have bugs.
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. Do you think the
wording of the standard needs to be improved and, if so, how?

<snip>
Post by bartc
But then maybe someone will have a macro expansion that generates
#-directives
Macros can't (validly) expand to directives.

<snip>
--
Ben.
Tim Rentsch
2017-05-15 13:48:26 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
Why is it surprising that tcc compiles it?
Post by bartc
The thing is that every so often you come across troublesome macros
like this that only work on some compilers. But why should that be the
case?
Compilers have bugs.
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. [...]
Does that mean you think the behavior is defined rather than
undefined? To me it looks like undefined behavior.
Ben Bacarisse
2017-05-15 14:18:09 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
<snip>
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. [...]
Does that mean you think the behavior is defined rather than
undefined? To me it looks like undefined behavior.
I thought it was defined. Is it not?
--
Ben.
Tim Rentsch
2017-05-15 18:22:23 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
<snip>
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. [...]
Does that mean you think the behavior is defined rather than
undefined? To me it looks like undefined behavior.
I thought it was defined. Is it not?
Section 6.10.3 paragraph 11 (discussing arguments for function-like
macro calls) says this:

[...] If there are sequences of preprocessing tokens within the
list of arguments that would otherwise act as preprocessing
directives, the behavior is undefined.

I believe this sentence applies in the code shown above, which
therefore would mean undefined behavior.
Ben Bacarisse
2017-05-15 19:40:44 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
Post by Ben Bacarisse
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
<snip>
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. [...]
Does that mean you think the behavior is defined rather than
undefined? To me it looks like undefined behavior.
I thought it was defined. Is it not?
Section 6.10.3 paragraph 11 (discussing arguments for function-like
[...] If there are sequences of preprocessing tokens within the
list of arguments that would otherwise act as preprocessing
directives, the behavior is undefined.
I believe this sentence applies in the code shown above, which
therefore would mean undefined behavior.
I'm not sure. It definitely applies to

M(
#if C>=3
10,20,
#else
100,200,
#endif
30)

but in the example above I am not sure the #else or the #endif are there
when the arguments are being collected. The wording in the standard
talks about processing the "group" (shown in the syntax as between the
#else and the #endif) being "processed". I took that to mean that the
bounding directive are no longer there when expansion happens, but I can
see (now) that that's not the only way to take it.
--
Ben.
Tim Rentsch
2017-05-16 08:58:38 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by Tim Rentsch
Post by Ben Bacarisse
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
<snip>
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. [...]
Does that mean you think the behavior is defined rather than
undefined? To me it looks like undefined behavior.
I thought it was defined. Is it not?
Section 6.10.3 paragraph 11 (discussing arguments for function-like
[...] If there are sequences of preprocessing tokens within the
list of arguments that would otherwise act as preprocessing
directives, the behavior is undefined.
I believe this sentence applies in the code shown above, which
therefore would mean undefined behavior.
I'm not sure. It definitely applies to
M(
#if C>=3
10,20,
#else
100,200,
#endif
30)
but in the example above I am not sure the #else or the #endif are there
when the arguments are being collected. The wording in the standard
talks about processing the "group" (shown in the syntax as between the
#else and the #endif) being "processed". I took that to mean that the
bounding directive are no longer there when expansion happens, but I can
see (now) that that's not the only way to take it.
I see. The idea that the subsequent directives might have
disappeared (ie, by the time the macro arguments are collected)
had not occurred to me. I don't have an airtight argument that
it's wrong but it does seem highly unlikely (at least it does to
me) that this is what was intended. Do you have a different
take on that?

FWIW, gcc -pedantic gives a diagnostic on the original program
(and with gcc -pedantic-errors causing the translation to fail).
Ben Bacarisse
2017-05-16 12:49:58 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
Post by Ben Bacarisse
Post by Tim Rentsch
Post by Ben Bacarisse
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
<snip>
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. [...]
Does that mean you think the behavior is defined rather than
undefined? To me it looks like undefined behavior.
I thought it was defined. Is it not?
Section 6.10.3 paragraph 11 (discussing arguments for function-like
[...] If there are sequences of preprocessing tokens within the
list of arguments that would otherwise act as preprocessing
directives, the behavior is undefined.
I believe this sentence applies in the code shown above, which
therefore would mean undefined behavior.
I'm not sure. It definitely applies to
M(
#if C>=3
10,20,
#else
100,200,
#endif
30)
but in the example above I am not sure the #else or the #endif are there
when the arguments are being collected. The wording in the standard
talks about processing the "group" (shown in the syntax as between the
#else and the #endif) being "processed". I took that to mean that the
bounding directive are no longer there when expansion happens, but I can
see (now) that that's not the only way to take it.
I see. The idea that the subsequent directives might have
disappeared (ie, by the time the macro arguments are collected)
had not occurred to me. I don't have an airtight argument that
it's wrong but it does seem highly unlikely (at least it does to
me) that this is what was intended. Do you have a different
take on that?
Nothing more than what I said: that the wording seems to suggest that
only the group is "processed", but that verb is not entirely clear. The
description seems to imply two phases -- a sort of parsing of the whole
conditional inclusion and then processing the selected group. It's not
even really an argument so much an impression formed on first reading.

<snip>
--
Ben.
Tim Rentsch
2017-05-20 04:47:08 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
Post by Ben Bacarisse
Post by Tim Rentsch
Post by Ben Bacarisse
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
<snip>
Post by Tim Rentsch
Post by Ben Bacarisse
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. [...]
Does that mean you think the behavior is defined rather than
undefined? To me it looks like undefined behavior.
I thought it was defined. Is it not?
Section 6.10.3 paragraph 11 (discussing arguments for function-like
[...] If there are sequences of preprocessing tokens within the
list of arguments that would otherwise act as preprocessing
directives, the behavior is undefined.
I believe this sentence applies in the code shown above, which
therefore would mean undefined behavior.
I'm not sure. It definitely applies to
M(
#if C>=3
10,20,
#else
100,200,
#endif
30)
but in the example above I am not sure the #else or the #endif are there
when the arguments are being collected. The wording in the standard
talks about processing the "group" (shown in the syntax as between the
#else and the #endif) being "processed". I took that to mean that the
bounding directive are no longer there when expansion happens, but I can
see (now) that that's not the only way to take it.
I see. The idea that the subsequent directives might have
disappeared (ie, by the time the macro arguments are collected)
had not occurred to me. I don't have an airtight argument that
it's wrong but it does seem highly unlikely (at least it does to
me) that this is what was intended. Do you have a different
take on that?
Nothing more than what I said: that the wording seems to suggest that
only the group is "processed", but that verb is not entirely clear. The
description seems to imply two phases -- a sort of parsing of the whole
conditional inclusion and then processing the selected group. It's not
even really an argument so much an impression formed on first reading.
First let me explicitly acknowledge the last sentence there, and
say I don't consider the current discussion to be an argument.

To satisfy my own curiosity though, and me being the person I am,
I wondered if the question could be answered more resolutely, and
went back to dig into the Standard again. I believe the passage
below (section 5.1.1.2, p1, subparagraph 4) does that for us:

4. Preprocessing directives are executed, macro invocations
are expanded, and _Pragma unary operator expressions are
executed. If a character sequence that matches the syntax
of a universal character name is produced by token
concatenation (6.10.3.3), the behavior is undefined. A
#include preprocessing directive causes the named header
or source file to be processed from phase 1 through phase
4, recursively. All preprocessing directives are then
deleted.

Note the last sentence. Preprocessing directives are not deleted
until macro calls have finished being expanded. Thus collecting
arguments will have encounted a preprocessing directive (if
nothing else then the #endif, since #endif by itself is not a
group), which gives an answer that looks (to me) pretty airtight.

And now I'm sure you will be glad to return to more interesting
topics. :)
bartc
2017-05-15 13:52:45 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
Why is it surprising that tcc compiles it?
Because the multi-pass approach that would be the only way I could think
of to get over some of these problems, is not compatible with very fast
compilation. Also tcc has bugs with other examples (so is not just
lifting the same preprocessor code from gcc).
Post by Ben Bacarisse
Post by bartc
The thing is that every so often you come across troublesome macros
like this that only work on some compilers. But why should that be the
case?
Compilers have bugs.
Yes, you might expect variability with compilers built with small teams
(or one-man efforts) such Pelles C, lccwin, DMC. But then there is also
MSVC (apparently even the version with VS2015), which I doubt was
created with a small team.
Post by Ben Bacarisse
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. Do you think the
wording of the standard needs to be improved and, if so, how?
I would prefer that the possibilities were purposely kept simple. As it
is now, a macro call such as:

M(A,B)

where A and B are arbitrary expressions, could conceivably be split up
like this:

M .. ( .. A .. , .. B .. )

where I use .. to indicate points of discontinuity (and .. can occur
within A and B of course. ".." can include:

* Any combination of spaces, tabs and newlines
* Any // or /*..*/ comments
* The end of one include file and/or the start of another
* The end of #if branch and/or the start of another
* Any combination of the above.

Since M, with no args, and M(...) are treated differently, you need to
check whether "(" follows M. But that is not easy when you need to
consider all the above that can occur between M and (.

That's not all, as 'M' and elements of A and B could be synthesised from
other macro expansions. Although not, oddly, "(", which must be an
actual parenthesis.

(Doing a quick test, splitting M(A,B) across include files doesn't work
with gcc. Why not? Where does it say in the Standard that this is not
possible? Or is it the case that if gcc can't manage it, no other
compiler needs to bother?)
Post by Ben Bacarisse
<snip>
Post by bartc
But then maybe someone will have a macro expansion that generates
#-directives
Macros can't (validly) expand to directives.
I think #pragmas can be generated. However, what's to stop gcc from
allowing directives to be generated? Then others will have to follow.

Here's a test of some old macro examples, and the results with various
compilers. They're all different! This is rather scary actually; what
other differences could there be in real code, which don't manifest
themselves so obviously?

People like you keep saying that the macro system is perfectly defined,
yet the experts writing the actual compilers seem to have a bit of
trouble! (Perhaps you would like to help them out...)

This is the C file being tested (the numeric labels are included but
omitted from outputs):

//---------------------------------
#define a(x) mac_a(x)
#define b
#define d(x) (x)
#define e(x, y, z) x y ## z
#define f a d d (x)
#define g a b (c)
#define h(x, y) x y
#define hash #
#define i j j
#define j i

#define F(a) a*G
#define G(a) F(a)
#define A(x,y) if(x==y)
#define B A(

1: a b (c)
2: a d (c)
3: d(a b (c))
4: d(a d (c))
5: f
6: e(a,,a)
7: e(a, c d, e)
8: h(hash, zz)
9: i
10: d(i)
11: F(2)(9)
12: B 10,20);
//---------------------------------

-E output of various compilers:

gcc Pelles C lccwin64 tcc msvc008

1: a (c) a (c) a b (c) a (c) a (c)
2: a (c) a(c) a d (c) a(c) a(c)
3: (mac_a(c)) (mac_a(c)) (a b (c)) (mac_a(c)) (mac_a(c))
4: (mac_a(c)) (mac_a(c)) (a d (c)) (mac_a(c)) (mac_a(c))
5: a d (x) a d(x) a d(x) a d(x) a (x)
6: a a a a aa a a a a
7: a c de a c de a c de a c de a c de
8: # zz # zz # zz # zz # zz
9: i i i i i i i i i i
10: (i i) (i i) (i i) (i i) (i i)
11: 2*9*G 2 * F(9) 2 *F(9) 2*9*G 2*9*G
12: if(10==20); A( 10,20); A( 10,20); error if(==)if(==) 10,20;

mcc (my compiler)

1: a ( c )
2: a ( c )
3: ( mac_a ( c ) )
4: ( mac_a ( c ) )
5: mac_a ( x )
6: a a
7: a c de
8: # zz
9: i i
10: ( i i )
11: 2 * 9 * G
12: error

Other errors and warnings that some compilers gave are not shown.

DMC couldn't be easily tested as its -e option doesn't work properly.
But there were a few errors shown.
--
bartc
Thiago Adams
2017-05-15 14:11:51 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
Why is it surprising that tcc compiles it?
Because the multi-pass approach that would be the only way I could think
of to get over some of these problems, is not compatible with very fast
compilation. Also tcc has bugs with other examples (so is not just
lifting the same preprocessor code from gcc).
I do it separately from the "macro expansion core" in one pass.
The "macro expansion core" does multi pass. (I am using that algorithm I put here)
When the scanner reads M it checks if it is a macro. Then it collect the macro call arguments reading only tokens that are included.

The result of macro call is M(10,20,30). This "M(10,20,30)" is send to
the "macro expansion core".
The result of expansion is pushed "similar of #include , but using string instead of file)
bartc
2017-05-15 14:29:32 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by bartc
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
Why is it surprising that tcc compiles it?
Because the multi-pass approach that would be the only way I could think
of to get over some of these problems, is not compatible with very fast
compilation. Also tcc has bugs with other examples (so is not just
lifting the same preprocessor code from gcc).
I do it separately from the "macro expansion core" in one pass.
The "macro expansion core" does multi pass. (I am using that algorithm I put here)
When the scanner reads M it checks if it is a macro. Then it collect the macro call arguments reading only tokens that are included.
The result of macro call is M(10,20,30). This "M(10,20,30)" is send to
the "macro expansion core".
The result of expansion is pushed "similar of #include , but using string instead of file)
I can compile the above with a one-line change. (But I can't make it
permanent as it would probably fail with everything else!)

I use three levels of tokenising function. Level 1 is the lowest, only
level 2 looks at #-directives. The code that assembles macro arguments
calls level 1. If I make it call level 2, that works, but I already know
it will screw up if I keep it like that. Maybe if I duplicate the
detection of #-directives within that argument handling code...

(I promised myself I wouldn't touch the preprocessing module unless it
was essential. Horrible language to have to implement which as far as I
know only works by chance. Fortunately I don't have to compile that
Python-related code where this macro crops up.)
--
bartc
Ben Bacarisse
2017-05-15 15:41:26 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
Why is it surprising that tcc compiles it?
Because the multi-pass approach that would be the only way I could
think of to get over some of these problems, is not compatible with
very fast compilation. Also tcc has bugs with other examples (so is
not just lifting the same preprocessor code from gcc).
OK. I don't see any obvious need for anything multi-pass in this case.

<snip>
Post by bartc
Post by Ben Bacarisse
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. Do you think the
wording of the standard needs to be improved and, if so, how?
I would prefer that the possibilities were purposely kept simple.
Ah, that's not what I meant. I was not asking you to invent a macro
language you'd prefer, I was asking if you can suggest a way in which
the wording of (or, for that matter, any other changes to) the standard
would make the current intended semantics clearer.

This is a good case to consider since there is clearly some
disagreement. (There's a high probably that I'm wrong about it being
defined but it's certain that it's not as clear as I thought it was.)

<snip>
Post by bartc
Post by Ben Bacarisse
Post by bartc
But then maybe someone will have a macro expansion that generates
#-directives
Macros can't (validly) expand to directives.
I think #pragmas can be generated.
In some cases, the pp-tokens that follow 'pragma' may be expanded and in
some cases that expansion is forbidden, but that's not really got much
to do with the question of macros expanding to
Post by bartc
However, what's to stop gcc from allowing directives to be generated?
Then others will have to follow.
Has every compiler implemented every gcc extension? I don't think so.
Post by bartc
Here's a test of some old macro examples, and the results with various
compilers. They're all different! This is rather scary actually; what
other differences could there be in real code, which don't manifest
themselves so obviously?
People like you keep saying that the macro system is perfectly
defined,
No, I don't. In fact I remember saying that macro expansion is
notoriously hard to specify, didn't I?

It's true that I don't think C's macro language is quite as hard you
like to make out, but then that's a general remark about you and C -- it
all seems way more complicated to you than it does to me. You also keep
taking that to mean I think it's all simple. I don't.
Post by bartc
yet the experts writing the actual compilers seem to have a
bit of trouble! (Perhaps you would like to help them out...)
I have, in fact, supplied a patch to tcc and submitted bug reports to
lccwin32 but that's by the by because I am just as likely to make
mistakes as anyone else.
Post by bartc
This is the C file being tested (the numeric labels are included but
//---------------------------------
#define a(x) mac_a(x)
#define b
#define d(x) (x)
#define e(x, y, z) x y ## z
#define f a d d (x)
#define g a b (c)
#define h(x, y) x y
#define hash #
#define i j j
#define j i
#define F(a) a*G
#define G(a) F(a)
#define A(x,y) if(x==y)
#define B A(
1: a b (c)
2: a d (c)
3: d(a b (c))
4: d(a d (c))
5: f
6: e(a,,a)
7: e(a, c d, e)
8: h(hash, zz)
9: i
10: d(i)
11: F(2)(9)
12: B 10,20);
//---------------------------------
Please give the flags you pass. It may not make any difference (I don't
know all of these compilers) but some may default to peculiar
non-conforming modes or to old standards. (I think I've said this
before. Showing what gcc does in it's default mode is almost pointless
when trying to work out what should happen in standard C.)
Post by bartc
gcc Pelles C lccwin64 tcc msvc008
1: a (c) a (c) a b (c) a (c) a (c)
2: a (c) a(c) a d (c) a(c) a(c)
3: (mac_a(c)) (mac_a(c)) (a b (c)) (mac_a(c)) (mac_a(c))
4: (mac_a(c)) (mac_a(c)) (a d (c)) (mac_a(c)) (mac_a(c))
5: a d (x) a d(x) a d(x) a d(x) a (x)
6: a a a a aa a a a a
7: a c de a c de a c de a c de a c de
8: # zz # zz # zz # zz # zz
9: i i i i i i i i i i
10: (i i) (i i) (i i) (i i) (i i)
11: 2*9*G 2 * F(9) 2 *F(9) 2*9*G 2*9*G
12: if(10==20); A( 10,20); A( 10,20); error if(==)if(==) 10,20;
It would be much more helpful if you said which ones are contentious or
badly defined by the standard. And why show 7, 8, 9 and 10? They all
seem to give the same output. Is there any thing to debate about those
cases?

Here is the table edited to show the interesting things:

gcc Pelles C lccwin64 tcc msvc008

1: a b (c)
2: a d (c)
3: (a b (c))
4: (a d (c))
5: a (x)
6: aa
12: A( 10,20); A( 10,20); error if(==)if(==) 10,20;

You've found some bugs in lccwin64, one in each of Pelles C and tcc and
two in and old version of MSVC. (In line 11 both outputs are
acceptable). Are there any contentious results here? I.e. do you think
there is anything other than some bugs?
Post by bartc
mcc (my compiler)
5: mac_a ( x )
12: error
(Again I've left only what appears to be significant). Do you think
these are bugs or is there some debate to be had about what 5 and 12
should be?

<snip>
--
Ben.
s***@casperkitty.com
2017-05-15 15:53:16 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Ah, that's not what I meant. I was not asking you to invent a macro
language you'd prefer, I was asking if you can suggest a way in which
the wording of (or, for that matter, any other changes to) the standard
would make the current intended semantics clearer.
IMHO, many things could be made much clearer if the authors of the Standard
were to explicitly recognizes places which different implementations would
be allowed to interpret differently at their leisure--not saying that the
behavior was Undefined, but rather allowing implementations to choose in
Unspecified fashion from among a few discrete possibilities, one of which
may be refusal to process ambiguous code.

It's useful to have a category of programs which will be processed
identically by all implementations, but requiring that implementations jump
through hoops to ensure uniform handling of corner-case constructs which
would only arise in compiler-test scenarios doesn't seem very helpful.
Tim Rentsch
2017-05-24 15:16:24 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Ben Bacarisse
Ah, that's not what I meant. I was not asking you to invent a macro
language you'd prefer, I was asking if you can suggest a way in which
the wording of (or, for that matter, any other changes to) the standard
would make the current intended semantics clearer.
IMHO, many things could be made much clearer if the authors of the
Standard [...]
Someone with your track record of horrifically obfuscated writing
is in no position to give advice on how to make things clearer.
s***@casperkitty.com
2017-05-24 17:43:34 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
Post by s***@casperkitty.com
IMHO, many things could be made much clearer if the authors of the
Standard [...]
Someone with your track record of horrifically obfuscated writing
is in no position to give advice on how to make things clearer.
Sometimes sentences to get away from me, especially when I'm in a rush. A
formal language standard, however, should be held to a higher-standard than
a 5-minute Usenet post.

It is hard to write a standard which classifies every program as violating
constraints or not violating constraints. It is impossible to do so while
simultaneously guaranteeing that:

1. Any program which any existing implementations would reject is regarded
as violating constraints.

2. No program which is in production use will be regarded as violating
constraints.

It would be much easier, *and* more useful, to recognize three categories
of programs:

1. Those which should be unambiguously regarded as violating constraints.

2. Those which should be unambiguously regarded as not violating
constraints.

3. Those which may or may not be regarded as violating constraints.

A good Standard should endeavor to minimize the fraction of programs in
the third class when practical. If a construct is simultaneously rejected
by some compilers but usefully employed in production code, however, that
would suggest that it cannot "practically" be classified as #1 or #2.
Ike Naar
2017-05-24 19:06:55 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Sometimes sentences to get away from me, especially when I'm in a rush.
Rushing again?
Tim Rentsch
2017-05-27 13:10:15 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Tim Rentsch
Post by s***@casperkitty.com
IMHO, many things could be made much clearer if the authors of the
Standard [...]
Someone with your track record of horrifically obfuscated writing
is in no position to give advice on how to make things clearer.
Sometimes sentences to get away from me, especially when I'm in a
rush. [...]
What, you're saying you can do better? Prove it. After you have
demonstrated an ability to produce reasonable quality prose, not
just in isolated sentences but over scores of entire postings,
then we can talk about comparing your ideas to writing in the
Standard.
David Kleinecke
2017-05-27 19:02:34 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
Post by s***@casperkitty.com
Post by Tim Rentsch
Post by s***@casperkitty.com
IMHO, many things could be made much clearer if the authors of the
Standard [...]
Someone with your track record of horrifically obfuscated writing
is in no position to give advice on how to make things clearer.
Sometimes sentences to get away from me, especially when I'm in a
rush. [...]
What, you're saying you can do better? Prove it. After you have
demonstrated an ability to produce reasonable quality prose, not
just in isolated sentences but over scores of entire postings,
then we can talk about comparing your ideas to writing in the
Standard.
The Standard is not written in what would ordinarily be called
quality prose. Its style is a legalistic style. Legal writing
has a long long history and is not always admired. But it is a
serious attempt to be precise and thorough.

I once suggested describing the distribution of assets in a
trust agreement with COBOL code. The lawyers were appalled.
"Why not", they said, "describe it in general in the agreement
and work out the details later." The agreement was never
finalized.

I am of the opinion that the standard could be clearer but is
generally a good piece of work. All the versions.
s***@casperkitty.com
2017-05-27 20:42:22 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
I am of the opinion that the standard could be clearer but is
generally a good piece of work. All the versions.
They're mostly pretty good, except for one major failing: it fails to
make clear that a decision to characterize an action as invoking Undefined
Behavior merely indicates a finding that some implementations *might*
benefit from being allowed to behave in arbitrary fashion. It does not in
any way imply that compilers should not attempt to define a useful behavior,
or at least offer useful behavioral guarantees, when targeting platforms
and application fields where doing so would make sense.
David Kleinecke
2017-05-27 22:48:48 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Kleinecke
I am of the opinion that the standard could be clearer but is
generally a good piece of work. All the versions.
They're mostly pretty good, except for one major failing: it fails to
make clear that a decision to characterize an action as invoking Undefined
Behavior merely indicates a finding that some implementations *might*
benefit from being allowed to behave in arbitrary fashion. It does not in
any way imply that compilers should not attempt to define a useful behavior,
or at least offer useful behavioral guarantees, when targeting platforms
and application fields where doing so would make sense.
I think the standard intended situations where a compiler
could react positively to be "implementation-defined". But
it says "Permissible undefined behavior ranges from ignoring
the situation completely with unpredictable results to
behaving behaving during translation or program execution
in a documented matter characteristic of the environment
(with or without the issuance of a diagnostic message) to
terminating a translation or execution (with the issuance
of a diagnostic message)"

As nearly as I can grasp that it means the compiler can do
anything it pleases provided:
(1) what it does is documented
(2) if it terminates it must issue a diagnostic.
As noted it could be clearer.
s***@casperkitty.com
2017-05-28 21:40:05 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by s***@casperkitty.com
Post by David Kleinecke
I am of the opinion that the standard could be clearer but is
generally a good piece of work. All the versions.
They're mostly pretty good, except for one major failing: it fails to
make clear that a decision to characterize an action as invoking Undefined
Behavior merely indicates a finding that some implementations *might*
benefit from being allowed to behave in arbitrary fashion. It does not in
any way imply that compilers should not attempt to define a useful behavior,
or at least offer useful behavioral guarantees, when targeting platforms
and application fields where doing so would make sense.
I think the standard intended situations where a compiler
could react positively to be "implementation-defined".
The term "Implementation-Defined" is mostly applied only to cases where a
compiler is required to choose from among a some specific possibilities (e.g.
the storage format for an "int"). In cases where an action is said to invoke
Implementation-Defined behavior, implementations are required to specify a
behavior whether or not *any* behavior would be useful given the target
platform and application field. It's unclear what degree of precision the
authors of the Standard sought to require with the phrase "Implementation-
Defined Behavior", but the Standard seems to have avoided applying it to
actions which might sometimes trap and sometimes not, based upon unpredictable
criteria.

If an action should behave in consistent usable fashion on 99% of
implementations, but there might plausibly be some implementations that
would benefit from being allowed to behave in not-necessarily-consistent
fashion, the Standard will describe such action as invoking Undefined
Behavior. A prime example of that would be left-shifting a negative
number. Consider...

1. On two's-complement machines there is only one sensible way to define the
behavior of a signed left shift, and C89 defined left-shift in that fashion.

2. On machines other than two's-complement machines, it may be more helpful
to have a signed left-shift work in other ways (e.g. on a ones'-comp
machine, multiply-by-two would probably be more useful than zero fill, or
performance may be enhanced by letting a compiler use whichever approach
would be faster in any given situation.

3. It may be helpful to have an option to trap cases where the choice between
zero fill and multiplication by two might affect a program's output. That
need not imply trapping all such shifts, however. If a compiler in-lines
a function and observes that code ignores the result of a left-shift
operation, code to check whether the value being shifted was negative may
not serve much purpose.

Because of the three points above, it would make sense for C99 to make
left-shifting of a negative number Undefined Behavior if there were any
likelihood of anyone every writing a C99 implementation for something other
than two's-complement machines. That should not be taken to imply, however,
that any general-purpose implementation for a two's-complement machine should
do anything other than behave in the fashion that had been mandated by C89.
Post by David Kleinecke
But
it says "Permissible undefined behavior ranges from ignoring
the situation completely with unpredictable results to
behaving behaving during translation or program execution
in a documented matter characteristic of the environment
(with or without the issuance of a diagnostic message) to
terminating a translation or execution (with the issuance
of a diagnostic message)"
As nearly as I can grasp that it means the compiler can do
(1) what it does is documented
(2) if it terminates it must issue a diagnostic.
As noted it could be clearer.
An implementation can be conforming without documenting anything about what
happens in cases where the Standard imposes no requirements. That does not
imply, however, that an implementation which documents useful behavior would
not be more suitable for some purposes than one which does not.

If one were to interpret UB as an invitation for an implementation to do
anything that would be even remotely sensible given its target platform and
application field, but also as an invitation for programmers to exploit
cases where every remotely-sensible behavior by an implementation would meet
the programmer's needs, that would seem like a far more useful balancing
of implementation and programmer needs than saying that implementations
must be presumed to act in arbitrarily nonsensical fashion.
Tim Rentsch
2017-05-29 15:33:41 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by David Kleinecke
I am of the opinion that the standard could be clearer but is
generally a good piece of work. All the versions.
They're mostly pretty good, except for one major failing: it fails to
make clear that a decision to characterize an action as invoking Undefined
Behavior merely indicates a finding that some implementations *might*
benefit from being allowed to behave in arbitrary fashion. It does not in
any way imply that compilers should not attempt to define a useful behavior,
or at least offer useful behavioral guarantees, when targeting platforms
and application fields where doing so would make sense.
I think the standard intended situations where a compiler
could react positively to be "implementation-defined". But
it says "Permissible undefined behavior ranges from ignoring
the situation completely with unpredictable results to
behaving behaving during translation or program execution
in a documented matter characteristic of the environment
(with or without the issuance of a diagnostic message) to
terminating a translation or execution (with the issuance
of a diagnostic message)"
As nearly as I can grasp that it means the compiler can do
(1) what it does is documented
(2) if it terminates it must issue a diagnostic.
As noted it could be clearer.
If you're talking about undefined behavior, neither of those
conditions has to hold. Any circumstances that fall into
the category of undefined behavior place no restrictions on
what an implementation must do, except that if there is a
syntax error or a constraint violation then a diagnostic
must be given.

(Incidentally, the passage you have quoted is a "NOTE", which
means the text is informative, not normative. That is, it might
be helpful to know, but it is not part of what formally defines
the language.)

The attribute of "implemenation-defined" applies only to those
behaviors explicitly identified as such in the Standard. An
implementation can document anything it wants (ie, as long as
those items that are required to be documented have been), but
doing that has no effect on what is implementation-defined.
Tim Rentsch
2017-05-29 15:17:53 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Tim Rentsch
Post by s***@casperkitty.com
Post by Tim Rentsch
Post by s***@casperkitty.com
IMHO, many things could be made much clearer if the authors of the
Standard [...]
Someone with your track record of horrifically obfuscated writing
is in no position to give advice on how to make things clearer.
Sometimes sentences to get away from me, especially when I'm in a
rush. [...]
What, you're saying you can do better? Prove it. After you have
demonstrated an ability to produce reasonable quality prose, not
just in isolated sentences but over scores of entire postings,
then we can talk about comparing your ideas to writing in the
Standard.
The Standard is not written in what would ordinarily be called
quality prose. Its style is a legalistic style. Legal writing
has a long long history and is not always admired. But it is a
serious attempt to be precise and thorough.
Text in the Standard is prose, as opposed to poetry, and is
written in formal English. The style resembles the style
of legal documents, which isn't too surprising since both
use formal English, and as you point out precise language,
but in my very much non-expert opinion the Standard doesn't
really cut it as a legal document.

All that is sort of by-the-way, because the prose I was referring
to above relates to writing from supercat. Whatever else might be
said of it, the quality of the writing in the Standard is pretty
high. The quality of supercat's writing is pretty low. If he
wants to make suggestions about how the Standard could be more
clear, he first should demonstrate some ability to produce clear
writing himself.
bartc
2017-05-29 15:26:19 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
All that is sort of by-the-way, because the prose I was referring
to above relates to writing from supercat. Whatever else might be
said of it, the quality of the writing in the Standard is pretty
high. The quality of supercat's writing is pretty low.
The C Standard is a document that has been refined over decades. And
versions of it presumably have a team to read, check and hone the
writing to perfection.

And you want to compare that with a few comments on a usenet post? (A
medium that doesn't even let you fix a typo once something is posted.)
--
bartc
Tim Rentsch
2017-05-30 13:13:15 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
All that is sort of by-the-way, because the prose I was referring
to above relates to writing from supercat. Whatever else might be
said of it, the quality of the writing in the Standard is pretty
high. The quality of supercat's writing is pretty low.
The C Standard is a document that has been refined over decades. And
versions of it presumably have a team to read, check and hone the
writing to perfection.
And you want to compare that with a few comments on a usenet post?
If that post calls out the C standard for not being clear enough,
and purports to give advice on how to improve it, Yes I do. As
would I hope any sensible person.
Malcolm McLean
2017-05-30 16:57:23 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
If that post calls out the C standard for not being clear enough,
and purports to give advice on how to improve it, Yes I do. As
would I hope any sensible person. [expect SuperCat to write
with more clarity]
It's very common to find programmers who are technically proficient
but poor at communicating. It's said that English Literature graduates
in fact make some of the best programmers.
GOTHIER Nathan
2017-05-30 17:12:33 UTC
Reply
Permalink
Raw Message
On Tue, 30 May 2017 09:57:23 -0700 (PDT)
Post by Malcolm McLean
It's very common to find programmers who are technically proficient
but poor at communicating. It's said that English Literature graduates
in fact make some of the best programmers.
Actually there's no "best programmer" but only good and bad code. Even the
so-called best programmers wrote some utter sh!t but legends tend to mask
this part of their humanity. The humanity is buggy and the more wonderful bug
is its creativity. Retards with trisomy are creatives but socially excluded for
being out of the common standard of humanity. In some way we need both side of
the humanity, scientists and artists, to build a new world.
David Kleinecke
2017-05-30 21:55:43 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Tue, 30 May 2017 09:57:23 -0700 (PDT)
Post by Malcolm McLean
It's very common to find programmers who are technically proficient
but poor at communicating. It's said that English Literature graduates
in fact make some of the best programmers.
Actually there's no "best programmer" but only good and bad code. Even the
so-called best programmers wrote some utter sh!t but legends tend to mask
this part of their humanity. The humanity is buggy and the more wonderful bug
is its creativity. Retards with trisomy are creatives but socially excluded for
being out of the common standard of humanity. In some way we need both side of
the humanity, scientists and artists, to build a new world.
My biggest bitch with the Standards is that they use the
modal verb "shall" is an un-English way (which might be a
legalism). In fact, the modal "shall" has, these days,
disappeared from all but the most old-fashioned pedantic
English.
s***@casperkitty.com
2017-05-30 22:39:03 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
My biggest bitch with the Standards is that they use the
modal verb "shall" is an un-English way (which might be a
legalism). In fact, the modal "shall" has, these days,
disappeared from all but the most old-fashioned pedantic
English.
That is a pretty severe problem. In most cases where a standard says that
a conforming widget MUST do X, that implies that anything which does not do
X is, *by definition*, not a conforming widget. The C Standard, however,
uses the term "shall" in some cases where it means nothing beyond the fact
that:

1. A program that fails to do what is specified is not strictly conforming.

2. The Standard does not require that conforming implementations behave
in any particular fashion if programs fail to do what is specified.

Note that if it would make sense for most implementations to process some
action the same way (e.g. the way it was defined in C89) but there might
possibly be some platforms where defining a consistent behavior would be
expensive (e.g. ones'-complement machines), leaving the behavior Undefined
should allow implementations to process such actions sensibly. All that's
necessary is that implementers recognize that (1) there is significant
value in following precedent absent a compelling reason to do otherwise,
and (2) permission to do otherwise does not, in and of itself, constitute
a compelling reason.
Keith Thompson
2017-05-30 23:03:40 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Kleinecke
My biggest bitch with the Standards is that they use the
modal verb "shall" is an un-English way (which might be a
legalism). In fact, the modal "shall" has, these days,
disappeared from all but the most old-fashioned pedantic
English.
That is a pretty severe problem.
No, it isn't.
Post by s***@casperkitty.com
In most cases where a standard says that
a conforming widget MUST do X, that implies that anything which does not do
X is, *by definition*, not a conforming widget. The C Standard, however,
uses the term "shall" in some cases where it means nothing beyond the fact
1. A program that fails to do what is specified is not strictly conforming.
2. The Standard does not require that conforming implementations behave
in any particular fashion if programs fail to do what is specified.
The term, as you know, is "undefined behavior". Hiding it behind extra
wording is not helpful.

The C standard defines clearly and unambiguously what it means by
"shall". The meaning depends on the context; it means one thing in a
constraint, and something else outside a constraint.
Post by s***@casperkitty.com
Note that if it would make sense for most implementations to process some
action the same way (e.g. the way it was defined in C89) but there might
possibly be some platforms where defining a consistent behavior would be
expensive (e.g. ones'-complement machines), leaving the behavior Undefined
should allow implementations to process such actions sensibly.
Leaving the behavior undefined quite literally allows implementations to
behave in any way they like, including whatever you think is sensible.
Post by s***@casperkitty.com
All that's
necessary is that implementers recognize that (1) there is significant
value in following precedent absent a compelling reason to do otherwise,
and (2) permission to do otherwise does not, in and of itself, constitute
a compelling reason.
Do you really have a problem with the way the standard uses the word "shall"?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-05-30 23:50:20 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
The term, as you know, is "undefined behavior". Hiding it behind extra
wording is not helpful.
Many useful tasks cannot be performed efficiently, or at all, without using
behaviors which are defined by some implementations but are not mandated by
the Standard. What terminology would you use to distinguish those behaviors
from behaviors which are not defined by anything whatsoever?

The fact that programs which invoke upon not-defined-by-anything behaviors
are broken does not imply that programs which invoke not-defined-by-the-
Standard-but-instead-defined-by-other-things behaviors are broken. Some
compiler writers seem unable to distinguish those categories, however.
Post by Keith Thompson
The C standard defines clearly and unambiguously what it means by
"shall". The meaning depends on the context; it means one thing in a
constraint, and something else outside a constraint.
Most standards specify what conforming entities have to "do", and are quite
specific about what entities are responsible for ensuring what. The Standard
sometimes talks about obligations of programs or implementations, but
sometimes uses "shall be" to impose obligations upon grammatical constructs.
Post by Keith Thompson
Post by s***@casperkitty.com
Note that if it would make sense for most implementations to process some
action the same way (e.g. the way it was defined in C89) but there might
possibly be some platforms where defining a consistent behavior would be
expensive (e.g. ones'-complement machines), leaving the behavior Undefined
should allow implementations to process such actions sensibly.
Leaving the behavior undefined quite literally allows implementations to
behave in any way they like, including whatever you think is sensible.
Indeed, but some programmers seem to think that permission to behave
nonsensically should be viewed, in and of itself, as a compelling reason
to behave nonsensically. Additionally, some people seem to think that
the failure of the Standard to define a behavior represents a judgment by
the Committee that any programs that would rely upon that behavior should
be considered "defective".
Post by Keith Thompson
Post by s***@casperkitty.com
All that's
necessary is that implementers recognize that (1) there is significant
value in following precedent absent a compelling reason to do otherwise,
and (2) permission to do otherwise does not, in and of itself, constitute
a compelling reason.
Do you really have a problem with the way the standard uses the word "shall"?
I believe it has contributed toward some implementers' belief that programs
relying upon certain behaviors should be viewed as "defective", rather than
merely being unsuitable for implementations which are designed for other
kinds of applications.
Keith Thompson
2017-05-31 00:12:01 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
The term, as you know, is "undefined behavior". Hiding it behind extra
wording is not helpful.
Many useful tasks cannot be performed efficiently, or at all, without using
behaviors which are defined by some implementations but are not mandated by
the Standard. What terminology would you use to distinguish those behaviors
from behaviors which are not defined by anything whatsoever?
"Non-portable", I suppose.
Post by s***@casperkitty.com
The fact that programs which invoke upon not-defined-by-anything behaviors
are broken does not imply that programs which invoke not-defined-by-the-
Standard-but-instead-defined-by-other-things behaviors are broken. Some
compiler writers seem unable to distinguish those categories, however.
Feel free to complain to them.
Post by s***@casperkitty.com
Post by Keith Thompson
The C standard defines clearly and unambiguously what it means by
"shall". The meaning depends on the context; it means one thing in a
constraint, and something else outside a constraint.
Most standards specify what conforming entities have to "do", and are quite
specific about what entities are responsible for ensuring what. The Standard
sometimes talks about obligations of programs or implementations, but
sometimes uses "shall be" to impose obligations upon grammatical constructs.
For example?

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-05-31 13:30:30 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
Most standards specify what conforming entities have to "do", and are quite
specific about what entities are responsible for ensuring what. The Standard
sometimes talks about obligations of programs or implementations, but
sometimes uses "shall be" to impose obligations upon grammatical constructs.
For example?
Compare 6.2.5p3:

If any other character is stored in a char object, the resulting value
is implementation-defined but shall be within the range of values that
can be represented in that type.

with 6.8.4.2p2:

If a switch statement has an associated case or default label within
the scope of an identifier with a variably modified type, the entire
switch statement shall be within the scope of that identifier.

The former could be interpreted to suggest that if an implementation
defined CHAR_MIN as -127 but specified that values get two's-complement
reduced, a program which stored 128 to a "char" and read it back would
violate a "shall" constraint and thus have Undefined Behavior. The
latter could be read as suggesting that an implementation shall treat
the scopes of identifiers in such fashion as to abide by the second.

Both constructs can be worked out, of course, but especially with the second
I think it would be clearer to say that no "case" label shall be placed
within the scope of a variably-modified type *unless* the entire switch
statement is likewise within that scope.
Keith Thompson
2017-05-31 16:36:56 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
Post by s***@casperkitty.com
Most standards specify what conforming entities have to "do", and
are quite specific about what entities are responsible for ensuring
what. The Standard sometimes talks about obligations of programs
or implementations, but sometimes uses "shall be" to impose
obligations upon grammatical constructs.
For example?
If any other character is stored in a char object, the resulting value
is implementation-defined but shall be within the range of values that
can be represented in that type.
(which is outside a constraint)
Post by s***@casperkitty.com
If a switch statement has an associated case or default label within
the scope of an identifier with a variably modified type, the entire
switch statement shall be within the scope of that identifier.
(which is within a constraint)
Post by s***@casperkitty.com
The former could be interpreted to suggest that if an implementation
defined CHAR_MIN as -127 but specified that values get two's-complement
reduced, a program which stored 128 to a "char" and read it back would
violate a "shall" constraint and thus have Undefined Behavior.
I think it's a misuse of the word "shall". Replacing "shall" by "is
guaranteed to" would be an improvement. (BTW, your use of the word
"constraint" here is inconsistent with the way the standard uses it.)
Post by s***@casperkitty.com
The
latter could be read as suggesting that an implementation shall treat
the scopes of identifiers in such fashion as to abide by the second.
Except that that interpretation would contradict other requirements
about scopes of identifiers. I think it's clear enough that a
constraint is being imposed on programs. This program:

int main(void) {
int n = 1;
switch (n) {
case 1: ;
int vla[n];
default: ;
}
}

violates that constraint, requiring a diagnostic.
Post by s***@casperkitty.com
Both constructs can be worked out, of course, but especially with the
second I think it would be clearer to say that no "case" label shall
be placed within the scope of a variably-modified type *unless* the
entire switch statement is likewise within that scope.
Sure, that's another way to express the same requirement.

(And yet again, please use shorter lines so I don't have to reformat
quoted text.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Kleinecke
2017-05-31 19:08:25 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
Post by Keith Thompson
Post by s***@casperkitty.com
Most standards specify what conforming entities have to "do", and
are quite specific about what entities are responsible for ensuring
what. The Standard sometimes talks about obligations of programs
or implementations, but sometimes uses "shall be" to impose
obligations upon grammatical constructs.
For example?
If any other character is stored in a char object, the resulting value
is implementation-defined but shall be within the range of values that
can be represented in that type.
(which is outside a constraint)
Post by s***@casperkitty.com
If a switch statement has an associated case or default label within
the scope of an identifier with a variably modified type, the entire
switch statement shall be within the scope of that identifier.
(which is within a constraint)
Post by s***@casperkitty.com
The former could be interpreted to suggest that if an implementation
defined CHAR_MIN as -127 but specified that values get two's-complement
reduced, a program which stored 128 to a "char" and read it back would
violate a "shall" constraint and thus have Undefined Behavior.
I think it's a misuse of the word "shall". Replacing "shall" by "is
guaranteed to" would be an improvement. (BTW, your use of the word
"constraint" here is inconsistent with the way the standard uses it.)
Post by s***@casperkitty.com
The
latter could be read as suggesting that an implementation shall treat
the scopes of identifiers in such fashion as to abide by the second.
Except that that interpretation would contradict other requirements
about scopes of identifiers. I think it's clear enough that a
int main(void) {
int n = 1;
switch (n) {
case 1: ;
int vla[n];
default: ;
}
}
violates that constraint, requiring a diagnostic.
Post by s***@casperkitty.com
Both constructs can be worked out, of course, but especially with the
second I think it would be clearer to say that no "case" label shall
be placed within the scope of a variably-modified type *unless* the
entire switch statement is likewise within that scope.
Sure, that's another way to express the same requirement.
(And yet again, please use shorter lines so I don't have to reformat
quoted text.)
Lots of action on a minor point. Oh Well.

My objection to the Standard's use of "shall" was an
esthetic objection to the writing in the Standard - not
a comment on undefined behavior.

As nearly as I can tell this use of "shall" in a technical
sense is confined to constraints (the definition of which I
don't think is clear - but everybody knows what they mean).
Is there a use of "shall" anywhere outside a constraint?
Scott Lurndal
2017-05-31 19:16:11 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Keith Thompson
Post by s***@casperkitty.com
Post by Keith Thompson
Post by s***@casperkitty.com
Most standards specify what conforming entities have to "do", and
are quite specific about what entities are responsible for ensuring
what. The Standard sometimes talks about obligations of programs
or implementations, but sometimes uses "shall be" to impose
obligations upon grammatical constructs.
For example?
If any other character is stored in a char object, the resulting value
is implementation-defined but shall be within the range of values that
can be represented in that type.
(which is outside a constraint)
Post by s***@casperkitty.com
If a switch statement has an associated case or default label within
the scope of an identifier with a variably modified type, the entire
switch statement shall be within the scope of that identifier.
(which is within a constraint)
Post by s***@casperkitty.com
The former could be interpreted to suggest that if an implementation
defined CHAR_MIN as -127 but specified that values get two's-complement
reduced, a program which stored 128 to a "char" and read it back would
violate a "shall" constraint and thus have Undefined Behavior.
I think it's a misuse of the word "shall". Replacing "shall" by "is
guaranteed to" would be an improvement. (BTW, your use of the word
"constraint" here is inconsistent with the way the standard uses it.)
Post by s***@casperkitty.com
The
latter could be read as suggesting that an implementation shall treat
the scopes of identifiers in such fashion as to abide by the second.
Except that that interpretation would contradict other requirements
about scopes of identifiers. I think it's clear enough that a
int main(void) {
int n = 1;
switch (n) {
case 1: ;
int vla[n];
default: ;
}
}
violates that constraint, requiring a diagnostic.
Post by s***@casperkitty.com
Both constructs can be worked out, of course, but especially with the
second I think it would be clearer to say that no "case" label shall
be placed within the scope of a variably-modified type *unless* the
entire switch statement is likewise within that scope.
Sure, that's another way to express the same requirement.
(And yet again, please use shorter lines so I don't have to reformat
quoted text.)
Lots of action on a minor point. Oh Well.
My objection to the Standard's use of "shall" was an
esthetic objection to the writing in the Standard - not
a comment on undefined behavior.
As nearly as I can tell this use of "shall" in a technical
sense is confined to constraints (the definition of which I
don't think is clear - but everybody knows what they mean).
Is there a use of "shall" anywhere outside a constraint?
While don't have the C spec handy, Posix defines 'shall' as follows:

shall

For an implementation that conforms to POSIX.1-2008, describes a
feature or behavior that is mandatory. An application can rely
on the existence of the feature or behavior.

For an application or user, describes a behavior that is mandatory.


http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap01.html
David Kleinecke
2017-05-31 19:35:56 UTC
Reply
Permalink
Raw Message
Post by Scott Lurndal
Post by David Kleinecke
Post by Keith Thompson
Post by s***@casperkitty.com
Post by Keith Thompson
Post by s***@casperkitty.com
Most standards specify what conforming entities have to "do", and
are quite specific about what entities are responsible for ensuring
what. The Standard sometimes talks about obligations of programs
or implementations, but sometimes uses "shall be" to impose
obligations upon grammatical constructs.
For example?
If any other character is stored in a char object, the resulting value
is implementation-defined but shall be within the range of values that
can be represented in that type.
(which is outside a constraint)
Post by s***@casperkitty.com
If a switch statement has an associated case or default label within
the scope of an identifier with a variably modified type, the entire
switch statement shall be within the scope of that identifier.
(which is within a constraint)
Post by s***@casperkitty.com
The former could be interpreted to suggest that if an implementation
defined CHAR_MIN as -127 but specified that values get two's-complement
reduced, a program which stored 128 to a "char" and read it back would
violate a "shall" constraint and thus have Undefined Behavior.
I think it's a misuse of the word "shall". Replacing "shall" by "is
guaranteed to" would be an improvement. (BTW, your use of the word
"constraint" here is inconsistent with the way the standard uses it.)
Post by s***@casperkitty.com
The
latter could be read as suggesting that an implementation shall treat
the scopes of identifiers in such fashion as to abide by the second.
Except that that interpretation would contradict other requirements
about scopes of identifiers. I think it's clear enough that a
int main(void) {
int n = 1;
switch (n) {
case 1: ;
int vla[n];
default: ;
}
}
violates that constraint, requiring a diagnostic.
Post by s***@casperkitty.com
Both constructs can be worked out, of course, but especially with the
second I think it would be clearer to say that no "case" label shall
be placed within the scope of a variably-modified type *unless* the
entire switch statement is likewise within that scope.
Sure, that's another way to express the same requirement.
(And yet again, please use shorter lines so I don't have to reformat
quoted text.)
Lots of action on a minor point. Oh Well.
My objection to the Standard's use of "shall" was an
esthetic objection to the writing in the Standard - not
a comment on undefined behavior.
As nearly as I can tell this use of "shall" in a technical
sense is confined to constraints (the definition of which I
don't think is clear - but everybody knows what they mean).
Is there a use of "shall" anywhere outside a constraint?
shall
For an implementation that conforms to POSIX.1-2008, describes a
feature or behavior that is mandatory. An application can rely
on the existence of the feature or behavior.
For an application or user, describes a behavior that is mandatory.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap01.html
There isn't any problem connected with what "shall" is being
used to say. But it isn't being used in its ordinary English
way. So I called it a technical term. I consider it unfortunate
and wonder why "must" was not used instead.
Tim Rentsch
2017-06-01 11:06:52 UTC
Reply
Permalink
Raw Message
[..use of "shall" in the Standard..]
There isn't any problem connected with what "shall" is being
used to say. But it isn't being used in its ordinary English
way. [...]
This assertion is in conflict with all of the ordinary English
dictionaries I consulted.
David Kleinecke
2017-06-01 17:33:51 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
[..use of "shall" in the Standard..]
There isn't any problem connected with what "shall" is being
used to say. But it isn't being used in its ordinary English
way. [...]
This assertion is in conflict with all of the ordinary English
dictionaries I consulted.
Dictionaries are not a good guide to current usage. "Shall" is
essentially obsolete in modern English. The modern English
modal to use would have been "must".
Keith Thompson
2017-06-01 19:00:37 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Tim Rentsch
[..use of "shall" in the Standard..]
There isn't any problem connected with what "shall" is being
used to say. But it isn't being used in its ordinary English
way. [...]
This assertion is in conflict with all of the ordinary English
dictionaries I consulted.
Dictionaries are not a good guide to current usage. "Shall" is
essentially obsolete in modern English. The modern English
modal to use would have been "must".
I disagree that "shall" is obsolete. Shall I provide examples?

In any case, the use of "shall" in this sense is quite common in
standards, particularly ISO standards.

And if it were obsolete, that would be an argument in favor of using it,
since the standard can define what it means in that context with less
risk of conflicting with common usage.

You just need to understand how the standard uses the word, and that's
something you only have to do once. After that, it shoulnd't be a
problem.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Keith Thompson
2017-05-31 19:43:09 UTC
Reply
Permalink
Raw Message
[...]
Post by Scott Lurndal
Post by David Kleinecke
As nearly as I can tell this use of "shall" in a technical
sense is confined to constraints (the definition of which I
don't think is clear - but everybody knows what they mean).
Is there a use of "shall" anywhere outside a constraint?
(Perhaps David has forgotten that I killfiled him.)
Post by Scott Lurndal
shall
For an implementation that conforms to POSIX.1-2008, describes a
feature or behavior that is mandatory. An application can rely
on the existence of the feature or behavior.
For an application or user, describes a behavior that is mandatory.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap01.html
The C standard defines its use of "shall" in section 4, paragraphs
1 and 2:

In this International Standard, "shall" is to be interpreted as
a requirement on an implementation or on a program; conversely,
"shall not" is to be interpreted as a prohibition.

If a "shall" or "shall not" requirement that appears outside
of a constraint or runtime constraint is violated, the behavior
is undefined. Undefined behavior is otherwise indicated in this
International Standard by the words "undefined behavior" or by
the omission of any explicit definition of behavior. There is
no difference in emphasis among these three; they all describe
"behavior that is undefined".

5.1.1.3p1 says that violations of constraints (and of syntax rules) must
be diagnosed.

There are numerous occurrences of "shall" both inside and outside
constraints.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Kleinecke
2017-06-01 04:50:32 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by Scott Lurndal
Post by David Kleinecke
As nearly as I can tell this use of "shall" in a technical
sense is confined to constraints (the definition of which I
don't think is clear - but everybody knows what they mean).
Is there a use of "shall" anywhere outside a constraint?
(Perhaps David has forgotten that I killfiled him.)
Post by Scott Lurndal
shall
For an implementation that conforms to POSIX.1-2008, describes a
feature or behavior that is mandatory. An application can rely
on the existence of the feature or behavior.
For an application or user, describes a behavior that is mandatory.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap01.html
The C standard defines its use of "shall" in section 4, paragraphs
In this International Standard, "shall" is to be interpreted as
a requirement on an implementation or on a program; conversely,
"shall not" is to be interpreted as a prohibition.
If a "shall" or "shall not" requirement that appears outside
of a constraint or runtime constraint is violated, the behavior
is undefined. Undefined behavior is otherwise indicated in this
International Standard by the words "undefined behavior" or by
the omission of any explicit definition of behavior. There is
no difference in emphasis among these three; they all describe
"behavior that is undefined".
5.1.1.3p1 says that violations of constraints (and of syntax rules) must
be diagnosed.
There are numerous occurrences of "shall" both inside and outside
constraints.
I'm not going to check that against anything except the C89
standard. The paragraph about constraints is part of 3.16.
The other paragraph is missing. But this changes nothing.
My problem is that the standard is using "shall" technically
in places where "must" would be real English.

Perhaps the later standards use "shall" outside of constraints
more freely than the C89 standard. Perhaps you can give be a
reference to where "shall" occurs outside a constraints in a
passage that is unchanged from C89.

And, of course, you killfiled me because I don't have enough
respect for the standards. To the tiger in the zoo Madeline
said pooh, pooh
Noob
2017-06-01 11:24:34 UTC
Reply
Permalink
Raw Message
While I don't have the C spec handy [...]
Great resource:

http://port70.net/~nsz/c/
j***@gmail.com
2017-05-31 20:02:10 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Keith Thompson
Post by s***@casperkitty.com
Post by Keith Thompson
Post by s***@casperkitty.com
Most standards specify what conforming entities have to "do", and
are quite specific about what entities are responsible for ensuring
what. The Standard sometimes talks about obligations of programs
or implementations, but sometimes uses "shall be" to impose
obligations upon grammatical constructs.
For example?
If any other character is stored in a char object, the resulting value
is implementation-defined but shall be within the range of values that
can be represented in that type.
(which is outside a constraint)
Post by s***@casperkitty.com
If a switch statement has an associated case or default label within
the scope of an identifier with a variably modified type, the entire
switch statement shall be within the scope of that identifier.
(which is within a constraint)
Post by s***@casperkitty.com
The former could be interpreted to suggest that if an implementation
defined CHAR_MIN as -127 but specified that values get two's-complement
reduced, a program which stored 128 to a "char" and read it back would
violate a "shall" constraint and thus have Undefined Behavior.
I think it's a misuse of the word "shall". Replacing "shall" by "is
guaranteed to" would be an improvement. (BTW, your use of the word
"constraint" here is inconsistent with the way the standard uses it.)
Post by s***@casperkitty.com
The
latter could be read as suggesting that an implementation shall treat
the scopes of identifiers in such fashion as to abide by the second.
Except that that interpretation would contradict other requirements
about scopes of identifiers. I think it's clear enough that a
int main(void) {
int n = 1;
switch (n) {
case 1: ;
int vla[n];
default: ;
}
}
violates that constraint, requiring a diagnostic.
Post by s***@casperkitty.com
Both constructs can be worked out, of course, but especially with the
second I think it would be clearer to say that no "case" label shall
be placed within the scope of a variably-modified type *unless* the
entire switch statement is likewise within that scope.
Sure, that's another way to express the same requirement.
(And yet again, please use shorter lines so I don't have to reformat
quoted text.)
Lots of action on a minor point. Oh Well.
My objection to the Standard's use of "shall" was an
esthetic objection to the writing in the Standard - not
a comment on undefined behavior.
As nearly as I can tell this use of "shall" in a technical
sense is confined to constraints (the definition of which I
don't think is clear - but everybody knows what they mean).
Is there a use of "shall" anywhere outside a constraint?
The use of "shall" as a keyword is often indicative of a structured
requirements document generated according to some process defined standard,
like CMMI. "shall" has a very particular meaning in requirement statements
in contrast with "will" in terms of traceability between requirements,
design, code, and test.

Often times these documents are defined a program like Rational DOORS, which
has support for this kind of traceability. Based on appearance, the C
standard looks similar in format to the requirement documents I see exported
from DOORS.

Best regards,
John D.
Malcolm McLean
2017-05-31 05:06:23 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Indeed, but some programmers seem to think that permission to behave
nonsensically should be viewed, in and of itself, as a compelling reason
to behave nonsensically. Additionally, some people seem to think that
the failure of the Standard to define a behavior represents a judgment by
the Committee that any programs that would rely upon that behavior should
be considered "defective".
You've posted about this at length.
The idea is that a construct which creates something like arithmetical
overflow, which is technically UB, is "impossible" and can be optimised
to nothing. I've got to say that these just aren't the problems I have
with C code. Programs fail for other reasons.
David Brown
2017-05-31 10:51:32 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
The term, as you know, is "undefined behavior". Hiding it behind extra
wording is not helpful.
Many useful tasks cannot be performed efficiently, or at all, without using
behaviors which are defined by some implementations but are not mandated by
the Standard. What terminology would you use to distinguish those behaviors
from behaviors which are not defined by anything whatsoever?
Implementation-defined behaviour, or target-specific behaviour. The C
standards specifically refer to some behaviours as "implementation
defined", which means the implementation must define (and document) the
behaviour. Implementations can freely add other defined behaviour (as
long as it does not contradict standards-defined behaviour).
Post by s***@casperkitty.com
The fact that programs which invoke upon not-defined-by-anything behaviors
are broken does not imply that programs which invoke not-defined-by-the-
Standard-but-instead-defined-by-other-things behaviors are broken. Some
compiler writers seem unable to distinguish those categories, however.
A compiler will follow "defined by the standards" /and/ "defined by the
implementation" behaviours. This second part includes extensions,
implementation-defined behaviour, and any cases where the implementation
specifically provides definitions for things that the standards leave
undefined.

A compiler has /no/ obligation to follow behaviour that is defined
elsewhere. That includes behaviour that may seem "natural" for the
target platform, behaviour that other compilers support, behaviour that
existed in older C versions or standards, or behaviour that is in the
imagination of the user.

You can imagine that every compiler is written using only the C
standards (whichever versions they support) and their own reference
manual as the requirements. /Nothing/ else matters - no other behaviour
is defined.

So where are the definitions for your
"not-defined-by-the-Standard-but-instead-defined-by-other-things"
behaviours? And why do you think that compiler writers need to obey
definitions from unrelated places?
Post by s***@casperkitty.com
Post by Keith Thompson
The C standard defines clearly and unambiguously what it means by
"shall". The meaning depends on the context; it means one thing in a
constraint, and something else outside a constraint.
Most standards specify what conforming entities have to "do", and are quite
specific about what entities are responsible for ensuring what. The Standard
sometimes talks about obligations of programs or implementations, but
sometimes uses "shall be" to impose obligations upon grammatical constructs.
Read chapter 4 of the standards. It tells you what "shall" and "shall
not" mean.
Post by s***@casperkitty.com
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint or runtime-
constraint is violated, the behavior is undefined. Undefined behavior is otherwise
indicated in this International Standard by the words ‘‘undefined behavior’’ or by the
omission of any explicit definition of behavior. There is no difference in emphasis among
these three; they all describe ‘‘behavior that is undefined’’.
Note the final sentence here. This should, I hope, put an end to your
endless claims about what you think the standards authors actually meant.
Post by s***@casperkitty.com
Post by Keith Thompson
Post by s***@casperkitty.com
Note that if it would make sense for most implementations to process some
action the same way (e.g. the way it was defined in C89) but there might
possibly be some platforms where defining a consistent behavior would be
expensive (e.g. ones'-complement machines), leaving the behavior Undefined
should allow implementations to process such actions sensibly.
Leaving the behavior undefined quite literally allows implementations to
behave in any way they like, including whatever you think is sensible.
Indeed, but some programmers seem to think that permission to behave
nonsensically should be viewed, in and of itself, as a compelling reason
to behave nonsensically.
Name /one/ such programmer. Give even /one/ example where you can
clearly demonstrate that the reason a compiler behaves in a
"nonsensical" manner is purely because it is allowed to behave
"nonsensically". Of course, you can't demonstrate this from a single
program - you have to show that there are /no/ correct programs that
don't benefit in some way from the compiler feature.

Remember, of course, that when there is no definition of the correct
behaviour, it does not make sense to say that something is
"nonsensically", no matter what it does.


Now, if you were claiming that some compilers are less helpful than they
might be in how they take advantage of undefined behaviour, I would
agree. When a compiler sees a chance to skip some code because it could
never be activated in a correct run of the program, that is a /good/
thing - but it would often be even better if compiler messages informed
you of that fact.

But when you claim compiler writers do that sort of thing out of an evil
sense of humour to annoy programmers and "create" bugs in code that used
to work, you are talking about something else entirely.
Post by s***@casperkitty.com
Additionally, some people seem to think that
the failure of the Standard to define a behavior represents a judgment by
the Committee that any programs that would rely upon that behavior should
be considered "defective".
I think everyone who understands the way C works would think that
programs which rely on undefined behaviour are defective - both compiler
writers and programmers. They would all agree that this refers to
behaviour that is undefined by the implementation, rather than just
undefined by the Standard - C is intended to be usable in a
non-portable, implementation-specific manner as well as for portable code.
Post by s***@casperkitty.com
Post by Keith Thompson
Post by s***@casperkitty.com
All that's
necessary is that implementers recognize that (1) there is significant
value in following precedent absent a compelling reason to do otherwise,
and (2) permission to do otherwise does not, in and of itself, constitute
a compelling reason.
Do you really have a problem with the way the standard uses the word "shall"?
I believe it has contributed toward some implementers' belief that programs
relying upon certain behaviors should be viewed as "defective", rather than
merely being unsuitable for implementations which are designed for other
kinds of applications.
Read chapter 4 of the standards again. The word "shall" apparently
confuses /you/, not other people.
s***@casperkitty.com
2017-05-31 13:21:40 UTC
Reply
Permalink
Raw Message
Post by David Brown
Post by s***@casperkitty.com
Post by Keith Thompson
The term, as you know, is "undefined behavior". Hiding it behind extra
wording is not helpful.
Many useful tasks cannot be performed efficiently, or at all, without using
behaviors which are defined by some implementations but are not mandated by
the Standard. What terminology would you use to distinguish those behaviors
from behaviors which are not defined by anything whatsoever?
Implementation-defined behaviour, or target-specific behaviour. The C
standards specifically refer to some behaviours as "implementation
defined", which means the implementation must define (and document) the
behaviour. Implementations can freely add other defined behaviour (as
long as it does not contradict standards-defined behaviour).
The phrase "Implementation-Defined" behavior, as used by the Standard, refers
to situations where ALL implementations are REQUIRED to document something
about the behavior. Platform-defined behavior may be good.
Post by David Brown
Post by s***@casperkitty.com
The fact that programs which invoke upon not-defined-by-anything behaviors
are broken does not imply that programs which invoke not-defined-by-the-
Standard-but-instead-defined-by-other-things behaviors are broken. Some
compiler writers seem unable to distinguish those categories, however.
A compiler will follow "defined by the standards" /and/ "defined by the
implementation" behaviours. This second part includes extensions,
implementation-defined behaviour, and any cases where the implementation
specifically provides definitions for things that the standards leave
undefined.
And if all general-purpose implementations for a platform have processed a
certain behavior a certain way, quality general-purpose implementations should
continue to do likewise unless they document a compelling reason to do
otherwise.
Post by David Brown
A compiler has /no/ obligation to follow behaviour that is defined
elsewhere. That includes behaviour that may seem "natural" for the
target platform, behaviour that other compilers support, behaviour that
existed in older C versions or standards, or behaviour that is in the
imagination of the user.
True, but a C implementation whose behavior deviates from those of existing
general-purpose compilers for similar platforms should not call itself a
quality general-purpose compiler, since general-purpose compilers should be
suitable for processing code written for pre-existing general-purpose compilers
for similar platforms.
Post by David Brown
You can imagine that every compiler is written using only the C
standards (whichever versions they support) and their own reference
manual as the requirements. /Nothing/ else matters - no other behaviour
is defined.
Compiler writers used to recognize that an ability to run code designed for
other compilers was a useful feature, and would thus take into account the
behavior of any compilers they might be competing with. No law requires
that any compiler be compatible with non-mandated features of any other, but
a language where compiler writers try to do so will be more useful than one
where they don't.
Post by David Brown
So where are the definitions for your
"not-defined-by-the-Standard-but-instead-defined-by-other-things"
behaviours? And why do you think that compiler writers need to obey
definitions from unrelated places?
Among other things, in the corpus of programs that will work just fine on a
wide range of older compilers, but get tripped up by modern ones. If one of
the purposes of a compiler is to be suitable for use with a corpus of existing
code, then the corpus of code will, essentially by definition, establish the
what would be needed to make a compiler suitable for the purpose of using it.

I would further suggest that on most platforms a compiler that was tasked
with generating code in "mindless-translator" fashion would in many cases
not be able to avoid exposing useful behaviors which are documented by the
environment without having to generate extra code for that purpose. In such
cases, a compiler which claims to be suitable for low-level programming on
that platform should expose such behaviors likewise. Doing so may mean that
the compiler can only achieve 50-90% of the optimizations that would otherwise
be possible, but a compiler that can process a large corpus of code and
achieve 50-90% of the possible optimizations may be much more useful for many
purposes than one which can achieve more optimizations on a few programs but
can't be trusted to yield more than 0% on the rest.
Post by David Brown
Post by s***@casperkitty.com
Post by Keith Thompson
The C standard defines clearly and unambiguously what it means by
"shall". The meaning depends on the context; it means one thing in a
constraint, and something else outside a constraint.
Most standards specify what conforming entities have to "do", and are quite
specific about what entities are responsible for ensuring what. The Standard
sometimes talks about obligations of programs or implementations, but
sometimes uses "shall be" to impose obligations upon grammatical constructs.
Read chapter 4 of the standards. It tells you what "shall" and "shall
not" mean.
Post by s***@casperkitty.com
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint or runtime-
constraint is violated, the behavior is undefined. Undefined behavior is otherwise
indicated in this International Standard by the words ‘‘undefined behavior’’ or by the
omission of any explicit definition of behavior. There is no difference in emphasis among
these three; they all describe ‘‘behavior that is undefined’’.
Note the final sentence here. This should, I hope, put an end to your
endless claims about what you think the standards authors actually meant.
Undefined by the standard, which is not the same as behavior which would
not be necessary to make a compiler suitable for purposes like low-level
programming.
Post by David Brown
Post by s***@casperkitty.com
Indeed, but some programmers seem to think that permission to behave
nonsensically should be viewed, in and of itself, as a compelling reason
to behave nonsensically.
Name /one/ such programmer. Give even /one/ example where you can
clearly demonstrate that the reason a compiler behaves in a
"nonsensical" manner is purely because it is allowed to behave
"nonsensically". Of course, you can't demonstrate this from a single
program - you have to show that there are /no/ correct programs that
don't benefit in some way from the compiler feature.
The whole concept behind UB-based dead-branch elimination is that all forms
of UB are equivalent. As I've demonstrated, gcc will use the fact that an
overflow may occur while evaluating two "unsigned short" values as a basis
for making inferences about those values, even if the result is truncated
mod 65536. Can you suggest any *other* basis for gcc's behavior?
Post by David Brown
Remember, of course, that when there is no definition of the correct
behaviour, it does not make sense to say that something is
"nonsensically", no matter what it does.
There are a number of actions for which some compilers offer behavioral
guarantees and some don't, but for which a single behavior would satisfy the
behavioral guarantees of all general-purpose compilers for similar platforms.
Has any general-purpose (non-sanitizing) compiler for a two's-complement
silent-wraparound hardware *ever* defined a behavior for

(ushort1*ushort2) & 65535u

which was not consistent with performing an arithetical computation and
mod-65536-reducing the result? If processing such code in such fashion
would make a compiler compatible with a wider range of code than would
any other treatment, and would not impede performance, I'd say that such
behavior would be desirable in a compiler that is intended to be suitable
with the maximum corpus of existing code.
David Brown
2017-05-31 15:16:24 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Brown
Post by s***@casperkitty.com
Post by Keith Thompson
The term, as you know, is "undefined behavior". Hiding it behind extra
wording is not helpful.
Many useful tasks cannot be performed efficiently, or at all, without using
behaviors which are defined by some implementations but are not mandated by
the Standard. What terminology would you use to distinguish those behaviors
from behaviors which are not defined by anything whatsoever?
Implementation-defined behaviour, or target-specific behaviour. The C
standards specifically refer to some behaviours as "implementation
defined", which means the implementation must define (and document) the
behaviour. Implementations can freely add other defined behaviour (as
long as it does not contradict standards-defined behaviour).
The phrase "Implementation-Defined" behavior, as used by the Standard, refers
to situations where ALL implementations are REQUIRED to document something
about the behavior. Platform-defined behavior may be good.
That seems a reasonable choice to me. But I'd wait for others such as
Keith to express an opinion.
Post by s***@casperkitty.com
Post by David Brown
Post by s***@casperkitty.com
The fact that programs which invoke upon not-defined-by-anything behaviors
are broken does not imply that programs which invoke not-defined-by-the-
Standard-but-instead-defined-by-other-things behaviors are broken. Some
compiler writers seem unable to distinguish those categories, however.
A compiler will follow "defined by the standards" /and/ "defined by the
implementation" behaviours. This second part includes extensions,
implementation-defined behaviour, and any cases where the implementation
specifically provides definitions for things that the standards leave
undefined.
And if all general-purpose implementations for a platform have processed a
certain behavior a certain way, quality general-purpose implementations should
continue to do likewise unless they document a compelling reason to do
otherwise.
No. If all an implementation wants to give you behaviour that you can
rely on, it should document it. Otherwise you are on your own - you
/can/ write code, compile it, and see that it works as you expect, but
you should not expect it to remain working if you change compiler flags,
update to a new version, or make other changes.

What you are suggesting would be a recipe for stagnation - compilers
could not change and improve because they would have to try to emulate
the unwritten and unspecified behaviour of other tools.

It would also make users' life a lottery - how is the user supposed to
know what the compiler writer sees as a "compelling reason" ?
Post by s***@casperkitty.com
Post by David Brown
A compiler has /no/ obligation to follow behaviour that is defined
elsewhere. That includes behaviour that may seem "natural" for the
target platform, behaviour that other compilers support, behaviour that
existed in older C versions or standards, or behaviour that is in the
imagination of the user.
True, but a C implementation whose behavior deviates from those of existing
general-purpose compilers for similar platforms should not call itself a
quality general-purpose compiler, since general-purpose compilers should be
suitable for processing code written for pre-existing general-purpose compilers
for similar platforms.
The trick here is very simple - write code that relies on
standards-defined behaviour, and perhaps basic implementation-defined
behaviour (such as the size of an "int") if that is helpful to your code
and you don't need wide portability. Some target architectures have
specified ABIs that compilers will stick to, giving you a nice selection
of implementation-specific behaviour you can rely on across different tools.

If you need something beyond that, your code is tied tightly to the
implementation (possibly even the specific version and flags). That's
fine too - C is designed to allow that kind of coding. Your mistake is
in thinking that /other/ compilers somehow have an obligation to support
/your/ non-portable code.
Post by s***@casperkitty.com
Post by David Brown
You can imagine that every compiler is written using only the C
standards (whichever versions they support) and their own reference
manual as the requirements. /Nothing/ else matters - no other behaviour
is defined.
Compiler writers used to recognize that an ability to run code designed for
other compilers was a useful feature, and would thus take into account the
behavior of any compilers they might be competing with.
Compiler writers will often support extensions that exist on other
compilers. They might take inspiration from others in how they specify
implementation-defined behaviour.

But I have never heard of a compiler trying to emulate another
compiler's treatment of undefined behaviour. Can you give real-life
examples?

Even if there is, I don't see that as being a useful feature, except
perhaps to propagate bugs and poor coding. And since it might hinder
new features or optimisations, it could be a /bad/ thing. If I write
code that relies on unwritten details of an implementation (and that
sometimes happens), then the code is specific to the compiler and flags
- I would not even bother trying to compile it with another tool without
extensive testing, checking and qualification.

It is a different thing entirely if the compiler /documents/ the
behaviour, perhaps using a compiler switch. For example, it would be a
bad idea for gcc to emulate old compiler's behaviour of wrapping signed
integer overflows, because it hinders optimisations. But it is a fine
idea to provide a documented "-fwrapv" switch which enables such
wrapping behaviour. /That/ is how you deal with compatibility with old
code that relies on specific undefined behaviour.
Post by s***@casperkitty.com
No law requires
that any compiler be compatible with non-mandated features of any other, but
a language where compiler writers try to do so will be more useful than one
where they don't.
Post by David Brown
So where are the definitions for your
"not-defined-by-the-Standard-but-instead-defined-by-other-things"
behaviours? And why do you think that compiler writers need to obey
definitions from unrelated places?
Among other things, in the corpus of programs that will work just fine on a
wide range of older compilers, but get tripped up by modern ones. If one of
the purposes of a compiler is to be suitable for use with a corpus of existing
code, then the corpus of code will, essentially by definition, establish the
what would be needed to make a compiler suitable for the purpose of using it.
The main target for a compiler is correctly written code. If it does
not work well with broken code that happened to work on other compilers,
that's fine. /I/ don't want a compiler that is hobbled so that it works
with /your/ old broken code. I want a compiler that does the best job
for /correct/ code. I also want it to be helpful in telling me about
broken code - if I make a mistake, I would prefer to be informed about
it, rather than for the compiler to try to guess what I meant based on
what somebody might have meant in code long ago.

People with old code that is badly written should stick to old compilers
that they have tested with that code. Since the behaviour of their code
is by definition undefined, there is no way for a new compiler to be
sure it supports these unwritten rules. Only in some specific cases,
such as wrapping behaviour for integer overflow, is it even possible for
a new compiler to support the old broken code.
Post by s***@casperkitty.com
I would further suggest that on most platforms a compiler that was tasked
with generating code in "mindless-translator" fashion would in many cases
not be able to avoid exposing useful behaviors which are documented by the
environment without having to generate extra code for that purpose.
If they are documented by the tools, that's fine. If by "documented by
the environment", you mean "behaviour of certain instructions on the
cpu", then that is not C - C is not an assembler. If your code relies
on such behaviour in old tools, then stick to the old tools - your code
is only suitable for use on the specific implementations you have tested
it with.

I have worked with code written for "mindless translator" compilers.
And when I have moved such code over to better compilers, I have gone
through all the code carefully, "porting" it over to standard C (or
implementation-dependent C, as necessary). I don't expect my new tools
to work like mindless translators just because the old ones did. The
alternative, which I also do, is simply to continue to use the mindless
translator tools for that code. When I have to dig out and modify 20
year old code, I use the same 20 year old compiler as I did originally.
Post by s***@casperkitty.com
In such
cases, a compiler which claims to be suitable for low-level programming on
that platform should expose such behaviors likewise. Doing so may mean that
the compiler can only achieve 50-90% of the optimizations that would otherwise
be possible, but a compiler that can process a large corpus of code and
achieve 50-90% of the possible optimizations may be much more useful for many
purposes than one which can achieve more optimizations on a few programs but
can't be trusted to yield more than 0% on the rest.
I think perhaps I put a lot more emphasis on writing good code than you
do. I just don't see it as a problem. For the types of targets where
you typically need to write "special" code that relies on weird
behaviour in order to get something of acceptable efficiency, you rarely
want to run that code on anything else anyway.
Post by s***@casperkitty.com
Post by David Brown
Post by s***@casperkitty.com
Post by Keith Thompson
The C standard defines clearly and unambiguously what it means by
"shall". The meaning depends on the context; it means one thing in a
constraint, and something else outside a constraint.
Most standards specify what conforming entities have to "do", and are quite
specific about what entities are responsible for ensuring what. The Standard
sometimes talks about obligations of programs or implementations, but
sometimes uses "shall be" to impose obligations upon grammatical constructs.
Read chapter 4 of the standards. It tells you what "shall" and "shall
not" mean.
Post by s***@casperkitty.com
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint or runtime-
constraint is violated, the behavior is undefined. Undefined behavior is otherwise
indicated in this International Standard by the words ‘‘undefined behavior’’ or by the
omission of any explicit definition of behavior. There is no difference in emphasis among
these three; they all describe ‘‘behavior that is undefined’’.
Note the final sentence here. This should, I hope, put an end to your
endless claims about what you think the standards authors actually meant.
Undefined by the standard, which is not the same as behavior which would
not be necessary to make a compiler suitable for purposes like low-level
programming.
As I have said many times, low-level programming is what I do for a
living - on a wide range of targets, with a wide range of tools. In
almost all cases, you can do it fine with standard defined behaviour,
possibly with some implementation defined behaviour. Occasionally you
need "platform defined behaviour" (the term from earlier in this post).
Very occasionally, you need something that is not really defined at
all, and known to work by inspection of the generated code or by
testing. What you /don't/ need, ever, is guesses about what behaviour
you think should have been defined, or would have been defined, or that
somebody meant to define.
Post by s***@casperkitty.com
Post by David Brown
Post by s***@casperkitty.com
Indeed, but some programmers seem to think that permission to behave
nonsensically should be viewed, in and of itself, as a compelling reason
to behave nonsensically.
Name /one/ such programmer. Give even /one/ example where you can
clearly demonstrate that the reason a compiler behaves in a
"nonsensical" manner is purely because it is allowed to behave
"nonsensically". Of course, you can't demonstrate this from a single
program - you have to show that there are /no/ correct programs that
don't benefit in some way from the compiler feature.
The whole concept behind UB-based dead-branch elimination is that all forms
of UB are equivalent. As I've demonstrated, gcc will use the fact that an
overflow may occur while evaluating two "unsigned short" values as a basis
for making inferences about those values, even if the result is truncated
mod 65536. Can you suggest any *other* basis for gcc's behavior?
gcc's behaviour is to follow the rules of C about integer promotion.
There was a time in C's history when some tools used "value preserving
promotion" while others used "signedness preserving promotion". After
due consideration and debate, it was decided to use "value preserving
promotion". Whether you like that or not (personally, I would be
happier with /no/ promotion to int), that's the rules of C - and
consistency is vital here. gcc follows these rules, and the implication
of them. There are a few situations where the results can then seem
strange.

But you are basically accusing the gcc authors of specifically looking
at code like your beloved uint16_t multiplication, and planning exciting
ways to confuse programmers by generating "nonsense" code just to prove
that /they/ have read the C standards.

In reality, it is nothing but a side-effect of optimisation passes that
are used to generate slightly more efficient object code from correct
source code.

Incidentally, how have the gcc authors responded to your bug report on
this? What about when you asked for improved warnings when such dead
code was eliminated? I presume that since you have told us in c.l.c.
about this a few hundred times over the past few years, you have asked
the gcc developers about it.

Oh, and of course gcc already provides options that let you get the
behaviour you presumably want here, even though there is no written
specification for it. gcc has a wide range of flags to control
optimisation - you can figure out the details yourself, or simply
disable optimisation (which is, incidentally, the default - you have to
/ask/ gcc to do the dead branch optimisation).
Post by s***@casperkitty.com
Post by David Brown
Remember, of course, that when there is no definition of the correct
behaviour, it does not make sense to say that something is
"nonsensically", no matter what it does.
There are a number of actions for which some compilers offer behavioral
guarantees and some don't, but for which a single behavior would satisfy the
behavioral guarantees of all general-purpose compilers for similar platforms.
Has any general-purpose (non-sanitizing) compiler for a two's-complement
silent-wraparound hardware *ever* defined a behavior for
(ushort1*ushort2) & 65535u
which was not consistent with performing an arithetical computation and
mod-65536-reducing the result?
(I assume you meant to add a requirement of "int" being longer than
"short" to your list.)

That is not the question you should be asking. The question is whether
any compiler has ever defined a behaviour for such code? The answer, to
my knowledge, is no. It doesn't matter than none have defined behaviour
that is inconsistent with your expectations - none have defined it in a
way that /is/ consistent with your expectations. Some might have
happened to generate code that you like - equally, some generate code
that you /don't/ like. /None/ specify what they should generate.

(Feel free to provide references to compiler manuals if you have
examples that prove me wrong here.)
Post by s***@casperkitty.com
If processing such code in such fashion
would make a compiler compatible with a wider range of code than would
any other treatment, and would not impede performance, I'd say that such
behavior would be desirable in a compiler that is intended to be suitable
with the maximum corpus of existing code.
I know you'd say that - you have done so many, many times. Your logic
is still invalid, as are your premises.
GOTHIER Nathan
2017-05-31 15:24:49 UTC
Reply
Permalink
Raw Message
On Wed, 31 May 2017 17:16:24 +0200
Post by David Brown
That seems a reasonable choice to me. But I'd wait for others such as
Keith to express an opinion.
I prefer undefined behaviours to emphasize the C standard POV since any C
implementor manage the implementation in its own defined behaviour. In my view
any specific behaviour wording is a non informative expression for the C
implementation.
Pascal J. Bourguignon
2017-05-31 20:18:25 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 31 May 2017 17:16:24 +0200
Post by David Brown
That seems a reasonable choice to me. But I'd wait for others such as
Keith to express an opinion.
I prefer undefined behaviours to emphasize the C standard POV since any C
implementor manage the implementation in its own defined behaviour. In my view
any specific behaviour wording is a non informative expression for the C
implementation.
The problem is that it's very easy for programmers to hit undefined
behavior (and without defined behavior that a warning shall be issued
when undefined behavior is reached).

Some other programming languages make it more difficult or more explicit
to attain undefined behavior (eg. using UNSAFE modules, or some other
language construct).

The problem is that programmers with decades of experience programming
in C still don't know they're filling their programs with undefined
behavior.
--
__Pascal J. Bourguignon
http://www.informatimago.com
GOTHIER Nathan
2017-05-31 21:09:48 UTC
Reply
Permalink
Raw Message
On Wed, 31 May 2017 22:18:25 +0200
Post by Pascal J. Bourguignon
The problem is that programmers with decades of experience programming
in C still don't know they're filling their programs with undefined
behavior.
If it works as expected why should they fear undefined behaviors? It give some
room to manage the implementation according to the programming environment. The
C programming language doesn't intend to force all implementations to conform
to a specific programming environment (kernel, processor, etc).
Jerry Stuckle
2017-05-31 21:32:06 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 31 May 2017 22:18:25 +0200
Post by Pascal J. Bourguignon
The problem is that programmers with decades of experience programming
in C still don't know they're filling their programs with undefined
behavior.
If it works as expected why should they fear undefined behaviors? It give some
room to manage the implementation according to the programming environment. The
C programming language doesn't intend to force all implementations to conform
to a specific programming environment (kernel, processor, etc).
Because a change in compilers, different versions of the same compiler
or even different compile options can change the behavior. That's what
happens when the behavior is undefined.
--
==================
Remove the "x" from my email address
Jerry Stuckle
***@attglobal.net
==================
GOTHIER Nathan
2017-05-31 21:50:01 UTC
Reply
Permalink
Raw Message
On Wed, 31 May 2017 17:32:06 -0400
Post by Jerry Stuckle
Because a change in compilers, different versions of the same compiler
or even different compile options can change the behavior. That's what
happens when the behavior is undefined.
If undefined behaviors are so bad why there isn't much feedback in bug
reports about this? That's likely because undefined behaviors concern corner
cases where a specifc behavior isn't required and is counterproductive for the
implementation of C on a multitude of environments.
Jerry Stuckle
2017-05-31 22:01:53 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 31 May 2017 17:32:06 -0400
Post by Jerry Stuckle
Because a change in compilers, different versions of the same compiler
or even different compile options can change the behavior. That's what
happens when the behavior is undefined.
If undefined behaviors are so bad why there isn't much feedback in bug
reports about this? That's likely because undefined behaviors concern corner
cases where a specifc behavior isn't required and is counterproductive for the
implementation of C on a multitude of environments.
Because they aren't bugs. They are by definition undefined behavior.
Experienced programmers know how to stay away from them.
--
==================
Remove the "x" from my email address
Jerry Stuckle
***@attglobal.net
==================
GOTHIER Nathan
2017-05-31 22:33:57 UTC
Reply
Permalink
Raw Message
On Wed, 31 May 2017 18:01:53 -0400
Post by Jerry Stuckle
Because they aren't bugs. They are by definition undefined behavior.
Experienced programmers know how to stay away from them.
I have to disagree. Some experienced C programmers know how to write good C
programs but not all. If an experienced programmer avoid strcpy() on strings
that he perfectly know don't overlap, it doesn't matter how long he wrote C
code but he's definitely a bad C programmer.
Jerry Stuckle
2017-06-01 02:36:07 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 31 May 2017 18:01:53 -0400
Post by Jerry Stuckle
Because they aren't bugs. They are by definition undefined behavior.
Experienced programmers know how to stay away from them.
I have to disagree. Some experienced C programmers know how to write good C
programs but not all. If an experienced programmer avoid strcpy() on strings
that he perfectly know don't overlap, it doesn't matter how long he wrote C
code but he's definitely a bad C programmer.
Experienced C programmers know how to write good C programs. Idiots
like you can't write good programs in ANY language.
--
==================
Remove the "x" from my email address
Jerry Stuckle
***@attglobal.net
==================
GOTHIER Nathan
2017-06-01 04:08:20 UTC
Reply
Permalink
Raw Message
On Wed, 31 May 2017 22:36:07 -0400
Post by Jerry Stuckle
Experienced C programmers know how to write good C programs. Idiots
like you can't write good programs in ANY language.
I'm very impressed by your undisputable argument... I'm afraid you wasted your
entire life trying to write good C programs but never realized you failed.
Jerry Stuckle
2017-06-01 15:08:08 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 31 May 2017 22:36:07 -0400
Post by Jerry Stuckle
Experienced C programmers know how to write good C programs. Idiots
like you can't write good programs in ANY language.
I'm very impressed by your undisputable argument... I'm afraid you wasted your
entire life trying to write good C programs but never realized you failed.
Sorry, Nathan, I write more good C programs in a month than you could
write in your entire lifetime.
--
==================
Remove the "x" from my email address
Jerry Stuckle
***@attglobal.net
==================
Malcolm McLean
2017-06-01 08:30:18 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 31 May 2017 18:01:53 -0400
Post by Jerry Stuckle
Because they aren't bugs. They are by definition undefined behavior.
Experienced programmers know how to stay away from them.
I have to disagree. Some experienced C programmers know how to write good C
programs but not all. If an experienced programmer avoid strcpy() on strings
that he perfectly know don't overlap, it doesn't matter how long he wrote C
code but he's definitely a bad C programmer.
MIcrosoft pay their programmers plenty and I'd guess that a lot of people here
wold work for Microsoft if they could. The terms are very attractive.

Microsoft deprecated strcpy() because the problem with C is buffer overruns leading
to security holes. It wasn't a perfect answer, but it's an answer of sorts.
GOTHIER Nathan
2017-06-01 09:38:29 UTC
Reply
Permalink
Raw Message
On Thu, 1 Jun 2017 01:30:18 -0700 (PDT)
Post by Malcolm McLean
Microsoft deprecated strcpy() because the problem with C is buffer overruns leading
to security holes. It wasn't a perfect answer, but it's an answer of sorts.
Don't expect a bad programmer to recruit good ones when he doesn't know what's a
good code. Removing the steering wheel won't prevent bad drivers to hit the
wall.
Malcolm McLean
2017-06-01 10:13:12 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Thu, 1 Jun 2017 01:30:18 -0700 (PDT)
Post by Malcolm McLean
Microsoft deprecated strcpy() because the problem with C is buffer overruns leading
to security holes. It wasn't a perfect answer, but it's an answer of sorts.
Don't expect a bad programmer to recruit good ones when he doesn't know what's a
good code. Removing the steering wheel won't prevent bad drivers to hit the
wall.
So we're expected to believe that not only are Microsoft bad programmers, they're
also such bad programmers that they can't recognise good programmers when they
see one. Which explains why you don't have a job with them.

Now writing operating systems isn't easy and Microsoft have largely been driven
from the consumer / leisure computing market. They remain a very strong and profitable
business in the corporate IT market. So how did they do that, if no-one working for them
is any good?
GOTHIER Nathan
2017-06-01 10:25:26 UTC
Reply
Permalink
Raw Message
On Thu, 1 Jun 2017 03:13:12 -0700 (PDT)
Post by Malcolm McLean
So we're expected to believe that not only are Microsoft bad programmers, they're
also such bad programmers that they can't recognise good programmers when they
see one. Which explains why you don't have a job with them.
You're assuming badly the fact that good programmers would like to work for
Microsoft... that explains why Linus TORVALDS works for this company.
Post by Malcolm McLean
Now writing operating systems isn't easy and Microsoft have largely been driven
from the consumer / leisure computing market. They remain a very strong and profitable
business in the corporate IT market. So how did they do that, if no-one working for them
is any good?
Does Microsoft sell its operating system only to computer experts? Should I
recall you the forced sale (aka the Windows tax) of its operating system on
most computer on the market?
Malcolm McLean
2017-06-01 11:48:37 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Thu, 1 Jun 2017 03:13:12 -0700 (PDT)
Post by Malcolm McLean
So we're expected to believe that not only are Microsoft bad programmers, they're
also such bad programmers that they can't recognise good programmers when they
see one. Which explains why you don't have a job with them.
You're assuming badly the fact that good programmers would like to work for
Microsoft... that explains why Linus TORVALDS works for this company.
Microsoft offer an extremely attractive compensation package. So most people will
work for them rather than accept less elsewhere. Not everybody, some people are
with a competitor which also pays very attractively, some people have other reasons.
But most programmers would accept a job offer from Microsoft given the option.
Post by GOTHIER Nathan
Post by Malcolm McLean
Now writing operating systems isn't easy and Microsoft have largely been driven
from the consumer / leisure computing market. They remain a very strong and profitable
business in the corporate IT market. So how did they do that, if no-one working for them
is any good?
Does Microsoft sell its operating system only to computer experts? Should I
recall you the forced sale (aka the Windows tax) of its operating system on
most computer on the market?
Increasingly yes. A few years ago, many people had a Windows machine for general-
purpose computing, web browsing, games, a bit of work-related stuff. Now they are
increasingly likely to have an Apple notebook or a tablet / smartphone. PC sales
have fallen as a result. However you can't run most business software on a phone,
PCs are still very common in the office, where they are purchased by IT professionals.
GOTHIER Nathan
2017-06-01 12:27:11 UTC
Reply
Permalink
Raw Message
On Thu, 1 Jun 2017 04:48:37 -0700 (PDT)
Post by Malcolm McLean
Microsoft offer an extremely attractive compensation package. So most people will
work for them rather than accept less elsewhere. Not everybody, some people are
with a competitor which also pays very attractively, some people have other reasons.
But most programmers would accept a job offer from Microsoft given the option.
It looks like Microsoft doesn't need a marketing department... since there are
followers working for free.
Post by Malcolm McLean
Increasingly yes. A few years ago, many people had a Windows machine for general-
purpose computing, web browsing, games, a bit of work-related stuff. Now they are
increasingly likely to have an Apple notebook or a tablet / smartphone. PC sales
have fallen as a result. However you can't run most business software on a phone,
PCs are still very common in the office, where they are purchased by IT professionals.
So now only IT pro buy computers... that a wonderful news for Microsoft.
Windows shouldn't be preinstalled and bound to new machines to be sold like
baguettes. This potentially would allow the company to make better margins
selling only overpriced pro edition licenses to local distributors.
Malcolm McLean
2017-06-01 12:55:31 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Thu, 1 Jun 2017 04:48:37 -0700 (PDT)
Post by Malcolm McLean
Microsoft offer an extremely attractive compensation package. So most people will
work for them rather than accept less elsewhere. Not everybody, some people are
with a competitor which also pays very attractively, some people have other reasons.
But most programmers would accept a job offer from Microsoft given the option.
It looks like Microsoft doesn't need a marketing department... since there are
followers working for free.
Just stating the obvious. If you can get in at Microsoft then it's a good place to be,
the company is very profitable and that is reflected in the compensation packages
on offer. But of course you have to be very good to pass the interviews.
Post by GOTHIER Nathan
Post by Malcolm McLean
Increasingly yes. A few years ago, many people had a Windows machine for general-
purpose computing, web browsing, games, a bit of work-related stuff. Now they are
increasingly likely to have an Apple notebook or a tablet / smartphone. PC sales
have fallen as a result. However you can't run most business software on a phone,
PCs are still very common in the office, where they are purchased by IT professionals.
So now only IT pro buy computers... that a wonderful news for Microsoft.
Windows shouldn't be preinstalled and bound to new machines to be sold like
baguettes. This potentially would allow the company to make better margins
selling only overpriced pro edition licenses to local distributors.
Apple also sells operating systems pre-installed. It just also manufactures the hardware.
So Microsoft have the more open model. If you want Linux, you have to either build
your own PC from components, or, as we used to do when I was working with Linux,
buy a Windows machine and delete the OS. If manufacturers had to sell a Linux version
alongside a Windows version, that would badly damage Microsoft's business model.
But not by so much now as it would have done had that policy been in place ten years
ago, as I said, they are gradually losing the domestic and consumer markets.
Jerry Stuckle
2017-06-01 15:11:54 UTC
Reply
Permalink
Raw Message
Post by Malcolm McLean
Post by GOTHIER Nathan
On Thu, 1 Jun 2017 04:48:37 -0700 (PDT)
Post by Malcolm McLean
Microsoft offer an extremely attractive compensation package. So most people will
work for them rather than accept less elsewhere. Not everybody, some people are
with a competitor which also pays very attractively, some people have other reasons.
But most programmers would accept a job offer from Microsoft given the option.
It looks like Microsoft doesn't need a marketing department... since there are
followers working for free.
Just stating the obvious. If you can get in at Microsoft then it's a good place to be,
the company is very profitable and that is reflected in the compensation packages
on offer. But of course you have to be very good to pass the interviews.
And therein lies Nathan's problem. He couldn't get a job as a
programmer at Microsoft or any other decent company. Maybe a
fly-by-night outfit, at least until they figure out how stoopid he is.
Post by Malcolm McLean
Post by GOTHIER Nathan
Post by Malcolm McLean
Increasingly yes. A few years ago, many people had a Windows machine for general-
purpose computing, web browsing, games, a bit of work-related stuff. Now they are
increasingly likely to have an Apple notebook or a tablet / smartphone. PC sales
have fallen as a result. However you can't run most business software on a phone,
PCs are still very common in the office, where they are purchased by IT professionals.
So now only IT pro buy computers... that a wonderful news for Microsoft.
Windows shouldn't be preinstalled and bound to new machines to be sold like
baguettes. This potentially would allow the company to make better margins
selling only overpriced pro edition licenses to local distributors.
Apple also sells operating systems pre-installed. It just also manufactures the hardware.
So Microsoft have the more open model. If you want Linux, you have to either build
your own PC from components, or, as we used to do when I was working with Linux,
buy a Windows machine and delete the OS. If manufacturers had to sell a Linux version
alongside a Windows version, that would badly damage Microsoft's business model.
But not by so much now as it would have done had that policy been in place ten years
ago, as I said, they are gradually losing the domestic and consumer markets.
You can buy computers with no OS installed, and install the one of your
choice.
--
==================
Remove the "x" from my email address
Jerry Stuckle
***@attglobal.net
==================
GOTHIER Nathan
2017-06-01 15:40:58 UTC
Reply
Permalink
Raw Message
On Thu, 1 Jun 2017 11:11:54 -0400
Post by Jerry Stuckle
You can buy computers with no OS installed, and install the one of your
choice.
Even a laptop... too smart the old monkey. :o)
s***@casperkitty.com
2017-06-01 15:56:36 UTC
Reply
Permalink
Raw Message
Post by David Brown
Post by s***@casperkitty.com
And if all general-purpose implementations for a platform have processed a
certain behavior a certain way, quality general-purpose implementations should
continue to do likewise unless they document a compelling reason to do
otherwise.
No. If all an implementation wants to give you behaviour that you can
rely on, it should document it. Otherwise you are on your own - you
/can/ write code, compile it, and see that it works as you expect, but
you should not expect it to remain working if you change compiler flags,
update to a new version, or make other changes.
What you are suggesting would be a recipe for stagnation - compilers
could not change and improve because they would have to try to emulate
the unwritten and unspecified behaviour of other tools.
Many compilers in the 1990s included switches which would enable certain
optimizations, but documented that use of those switches would break
certain kinds of programs. Since nothing in the Standard prohibited
conforming implementations from providing optional non-conforming modes of
operation, such switches could even enable optimizations which would be
impossible in any conforming mode.

If a programmer explicitly invites a compiler to behave in a fashion
contrary to what would otherwise be commonplace behavior, I would regard
such explicit invitation as fair basis for treating as "compelling" even
the slightest reason for unusual behavior. Further, I would suggest that
the ability of switches to invite optimizations beyond those permitted by
the Standard would help avoid stagnation.

The difference between that and the present state of affairs is that
if compilers favor compatibility over performance by default, then feeding
an implementation a program that was written for another on a similar
platform would be unlikely to yield the fastest possible executable, but
would be likely to yield one that would run correctly even if the program
used platform-provided features. It would also avoid the need for
programmers to avoid using platform-provided features that would be useful
on the present implementation for fear that other implementations might
break them.
Post by David Brown
It would also make users' life a lottery - how is the user supposed to
know what the compiler writer sees as a "compelling reason" ?
Read the compiler documentation. If e.g. a system has a 32-bit accumulator
but defines "int" as 16 bits because a 32-bit store requires using separate
store-lower and store-upper operations, that would be a compelling reason
for it to process something like "int1+int2 > int3" using 32-bit arithmetic
without even offering an option to do otherwise, *provided the implementation
documents its deviation from commonplace behavior*. Otherwise, in most cases
where an implementation could practically support commonplace behaviors, it
should do so *unless switches or directives waive such behaviors*.
Post by David Brown
Post by s***@casperkitty.com
True, but a C implementation whose behavior deviates from those of existing
general-purpose compilers for similar platforms should not call itself a
quality general-purpose compiler, since general-purpose compilers should be
suitable for processing code written for pre-existing general-purpose compilers
for similar platforms.
The trick here is very simple - write code that relies on
standards-defined behaviour, and perhaps basic implementation-defined
behaviour (such as the size of an "int") if that is helpful to your code
and you don't need wide portability. Some target architectures have
specified ABIs that compilers will stick to, giving you a nice selection
of implementation-specific behaviour you can rely on across different tools.
The Standard does not require implementations to honor guarantees that would
seem to be implied by the ABI. If an ABI defines "long long" and "int64_t"
as having identical 64-bit representations, for example, and one needs to
integrate some code that uses arrays of "long long" with code that uses
arrays of "int64_t", writing a function that can operate on both types
interchangeably should not be appreciably more difficult than writing a
function to process just one or the other, but the Standard doesn't specify
any means of doing so (since such a requirement would be meaningless on any
implementation where the types might have different representations). I
would think the authors of the Standard would have seen obvious benefit to
having implementations allow programmers to write such functions in cases
where the representations match, but neglected to mandate that implementations
do so *precisely because it was so obvious*. Nonetheless, "modern" compiler
writers seem to think that it's better to have two alias-incompatible
types with the same representation than to e.g. provide a means of declaring
an arbitrary number of types with the same representation which are alias-
incompatible with each other, but are all alias compatible with int64_t.
Post by David Brown
If you need something beyond that, your code is tied tightly to the
implementation (possibly even the specific version and flags). That's
fine too - C is designed to allow that kind of coding. Your mistake is
in thinking that /other/ compilers somehow have an obligation to support
/your/ non-portable code.
They're free to write whatever they want, but compilers that can run code
that uses a wide range of features that other compilers commonly provide
will be suitable for more purposes than those which cannot.
Post by David Brown
But I have never heard of a compiler trying to emulate another
compiler's treatment of undefined behaviour. Can you give real-life
examples?
Well, gcc has an option called -fwrapv which exists specifically for that
purpose and turns what would otherwise be UB into defined behavior. It
also has -fno-strict-aliasing which almost does likewise, except that gcc's
documentation, last I checked, failed to actually define the behavior
associated with that flag [it says the flag disables certain optimizations,
but did not explicitly preclude the possibility that the compiler might
decide to generate code that launches "rogue" if it detects pointer aliasing
not because it would "optimize" anything, but nothing would forbid it from
doing so].
Post by David Brown
It is a different thing entirely if the compiler /documents/ the
behaviour, perhaps using a compiler switch. For example, it would be a
bad idea for gcc to emulate old compiler's behaviour of wrapping signed
integer overflows, because it hinders optimisations. But it is a fine
idea to provide a documented "-fwrapv" switch which enables such
wrapping behaviour. /That/ is how you deal with compatibility with old
code that relies on specific undefined behaviour.
What advantage is there to that, versus having the compiler default to the
old behavior but have command-line switches to loosen the guarantees
associated with it? Suppose I have some code which left-shifts negative
numbers and I want to ensure that it will work with present and future
versions of gcc. Note that the *present* version of gcc documents that it
does not exploit the freedom offered by C99, but that wouldn't preclude the
possibility of future versions doing so. If future versions could be
relied upon not to change the behavior in the absence of an -fneg-lshift
flag, present build files' lack of such a flag would be sufficient to
ensure correct behavior with those future versions. But if future
versions require -fno-neg-lshift flag to achieve present behavior, that
would make it impossible to write code and makefile today that would
ensure that a future gcc version would continue to process the code in
compatible fashion.
Keith Thompson
2017-05-31 16:11:41 UTC
Reply
Permalink
Raw Message
[...]
Post by David Brown
Post by s***@casperkitty.com
Indeed, but some programmers seem to think that permission to behave
nonsensically should be viewed, in and of itself, as a compelling reason
to behave nonsensically.
Name /one/ such programmer. Give even /one/ example where you can
clearly demonstrate that the reason a compiler behaves in a
"nonsensical" manner is purely because it is allowed to behave
"nonsensically". Of course, you can't demonstrate this from a single
program - you have to show that there are /no/ correct programs that
don't benefit in some way from the compiler feature.
I'm aware of exactly one such example. A #pragma directive "causes
the implementation to behave in an implementation-defined manner".
When this feature was added to the language, gcc 1.17 implemented
it by invoking a game at compile time (hack, or rogue, or Towers of
Hanoi in emacs). This was of course a joke, and has no particular
bearing on this discussion. (I think the feature was later removed).

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
bartc
2017-05-15 17:31:07 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by bartc
I would prefer that the possibilities were purposely kept simple.
Ah, that's not what I meant. I was not asking you to invent a macro
language you'd prefer,
I don't need to invent, it already exists. But I'm saying it is
ill-defined and badly constrained.
Post by Ben Bacarisse
Post by bartc
However, what's to stop gcc from allowing directives to be generated?
Then others will have to follow.
Has every compiler implemented every gcc extension? I don't think so.
There will be pressure to do so it if wants to compile code developed
with gcc and using those extensions.
Post by Ben Bacarisse
Post by bartc
12: if(10==20); A( 10,20); A( 10,20); error if(==)if(==) 10,20;
(I've now managed to test DMC. Results are the same as gcc except for
#12 which makes it crash.)
Post by Ben Bacarisse
It would be much more helpful if you said which ones are contentious or
badly defined by the standard. And why show 7, 8, 9 and 10? They all
seem to give the same output. Is there any thing to debate about those
cases?
You're assuming I know which ones are correct! But in the absence of
that knowledge, I'm taking gcc to give definitive versions.

Most examples were first posted by anti-spam, and I think all were
intended to be tricky, especially testing the first version of a
preprocessor (that is, before you have to rewrite it then hack it around
before most of the above work).
Post by Ben Bacarisse
gcc Pelles C lccwin64 tcc msvc008
1: a b (c)
2: a d (c)
3: (a b (c))
4: (a d (c))
5: a (x)
6: aa
12: A( 10,20); A( 10,20); error if(==)if(==) 10,20;
You've found some bugs in lccwin64, one in each of Pelles C and tcc and
two in and old version of MSVC. (In line 11 both outputs are
acceptable). Are there any contentious results here? I.e. do you think
there is anything other than some bugs?
Yes, that a preprocessor is difficult to get right. And I think it's
because it's poorly specified. (I expect most PPs are either based on an
existing, working one, or gradually evolve when it's found they won't
compile some existing program that makes creative use of the PP.)

If my examples are taken further and stringified as in the following:

#define str2(x) #x
#define str(x) str2(x)

puts(str(a b (c)));

then I get yet another assorted bunch of results. More bugs presumably.

I think that whoever dreamt up the preprocessor should also have
provided a reference implementation. Then at least it'll be easier to
test. But it's still going to be a matter of trial and error.
Post by Ben Bacarisse
Post by bartc
mcc (my compiler)
5: mac_a ( x )
12: error
(Again I've left only what appears to be significant). Do you think
these are bugs or is there some debate to be had about what 5 and 12
should be?
The majority verdict on #5 is that it should be 'a d (x)'. So my code
should really be changed (I thought I'd fixed 1-11 actually).

#12 I'm not too worried about, as it's a made-up example and most
compilers seem to go wrong. Yet, I don't really know whether it should
have worked or not. If you're allowed to have #-directives in the middle
of a macro-call, then why shouldn't #12 work too? After all gcc managed
to generate what might have been expected.
--
bartc
Keith Thompson
2017-05-15 18:56:21 UTC
Reply
Permalink
Raw Message
bartc <***@freeuk.com> writes:
[...]
Post by bartc
You're assuming I know which ones are correct! But in the absence of
that knowledge, I'm taking gcc to give definitive versions.
[...]

I don't know why you would make that assumption.

In particular, in cases where the behavior is undefined, the particular
behavior shown by gcc doesn't tell you anything.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Ben Bacarisse
2017-05-15 18:57:50 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
Post by bartc
I would prefer that the possibilities were purposely kept simple.
Ah, that's not what I meant. I was not asking you to invent a macro
language you'd prefer,
I don't need to invent, it already exists. But I'm saying it is
ill-defined and badly constrained.
You've said that many times already. I was inviting you to be
productive and to say what parts of the standard could be made clearer
and/or more explicit. Even if no changes are ever made to it, it would
make for a useful thread.

<snip>
Post by bartc
Post by Ben Bacarisse
It would be much more helpful if you said which ones are contentious or
badly defined by the standard. And why show 7, 8, 9 and 10? They all
seem to give the same output. Is there any thing to debate about those
cases?
You're assuming I know which ones are correct!
Not all. I am asking you to say which ones are contentious and why. If
don't know the correct output for any of these, say so -- that they are
all up for debate and we could have a productive discussion about the
rules and what they mean.
Post by bartc
But in the absence of that knowledge, I'm taking gcc to give
definitive versions.
That's a reasonable starting point, but it break down for anything
undefined, implementation specific or where the implementation is given
a free hand (as in number 11 for example).

Anyway, I see you have not posted the flags being used in your tests so
I'm not sure there's any value in this discussion -- at least not in the
sense of clarify what the standard intends.

<snip>
Post by bartc
Post by Ben Bacarisse
gcc Pelles C lccwin64 tcc msvc008
1: a b (c)
2: a d (c)
3: (a b (c))
4: (a d (c))
5: a (x)
6: aa
12: A( 10,20); A( 10,20); error if(==)if(==) 10,20;
You've found some bugs in lccwin64, one in each of Pelles C and tcc and
two in and old version of MSVC. (In line 11 both outputs are
acceptable). Are there any contentious results here? I.e. do you think
there is anything other than some bugs?
Yes, that a preprocessor is difficult to get right.
I think you mean "no, my only point is that this is hard". You
obviously don't want to discuss the actual rules.

<snip>
Post by bartc
#define str2(x) #x
#define str(x) str2(x)
puts(str(a b (c)));
then I get yet another assorted bunch of results. More bugs
presumably.
I doubt there will be any fewer that's almost certain. But with no idea
what flags you are using, none of these may be bugs.
Post by bartc
I think that whoever dreamt up the preprocessor should also have
provided a reference implementation.
Yes, that is one way to specify these things. The reference
implementation is probably better off being very abstract. Haskell
anyone?
Post by bartc
Post by Ben Bacarisse
Post by bartc
mcc (my compiler)
5: mac_a ( x )
12: error
(Again I've left only what appears to be significant). Do you think
these are bugs or is there some debate to be had about what 5 and 12
should be?
The majority verdict on #5 is that it should be 'a d (x)'. So my code
should really be changed (I thought I'd fixed 1-11 actually).
#5 is just "f" with these defines:

#define a(x) mac_a(x)
#define d(x) (x)
#define f a d d (x)

So, as far I can see, this is what happens. I'll write <x> for the
token x and <_> for the space token (I don't think newline tokens
feature here).

<f> is replaced by <a><_><d><_><d><_><(><x><)>. This replacement is
scanned for macros to expand along with any tokens the follow <f> (but
lets say there are none).

<a> not followed by <_>*<(> has no expansion so I believe we can now
output the first tokens (i.e. no repeated scanning happens)

result: <a><_>

<d> (again not followed by <_>*<(>) expands to nothing:

result: <a><_><d><_>

<d><_><(> is a function-like macro invocation, so we must collect the
arguments. This involves scanning (with no expansion) for a matching
<)> treating <,> at the outer level as a separator. That's easy. The
argument list has one token list: <x>.

Before substituting <x> we expand any macros, but there are none. This
round of exansion is done in isolation as if it were a short, separate
source file. There are no # or ## tokens in sight so we can simply
replace <x> in the replacement list with <x> from the macro arguments.

replacement list: <(><x><)>

Again, this list scanned for macros to expand (including, this time, any
tokens that might follow in the input stream but there is nothing to
do. We now know the final result:

result: <a><_><d><_><(><x><)>

Note that the output of -E need not reflect the actual spacing provided
it makes no difference to the final program.
Post by bartc
#12 I'm not too worried about, as it's a made-up example and most
compilers seem to go wrong. Yet, I don't really know whether it should
have worked or not.
I think it should. I can't see any justification for it not working,
but I'm far from infallible.
Post by bartc
If you're allowed to have #-directives in the middle of a macro-call,
then why shouldn't #12 work too? After all gcc managed to generate
what might have been expected.
No, you are not allowed pp directives in the middle of a macro call[1]
but I don't see the connection with #12. #12 relies on the fact the the
replacement list is scanned alone with the remaining tokens:

#define A(x,y) if(x==y)
#define B A(

so "B 1,2)" expands to "A(| 1,2)" (using | to mark where the replacement
list ends). This replacement list is scanned along with the remaining
tokens, so a valid call of A is seen.

[1] 6.10.3 p11:

"The sequence of preprocessing tokens bounded by the outside-most
matching parentheses forms the list of arguments for the function-like
macro. The individual arguments within the list are separated by comma
preprocessing tokens, but comma preprocessing tokens between matching
inner parentheses do not separate arguments. If there are sequences of
preprocessing tokens within the list of arguments that would otherwise
act as preprocessing directives, the behavior is undefined."
--
Ben.
bartc
2017-05-15 19:31:41 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by bartc
then I get yet another assorted bunch of results. More bugs
presumably.
I doubt there will be any fewer that's almost certain. But with no idea
what flags you are using, none of these may be bugs.
I just don't think that flags have much to do with it. Except for
perhaps for gcc (and clang which uses the same flags), where apparently
they can be used to make gcc do anything.

And if they do, then they shouldn't.
Post by Ben Bacarisse
Post by bartc
I think that whoever dreamt up the preprocessor should also have
provided a reference implementation.
Yes, that is one way to specify these things. The reference
implementation is probably better off being very abstract. Haskell
anyone?
Perhaps, after all, the C standard version is better!
Post by Ben Bacarisse
Post by bartc
The majority verdict on #5 is that it should be 'a d (x)'. So my code
should really be changed (I thought I'd fixed 1-11 actually).
#define a(x) mac_a(x)
#define d(x) (x)
#define f a d d (x)
So, as far I can see, this is what happens. I'll write <x> for the
token x and <_> for the space token (I don't think newline tokens
feature here).
<f> is replaced by <a><_><d><_><d><_><(><x><)>. This replacement is
scanned for macros to expand along with any tokens the follow <f> (but
lets say there are none).
<a> not followed by <_>*<(> has no expansion so I believe we can now
output the first tokens (i.e. no repeated scanning happens)
result: <a><_>
result: <a><_><d><_>
<d><_><(> is a function-like macro invocation, so we must collect the
arguments. This involves scanning (with no expansion) for a matching
<)> treating <,> at the outer level as a separator. That's easy. The
argument list has one token list: <x>.
Before substituting <x> we expand any macros, but there are none. This
round of exansion is done in isolation as if it were a short, separate
source file. There are no # or ## tokens in sight so we can simply
replace <x> in the replacement list with <x> from the macro arguments.
replacement list: <(><x><)>
Again, this list scanned for macros to expand (including, this time, any
tokens that might follow in the input stream but there is nothing to
result: <a><_><d><_><(><x><)>
Note that the output of -E need not reflect the actual spacing provided
it makes no difference to the final program.
OK, I'll have a closer look later on. An older version of my
preprocessor expanded this properly. A fix to solve another problem,
that involved repeatedly rescanning any expanded sequence, has
introduced a bug. That suggests a rewrite is in order, but I don't have
the inclination to do that (and the problem hasn't come up in real code
yet).
Post by Ben Bacarisse
No, you are not allowed pp directives in the middle of a macro call[1]
but I don't see the connection with #12. #12 relies on the fact the the
#define A(x,y) if(x==y)
#define B A(
so "B 1,2)" expands to "A(| 1,2)" (using | to mark where the replacement
list ends). This replacement list is scanned along with the remaining
tokens, so a valid call of A is seen.
"The sequence of preprocessing tokens bounded by the outside-most
matching parentheses forms the list of arguments for the function-like
macro. The individual arguments within the list are separated by comma
preprocessing tokens, but comma preprocessing tokens between matching
inner parentheses do not separate arguments. If there are sequences of
preprocessing tokens within the list of arguments that would otherwise
act as preprocessing directives, the behavior is undefined."
If it's so perfectly clear, why do so many compilers have trouble?
--
bartc
Ben Bacarisse
2017-05-15 21:08:57 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
Post by bartc
then I get yet another assorted bunch of results. More bugs
presumably.
I doubt there will be any fewer that's almost certain. But with no idea
what flags you are using, none of these may be bugs.
I just don't think that flags have much to do with it. Except for
perhaps for gcc (and clang which uses the same flags), where
apparently they can be used to make gcc do anything.
I really don't know why you can't just tell us. Your remark about it
not mattering suggest that you are using no flags at all (save for -E)
for any of compilers you listed. Is that correct?
Post by bartc
And if they do, then they shouldn't.
Nonsense. It would be entirely reasonable for a compiler to support
some legacy behaviour in either its default mode or with some special
flags.

<snip>
Post by bartc
Post by Ben Bacarisse
No, you are not allowed pp directives in the middle of a macro call[1]
but I don't see the connection with #12. #12 relies on the fact the the
#define A(x,y) if(x==y)
#define B A(
so "B 1,2)" expands to "A(| 1,2)" (using | to mark where the replacement
list ends). This replacement list is scanned along with the remaining
tokens, so a valid call of A is seen.
"The sequence of preprocessing tokens bounded by the outside-most
matching parentheses forms the list of arguments for the function-like
macro. The individual arguments within the list are separated by comma
preprocessing tokens, but comma preprocessing tokens between matching
inner parentheses do not separate arguments. If there are sequences of
preprocessing tokens within the list of arguments that would otherwise
act as preprocessing directives, the behavior is undefined."
If it's so perfectly clear, why do so many compilers have trouble?
That's rhetoric. Instead, can you say what you think is unclear about

#define A(x,y) if(x==y)
#define B A(
B 10,20)

? It's entirely possible that the erroneous results you report are not
due to misunderstanding but are the consequences of incorrect fixes
elsewhere. Or they may simply be due to incorrect implementation
(i.e. despite understanding the words in the standard).

But I found lccwin64 and PellesC (8.0) get example 12 right.

Using no flags and

Pelles Compiler Driver, Version 8.00.0
Copyright (c) Pelle Orinius 2002-2015

Logiciels/Informatique lcc-win (64 bits) version 4.1.
Compilation date: Oct 27 2016 16:34:50

but even a very old version of lccwin32 gets 12 right.
--
Ben.
bartc
2017-05-15 22:50:27 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
I really don't know why you can't just tell us. Your remark about it
not mattering suggest that you are using no flags at all (save for -E)
Actually, I did use -E or equivalent to get the output, and nothing
else. Would source be preprocessed differently when it was actually
compiled? I was anyway /only/ looking at preprocessing.
Post by Ben Bacarisse
Post by bartc
If it's so perfectly clear, why do so many compilers have trouble?
That's rhetoric. Instead, can you say what you think is unclear about
#define A(x,y) if(x==y)
#define B A(
B 10,20)
Why do you think I thought up the example? I wanted one where a macro
call was synthesised from elements at different levels of macro
expansion including level zero (direct from original source). Just to
see what would happen.
Post by Ben Bacarisse
? It's entirely possible that the erroneous results you report are not
due to misunderstanding but are the consequences of incorrect fixes
elsewhere. Or they may simply be due to incorrect implementation
(i.e. despite understanding the words in the standard).
But I found lccwin64 and PellesC (8.0) get example 12 right.
Using no flags and
Pelles Compiler Driver, Version 8.00.0
Copyright (c) Pelle Orinius 2002-2015
Logiciels/Informatique lcc-win (64 bits) version 4.1.
Compilation date: Oct 27 2016 16:34:50
but even a very old version of lccwin32 gets 12 right.
I've tried PellesC and lccwin again, and results are variable. In the
original code, the macros for #11 and #12 were positioned just before
each invocation. Also, part of #11 was commented out which I hadn't
realised (I redid all tests for an uncommented #11, but only looked at
and updated the #11 results).

Anyway, the upshot is that processing:

#define F(a) a*G
#define G(a) F(a)

11: F(2)//(9)

#define A(x,y) if(x==y)
#define B A(
12: B 10,20);

with Pelles C 64-bit 8.0.170 gives this -E output (some blanks and
#lines removed):

11: 2 *G

#define A(x,y) if(x==y)

12: A( 10,20);

But this input:

#define F(a) a*G
#define G(a) F(a)

11: F(2)(9)

#define A(x,y) if(x==y)
#define B A(
12: B 10,20);

produces:

11: 2 * F(9)

12: if(10 == 20);

So something funny is going on, judging from that #define being sent to
the output. I've observed something similar with lccwin64.
--
bartc
Keith Thompson
2017-05-15 23:13:45 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
I really don't know why you can't just tell us. Your remark about it
not mattering suggest that you are using no flags at all (save for -E)
Actually, I did use -E or equivalent to get the output, and nothing
else. Would source be preprocessed differently when it was actually
compiled? I was anyway /only/ looking at preprocessing.
gcc does not fully conform to any edition of the C standard by
default. I don't know how or whether the various "-std=..." options
affect the behavior of the preprocessor, but I would suggest using
"-std=c11 -pedantic-errors -E" if you're looking at the behavior
of the preprocessor in the context of the current C standard.
(But even that can be less than entirely useful for code whose
behavior is undefined.)

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Ben Bacarisse
2017-05-16 00:36:42 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
I really don't know why you can't just tell us. Your remark about it
not mattering suggest that you are using no flags at all (save for -E)
Actually, I did use -E or equivalent to get the output, and nothing
else.
Finally.
Post by bartc
Would source be preprocessed differently when it was actually
compiled? I was anyway /only/ looking at preprocessing.
Eh? I just want to know what flags you used. It's really not that odd
to ask when you posted compiler results.
Post by bartc
Post by Ben Bacarisse
Post by bartc
If it's so perfectly clear, why do so many compilers have trouble?
That's rhetoric. Instead, can you say what you think is unclear about
#define A(x,y) if(x==y)
#define B A(
B 10,20)
Why do you think I thought up the example? I wanted one where a macro
call was synthesised from elements at different levels of macro
expansion including level zero (direct from original source). Just to
see what would happen.
But what is unclear about it? Presumably you've read what the language
standard says about macro expansion. It's not many paragraphs. Did I
get it right in my explanation? Do you see a murky area where the words
could mean something else? I want to talk technical. You just seem to
want to bang the "ooh, it's all so complicated" drum.
Post by bartc
Post by Ben Bacarisse
? It's entirely possible that the erroneous results you report are not
due to misunderstanding but are the consequences of incorrect fixes
elsewhere. Or they may simply be due to incorrect implementation
(i.e. despite understanding the words in the standard).
But I found lccwin64 and PellesC (8.0) get example 12 right.
Using no flags and
Pelles Compiler Driver, Version 8.00.0
Copyright (c) Pelle Orinius 2002-2015
Logiciels/Informatique lcc-win (64 bits) version 4.1.
Compilation date: Oct 27 2016 16:34:50
but even a very old version of lccwin32 gets 12 right.
I've tried PellesC and lccwin again, and results are variable. In the
original code, the macros for #11 and #12 were positioned just before
each invocation. Also, part of #11 was commented out which I hadn't
realised (I redid all tests for an uncommented #11, but only looked at
and updated the #11 results).
Always post the code that gives the results you are reporting, please!

<snip cases>
Post by bartc
So something funny is going on, judging from that #define being sent
to the output.
It's just a bug. Here a small test case if you want to report it.

#define F()
F
#define X
X

There is obviously a bug in the code that looks ahead for '(' after
seeing a function-like macro name. The following #define is not
processed as a directive but it does persuade the scanner that no '(' is
coming.
Post by bartc
I've observed something similar with lccwin64.
Yes, lccwin64 gives the same results when given an input that actually
trips it up. Both PellesC and lccwin64 are based on Dave Hanson's lcc,
so I would bet the bug dates back to lcc's preprocessor. Both have been
extensively developed so they have obviously diverged from the original
lcc, but the coincidence suggests a common cause.

(Quick search... Source of lcc's cpp is available and, yes, it
generated the same faulty output.)
--
Ben.
bartc
2017-05-16 09:15:44 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by bartc
Would source be preprocessed differently when it was actually
compiled? I was anyway /only/ looking at preprocessing.
Eh? I just want to know what flags you used. It's really not that odd
to ask when you posted compiler results.
You know I avoid using compiler flags except for ones such as -E, -c or -O3.
Post by Ben Bacarisse
But what is unclear about it? Presumably you've read what the language
standard says about macro expansion. It's not many paragraphs. Did I
get it right in my explanation? Do you see a murky area where the words
could mean something else? I want to talk technical. You just seem to
want to bang the "ooh, it's all so complicated" drum.
It's not just me that thinks it's complicated. In the original #12
macro, 5 compilers out of 6 got it wrong. Now it turns out two of those
results were erroneous. But that still leaves half getting it wrong
(with those two turning out to have an unrelated bug).

I think (this is going back several months) I tried such an example to
see how much effort I should put into getting it right. So if the mighty
MSVC got it wrong (even with the 2008 version, but 2008 is still
comparatively recent), then I probably didn't need to bother (as there
were plenty of more pressing matters to get on with).

What it would mean in practice is that if it turns out I can't compile a
particular source, then neither can MSVC.

In the long term, that might even lead to people avoiding using those
dodgier macros. Whereas if all compilers accepted them, then the bar
would be raised even higher.
Post by Ben Bacarisse
There is obviously a bug in the code that looks ahead for '(' after
seeing a function-like macro name.
I don't have that bug because I deliberately limit the possibilities, so
that a macro call must look like one of these two:

M(A)
M (A)

That means there will be a set of source codes that my preprocessor will
fail on because they will do one of these, or worse:

M (A)
M
(A)
M /*comment*/ (A)

etc. But I don't care. If the C standard had such a restriction (and in
some places single spaces /are/ significant), then that bug in LCC
wouldn't have existed.
--
bartc
Philip Lantz
2017-05-17 02:17:49 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
Post by bartc
Would source be preprocessed differently when it was actually
compiled? I was anyway /only/ looking at preprocessing.
Eh? I just want to know what flags you used. It's really not that odd
to ask when you posted compiler results.
You know I avoid using compiler flags except for ones such as -E, -c or -O3.
If you're going to use gcc to help you determine what the standard requires,
I strongly suggest that you use a flag that tells it to compile some standard
flavor of C, rather than its own privately-defined C-like language.
Philip Lantz
2017-05-17 02:12:38 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
I really don't know why you can't just tell us. Your remark about it
not mattering suggest that you are using no flags at all (save for -E)
Actually, I did use -E or equivalent to get the output, and nothing
else. Would source be preprocessed differently when it was actually
compiled? I was anyway /only/ looking at preprocessing.
You seem to have completely missed his point. A compiler may treat its
input as C11, C99, C89, K&R, or some private C-like language, depending
on the flags.* Do you expect the preprocessor rules to be identical for
all these possibilities? (I don't know whether they are or not, but I
certainly wouldn't expect them to be without checking.)

* And of course gcc might treat it as Fortran, but I think we can ignore
that for our current purposes.
s***@casperkitty.com
2017-05-17 15:48:12 UTC
Reply
Permalink
Raw Message
Post by Philip Lantz
* And of course gcc might treat it as Fortran, but I think we can ignore
that for our current purposes.
If there exists a conforming C compiler that would interpret a file that
began with __FORTRAN as a FORTRAN-77 program (allowable behavior given
the presence of an implementation-reserved identifier) then if such a
file would be processed as a usable FORTRAN program it would also be a
"conforming C program" because there would be at least one conforming C
implementation which could process it.
Robert Wessel
2017-05-17 16:19:53 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Philip Lantz
* And of course gcc might treat it as Fortran, but I think we can ignore
that for our current purposes.
If there exists a conforming C compiler that would interpret a file that
began with __FORTRAN as a FORTRAN-77 program (allowable behavior given
the presence of an implementation-reserved identifier) then if such a
file would be processed as a usable FORTRAN program it would also be a
"conforming C program" because there would be at least one conforming C
implementation which could process it.
Piffle! There's no need for an ugly language declaration like that,
it's perfectly possible to write programs that can be compiled as C or
Fortran, or even run as a shell script.

http://www.ioccc.org/1986/applin/applin.c

As to whether or not such an ability reflects the original intention
of C (later ruined by the ANSI committee), I'll leave for others to
debate. ;-)
s***@casperkitty.com
2017-05-17 17:22:59 UTC
Reply
Permalink
Raw Message
Post by Robert Wessel
Post by s***@casperkitty.com
If there exists a conforming C compiler that would interpret a file that
began with __FORTRAN as a FORTRAN-77 program (allowable behavior given
the presence of an implementation-reserved identifier) then if such a
file would be processed as a usable FORTRAN program it would also be a
"conforming C program" because there would be at least one conforming C
implementation which could process it.
Piffle! There's no need for an ugly language declaration like that,
it's perfectly possible to write programs that can be compiled as C or
Fortran, or even run as a shell script.
http://www.ioccc.org/1986/applin/applin.c
My point was that the appearance of an identifier starting with __ would
give a C compiler latitude to treat everything else in arbitrary fashion,
so the only requirement for such a file to be conforming would be the
existence of a conforming compiler that would process it usefully. The
example might have been improved, however, by having the magic line be

const __FORTRAN=77;

which would then allow the program to be a valid FORTRAN file as well as a
valid C file, at least if a lowercase "C" is a valid comment indicator (the
system on which I programmed FORTRAN didn't *have* lowercase letters, so
I don't know).
Hans-Peter Diettrich
2017-05-18 02:17:56 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Philip Lantz
* And of course gcc might treat it as Fortran, but I think we can ignore
that for our current purposes.
If there exists a conforming C compiler that would interpret a file that
began with __FORTRAN as a FORTRAN-77 program (allowable behavior given
the presence of an implementation-reserved identifier) then if such a
file would be processed as a usable FORTRAN program it would also be a
"conforming C program" because there would be at least one conforming C
implementation which could process it.
IMO it's a matter of the compiler front-ends, which languages are
accepted. But in most cases distinct compilers for different languages
are built, even if these use the gcc infrastructure and code generation
back-ends.

DoDi
Philip Lantz
2017-05-19 01:40:56 UTC
Reply
Permalink
Raw Message
Post by Hans-Peter Diettrich
Post by s***@casperkitty.com
Post by Philip Lantz
* And of course gcc might treat it as Fortran, but I think we can ignore
that for our current purposes.
If there exists a conforming C compiler that would interpret a file that
began with __FORTRAN as a FORTRAN-77 program (allowable behavior given
the presence of an implementation-reserved identifier) then if such a
file would be processed as a usable FORTRAN program it would also be a
"conforming C program" because there would be at least one conforming C
implementation which could process it.
IMO it's a matter of the compiler front-ends, which languages are
accepted. But in most cases distinct compilers for different languages
are built, even if these use the gcc infrastructure and code generation
back-ends.
Regardless of that, if you don't know what options the compiler was invoked
with, you have /no/ idea what it is going to do, and you can't draw any
conclusions from what messages it prints.
Philip Lantz
2017-05-19 01:36:58 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Philip Lantz
* And of course gcc might treat it as Fortran, but I think we can ignore
that for our current purposes.
If there exists a conforming C compiler that would interpret a file that
began with __FORTRAN as a FORTRAN-77 program (allowable behavior given
the presence of an implementation-reserved identifier) then if such a
file would be processed as a usable FORTRAN program it would also be a
"conforming C program" because there would be at least one conforming C
implementation which could process it.
You completely missed my point. I wasn't talking about any of that. Bart
thinks that command line options don't matter. Consider this, which shows
that command line options matter quite a bit:

$ cat > f.c
C This is Fortran
print *,"This is Fortran."
end

$ gcc -x f77 -c f.c

$ cat c.f
/* this is C */
main() { printf("this is C\n"); }

$ gcc -x c -c c.f
c.f: In function 'main':
c.f:2:10: warning: incompatible implicit declaration of built-in function 'printf' [enabled by default]
main() { printf("this is C\n"); }
^

$ gcc c.f
c.f:1.1:

/* this is C */
1
Error: Non-numeric character in statement label at (1)
c.f:1.2:

/* this is C */
1
Error: Invalid character in name at (1)
c.f:2.1:

main() { printf("this is C\n"); }
1
Error: Non-numeric character in statement label at (1)
c.f:2.1:

main() { printf("this is C\n"); }
1
Error: Unclassifiable statement at (1)
c.f:2.33:

main() { printf("this is C\n"); }
1
Error: Invalid character in name at (1)
bartc
2017-05-16 09:57:41 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by bartc
The majority verdict on #5 is that it should be 'a d (x)'. So my code
should really be changed (I thought I'd fixed 1-11 actually).
#define a(x) mac_a(x)
#define d(x) (x)
#define f a d d (x)
So, as far I can see, this is what happens. I'll write <x> for the
token x and <_> for the space token (I don't think newline tokens
feature here).
<f> is replaced by <a><_><d><_><d><_><(><x><)>. This replacement is
scanned for macros to expand along with any tokens the follow <f> (but
lets say there are none).
<a> not followed by <_>*<(> has no expansion so I believe we can now
output the first tokens (i.e. no repeated scanning happens)
result: <a><_>
result: <a><_><d><_>
<d><_><(> is a function-like macro invocation, so we must collect the
arguments. This involves scanning (with no expansion) for a matching
<)> treating <,> at the outer level as a separator. That's easy. The
argument list has one token list: <x>.
Before substituting <x> we expand any macros, but there are none. This
round of exansion is done in isolation as if it were a short, separate
source file. There are no # or ## tokens in sight so we can simply
replace <x> in the replacement list with <x> from the macro arguments.
replacement list: <(><x><)>
Again, this list scanned for macros to expand (including, this time, any
tokens that might follow in the input stream but there is nothing to
result: <a><_><d><_><(><x><)>
Now that this first <d> is followed by <(>, why wouldn't it now be
expanded? I believe that is what MSVC must have done.

Is it because once a macro name has been processed once (and either it
has been expanded, or it couldn't be expanded because it wasn't followed
by "(") then it's no longer eligible to be expanded again?

(That is not the problem I get. I now fail to expand this #5 properly
because I added a loop. That was to get around this example:

#define info(x) (mem(x))
#define llink(x) info(x+1)
#define prevbreak llink
#define serial info

serial(prevbreak(10));

Output should be: (mem((mem(10+1))));

All compilers manage this except TCC which produces (mem(info(10+1)));
As did mine before I made the fix, but that broke example #5 (however
this one occurs in a real program so it takes priority).

The issue here is that at some point, a macro name is produced that is
not at first followed by "(", so is not expanded, but later it is.
Similar to my point about why the d(x) in "a d (x)" wasn't expanded above.)
--
bartc
Thiago Adams
2017-05-16 14:02:02 UTC
Reply
Permalink
Raw Message
Post by bartc
The majority verdict on #5 is that it should be 'a d (x)'. So my code
should really be changed (I thought I'd fixed 1-11 actually).
It would be very nice if you publish the samples/results in your github.

As a comment, Microsoft is planning to fix the preprocessor, not because of C, but because of C++.

https://blogs.msdn.microsoft.com/vcblog/2017/05/10/c17-features-in-vs-2017-3/

"[C] C99 preprocessor support is still partial, in that variadic macros mostly work. We’re planning to overhaul the preprocessor before marking this as complete."

C++ perspective:
"This document details the changes that need to be made to the working draft to resynchronize the preprocessor and translation phases of C++ with C99
"
Working draft changes for C99 preprocessor synchronization
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1653.htm
Ben Bacarisse
2017-05-16 15:26:33 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by bartc
Post by bartc
The majority verdict on #5 is that it should be 'a d (x)'. So my code
should really be changed (I thought I'd fixed 1-11 actually).
#define a(x) mac_a(x)
#define d(x) (x)
#define f a d d (x)
So, as far I can see, this is what happens. I'll write <x> for the
token x and <_> for the space token (I don't think newline tokens
feature here).
<f> is replaced by <a><_><d><_><d><_><(><x><)>. This replacement is
scanned for macros to expand along with any tokens the follow <f> (but
lets say there are none).
<a> not followed by <_>*<(> has no expansion so I believe we can now
output the first tokens (i.e. no repeated scanning happens)
result: <a><_>
result: <a><_><d><_>
<d><_><(> is a function-like macro invocation, so we must collect the
arguments. This involves scanning (with no expansion) for a matching
<)> treating <,> at the outer level as a separator. That's easy. The
argument list has one token list: <x>.
Before substituting <x> we expand any macros, but there are none. This
round of exansion is done in isolation as if it were a short, separate
source file. There are no # or ## tokens in sight so we can simply
replace <x> in the replacement list with <x> from the macro arguments.
replacement list: <(><x><)>
Again, this list scanned for macros to expand (including, this time, any
tokens that might follow in the input stream but there is nothing to
result: <a><_><d><_><(><x><)>
Now that this first <d> is followed by <(>, why wouldn't it now be
expanded? I believe that is what MSVC must have done.
Do you think it should be expanded in this case:

#define d(x) [x]
#define p (y)
d p

? Replacement lists are re-scanned (once, I believe) but not the
result. Once tokens have been emitted the scanning does no back-up to
see if subsequent tokens have now make a valid call.
Post by bartc
Is it because once a macro name has been processed once (and either it
has been expanded, or it couldn't be expanded because it wasn't
followed by "(") then it's no longer eligible to be expanded again?
No, that wording applies to macros that have been expanded -- they are
not recognised when scanning the replacement list. d has not been
expanded here and it does not appear in a replacement list anymore.
It's been though the algorithm and out the other end.
Post by bartc
(That is not the problem I get. I now fail to expand this #5 properly
#define info(x) (mem(x))
#define llink(x) info(x+1)
#define prevbreak llink
#define serial info
serial(prevbreak(10));
Output should be: (mem((mem(10+1))));
All compilers manage this except TCC which produces (mem(info(10+1)));
As did mine before I made the fix, but that broke example #5 (however
this one occurs in a real program so it takes priority).
This sort of "balloon animal" debugging -- where you squeeze it into
shape in one place on one place and a bug pops up somewhere else --
usually indicaes a need to step back and review the overall algorithm.
At least that's how I deal with this sort of situation.
Post by bartc
The issue here is that at some point, a macro name is produced that is
not at first followed by "(", so is not expanded, but later it
is. Similar to my point about why the d(x) in "a d (x)" wasn't
expanded above.)
You do have to get the sequence right. I started to go through this
example but I failed to find a good notation to show the nested
expansions and I ended up thinking it was not helping. I might revisit
it if I get time.
--
Ben.
Hans-Peter Diettrich
2017-05-18 01:27:33 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
<snip>
Post by bartc
Post by Ben Bacarisse
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. Do you think the
wording of the standard needs to be improved and, if so, how?
I would prefer that the possibilities were purposely kept simple.
Ah, that's not what I meant. I was not asking you to invent a macro
language you'd prefer, I was asking if you can suggest a way in which
the wording of (or, for that matter, any other changes to) the standard
would make the current intended semantics clearer.
IMO the problem arises from translation phase 4, with the mix of macro
expansion and preprocessor directive handling. If macro expansion would
start only after all macro arguments are collected, at least no
preprocessor directives can occur in the argument list. This convention
also would produce the same result, regardless of whether a function or
a functional macro is compiled.
Post by Ben Bacarisse
This is a good case to consider since there is clearly some
disagreement. (There's a high probably that I'm wrong about it being
defined but it's certain that it's not as clear as I thought it was.)
<snip>
Post by bartc
Post by Ben Bacarisse
Post by bartc
But then maybe someone will have a macro expansion that generates
#-directives
Macros can't (validly) expand to directives.
Synthetic preprocessor directive construction should be
disallowed/ignored, because this leads to self modifying code. This were
in accordance to the tokenization, where it is impossible to construct
comments from /##/, as MSVC did some time ago.
Post by Ben Bacarisse
Post by bartc
6: e(a,,a)
AFAIK empty macro arguments are not allowed.

DoDi
Ben Bacarisse
2017-05-18 02:25:46 UTC
Reply
Permalink
Raw Message
Post by Hans-Peter Diettrich
Post by Ben Bacarisse
<snip>
Post by bartc
Post by Ben Bacarisse
Post by bartc
Are C's preprocessor and macro expansion rules really so poorly
defined that so many compilers get it wrong?
Not, I think, in this case. It seems very clear. Do you think the
wording of the standard needs to be improved and, if so, how?
I would prefer that the possibilities were purposely kept simple.
Ah, that's not what I meant. I was not asking you to invent a macro
language you'd prefer, I was asking if you can suggest a way in which
the wording of (or, for that matter, any other changes to) the standard
would make the current intended semantics clearer.
IMO the problem arises from translation phase 4, with the mix of macro
expansion and preprocessor directive handling. If macro expansion
would start only after all macro arguments are collected, at least no
preprocessor directives can occur in the argument list.
Macro expansion does start only after the arguments are collected. At
least it can be arranged that way. Arguments are also expanded but that
does not have to happen until after they are all collected.
Post by Hans-Peter Diettrich
This
convention also would produce the same result, regardless of whether a
function or a functional macro is compiled.
I don't follow this, but then I think I've misunderstood what you mean
above.

<snip>
Post by Hans-Peter Diettrich
Post by Ben Bacarisse
Post by bartc
6: e(a,,a)
AFAIK empty macro arguments are not allowed.
No, they are allowed.
--
Ben.
James Kuyper
2017-05-18 02:35:46 UTC
Reply
Permalink
Raw Message
...
Post by Hans-Peter Diettrich
Post by bartc
6: e(a,,a)
AFAIK empty macro arguments are not allowed.
6.10.3p4 mentions "arguments consisting of no preprocessing tokens",
which implies that it's permissible for such arguments to exist. Section
7 of the forward mentions "empty macro arguments" as one of the features
introduced in C99.
Tim Rentsch
2017-05-20 01:38:18 UTC
Reply
Permalink
Raw Message
Post by James Kuyper
...
Post by Hans-Peter Diettrich
Post by bartc
6: e(a,,a)
AFAIK empty macro arguments are not allowed.
6.10.3p4 mentions "arguments consisting of no preprocessing tokens",
which implies that its permissible for such arguments to exist. Section
7 of the forward mentions "empty macro arguments" as one of the features
introduced in C99.
If you will excuse a micro-quibble... They are mentioned as one
of the /changes/ in C99. Empty macro arguments were actually
introduced in the original standard, in section G.5.12 of C90
(and IIANM under a different numbering in C89), as a "Common
Extension".
Thiago Adams
2017-05-15 13:00:50 UTC
Reply
Permalink
Raw Message
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
It's working in my preprocessor, not because I understood the standard, but because the way I did worked. I assumed that the scanner will see

'M' '(' '10' ',' '20', '\n' '\n' '\n' '30' ')'

The extra '\n' are for #else and #endif

How about split macro call with #include?

--header1.h--
#define AB(a, b)
AB(1,

--main.c--
#include "header.h"
2)

//unexpected end of file in macro expansion (VC++)

So, #include is not something continuous 'EOF', but #if blocks are (with added \n).
bartc
2017-05-15 14:05:28 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
It's working in my preprocessor, not because I understood the standard, but because the way I did worked.
Yes, there is that too. Sometimes it works by chance. I thought doing
multiple passes, to first deal with #include and #if, would be better
behaved. But thinking again, that won't work for #if because the
expression will depend on macros, so they need to be expanded first
before #if knows if its expression is true or false. And presumably you
can't do it for #includes because they rely a lot on #ifs too.

And what happens here:

#if M(
#if C
A
#else
B
#endif
)
#endif

It seems that this isn't allowed because the #if expression must be on
one line. That seems a good rule to me; why can't it apply everywhere!

I assumed that the scanner will see
Post by Thiago Adams
'M' '(' '10' ',' '20', '\n' '\n' '\n' '30' ')'
The extra '\n' are for #else and #endif
How about split macro call with #include?
--header1.h--
#define AB(a, b)
AB(1,
--main.c--
#include "header.h"
2)
//unexpected end of file in macro expansion (VC++)
So, #include is not something continuous 'EOF', but #if blocks are (with added \n).
No, I just tried splitting a macro call across include files, and it
didn't work.
--
bartc
Thiago Adams
2017-05-24 16:44:32 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
It's working in my preprocessor, not because I understood the standard, but because the way I did worked. I assumed that the scanner will see
I lost track of the conversation.
Visual studio preprocessor ignores tokens before '('.
Is this correct? I found on windows headers expansion with spaces before '('.

My preprocessor has a problem, because I consume the tokens
while I am searching for '('. But if '(' is not found then
I should not consume the tokens. So I will have to keep track
of lookaheads. :-/


Sample

#define max(a, b) ((a) > (b) ? (a) : (b))

//max is not a macro here (But there are two tokens ahead)

int max /*comment1*/ /*comment2*/ ;

//max is a macro here
max /*comment1*/ /*comment2*/ (1, 2);
Thiago Adams
2017-05-27 00:26:09 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by Thiago Adams
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
It's working in my preprocessor, not because I understood the standard, but because the way I did worked. I assumed that the scanner will see
I lost track of the conversation.
Visual studio preprocessor ignores tokens before '('.
Is this correct? I found on windows headers expansion with spaces before '('.
My preprocessor has a problem, because I consume the tokens
while I am searching for '('. But if '(' is not found then
I should not consume the tokens. So I will have to keep track
of lookaheads. :-/
Sample
#define max(a, b) ((a) > (b) ? (a) : (b))
//max is not a macro here (But there are two tokens ahead)
int max /*comment1*/ /*comment2*/ ;
//max is a macro here
max /*comment1*/ /*comment2*/ (1, 2);
Answering my own question..

In Phase 3
...
3) Each comment is replaced by one space character.

Phase 4
...
1) The preprocessor is executed.

http://en.cppreference.com/w/c/language/translation_phases
Tim Rentsch
2017-05-15 13:43:11 UTC
Reply
Permalink
Raw Message
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. [...]
Did you give gcc the -pedantic-errors option, like several
people have suggested?
fir
2017-05-15 16:47:49 UTC
Reply
Permalink
Raw Message
Post by bartc
#include <stdio.h>
#define M(a,b,c) printf("%d %d %d\n",(a),(b),(c));
#define C 2
int main(void) {
#if C>=3
M(10,20,
#else
M(100,200,
#endif
30)
}
It either calls printf with arguments 10,20,30 or 100,200,30. But it
splits the macro invocation with conditionals, that causes compile
errors with pelles c, lccwin, dmc, and MSVC. It compiles with gcc and
(surprisingly) tiny C. Also (using online compilers) with clang.
The thing is that every so often you come across troublesome macros like
this that only work on some compilers. But why should that be the case?
Are C's preprocessor and macro expansion rules really so poorly defined
that so many compilers get it wrong? (I certainly thought so when I
tried to implement a preprocessor earlier this year.)
Maybe, you can get more consistent behaviour by doing multiple passes,
so that all the #ifs are done first for example, then the macro expansion.
But then maybe someone will have a macro expansion that generates
#-directives, or other macro invocations created from parts joined
together with ## or that use #-stringifying, that gcc will somehow
manage to compile as expected! Then that becomes the benchmark for what
is expected to work.
So, does anyone actually know EXACTLY what the capabilities of the C
macro system are? Or do compilers just make them up as they go along?
With gcc in the lead. (I don't intend to make this work in my own
implementation. I believe there should be clearly-defined limits to what
is possible and what is considered reasonable.)
(This is not a made-up example; the following was posted in
comp.lang.python today in "How to install Python package from source on
If you're using 3.6, you'll have to build from source. The package has
a single C extension without external dependencies, so it should be a
straight-forward build if you have Visual Studio 2015+ installed with
the C/C++ compiler for x86. Ideally it should work straight from pip.
But I tried and it failed in 3.6.1 due to the new PySlice_GetIndicesEx
macro. Apparently MSVC doesn't like preprocessor code like this in
#if PY_MAJOR_VERSION >= 3
if (PySlice_GetIndicesEx(item, Py_SIZE(self),
#else
if (PySlice_GetIndicesEx((PySliceObject*)item, Py_SIZE(self),
#endif
&start, &stop, &step, &slicelength) < 0) {
It fails with a C1057 error (unexpected end of file in macro
expansion). The build will succeed if you copy the common line with
`&start` to each case and comment out the original line, such that the
macro invocation isn't split across an #if / #endif. This is an ugly
consequence of making PySlice_GetIndicesEx a macro. I wonder if it
could be written differently to avoid this problem.
macro language layer such as this used in c (has it its name? "c macro language?") may be usefull [though i NEVER use and used it (ecept very simple tries), you will never se even one define in all my code from begining till now]

i always belived that some alias-language which would take c program semantics into account can go better
than abstract text based layer macro language), but i may be wrong [and if such alias-language would be usefull maybe it should be just built in in core c? i dont know]
Loading...