Discussion:
gcc failure (optimised/non-optimised)
(too old to reply)
James Harris
2017-04-22 08:06:52 UTC
Permalink
Raw Message
I saw this posted elsewhere. The gcc optimiser evidently changes the
behaviour of the code relative to its non-optimised effect.

int main(int argc, char **argv) {
int i;
for (i = 0; i < i + 1; i++)
;
return 0;
}


$ cc iter-test.c
$ time ./a.out
real 0m7.314s
user 0m7.304s
sys 0m0.000s

$ cc iter-test.c -O3
$ time ./a.out
(ran until interrupted)

It seems that when compared with the unoptimised code the optimiser (at
least of gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)) adjusts the
loop test so that it is always true.

Run times
-O0: 7s (unoptimised)
-O1: 2s
-O2: ran until interrupted
-O3: ran until interrupted

Thoughts?
--
James Harris
Andrew Cooper
2017-04-22 08:49:29 UTC
Permalink
Raw Message
Post by James Harris
I saw this posted elsewhere. The gcc optimiser evidently changes the
behaviour of the code relative to its non-optimised effect.
int main(int argc, char **argv) {
int i;
for (i = 0; i < i + 1; i++)
;
return 0;
}
$ cc iter-test.c
$ time ./a.out
real 0m7.314s
user 0m7.304s
sys 0m0.000s
$ cc iter-test.c -O3
$ time ./a.out
(ran until interrupted)
It seems that when compared with the unoptimised code the optimiser (at
least of gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)) adjusts the
loop test so that it is always true.
Run times
-O0: 7s (unoptimised)
-O1: 2s
-O2: ran until interrupted
-O3: ran until interrupted
Thoughts?
Overflow of signed integers is UB. Therefore "i < i + 1" can
legitimately be simplified to true.

Use -fwrapv if you want GCC/Clang to strictly consider signed integer
overflow to have defined twos-compliment behaviour.

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html is
an excellent set of articles which everyone should read.

~Andrew
James Harris
2017-04-22 09:52:27 UTC
Permalink
Raw Message
Post by Andrew Cooper
Post by James Harris
I saw this posted elsewhere. The gcc optimiser evidently changes the
behaviour of the code relative to its non-optimised effect.
int main(int argc, char **argv) {
int i;
for (i = 0; i < i + 1; i++)
;
return 0;
}
$ cc iter-test.c
$ time ./a.out
real 0m7.314s
user 0m7.304s
sys 0m0.000s
$ cc iter-test.c -O3
$ time ./a.out
(ran until interrupted)
It seems that when compared with the unoptimised code the optimiser (at
least of gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)) adjusts the
loop test so that it is always true.
Run times
-O0: 7s (unoptimised)
-O1: 2s
-O2: ran until interrupted
-O3: ran until interrupted
Thoughts?
Overflow of signed integers is UB. Therefore "i < i + 1" can
legitimately be simplified to true.
Use -fwrapv if you want GCC/Clang to strictly consider signed integer
overflow to have defined twos-compliment behaviour.
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html is
an excellent set of articles which everyone should read.
Thanks. That's very interesting.

What gets me is that the compiler is (potentially) altering the
behaviour of the code without issuing a warning. That's true even with
-Wall -Wextra.

I found that one has to add -Wstrict-overflow=5 to get the compiler to
warn about the signed-overflow assumption its optimiser is making.

$ cc iter-test.c -Wall -Wextra -Wstrict-overflow=5 -O2
iter-test.c: In function Ç main Ç :
iter-test.c:3:5: warning: assuming signed overflow does not occur when
reducing constant in comparison [-Wstrict-overflow]
for (i = 0; i < i + 1; i++)
^

One to add to my list of compiler switches to use. But doing that has
two problems:

1. How many more similar switches would be needed to catch all the other
forms of undefined behaviour that the compiler is able to recognise?

2. I think I would rather this and similar warnings were enabled by
default and needed a conscious decision to disable. Failing that, IMO it
would be OK to have a switch such as -Wundefined which turned on all
such checking but still allowed individual checks to be turned off.
--
James Harris
Gareth Owen
2017-04-22 10:37:05 UTC
Permalink
Raw Message
Post by James Harris
Post by Andrew Cooper
Overflow of signed integers is UB. Therefore "i < i + 1" can
legitimately be simplified to true.
Use -fwrapv if you want GCC/Clang to strictly consider signed integer
overflow to have defined twos-compliment behaviour.
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html is
an excellent set of articles which everyone should read.
Thanks. That's very interesting.
What gets me is that the compiler is (potentially) altering the
behaviour of the code without issuing a warning. That's true even with
-Wall -Wextra.
The code does not have (defined) behaviour. Code whose behaviour is not
defined, does not have to consistent modes of misbehaviour.
James Harris
2017-04-24 08:07:13 UTC
Permalink
Raw Message
Post by Gareth Owen
Post by James Harris
Post by Andrew Cooper
Overflow of signed integers is UB. Therefore "i < i + 1" can
legitimately be simplified to true.
Use -fwrapv if you want GCC/Clang to strictly consider signed integer
overflow to have defined twos-compliment behaviour.
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html is
an excellent set of articles which everyone should read.
Thanks. That's very interesting.
What gets me is that the compiler is (potentially) altering the
behaviour of the code without issuing a warning. That's true even with
-Wall -Wextra.
The code does not have (defined) behaviour. Code whose behaviour is not
defined, does not have to consistent modes of misbehaviour.
That's the correct "C" position, of course, and I understand it.
However, I am uncomfortable with a compiler changing the effect of the
code without warning.
--
James Harris
Keith Thompson
2017-04-24 15:33:47 UTC
Permalink
Raw Message
[...]
Post by James Harris
Post by Gareth Owen
Post by James Harris
What gets me is that the compiler is (potentially) altering the
behaviour of the code without issuing a warning. That's true even with
-Wall -Wextra.
The code does not have (defined) behaviour. Code whose behaviour is not
defined, does not have to consistent modes of misbehaviour.
That's the correct "C" position, of course, and I understand it.
However, I am uncomfortable with a compiler changing the effect of the
code without warning.
Consider this:

char message[5] = "hello";
puts(message);

This has undefined behavior because `message` does not contain a
(valid null-terminated) string. Its actual behavior is likely to
vary depending on optimization level and other compiler options.
Does that make you equally uncomfortable?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
James Harris
2017-04-28 21:32:00 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by James Harris
Post by Gareth Owen
Post by James Harris
What gets me is that the compiler is (potentially) altering the
behaviour of the code without issuing a warning. That's true even with
-Wall -Wextra.
The code does not have (defined) behaviour. Code whose behaviour is not
defined, does not have to consistent modes of misbehaviour.
That's the correct "C" position, of course, and I understand it.
However, I am uncomfortable with a compiler changing the effect of the
code without warning.
char message[5] = "hello";
puts(message);
This has undefined behavior because `message` does not contain a
(valid null-terminated) string. Its actual behavior is likely to
vary depending on optimization level and other compiler options.
Does that make you equally uncomfortable?
Strangely, no. Perhaps because what will be 'put' will depend on the
layout of memory rather than the compiler making a different assumption
for a different optimisation level.
--
James Harris
s***@casperkitty.com
2017-04-28 21:53:46 UTC
Permalink
Raw Message
Post by James Harris
Post by Keith Thompson
char message[5] = "hello";
puts(message);
This has undefined behavior because `message` does not contain a
(valid null-terminated) string. Its actual behavior is likely to
vary depending on optimization level and other compiler options.
Does that make you equally uncomfortable?
Strangely, no. Perhaps because what will be 'put' will depend on the
layout of memory rather than the compiler making a different assumption
for a different optimisation level.
Given

#include <stdio.h>

char message[5] = "Hello";
char magic = 0;

int main(void)
{

if (message+5 == &magic)
printf("%d", message[5]);
return 0;
}

evaluation of the pointer expression "message+5" is defined behavior,
as is the comparison between that address and the address of magic. Even
if message+5 == &magic, however, that does not imply anything about the
behavior of evaluating message[5].

Consider the function

int test(int x)
{
if (magic) return magic;
message[x]++;
return magic;
}

In the event that message+5 == magic, should a compiler be required to
generate code which will return 1 if the above function is invoked with
magic==0 and x==5?
Gareth Owen
2017-04-29 06:00:35 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
if (message+5 == &magic)
I think this equality expression may be UB in C11 (S6.5.8.5)
It's merely unspecified in C++.
Ben Bacarisse
2017-04-29 10:19:50 UTC
Permalink
Raw Message
Post by Gareth Owen
Post by s***@casperkitty.com
if (message+5 == &magic)
I think this equality expression may be UB in C11 (S6.5.8.5)
It's merely unspecified in C++.
You are looking at the wrong section. It's 6.5.9 (p6) that applies.

The take-home result is that you can always test valid pointers for
equality and you may get a positive result from testing the valid "one
past the end of an array" pointer and the address of some other object
that just happens to follow on in memory. And since non-array objects
are considered to be arrays of length one when constructing pointers,
this applies to ordinary objects too.
--
Ben.
Gareth Owen
2017-04-29 10:30:52 UTC
Permalink
Raw Message
Post by Ben Bacarisse
Post by Gareth Owen
Post by s***@casperkitty.com
if (message+5 == &magic)
I think this equality expression may be UB in C11 (S6.5.8.5)
It's merely unspecified in C++.
You are looking at the wrong section. It's 6.5.9 (p6) that applies.
I stand corrected. Thanks.
Noob
2017-04-22 14:12:55 UTC
Permalink
Raw Message
Post by James Harris
What gets me is that the compiler is (potentially) altering the
behaviour of the code without issuing a warning. That's true even with
-Wall -Wextra.
The problem with your logic is that the code has undefined behavior;
in other words, there is no behavior to "alter".

I think you're alluding to the principle of least surprise.
https://en.wikipedia.org/wiki/Principle_of_least_astonishment

Since C often closely maps to the underlying HW, it often comes
as a surprise that "int" is slightly more abstract in the abstract
machine.
Post by James Harris
I found that one has to add -Wstrict-overflow=5 to get the compiler to
warn about the signed-overflow assumption its optimiser is making.
Would the UB sanitizer help here?
https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
-fsanitize=undefined
Post by James Harris
One to add to my list of compiler switches to use. But doing that has
1. How many more similar switches would be needed to catch all the other
forms of undefined behaviour that the compiler is able to recognise?
I don't think the compiler always keeps enough information around
to be able to precisely pinpoint where the UB originated from, in
the latest optimization stages.
Post by James Harris
2. I think I would rather this and similar warnings were enabled by
default and needed a conscious decision to disable. Failing that, IMO it
would be OK to have a switch such as -Wundefined which turned on all
such checking but still allowed individual checks to be turned off.
Perhaps you should open an issue on bugzilla?

Regards.
James Harris
2017-04-24 08:16:52 UTC
Permalink
Raw Message
Post by Noob
Post by James Harris
What gets me is that the compiler is (potentially) altering the
behaviour of the code without issuing a warning. That's true even with
-Wall -Wextra.
The problem with your logic is that the code has undefined behavior;
in other words, there is no behavior to "alter".
I think you're alluding to the principle of least surprise.
https://en.wikipedia.org/wiki/Principle_of_least_astonishment
Since C often closely maps to the underlying HW, it often comes
as a surprise that "int" is slightly more abstract in the abstract
machine.
Post by James Harris
I found that one has to add -Wstrict-overflow=5 to get the compiler to
warn about the signed-overflow assumption its optimiser is making.
Would the UB sanitizer help here?
https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
-fsanitize=undefined
Sadly not, though it is interesting! I upgraded and tried it but
-fsanitize=undefined produces the report at runtime. The compiler makes
assumptions at compile time and it's there that I think a compiler
should produce a warning.
Post by Noob
Post by James Harris
One to add to my list of compiler switches to use. But doing that has
1. How many more similar switches would be needed to catch all the other
forms of undefined behaviour that the compiler is able to recognise?
I don't think the compiler always keeps enough information around
to be able to precisely pinpoint where the UB originated from, in
the latest optimization stages.
Post by James Harris
2. I think I would rather this and similar warnings were enabled by
default and needed a conscious decision to disable. Failing that, IMO it
would be OK to have a switch such as -Wundefined which turned on all
such checking but still allowed individual checks to be turned off.
Perhaps you should open an issue on bugzilla?
Well, it's not a bug, more a feature suggestion. If bugzilla is the
right place for that them perhaps I should.

I was mainly interested in the question of "what a compiler /should/ do"
when it makes optimisation assumptions.
--
James Harris
Andrew Haley
2017-04-23 10:41:06 UTC
Permalink
Raw Message
Post by James Harris
What gets me is that the compiler is (potentially) altering the
behaviour of the code without issuing a warning. That's true even with
-Wall -Wextra.
I found that one has to add -Wstrict-overflow=5 to get the compiler to
warn about the signed-overflow assumption its optimiser is making.
$ cc iter-test.c -Wall -Wextra -Wstrict-overflow=5 -O2
iter-test.c:3:5: warning: assuming signed overflow does not occur when
reducing constant in comparison [-Wstrict-overflow]
for (i = 0; i < i + 1; i++)
^
One to add to my list of compiler switches to use. But doing that has
1. How many more similar switches would be needed to catch all the other
forms of undefined behaviour that the compiler is able to recognise?
It's impossible at compile time without triggering a lot of false
positives. It's possible at runtime, but difficult.
Post by James Harris
2. I think I would rather this and similar warnings were enabled by
default and needed a conscious decision to disable. Failing that, IMO it
would be OK to have a switch such as -Wundefined which turned on all
such checking but still allowed individual checks to be turned off.
There would be too much collateral damage.

Consider this:

extern int flonger(int n);
void poo(int m) {
for (int i = 0; i < m; i += 2)
flonger(i * 10);
}

Do you want a warning for every such loop? The compiler doesn't
necessarily keep track of all the places that it assumes signed
overflow does not occur.

Andrew.
s***@casperkitty.com
2017-04-23 21:14:35 UTC
Permalink
Raw Message
Post by Andrew Haley
extern int flonger(int n);
void poo(int m) {
for (int i = 0; i < m; i += 2)
flonger(i * 10);
}
Do you want a warning for every such loop? The compiler doesn't
necessarily keep track of all the places that it assumes signed
overflow does not occur.
It might be useful to point out why it should matter: if a compiler knew that
flonger would always return, it could replace the loop with:

if (m > 0)
{
int __temp = (m-2)*10;
for (int i=0; i<=__temp; i+=20)
flonger(i);
}

I'm not sure how often that would be more efficient than:

if (m > 0)
{
int __temp1 = 0;
unsigned __temp2 = m;
do
{
flonger(__temp1);
__temp1+=20;
} while(--_temp2);
}

but the former construct would be prone to exit the loop early if an overflow
would occur later, while the latter construct would behave as "expected"
unless something unusual happens in the __temp1+=20 line.
Andrew Haley
2017-04-24 09:04:27 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Andrew Haley
extern int flonger(int n);
void poo(int m) {
for (int i = 0; i < m; i += 2)
flonger(i * 10);
}
Do you want a warning for every such loop? The compiler doesn't
necessarily keep track of all the places that it assumes signed
overflow does not occur.
It might be useful to point out why it should matter
A really perspicaceous reader could run the program through GCC with
and without -fwrapv, look at the generated code, and hopefully become
enlightened.

Andrew.
James Harris
2017-04-24 08:22:58 UTC
Permalink
Raw Message
Post by Andrew Haley
Post by James Harris
What gets me is that the compiler is (potentially) altering the
behaviour of the code without issuing a warning. That's true even with
-Wall -Wextra.
I found that one has to add -Wstrict-overflow=5 to get the compiler to
warn about the signed-overflow assumption its optimiser is making.
$ cc iter-test.c -Wall -Wextra -Wstrict-overflow=5 -O2
iter-test.c:3:5: warning: assuming signed overflow does not occur when
reducing constant in comparison [-Wstrict-overflow]
for (i = 0; i < i + 1; i++)
^
One to add to my list of compiler switches to use. But doing that has
1. How many more similar switches would be needed to catch all the other
forms of undefined behaviour that the compiler is able to recognise?
It's impossible at compile time without triggering a lot of false
positives. It's possible at runtime, but difficult.
Surely a compiler could report that the assumptions it was making at one
level of optimisation were different from those produced at a different
optimisation level.
Post by Andrew Haley
Post by James Harris
2. I think I would rather this and similar warnings were enabled by
default and needed a conscious decision to disable. Failing that, IMO it
would be OK to have a switch such as -Wundefined which turned on all
such checking but still allowed individual checks to be turned off.
There would be too much collateral damage.
extern int flonger(int n);
void poo(int m) {
for (int i = 0; i < m; i += 2)
flonger(i * 10);
}
Do you want a warning for every such loop? The compiler doesn't
necessarily keep track of all the places that it assumes signed
overflow does not occur.
I am not sure what signed-integer optimisation you see a compiler making
in that. Of course i * 10 may overflow but I don't see where a compiler
would make different assumptions at different optimisation levels.
--
James Harris
James R. Kuyper
2017-04-24 16:05:40 UTC
Permalink
Raw Message
Post by James Harris
Post by Andrew Haley
Post by James Harris
What gets me is that the compiler is (potentially) altering the
behaviour of the code without issuing a warning. That's true even with
-Wall -Wextra.
I found that one has to add -Wstrict-overflow=5 to get the compiler to
warn about the signed-overflow assumption its optimiser is making.
$ cc iter-test.c -Wall -Wextra -Wstrict-overflow=5 -O2
iter-test.c:3:5: warning: assuming signed overflow does not occur when
reducing constant in comparison [-Wstrict-overflow]
for (i = 0; i < i + 1; i++)
^
One to add to my list of compiler switches to use. But doing that has
1. How many more similar switches would be needed to catch all the other
forms of undefined behaviour that the compiler is able to recognise?
It's impossible at compile time without triggering a lot of false
positives. It's possible at runtime, but difficult.
Surely a compiler could report that the assumptions it was making at one
level of optimisation were different from those produced at a different
optimisation level.
So, whenever I choose -O3, I should see a message from the compiler
saying "Selection of -O3 enables several thousand assumptions that are
not made at -O2. For a complete list, see the following website:"?

Keep in mind that the committee often chooses to make violation of a
particular rule "undefined behavior" rather than a "constraint
violation" precisely because it's not always feasible to identify such
violations. Making it a "constraint violation" would make detection of
the violation mandatory in all cases. I agree that a good compiler
should warn about the cases that are easy to detect, but they are not
required to do so, and you shouldn't rely up violations always being
detectable.

Furthermore, a compiler does not necessarily collect the information
needed to identify which assumptions it's relying upon. All it needs to
do is identify when it's permissible to apply a given optimization. At
the point where it reaches that conclusion, there's lots of different
possibilities for the context of that conclusion. To get a little more
specific, let's assume that, when a particular optimization has been
determined to be feasible, the possible contexts for that optimization
can be divided into 16 different categories, with each category having a
different reason why the optimization is permissible. It might be the
case that for contexts which fall into categories 5, 7, 8 and 9, the
optimization is permissible because the behavior is undefined. In
categories 5 and 8, the optimization might change the observable
behavior of the code. However, the optimizer doesn't know which of the
16 categories applies - the designer only knew that it's in a situation
where those 16 categories are the only ones that could apply. So the
compiler doesn't know whether or not it's relying upon the assumption
that the behavior is defined, but it is still justified in performing
the optimization.
Stefan Ram
2017-04-24 20:35:06 UTC
Permalink
Raw Message
Post by James R. Kuyper
Keep in mind that the committee often chooses to make violation of a
particular rule "undefined behavior" rather than a "constraint
violation" precisely because it's not always feasible to identify such
violations.
This is also confirmed by the rationale itself:

»Undefined behavior gives the implementor license not to
catch certain program errors that are difficult to diagnose.«.

Rationale for International Standard
- Programming Languages - C, Revision 5.10, April 2003

(However, when an implementation exploits UB by generating
special code, then the implementation should usually /have/
identified the UB.)
James R. Kuyper
2017-04-24 21:06:56 UTC
Permalink
Raw Message
Post by Stefan Ram
Post by James R. Kuyper
Keep in mind that the committee often chooses to make violation of a
particular rule "undefined behavior" rather than a "constraint
violation" precisely because it's not always feasible to identify such
violations.
»Undefined behavior gives the implementor license not to
catch certain program errors that are difficult to diagnose.«.
Rationale for International Standard
- Programming Languages - C, Revision 5.10, April 2003
(However, when an implementation exploits UB by generating
special code, then the implementation should usually /have/
identified the UB.)
The point of my message was to remind people that this is NOT
necessarily the case - you can exploit UB by having a set A of
conditions that are sufficient to ensure that the optimization is valid,
and an additional set of conditions B that are consistent with A, and
render the behavior undefined. When A and B are both true, the
optimization is legal because the behavior is undefined. When A is true
and B is not true, the optimization is still legal, for different
reasons. All the compiler needs to determine is whether A is true, it
doesn't need to determine whether B applies, and therefore need NOT be
aware that the behavior is undefined.

In this case "signed_integer_expression < signed_integer_expression +
positive_constant_expression", where the two signed integer expressions
are guaranteed to have the same value, is condition A. That condition is
sufficient to justify the optimization of replacing that comparison
expression with 1. If signed_integer_expression +
positive_constant_expression overflows, that's condition B, and the
optimization is justified because the behavior is undefined. If it
doesn't overflow, the optimization is justified by the ordinary rules of
C arithmetic.

An implementation therefore never needs to check to whether overflow is
guaranteed to occur (as it is in this case), impossible (as it might be
with adequate error checking), or indeterminable (as it would be if
signed_integer has a value that comes from outside the same translation
unit. But it doesn't need to perform such a check - it's permitted to
perform the optimization regardless of which of the three applies.
That's good, because in general, determining which of the three applies
can be proven equivalent to solving the halting problem.
s***@casperkitty.com
2017-04-24 22:06:44 UTC
Permalink
Raw Message
Post by James R. Kuyper
Post by Stefan Ram
(However, when an implementation exploits UB by generating
special code, then the implementation should usually /have/
identified the UB.)
In this case "signed_integer_expression < signed_integer_expression +
positive_constant_expression", where the two signed integer expressions
are guaranteed to have the same value, is condition A. That condition is
sufficient to justify the optimization of replacing that comparison
expression with 1. If signed_integer_expression +
positive_constant_expression overflows, that's condition B, and the
optimization is justified because the behavior is undefined. If it
doesn't overflow, the optimization is justified by the ordinary rules of
C arithmetic.
In this kind of situation, as in many others, most of the optimizations
which would be facilitated by treating some action as UB could be
facilitated just as well by specifying a behavioral model which allows an
implementation broad latitude with regard to behavior, but still keeps it
within certain bounds. Such a behavioral model would fit well with many
application requirements, which require that a program behave in precisely-
defined fashion when given valid input, and behave in constrained, but not
precisely-defined, fashion when given invalid input. The C Standard
attempts to define a concept of "analyzability" that distinguishes between
"critical" and non-critical forms of UB. Unfortunately, it is specified so
loosely as to be practically useless unless implementers can be relied upon
to supplement it with common sense.
Richard Bos
2017-04-22 21:31:01 UTC
Permalink
Raw Message
Post by Andrew Cooper
Overflow of signed integers is UB. Therefore "i < i + 1" can
legitimately be simplified to true.
Use -fwrapv if you want GCC/Clang to strictly consider signed integer
overflow to have defined twos-compliment behaviour.
Or, by preference, don't, and rather write code which doesn't rely on
undefined behaviour in the first place.

Richard
Richard Bos
2017-04-22 09:29:11 UTC
Permalink
Raw Message
Post by James Harris
I saw this posted elsewhere. The gcc optimiser evidently changes the
behaviour of the code relative to its non-optimised effect.
int main(int argc, char **argv) {
int i;
for (i = 0; i < i + 1; i++)
;
return 0;
}
That's not a compiler failure - the code is broken. Signed integer
overflow has undefined behaviour, and therefore, gcc is allowed to
assume that it doesn't ever happen. It's allowed to do so because, when
it _does_ happen, any result is allowed, including pretending that it
didn't happen.

Try changing the int to an unsigned int and run your tests again. If it
optimises the check into non-existence again, _then_ gcc is at fault.

Richard
bartc
2017-04-22 09:47:16 UTC
Permalink
Raw Message
Post by Richard Bos
Post by James Harris
I saw this posted elsewhere. The gcc optimiser evidently changes the
behaviour of the code relative to its non-optimised effect.
int main(int argc, char **argv) {
int i;
for (i = 0; i < i + 1; i++)
;
return 0;
}
That's not a compiler failure - the code is broken. Signed integer
overflow has undefined behaviour, and therefore, gcc is allowed to
assume that it doesn't ever happen. It's allowed to do so because, when
it _does_ happen, any result is allowed, including pretending that it
didn't happen.
Nevertheless, it desirable to have consistent behaviour across
optimisation levels.

I changed the code a little to this:

int main(void) {
int i,m;
m=0x7fffffff;
for (i = 0; i < m + 1; i++)
;
return 0;
}

Now, with -O0, it terminates immediately. With -O3, it runs forever.
Those two behaviours couldn't be more contrasting!

It gets worse; change it to:

int main(void) {
int i,m;
m=fn();
for (i = 0; i < m + 1; i++)
;
return 0;
}

where fn() is defined in a separate module so the compiler of this
module can't see it. Now, both -O0 and -O3 compilations terminate
immediately.

The behaviour of the -O3-compiled code has changed dramatically just by
a minor rearrangement of the code, the sort that happens all the time in
a real program.
--
bartc
James Harris
2017-04-22 10:36:38 UTC
Permalink
Raw Message
Post by bartc
Post by Richard Bos
Post by James Harris
I saw this posted elsewhere. The gcc optimiser evidently changes the
behaviour of the code relative to its non-optimised effect.
int main(int argc, char **argv) {
int i;
for (i = 0; i < i + 1; i++)
;
return 0;
}
That's not a compiler failure - the code is broken. Signed integer
overflow has undefined behaviour, and therefore, gcc is allowed to
assume that it doesn't ever happen. It's allowed to do so because, when
it _does_ happen, any result is allowed, including pretending that it
didn't happen.
Nevertheless, it desirable to have consistent behaviour across
optimisation levels.
int main(void) {
int i,m;
m=0x7fffffff;
for (i = 0; i < m + 1; i++)
;
return 0;
}
Now, with -O0, it terminates immediately. With -O3, it runs forever.
Those two behaviours couldn't be more contrasting!
int main(void) {
int i,m;
m=fn();
for (i = 0; i < m + 1; i++)
;
return 0;
}
where fn() is defined in a separate module so the compiler of this
module can't see it. Now, both -O0 and -O3 compilations terminate
immediately.
The behaviour of the -O3-compiled code has changed dramatically just by
a minor rearrangement of the code, the sort that happens all the time in
a real program.
That's my gripe, too. I accept that C specifies cases of UB, understand
why it does so, and welcome the fact that such things are programmer
responsibility. What I find surprising, though, is that the compiler
generates code which behaves differently _without warning_ by default.

IMO disable-able warnings would be better.
--
James Harris
Kenny McCormack
2017-04-22 13:50:41 UTC
Permalink
Raw Message
In article <odfbi9$b0c$***@dont-email.me>,
James Harris <***@gmail.com> wrote:
...
Post by James Harris
That's my gripe, too. I accept that C specifies cases of UB, understand
why it does so, and welcome the fact that such things are programmer
responsibility. What I find surprising, though, is that the compiler
generates code which behaves differently _without warning_ by default.
The point here is that it is a "Doctor, it hurts when I do this" sort of
thing. By specifying optimization (via -O<anything>), you are telling the
compiler to assume that your code is all correct and to optimize
accordingly. If you didn't want that, you'd not put in -Ox.

Now, here's the thing - and I accept that this is controversial and against
the grain of this newsgroup - but the way it *should* work is that when
don't specify any optimization (i.e., -O0), the code should be compiled "as
is" - i.e., what you see is what you get. Then, if you specify that
optimization be done, then you have to accept that the compiler may
"distort" your code - as they say, you asked for it, you got it.

Unfortunately, the C standards (the Bible of this NG) make no such
guarantees. The compiler is free to do whatever it likes with your code,
even when compiled with -O0. That's the flaw.

P.S. Again, I'm fully aware that the above is controversial and I fully
expect a slew of responses telling me I'm wrong. Unless you really like
hearing the sound of your own voice, don't bother.
--
Trump has normalized hate.

The media has normalized Trump.
bartc
2017-04-22 14:35:59 UTC
Permalink
Raw Message
Post by Kenny McCormack
...
Post by James Harris
That's my gripe, too. I accept that C specifies cases of UB, understand
why it does so, and welcome the fact that such things are programmer
responsibility. What I find surprising, though, is that the compiler
generates code which behaves differently _without warning_ by default.
The point here is that it is a "Doctor, it hurts when I do this" sort of
thing. By specifying optimization (via -O<anything>), you are telling the
compiler to assume that your code is all correct and to optimize
accordingly. If you didn't want that, you'd not put in -Ox.
Now, here's the thing - and I accept that this is controversial and against
the grain of this newsgroup - but the way it *should* work is that when
don't specify any optimization (i.e., -O0), the code should be compiled "as
is" - i.e., what you see is what you get. Then, if you specify that
optimization be done, then you have to accept that the compiler may
"distort" your code - as they say, you asked for it, you got it.
I expect it to do the exactly same thing, only faster (assuming this is
optimising for speed).

I also expect, with gcc, for it not to do something at all, but this is
mainly a nuisance when trying to compare performance with another
compiler, option, language or algorithm.
Post by Kenny McCormack
Unfortunately, the C standards (the Bible of this NG) make no such
guarantees. The compiler is free to do whatever it likes with your code,
even when compiled with -O0. That's the flaw.
C has decided that overflow of signed types is undefined behaviour. In
reality, in most sane systems, it is quite predictable: unsigned types
just wrap around. The same underlying machine operations are often used
for both signed and unsigned.

Presumably some rare machines are different, hence the UB, even on
machines where overflow behaviour is predictable. But it means the
compiler doing unexpected things. If I write:

int a,b,c; // file scope non-static
....
a < b+c;

where b and c may have been set up elsewhere, the compiler can't do
much. If b+c does overflow, and the code would have a certain behaviour
(which may be desired, maybe not).

But as soon as the compiler gets some extra information about the values
of b and c, then it will pounce on that, and generate different code
when optimising. And with different behaviour, not just faster.
--
bartc
James Harris
2017-04-24 08:36:45 UTC
Permalink
Raw Message
Post by bartc
Post by Kenny McCormack
...
Post by James Harris
That's my gripe, too. I accept that C specifies cases of UB, understand
why it does so, and welcome the fact that such things are programmer
responsibility. What I find surprising, though, is that the compiler
generates code which behaves differently _without warning_ by default.
The point here is that it is a "Doctor, it hurts when I do this" sort of
thing. By specifying optimization (via -O<anything>), you are telling the
compiler to assume that your code is all correct and to optimize
accordingly. If you didn't want that, you'd not put in -Ox.
Now, here's the thing - and I accept that this is controversial and against
the grain of this newsgroup - but the way it *should* work is that when
don't specify any optimization (i.e., -O0), the code should be compiled "as
is" - i.e., what you see is what you get. Then, if you specify that
optimization be done, then you have to accept that the compiler may
"distort" your code - as they say, you asked for it, you got it.
I expect it to do the exactly same thing, only faster (assuming this is
optimising for speed).
My view is similar to yours but not identical. I accept that a compiler
can achieve greater speed by making certain assumptions. What I would
want, though, is the compiler at least to say that it was doing
something which /could/ alter the behaviour.
--
James Harris
Keith Thompson
2017-04-24 15:37:09 UTC
Permalink
Raw Message
James Harris <***@gmail.com> writes:
[...]
Post by James Harris
My view is similar to yours but not identical. I accept that a compiler
can achieve greater speed by making certain assumptions. What I would
want, though, is the compiler at least to say that it was doing
something which /could/ alter the behaviour.
The problem is that, as I understand it, it's impractical to do that.

I read an article a while ago from the authors of clang and/or llvm
discussing the issue. I'll post a link when/if I find it again.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Brown
2017-04-25 09:07:31 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by James Harris
My view is similar to yours but not identical. I accept that a compiler
can achieve greater speed by making certain assumptions. What I would
want, though, is the compiler at least to say that it was doing
something which /could/ alter the behaviour.
The problem is that, as I understand it, it's impractical to do that.
I read an article a while ago from the authors of clang and/or llvm
discussing the issue. I'll post a link when/if I find it again.
Are you referring to these? If I'm right, it will save you a little
research effort - and if I'm wrong, these are useful articles anyway.


<http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html>
<http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html>
<http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html>
Keith Thompson
2017-04-25 18:57:11 UTC
Permalink
Raw Message
Post by David Brown
Post by Keith Thompson
[...]
Post by James Harris
My view is similar to yours but not identical. I accept that a compiler
can achieve greater speed by making certain assumptions. What I would
want, though, is the compiler at least to say that it was doing
something which /could/ alter the behaviour.
The problem is that, as I understand it, it's impractical to do that.
I read an article a while ago from the authors of clang and/or llvm
discussing the issue. I'll post a link when/if I find it again.
Are you referring to these? If I'm right, it will save you a little
research effort - and if I'm wrong, these are useful articles anyway.
<http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html>
<http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html>
<http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html>
Yes, I think that's it. The third one includes a section "Why can't you
warn when optimizing based on undefined behavior?". Thanks!
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
James Harris
2017-04-24 08:34:18 UTC
Permalink
Raw Message
Post by Kenny McCormack
...
Post by James Harris
That's my gripe, too. I accept that C specifies cases of UB, understand
why it does so, and welcome the fact that such things are programmer
responsibility. What I find surprising, though, is that the compiler
generates code which behaves differently _without warning_ by default.
The point here is that it is a "Doctor, it hurts when I do this" sort of
thing. By specifying optimization (via -O<anything>), you are telling the
compiler to assume that your code is all correct and to optimize
accordingly. If you didn't want that, you'd not put in -Ox.
I see your point: optimisations often involve assumptions. Optimisation
can, of course, just make compilation take longer, though.

Perhaps the best option would be a single switch to turn on warnings
about any assumptions an optimiser was making which were different from
those the unoptimised code would make, and to be able to turn such
warnings off individually. Then optimised code could be guaranteed to
use the same assumptions as unoptimised code unless the programmer got a
warning about it.

Of course, a simple warning about undefined behaviour could also be
useful - and might be better!
Post by Kenny McCormack
Now, here's the thing - and I accept that this is controversial and against
the grain of this newsgroup - but the way it *should* work is that when
don't specify any optimization (i.e., -O0), the code should be compiled "as
is" - i.e., what you see is what you get. Then, if you specify that
optimization be done, then you have to accept that the compiler may
"distort" your code - as they say, you asked for it, you got it.
Unfortunately, the C standards (the Bible of this NG) make no such
guarantees. The compiler is free to do whatever it likes with your code,
even when compiled with -O0. That's the flaw.
P.S. Again, I'm fully aware that the above is controversial and I fully
expect a slew of responses telling me I'm wrong. Unless you really like
hearing the sound of your own voice, don't bother.
--
James Harris
David Brown
2017-04-24 09:25:48 UTC
Permalink
Raw Message
Post by James Harris
Post by Kenny McCormack
...
Post by James Harris
That's my gripe, too. I accept that C specifies cases of UB, understand
why it does so, and welcome the fact that such things are programmer
responsibility. What I find surprising, though, is that the compiler
generates code which behaves differently _without warning_ by default.
The point here is that it is a "Doctor, it hurts when I do this" sort of
thing. By specifying optimization (via -O<anything>), you are telling the
compiler to assume that your code is all correct and to optimize
accordingly. If you didn't want that, you'd not put in -Ox.
I see your point: optimisations often involve assumptions. Optimisation
can, of course, just make compilation take longer, though.
Perhaps the best option would be a single switch to turn on warnings
about any assumptions an optimiser was making which were different from
those the unoptimised code would make, and to be able to turn such
warnings off individually. Then optimised code could be guaranteed to
use the same assumptions as unoptimised code unless the programmer got a
warning about it.
You have a fundamental misunderstanding about C here. Yes, compiler
optimisations can often be based on assumptions - such as the assumption
that signed integer overflow will not occur (or that you don't care what
happens if it /does/ occur). But no, these assumptions are /not/
different or "invalid" when optimisation is not enabled. It is /never/
valid in ordinary C to let signed integers overflow. The compiler can
/always/ assume that such overflow does not occur, and there is no way
to talk about the "expected behaviour" of the program when it has signed
overflow - optimised or not.

A compiler implements C. It does not have to try to guess what it
thinks the programmer might have meant to write - it takes what the
programmer /did/ write, and generates code on the assumption that the
programmer understands the language, and wrote what he meant to write.

When you give a compiler a program containing undefined behaviour, such
as signed integer overflow, the compiler can generate any code it wants.
Practical realities of compiler design will usually limit the generated
code to a few different options - such as giving a specific output, or
hanging in an infinite loop. But the compiler is also free to write
code that prints "Hello, world!" just because you ran the compiler on a
Sunday.

Having said that, it is of course always a useful thing when the
compiler can warn you about possible or definite mistakes in your code.
But remember that while it may be easy for a programmer to spot a
mistake in one sample of code, it is not necessarily easy to give clear
rules about finding such mistakes - and compiler writers need clear
rules to implement warning checks. No one wants a check that produces
lots of false positives on perfectly good code.
Post by James Harris
Of course, a simple warning about undefined behaviour could also be
useful - and might be better!
Tim Rentsch
2017-04-27 20:57:49 UTC
Permalink
Raw Message
Post by Kenny McCormack
...
That's my gripe, too. I accept that C specifies cases of UB, understand
why it does so, and welcome the fact that such things are programmer
responsibility. What I find surprising, though, is that the compiler
generates code which behaves differently _without warning_ by default.
The point here is that it is a "Doctor, it hurts when I do this" sort of
thing. By specifying optimization (via -O<anything>), you are telling the
compiler to assume that your code is all correct and to optimize
accordingly. If you didn't want that, you'd not put in -Ox.
Now, here's the thing - and I accept that this is controversial and against
the grain of this newsgroup - but the way it *should* work is that when
don't specify any optimization (i.e., -O0), the code should be compiled "as
is" - i.e., what you see is what you get. Then, if you specify that
optimization be done, then you have to accept that the compiler may
"distort" your code - as they say, you asked for it, you got it.
Unfortunately, the C standards (the Bible of this NG) make no such
guarantees. The compiler is free to do whatever it likes with your code,
even when compiled with -O0. That's the flaw.
The problem with this idea is not that it's wrong but that
it's unworkable. There just isn't a way to write such a
specification and have it satisfy all the other constraints
that the ISO C standard needs to satisfy.
Post by Kenny McCormack
P.S. Again, I'm fully aware that the above is controversial and I fully
expect a slew of responses telling me I'm wrong. Unless you really like
hearing the sound of your own voice, don't bother.
Generally I try to write my newsgroup postings for the benefit
of all potentially interested readers, not just the person I'm
responding to.
Tim Rentsch
2017-04-27 20:38:21 UTC
Permalink
Raw Message
[the presence of undefined behavior causes different results
when programs are compiled using different optimization levels]
That's my gripe, too. I accept that C specifies cases of UB,
understand why it does so, and welcome the fact that such things are
programmer responsibility. What I find surprising, though, is that the
compiler generates code which behaves differently _without warning_ by
default.
I understand the gripe, and to some extent agree with it.
It's important to keep in mind though that the gripe is
with the implementation, not with the C language.
James Harris
2017-04-28 21:38:08 UTC
Permalink
Raw Message
Post by Tim Rentsch
[the presence of undefined behavior causes different results
when programs are compiled using different optimization levels]
That's my gripe, too. I accept that C specifies cases of UB,
understand why it does so, and welcome the fact that such things are
programmer responsibility. What I find surprising, though, is that the
compiler generates code which behaves differently _without warning_ by
default.
I understand the gripe, and to some extent agree with it.
It's important to keep in mind though that the gripe is
with the implementation, not with the C language.
Yes, the 'gripe' was about gcc, not C.
--
James Harris
Stefan Ram
2017-04-22 13:34:15 UTC
Permalink
Raw Message
Post by bartc
Nevertheless, it desirable to have consistent behaviour across
optimisation levels.
Different optimization levels constitute different
implementations.

And it's no big surprise when the behavior of
a program with UB depends on the implementation.
Keith Thompson
2017-04-22 15:35:48 UTC
Permalink
Raw Message
bartc <***@freeuk.com> writes:
[...]
Post by bartc
int main(void) {
int i,m;
m=0x7fffffff;
for (i = 0; i < m + 1; i++)
;
return 0;
}
[...]

Assuming 32-bit int, 0x7fffffff is equal to INT_MAX.

What behavior did you expect when evaluating INT_MAX + 1?
(Any possible expectation is incorrect.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
bartc
2017-04-22 16:16:49 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by bartc
int main(void) {
int i,m;
m=0x7fffffff;
for (i = 0; i < m + 1; i++)
;
return 0;
}
[...]
Assuming 32-bit int, 0x7fffffff is equal to INT_MAX.
What behavior did you expect when evaluating INT_MAX + 1?
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.

This is the behaviour observed with Pelles C, lccwin, DMC and Tiny C,
both optimised and unoptimised, when there is that option. (And with my
compiler...)

It's also the behaviour with gcc -O0 and -O1.

It's the behaviour of gcc with any optimisation setting, when the
assignment to m with 0x7fffffff is hidden from it.

Only with gcc, only with -O2 or -O3, and only when it can /see/ that m
has the value 0x7fffffff, does it play its UB card and generate code
that loops forever.

If something had exactly this code in assembly:

mov [i], 0
mov [m], 0x7FFF_FFFF
load R1, [i]
load R2, [m]
add R2, 1
cmp R1, R2
jmp ge,.... # signed compare

and decided to port it C, then it would not work the same way if they
used gcc -O2 or gcc -O3.
Post by Keith Thompson
(Any possible expectation is incorrect.)
Only because the language has said it is incorrect for EVERY possible
implementation, even the ones where it is well-defined. And gcc takes
advantage of that.
--
bartc
bartc
2017-04-22 17:55:36 UTC
Permalink
Raw Message
Post by bartc
Post by Keith Thompson
What behavior did you expect when evaluating INT_MAX + 1?
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.
This is the behaviour observed with Pelles C, lccwin, DMC and Tiny C,
both optimised and unoptimised, when there is that option. (And with my
compiler...)
It's also the behaviour with gcc -O0 and -O1.
It's the behaviour of gcc with any optimisation setting, when the
assignment to m with 0x7fffffff is hidden from it.
Only with gcc, only with -O2 or -O3, and only when it can /see/ that m
has the value 0x7fffffff, does it play its UB card and generate code
that loops forever.
Above was tested with x64 and partly on x86 (32-bit). I've tested now on
ARM32 and the same thing: -O3 creates an infinite loop when the
initialisation of 'm' is not visible.

Be interesting if anyone had examples where the pattern of behaviour was
different.
--
bartc
Noob
2017-04-22 19:04:30 UTC
Permalink
Raw Message
Post by bartc
Post by Keith Thompson
What behavior did you expect when evaluating INT_MAX + 1?
I expect 0x7fffffff+1 to be equal to 0x80000000.
Then use the appropriate flag [diatribe redacted].

https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#index-fwrapv
Ben Bacarisse
2017-04-22 20:51:38 UTC
Permalink
Raw Message
Post by bartc
Post by Keith Thompson
[...]
Post by bartc
int main(void) {
int i,m;
m=0x7fffffff;
for (i = 0; i < m + 1; i++)
;
return 0;
}
[...]
Assuming 32-bit int, 0x7fffffff is equal to INT_MAX.
What behavior did you expect when evaluating INT_MAX + 1?
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally
equates to MIN_INT. Then I would expect 0<0x80000000 to be false so
that no iterations are performed.
This is not a clear explanation of what you expect because you are using
a C-like notation (0x80000000 etc) with (I think) a non-C meaning. On a
32-bit system, 0x7fffffff is a constant of type int with the value
2147483647. The expression 0x7fffffff+1 is of type int but has a value
(2147483648) that can't be represented in an int. The constant
0x80000000 does not have the same type -- it's an unsigned int -- but
it's value (2147483648) is what many people would expect 2147483647+1 to
be. Furthermore, no one who knows C should expect 0<0x80000000 to be
false.

Know I know what you mean, but that's because of having discussed things
with you before. If you want to discuss your non-C expectations, it
would be much clearer to use a notation that did no carry a meaning you
don't want. For example, I might show fixed-width bit patterns in hex
like this: |7F FF FF FF| + 1 => |80 00 00 00|. Now its clear that I
don't mean what C means by 0x80000000.

<snip>
--
Ben.
s***@casperkitty.com
2017-04-22 21:56:49 UTC
Permalink
Raw Message
Post by bartc
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.
Allowing a compiler to have INT_MAX+1 yield a value which may arbitrarily
behave as INT_MIN or as a number which is one larger than INT_MAX will
permit useful optimizations in many cases; if applying an (int) cast to
any value outside the range of "int" would force its truncation and sign
extension, then any semantics which could be achieved on a system that
always performs such truncation would still be achievable in code that
would generally be just as clear--if not clearer--than the original.

Given an expression like:

long long l = i1*i2;

requiring a compiler to sign-extend the bottom 32-bits of the product may
yield slower code than would result from just using a 64-bit product
directly. Likewise, given something like:

const int x = ..somevalue..;
const long long L = ..somevalue..;
for (i=0; i<100; i++)
{ ...
doSomethingWith(i*x+L);
}

it may be faster for a compiler to initialize a temporary variable to x+L and
add x once for each loop iteration, than to have to compute a 32-bit value for
i*x, sign-extend that, and add L.

Giving compilers the freedom to treat sub-expressions as longer types can
enable many useful optimizations; if code really wants values truncated to
the original type, it should explicitly specify it in a fashion analogous
to e.g. (float)(f+1.0f) > f. In the absence of a cast, a system which does
not promise to refrain from doing so could use extended precision when
evaluating the sub-expression f+1.0f and retain that precision into the
comparison. Adding the cast would force any extra precision to be discarded.
Post by bartc
Only because the language has said it is incorrect for EVERY possible
implementation, even the ones where it is well-defined. And gcc takes
advantage of that.
Where does the language say it's incorrect for every possible implementation?
It says it's "non-portable OR erroneous", but a statement "X is Y or Z"
does not in any way, shape, or form imply that X is Z except in cases where
X isn't Y.

If one recognizes that the Standard makes no attempt to mandate that any
implementation be suitable for any particular purpose, the fact that the
Standard would not regard a behavior as making an implementation non-
conforming does not imply any judgment on the part of the Committee that
the behavior would not make the implementation unsuitable for some or
many purposes.
Noob
2017-04-30 14:49:59 UTC
Permalink
Raw Message
Post by bartc
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.
Let's consider a slightly different situation.

int foo(int n) {
return n * 2 / 2;
}

Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.

Yet, if a non-optimizing compiler (e.g. gcc -O0) compiles the
code literally, and the user calls foo with x = 1_500_000_000,
then the actual result would be -647483648.

I suppose the compiler could warn: "Assuming n * 2 never overflows",
but one would get such a warning for *every* multiplication. I'd
wager that warning would not be very popular, and many would
disable it.

Regards.
Ben Bacarisse
2017-04-30 15:02:45 UTC
Permalink
Raw Message
Post by Noob
Post by bartc
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.
Let's consider a slightly different situation.
int foo(int n) {
return n * 2 / 2;
}
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
Yes, I would! It's a small point but if, for example, x is double
then foo(x) is not the identity function and can't be replaced by x.

<snip>
--
Ben.
Robert Wessel
2017-04-30 16:51:58 UTC
Permalink
Raw Message
On Sun, 30 Apr 2017 16:02:45 +0100, Ben Bacarisse
Post by Ben Bacarisse
Post by Noob
Post by bartc
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.
Let's consider a slightly different situation.
int foo(int n) {
return n * 2 / 2;
}
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
Yes, I would! It's a small point but if, for example, x is double
then foo(x) is not the identity function and can't be replaced by x.
Perhaps a clearer, and more accurate, statement would be that we'd
expect an optimizing compiler to replace the call to foo() with
whatever the conversion of the parameter to int is.
Noob
2017-04-30 19:42:43 UTC
Permalink
Raw Message
Post by Ben Bacarisse
Post by Noob
Post by bartc
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.
Let's consider a slightly different situation.
int foo(int n) {
return n * 2 / 2;
}
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
Yes, I would! It's a small point but if, for example, x is double
then foo(x) is not the identity function and can't be replaced by x.
Nice nit. Allow me to rephrase.

An optimizing compiler would replace foo(x) with (int)x

Regards.
bartc
2017-04-30 16:35:39 UTC
Permalink
Raw Message
Post by Noob
Post by bartc
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.
Let's consider a slightly different situation.
int foo(int n) {
return n * 2 / 2;
}
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
Only if it does the same when it's unsigned int.

I see the above operation as a left shift followed by a right shift.

With unsigned, this has would have no effect when the top bit is 0 (as a
zero will be shifted back in).

With signed, I think it has no effect when the top two bits are the same.
Post by Noob
Yet, if a non-optimizing compiler (e.g. gcc -O0) compiles the
code literally, and the user calls foo with x = 1_500_000_000,
then the actual result would be -647483648.
If I do the same with unsigned x=3 billion, and using an unsigned
version of foo, then the result will be 85251635, whether optimised or not.
--
bartc
Robert Wessel
2017-04-30 16:57:42 UTC
Permalink
Raw Message
Post by bartc
Post by Noob
Post by bartc
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.
Let's consider a slightly different situation.
int foo(int n) {
return n * 2 / 2;
}
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
Only if it does the same when it's unsigned int.
That makes no sense.
Post by bartc
I see the above operation as a left shift followed by a right shift.
That's not how multiplication and division are defined in C.
Post by bartc
With unsigned, this has would have no effect when the top bit is 0 (as a
zero will be shifted back in).
With signed, I think it has no effect when the top two bits are the same.
Post by Noob
Yet, if a non-optimizing compiler (e.g. gcc -O0) compiles the
code literally, and the user calls foo with x = 1_500_000_000,
then the actual result would be -647483648.
If I do the same with unsigned x=3 billion, and using an unsigned
version of foo, then the result will be 85251635, whether optimised or not.
While true, that completely misses the point. The OP used a signed
integer for a reason. It's also different if the function took a
float (and assuming IEEE math), but again, he didn't do that. This is
an example of where the compiler taking advantage of undefined
behavior (signed integer overflow), can remove "undefined" behavior
when optimization is *enabled*. Leading to even more difficulty about
warning of such a thing.
bartc
2017-04-30 17:44:48 UTC
Permalink
Raw Message
Post by Robert Wessel
Post by bartc
Post by Noob
int foo(int n) {
return n * 2 / 2;
}
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
Only if it does the same when it's unsigned int.
That makes no sense.
If you saw this anywhere else:

f(x) = x*2/2

You might be justified in calling f an identity function, whether x was
integer or float.

It's harder to justify it within the context of a programming language
where x has a fixed range and untidy things can happen at the extremes
of the range.

But if you're going to do so, you don't want to make exceptions
according to some subtle difference in representation.

I wouldn't, myself, call foo() an identity function, as I'd prefer the
expression (n*2)/2 to be evaluated as it is. However, with n*(2/2), you
can probably eliminate the multiply without much controversy.
Post by Robert Wessel
Post by bartc
I see the above operation as a left shift followed by a right shift.
That's not how multiplication and division are defined in C.
It's more than likely how they would be implemented (for powers of two
anyway!). And it's how multiplying and dividing by powers of ten is done
in pen and paper calculations for decimal (add zeros or move the decimal
point).
Post by Robert Wessel
While true, that completely misses the point. The OP used a signed
integer for a reason.
Actually, I think most people type 'int', even when unsigned would do,
because it's quicker than either typing 'unsigned int', or having to
commit to a width, and typing 'uint32_t' followed by checking that
stdint.h has been included.
--
bartc
David Kleinecke
2017-04-30 18:34:20 UTC
Permalink
Raw Message
Post by bartc
Post by Robert Wessel
Post by bartc
Post by Noob
int foo(int n) {
return n * 2 / 2;
}
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
Only if it does the same when it's unsigned int.
That makes no sense.
f(x) = x*2/2
You might be justified in calling f an identity function, whether x was
integer or float.
It's harder to justify it within the context of a programming language
where x has a fixed range and untidy things can happen at the extremes
of the range.
But if you're going to do so, you don't want to make exceptions
according to some subtle difference in representation.
I wouldn't, myself, call foo() an identity function, as I'd prefer the
expression (n*2)/2 to be evaluated as it is. However, with n*(2/2), you
can probably eliminate the multiply without much controversy.
Post by Robert Wessel
Post by bartc
I see the above operation as a left shift followed by a right shift.
That's not how multiplication and division are defined in C.
It's more than likely how they would be implemented (for powers of two
anyway!). And it's how multiplying and dividing by powers of ten is done
in pen and paper calculations for decimal (add zeros or move the decimal
point).
Post by Robert Wessel
While true, that completely misses the point. The OP used a signed
integer for a reason.
Actually, I think most people type 'int', even when unsigned would do,
because it's quicker than either typing 'unsigned int', or having to
commit to a width, and typing 'uint32_t' followed by checking that
stdint.h has been included.
If one is thinking calculation signed int is more natural
but unsigned is more natural in (I believe) most uses of C.
But I fear it is far too late to make "int" mean "unsigned
int". I usually cheat by typedeffing "Int" as "unsigned int"
and using Int the "default" type.

PS: I have a general practice of making all type names
camel case - Bool, Long, LongLong, Void, etc. So sometimes
I let Int be signed and Unt be unsigned integers.
Keith Thompson
2017-04-30 19:03:43 UTC
Permalink
Raw Message
David Kleinecke <***@gmail.com> writes:
[...]
Post by David Kleinecke
If one is thinking calculation signed int is more natural
but unsigned is more natural in (I believe) most uses of C.
But I fear it is far too late to make "int" mean "unsigned
int". I usually cheat by typedeffing "Int" as "unsigned int"
and using Int the "default" type.
Please never ask me to read your code.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Kleinecke
2017-04-30 22:30:10 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by David Kleinecke
If one is thinking calculation signed int is more natural
but unsigned is more natural in (I believe) most uses of C.
But I fear it is far too late to make "int" mean "unsigned
int". I usually cheat by typedeffing "Int" as "unsigned int"
and using Int the "default" type.
Please never ask me to read your code.
I may some day post some code. Feel free to ignore it.
David Brown
2017-04-30 19:18:13 UTC
Permalink
Raw Message
Post by David Kleinecke
Post by bartc
Actually, I think most people type 'int', even when unsigned would do,
because it's quicker than either typing 'unsigned int', or having to
commit to a width, and typing 'uint32_t' followed by checking that
stdint.h has been included.
If one is thinking calculation signed int is more natural
but unsigned is more natural in (I believe) most uses of C.
But I fear it is far too late to make "int" mean "unsigned
int". I usually cheat by typedeffing "Int" as "unsigned int"
and using Int the "default" type.
That strikes me as an extraordinarily bad idea. And I have seen many
bad ideas in C programming.

I use a lot more unsigned types in my coding than is common in C - often
they are a more natural choice in low-level embedded programming. But I
write "uint32_t", or sometimes "unsigned int". Then anyone reading the
code can understand exactly what the types are.

If you call your type "Int", anyone looking at your code will first be
confused as to why there is a capital there. Then they will assume it
is an "int", and perhaps you normally code in Pascal rather than C. I
doubt if anyone would consider it likely that you meant an "unsigned
int" - it would be viewed as so unlikely that many would not bother to
check.
Post by David Kleinecke
PS: I have a general practice of making all type names
camel case - Bool, Long, LongLong, Void, etc. So sometimes
I let Int be signed and Unt be unsigned integers.
So sometimes you use Int to mean "signed int", and sometimes you use it
to mean "unsigned int". I thought it would be hard to come up with a
worse idea than using "Int" for "unsigned int" - but it seems I
underestimated you!

If a type has a standard name, use it. If you have some serious
aversion to writing bool, long, etc., then at the very least make sure
that Bool, Long, etc., are defined exactly as the normal lower case
versions.
David Kleinecke
2017-04-30 22:33:33 UTC
Permalink
Raw Message
Post by David Brown
Post by David Kleinecke
Post by bartc
Actually, I think most people type 'int', even when unsigned would do,
because it's quicker than either typing 'unsigned int', or having to
commit to a width, and typing 'uint32_t' followed by checking that
stdint.h has been included.
If one is thinking calculation signed int is more natural
but unsigned is more natural in (I believe) most uses of C.
But I fear it is far too late to make "int" mean "unsigned
int". I usually cheat by typedeffing "Int" as "unsigned int"
and using Int the "default" type.
That strikes me as an extraordinarily bad idea. And I have seen many
bad ideas in C programming.
I use a lot more unsigned types in my coding than is common in C - often
they are a more natural choice in low-level embedded programming. But I
write "uint32_t", or sometimes "unsigned int". Then anyone reading the
code can understand exactly what the types are.
If you call your type "Int", anyone looking at your code will first be
confused as to why there is a capital there. Then they will assume it
is an "int", and perhaps you normally code in Pascal rather than C. I
doubt if anyone would consider it likely that you meant an "unsigned
int" - it would be viewed as so unlikely that many would not bother to
check.
Post by David Kleinecke
PS: I have a general practice of making all type names
camel case - Bool, Long, LongLong, Void, etc. So sometimes
I let Int be signed and Unt be unsigned integers.
So sometimes you use Int to mean "signed int", and sometimes you use it
to mean "unsigned int". I thought it would be hard to come up with a
worse idea than using "Int" for "unsigned int" - but it seems I
underestimated you!
If a type has a standard name, use it. If you have some serious
aversion to writing bool, long, etc., then at the very least make sure
that Bool, Long, etc., are defined exactly as the normal lower case
versions.
The kind of code I read and write rather rarely has any simple
types other than Bool and Void. Most of the types are typedefs
of structs.
ty
David Brown
2017-05-01 14:27:18 UTC
Permalink
Raw Message
Post by David Kleinecke
Post by David Brown
So sometimes you use Int to mean "signed int", and sometimes you use it
to mean "unsigned int". I thought it would be hard to come up with a
worse idea than using "Int" for "unsigned int" - but it seems I
underestimated you!
If a type has a standard name, use it. If you have some serious
aversion to writing bool, long, etc., then at the very least make sure
that Bool, Long, etc., are defined exactly as the normal lower case
versions.
The kind of code I read and write rather rarely has any simple
types other than Bool and Void. Most of the types are typedefs
of structs.
ty
It's fine to give your own types your own names. All I am saying is
don't give existing standard types new names that are confusing or
pointless, and don't pick names that are confusingly like existing
standard types.

s***@casperkitty.com
2017-05-01 00:28:37 UTC
Permalink
Raw Message
Post by Robert Wessel
While true, that completely misses the point. The OP used a signed
integer for a reason. It's also different if the function took a
float (and assuming IEEE math), but again, he didn't do that. This is
an example of where the compiler taking advantage of undefined
behavior (signed integer overflow), can remove "undefined" behavior
when optimization is *enabled*. Leading to even more difficulty about
warning of such a thing.
Allowing a compiler to substitute longer types for signed integers (as is
commonly done for floating-point types), or behave as though it is doing
so, will enable many useful optimizations. In floating-point types, such
extra precision may be suppressed by the use of explicit casts. Unless
FLT_EVAL_METHOD is defined to promise otherwise, a compiler given:

double d = float1 + float2;

would be allowed to either perform the computation on float values directly
and then convert the result to "double", or convert the operands to a longer
type, add them as that type, and then convert the result to double (if it
isn't already). If an implementation where to extend the freedom offered by
the Standard solely for that purpose and would evaluate all integer
operations other than division/remainder as though it computed the result
using mathematical integers and then truncated to an Unspecified number of
bits at least as large as the result type, that would allow for many useful
optimizations. If the implementation further promised that, as with
floating point values, an explicit cast would discard any precision beyond
that of the indicated type (so that just as e.g.

double d = (float)(float1 + float2);

would yield a value that was precisely representable as "float", the
expression

(int)(x+y) > z

would truncate the result f x+y into the range of "int" and then perform
the comparison, any program which could be written under "precise wrapping"
semantics could be written to be compatible with the optimization without
having to convert operands to unsigned and then back to int.

The purpose of a compiler should, fundamentally, be to produce executable
code that meets a programmer's requirements. If a programmer's requirements
would be met just as well by having a compiler evaluate x+y>z by mod-reducing
the sum before the comparison or by having it behave as though it evaluates
the expression using "extra bits", allowing a compiler to use whichever is
more efficient in any given context may yield a faster executable than would
be possible if the program had to rigidly specify one or the other.
Scott Lurndal
2017-05-01 13:12:25 UTC
Permalink
Raw Message
Post by Robert Wessel
While true, that completely misses the point. The OP used a signed
integer for a reason.
Actually, I suspect that the vast majority of C programmers
who use 'int' use it out of inertia, not because they've
considered carefully the domain. I frequently see int
used where the domain of the function is unsigned.
Noob
2017-04-30 20:23:28 UTC
Permalink
Raw Message
Post by bartc
Post by Noob
Post by bartc
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.
Let's consider a slightly different situation.
int foo(int n) {
return n * 2 / 2;
}
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
Only if it does the same when it's unsigned int.
Of course it doesn't, as you yourself point out below.
Post by bartc
I see the above operation as a left shift followed by a right shift.
With unsigned, this has would have no effect when the top bit is 0 (as a
zero will be shifted back in).
With signed, I think it has no effect when the top two bits are the same.
The foo operation on an unsigned int clears the top bit.

int imul(int n) { return n * 2 / 2; }
int umul(unsigned int n) { return n * 2 / 2; }

imul:
movl %edi, %eax
ret

umul:
movl %edi, %eax
andl $2147483647, %eax
ret

imul is the identity function, umul clears the top bit.
Post by bartc
Post by Noob
Yet, if a non-optimizing compiler (e.g. gcc -O0) compiles the
code literally, and the user calls foo with x = 1_500_000_000,
then the actual result would be -647483648.
If I do the same with unsigned x=3 billion, and using an unsigned
version of foo, then the result will be 85251635, whether optimised or not.
My point was showing one type of optimization enabled when
ignoring UB. Dealing with (signed) overflow or (unsigned)
wrap-around is a slightly different issue.

You often mention widening multiply; this is a code
pattern which GCC recognizes on supported platforms:

long long int lmul(int u, int v) { return (long long)u * v; }

lmul:
movslq %edi, %rax
movslq %esi, %rsi
imulq %rsi, %rax
ret

And the same is possible for 64x64 -> 128 multiply.

Regards.
s***@casperkitty.com
2017-04-30 17:08:56 UTC
Permalink
Raw Message
Post by Noob
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
That would depend upon what, if anything, the compiler documents about the
behavior of integer overflow. If the compiler documents that overflow will
truncate and sign-extend the result (possibly as a consequence of certain
compiler flags), then such behavior would be defective. Likewise if it
documents that it will trap on all overflows *without regard for whether they
would affect the numerical correctness of the result*.

If the compiler documents that integer overflows may, at the compiler's
discretion, behave as though performed with arithmetically-correct values
and then truncated to any arbitrary size which is at least as large as the
types involved, but that it will have no other side-effect (IMHO such a mode
would offer the best blend of optimization and safety for most purposes) or
that overflows would trap any time they would result in a computation that
would differ from using types that could hold arbitrarily-large numbers (the
mode that would offer the best safety for most purposes while still allowing
many useful optimizations), it could elide the function above since either
documented behavior would allow the function to behave as though it used an
integer type large enough to hold the result.
Noob
2017-04-30 20:32:50 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Noob
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
That would depend upon what, if anything, the compiler documents about the
behavior of integer overflow.
Are you confusing IDB and UB?

Why would any compiler document the behavior of signed overflow?
Are you thinking of some extension?

Regards.
Keith Thompson
2017-04-30 20:59:23 UTC
Permalink
Raw Message
Post by Noob
Post by s***@casperkitty.com
Post by Noob
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
That would depend upon what, if anything, the compiler documents about the
behavior of integer overflow.
Are you confusing IDB and UB?
I obviously can't speak for supercat, but I doubt it.
Post by Noob
Why would any compiler document the behavior of signed overflow?
Are you thinking of some extension?
A implementation is permitted to document the behavior of constructs
that have undefined behavior according to the C standard.
"gcc -fwrapv" is one implementation that does so for signed
integer overflow ("gcc -fwrapv" can be thought of as a distinct
implementation from gcc without "-fwrapv"). I wouldn't say it's
necessary to think of this as an "extension", since the behavior
is consistent with what the standard requires.

(Nothing in the standard requires an implementation to obey its own
documentation, except arguably for documentation that's required
by the standard, so if "gcc -fwrapv" didn't work as specified it
would be a bug in gcc, but not a conformance violation.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-05-01 00:47:53 UTC
Permalink
Raw Message
Post by Noob
Why would any compiler document the behavior of signed overflow?
Are you thinking of some extension?
Because, among other things, it may sometimes be necessary to perform
computations that may or may not be meaningful, before one knows whether
they are meaningful or not. It may also sometimes be easier to perform
a bunch of computations, some of which will be meaningful and some not,
than to avoid performing the meaningless ones.

In many cases, even very loose guarantees about overflow behavior--ones
which could on many implementations be upheld at zero cost outside of rare
or contrived situations--can avoid the need to have expensive code avoid
overflows in computations whose results will end up being essentially
ignored anyway. Why shouldn't a quality compiler offer up such guarantees?
j***@gmail.com
2017-05-01 04:49:23 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Noob
Why would any compiler document the behavior of signed overflow?
Are you thinking of some extension?
Because, among other things, it may sometimes be necessary to perform
computations that may or may not be meaningful, before one knows whether
they are meaningful or not. It may also sometimes be easier to perform
a bunch of computations, some of which will be meaningful and some not,
than to avoid performing the meaningless ones.
In many cases, even very loose guarantees about overflow behavior--ones
which could on many implementations be upheld at zero cost outside of rare
or contrived situations--can avoid the need to have expensive code avoid
overflows in computations whose results will end up being essentially
ignored anyway. Why shouldn't a quality compiler offer up such guarantees?
Because C still has the cachet of being an assembler for a virtual portable
run-time.

So to speak.

Which clarifies in my mind the arguments relative to optimization and
undefined behavior (and implementation-defined).

Optimization necessarily transforms a program and induces semantics beyond
what is written.

I'm sure there will be some argument about this, but we have to remember
that the closest thing to a bug-free program that can possibly be written
is a single no-op.

and

#include <stdlib.h>
int main( int argc, char **argv )
{ return EXIT_SUCCESS; }

isn't really quite a no-op in this sense, although it's about as close to
a bug-free program as you can write in C.

This kind of speaks to why I think there should be multiple languages under
the "C" umbrella (separate from C++), and standards for each.

Or, perhaps, the standard could be broken down into modules, and a compiler
could claim support for specific modules of the standard, and maybe even
provide #pragmas to turn specific modules on and off.

Or is this already being done and I am just unaware of it?

--
Joel Rees

http://reiisi.blogspot.com
Keith Thompson
2017-04-30 18:24:47 UTC
Permalink
Raw Message
Noob <***@127.0.0.1> writes:
[...]
Post by Noob
Let's consider a slightly different situation.
int foo(int n) {
return n * 2 / 2;
}
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
I agree that an optimizing compiler (actually any compiler) *could*
do that, given that x is of type int. But there is no requirement
to do so.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Tim Rentsch
2017-04-30 19:45:22 UTC
Permalink
Raw Message
Post by Noob
I expect 0x7fffffff+1 to be equal to 0x80000000. Which normally equates
to MIN_INT. Then I would expect 0<0x80000000 to be false so that no
iterations are performed.
Let's consider a slightly different situation.
int foo(int n) {
return n * 2 / 2;
}
Does anyone disagree that an optimizing compiler would
determine that this is the identity function, and would
drop the calls altogether, by replacing foo(x) with x.
Certainly I wouldn't be surprised to see a compiler doing
that, for an argument of type int.
Post by Noob
Yet, if a non-optimizing compiler (e.g. gcc -O0) compiles the
code literally, and the user calls foo with x = 1_500_000_000,
then the actual result would be -647483648.
I suppose the compiler could warn: "Assuming n * 2 never overflows",
but one would get such a warning for *every* multiplication.
I think that's a straw man argument. If a compiler were going to
give some sort of warning along these lines, presumably the
warnings would be given /only/ when the assumption causes a code
transformation that would change the behavior relative to not
making the assumption. Most multiplications don't fall into that
category.
Andrey Tarasevich
2017-04-24 06:11:25 UTC
Permalink
Raw Message
Post by bartc
Post by Richard Bos
That's not a compiler failure - the code is broken. Signed integer
overflow has undefined behaviour, and therefore, gcc is allowed to
assume that it doesn't ever happen. It's allowed to do so because, when
it _does_ happen, any result is allowed, including pretending that it
didn't happen.
Nevertheless, it desirable to have consistent behaviour across
optimisation levels.
...
That makes no sense whatsoever.

Undefined behavior is unstable by definition. But abstract theoretical
considerations aside, optimization level would be one obvious
down-to-Earth practical factor that always can (and will) "destabilize"
manifestations of undefined behavior. After all, this is the very reason
we have undefined behavior in language spec.

So, no, when it comes to undefined behavior consistency across
optimization levels is not even remotely desirable.
--
Best regards,
Andrey Tarasevich
bartc
2017-04-24 09:46:44 UTC
Permalink
Raw Message
Post by Andrey Tarasevich
Post by bartc
Post by Richard Bos
That's not a compiler failure - the code is broken. Signed integer
overflow has undefined behaviour, and therefore, gcc is allowed to
assume that it doesn't ever happen. It's allowed to do so because, when
it _does_ happen, any result is allowed, including pretending that it
didn't happen.
Nevertheless, it desirable to have consistent behaviour across
optimisation levels.
...
That makes no sense whatsoever.
Undefined behavior is unstable by definition. But abstract theoretical
considerations aside, optimization level would be one obvious
down-to-Earth practical factor that always can (and will) "destabilize"
manifestations of undefined behavior. After all, this is the very reason
we have undefined behavior in language spec.
So, no, when it comes to undefined behavior consistency across
optimization levels is not even remotely desirable.
You write a program P and send it as C source code to someone else to
work on a certain platform.

You don't want to give instructions as to what compiler to use or what
optimisations they should be applying. Maybe it's a module that is to be
part of a bigger program, which uses its own uniform set of options.

This is why it must not matter what optimisations are applied.

Here, signed overflow /may/ be a bug in the program, or it may be by
design. If P is designed to run on the vast range of hardware where
signed overflows just wrap around, then it doesn't matter; the behaviour
will be consistent, for same width of 'int' at least.

Until C decides to intervene and decides it should matter, and changes
the behaviour.

(Take the 8-bit value 01111111 which is +127 when interpreted as signed.
Now increment it, which doesn't have signed/unsigned variations on most
hardware. You will get 10000000, which is -128 on two's complement
hardware (pretty much every machine). It just wraps.

Yet C has a problem with this.

And ironically, if you start with the 8-bit value 11111111, which is -1
signed, and increment that, you get zero (not unexpectedly), Yet here,
the hardware operation /will/ genuinely have overflowed - there will
have been a carry bit generated intended for the ninth bit of the result.)
--
Bartc
David Brown
2017-04-24 11:23:52 UTC
Permalink
Raw Message
Post by bartc
Post by Andrey Tarasevich
Post by bartc
Post by Richard Bos
That's not a compiler failure - the code is broken. Signed integer
overflow has undefined behaviour, and therefore, gcc is allowed to
assume that it doesn't ever happen. It's allowed to do so because, when
it _does_ happen, any result is allowed, including pretending that it
didn't happen.
Nevertheless, it desirable to have consistent behaviour across
optimisation levels.
...
That makes no sense whatsoever.
Undefined behavior is unstable by definition. But abstract theoretical
considerations aside, optimization level would be one obvious
down-to-Earth practical factor that always can (and will) "destabilize"
manifestations of undefined behavior. After all, this is the very reason
we have undefined behavior in language spec.
So, no, when it comes to undefined behavior consistency across
optimization levels is not even remotely desirable.
You write a program P and send it as C source code to someone else to
work on a certain platform.
You don't want to give instructions as to what compiler to use or what
optimisations they should be applying. Maybe it's a module that is to be
part of a bigger program, which uses its own uniform set of options.
This is why it must not matter what optimisations are applied.
Agreed. (Well, I agree that it generally should not matter what
optimisations are applied as far as the correctness of the code is
concerned. And the ability to distribute the C source without detailed
compilation requirements is one of many reasons for that.)
Post by bartc
Here, signed overflow /may/ be a bug in the program, or it may be by
design.
If it is by design, then the designer is buggy :-)

In C, signed overflow is /always/ a mistake. There are no circumstances
in standard C programming in which it is not an error.

You may have reason to program in an extended dialect of C in which you
want signed overflow to have a specific behaviour - typically two's
complement wraparound, but possibly saturating arithmetic or run-time
traps or signals. That's also fine - but then you are not coding in
standard C. You are coding in "gcc C with -fwrapv" or something
similar. And you need to specify such details with your code, because
you are /not/ distributing "C source code" any more.
Post by bartc
If P is designed to run on the vast range of hardware where
signed overflows just wrap around, then it doesn't matter; the behaviour
will be consistent, for same width of 'int' at least.
That argument would have merit if you were talking about a language that
defines signed integer arithmetic in terms of the underlying hardware.
C does not do that.

Different languages have different ways of defining (or leaving
undefined) the behaviour of impossible or meaningless actions. If we
assume 16-bit int (because the numbers are easier), and have x = 32767,
then what should be the result of adding 1 to x? Standard mathematics
says it should be 32768, and anything else is wrong. Typical processors
say it should be -32768, because that is the easiest in the hardware
design. Some algorithms would much prefer 32767, because saturation is
"less wrong" than anything else. Some algorithms would prefer a NaN
type indicator, or jumping to some error routine.

C says "Adding 1 to x here is meaningless. There is no behaviour which
can be called 'correct'. So we don't give it a meaning. The compiler
can assume it never happens, if that lets it generate more efficient
code for correct cases".
Post by bartc
Until C decides to intervene and decides it should matter, and changes
the behaviour.
You have got that ass-backwards. C decides /not/ to intervene, decides
it does /not/ matter, ans says there is no behaviour to change.
Post by bartc
(Take the 8-bit value 01111111 which is +127 when interpreted as signed.
Now increment it, which doesn't have signed/unsigned variations on most
hardware. You will get 10000000, which is -128 on two's complement
hardware (pretty much every machine). It just wraps.
Yet C has a problem with this.
No, C has no problem with this - /you/, and a few other equally mistaken
programmers, have problems with it. You appear to believe that C is not
a programming language, but merely a kind of high level assembly - and
you are wrong. C is not, and never has been, a high level assembler or
semi-portable assembler.
Post by bartc
And ironically, if you start with the 8-bit value 11111111, which is -1
signed, and increment that, you get zero (not unexpectedly), Yet here,
the hardware operation /will/ genuinely have overflowed - there will
have been a carry bit generated intended for the ninth bit of the result.)
The hardware details are irrelevant (which is good, because you are
wrong to generalise - on some systems, the extra carry bit is never
generated as there is no logic for it at all. And on many other
processors, it is usually immediately ignored and not available to the
programmer).
bartc
2017-04-24 14:38:53 UTC
Permalink
Raw Message
Post by David Brown
Different languages have different ways of defining (or leaving
undefined) the behaviour of impossible or meaningless actions. If we
assume 16-bit int (because the numbers are easier), and have x = 32767,
then what should be the result of adding 1 to x? Standard mathematics
says it should be 32768, and anything else is wrong. Typical processors
say it should be -32768, because that is the easiest in the hardware
design. Some algorithms would much prefer 32767, because saturation is
"less wrong" than anything else. Some algorithms would prefer a NaN
type indicator, or jumping to some error routine.
C says "Adding 1 to x here is meaningless. There is no behaviour which
can be called 'correct'. So we don't give it a meaning. The compiler
can assume it never happens, if that lets it generate more efficient
code for correct cases".
You can apply the same argument to unsigned. With the same 16 bits,
what's the result of adding 1 to 65535? Mathematics says it should be
65536. Typical processors will give 0, as that is simplest. Some
algorithms prefer NaN, etc.

It's exactly the same issue!

The only difference might be that unsigned representation is more
widespread than two's complement. Is that the reason why C says it's a
no-no?

0.1% of machines don't use two's complement, therefore the other 99.9%
aren't allowed to assume it. Furthermore, if they do, then some C
compilers will endeavour to break your code.
Post by David Brown
Post by bartc
Yet C has a problem with this.
No, C has no problem with this - /you/, and a few other equally mistaken
programmers, have problems with it. You appear to believe that C is not
a programming language, but merely a kind of high level assembly - and
you are wrong. C is not, and never has been, a high level assembler or
semi-portable assembler.
That's a problem with C then.

Someone writes a program in X where signed overflow has a behaviour
consistent with two's complement. And they want to run it on machine Y
which uses two's complement. So far, so good.

But if they need to pass it through C, in order for X (which might just
be pseudo-code in the programmer's mind) to be translated into a form
that runs on Y, then the problems start.
--
bartc
David Brown
2017-04-24 15:31:05 UTC
Permalink
Raw Message
Post by bartc
Post by David Brown
Different languages have different ways of defining (or leaving
undefined) the behaviour of impossible or meaningless actions. If we
assume 16-bit int (because the numbers are easier), and have x = 32767,
then what should be the result of adding 1 to x? Standard mathematics
says it should be 32768, and anything else is wrong. Typical processors
say it should be -32768, because that is the easiest in the hardware
design. Some algorithms would much prefer 32767, because saturation is
"less wrong" than anything else. Some algorithms would prefer a NaN
type indicator, or jumping to some error routine.
C says "Adding 1 to x here is meaningless. There is no behaviour which
can be called 'correct'. So we don't give it a meaning. The compiler
can assume it never happens, if that lets it generate more efficient
code for correct cases".
You can apply the same argument to unsigned. With the same 16 bits,
what's the result of adding 1 to 65535? Mathematics says it should be
65536. Typical processors will give 0, as that is simplest. Some
algorithms prefer NaN, etc.
It's exactly the same issue!
You certainly could. The C language designers felt it was useful to
define unsigned arithmetic as modulo arithmetic.

It would be entirely reasonable for a language designer to say that
signed integer overflow used two's complement wraparound. But the point
is, that the /C/ designers did not do so - they specifically and
explicitly made it undefined behaviour. It would be entirely reasonable
for a language designer to make unsigned overflow undefined behaviour -
again, the /C/ designers did not do so. They could also have made it
implementation defined, but chose not to. (Though an implementation is
always free to define the behaviour if it wants to.)

When designing your own language, you can make the choices /you/ want
here. Some languages use the same system as C. Others make different
decisions. Given that /any/ behaviour on overflow (signed or unsigned)
is going to be wrong in many circumstances, but some behaviours might be
useful in some cases, a language designer has plenty of options here.
He picks one, documents it, and makes that the behaviour for his
language. For different languages, the "best" choice may well be different.

So the C language designers decided on undefined behaviour for signed
overflow, and wraparound behaviour for unsigned overflow. That gives
programmers a useful set of features, it gives compilers opportunities
for some good optimisations, and it can be easily implemented on a wide
range of hardware (which in the early days of C included some machines
that would have a lot of inconvenience with any of the possible choices).


I am not trying to say that making signed integer overflow undefined
behaviour is the "right" choice (though I agree with the C designers in
this case). I am just trying to explain to you that that is how C
works. If you want to program in C, or make C tools, then you need to
understand the language as it really is - you can't just pretend it is
something else or works in a different way, just because you might have
preferred some differences. You can make these changes when you are
inventing the Bart-C variation of C - but not when you are writing /real/ C.
Post by bartc
The only difference might be that unsigned representation is more
widespread than two's complement. Is that the reason why C says it's a
no-no?
That may well have been a major consideration of the original C language
designers.
Post by bartc
0.1% of machines don't use two's complement, therefore the other 99.9%
aren't allowed to assume it. Furthermore, if they do, then some C
compilers will endeavour to break your code.
You misunderstand. The compiler will not "break" your code if you
assume that signed integers use two's complement wrap-around. Your code
is /already/ broken. It is incorrect C code - it is a sequence of
letters and symbols without a valid definition in C. The compiler does
not "break" it by assuming your signed integers don't overflow - your
signed integers are not allowed to overflow in C. The compiler does not
produce "correct" or "working" code when the relevant optimisations are
disabled - your C code has /no/ defined behaviour, and therefore /no/
generated code can be said to be "correct". It is a case of "garbage
in, garbage out" - depending on compiler options, you may get different
varieties of garbage out.

As the father of computing, Charles Babbage, puts it:

On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into
the machine wrong figures, will the right answers come out?' I am not
able rightly to apprehend the kind of confusion of ideas that could
provoke such a question.
Post by bartc
Post by David Brown
Post by bartc
Yet C has a problem with this.
No, C has no problem with this - /you/, and a few other equally mistaken
programmers, have problems with it. You appear to believe that C is not
a programming language, but merely a kind of high level assembly - and
you are wrong. C is not, and never has been, a high level assembler or
semi-portable assembler.
That's a problem with C then.
No, it is a problem with /you/. You don't understand what C is - that
is /your/ problem.

You are not alone in having such problems with C - not everyone is able
to grasp how C works. But most people either learn C properly in the
end, or give up and either use a different programming language, or stop
programming. I cannot comprehend why you have such a stubborn
resistance to making this choice.
Post by bartc
Someone writes a program in X where signed overflow has a behaviour
consistent with two's complement. And they want to run it on machine Y
which uses two's complement. So far, so good.
That is fine, as long as "X" is not the C programming language (or any
other language that does not guarantee such overflow behaviour).
Post by bartc
But if they need to pass it through C, in order for X (which might just
be pseudo-code in the programmer's mind) to be translated into a form
that runs on Y, then the problems start.
It is not a problem in the slightest. If you are writing a translator
from language X in which signed overflow is defined, into C in which it
is not defined, then you have three choices:

1. Make it a requirement that the resulting C is treated as "C with
defined signed integer overflow" - insist on compilation with "-fwrapv"
flag in gcc and clang, and disallow compilers that do not have
/documented guarantees/ of this type of behaviour.

2. Make it a requirement that the resulting C is compiled on machines
with two's complement signed integers, and use unions or casts to
convert between signed integers and unsigned integers as necessary to
get the required behaviour. This will result in ugly C intermediary
code, but that should be of no relevance. It is likely that the code
will compile optimally on most compilers.

3. Generate the code in a manner that avoids signed integer overflow.
This might be a little fiddly, and certainly a bit messy in the
generated code - but it is intermediary code, and it is /allowed/ to be
messy.

What you don't get to do, is moan that C doesn't work the way you think
it ought to work, and claim it is C's problem. /You/ are defining
language X. /You/ are making the X-to-C compiler. It is /your/ problem.
Ben Bacarisse
2017-04-24 16:15:13 UTC
Permalink
Raw Message
Post by David Brown
Post by bartc
Post by David Brown
Different languages have different ways of defining (or leaving
undefined) the behaviour of impossible or meaningless actions. If we
assume 16-bit int (because the numbers are easier), and have x = 32767,
then what should be the result of adding 1 to x? Standard mathematics
says it should be 32768, and anything else is wrong. Typical processors
say it should be -32768, because that is the easiest in the hardware
design. Some algorithms would much prefer 32767, because saturation is
"less wrong" than anything else. Some algorithms would prefer a NaN
type indicator, or jumping to some error routine.
C says "Adding 1 to x here is meaningless. There is no behaviour which
can be called 'correct'. So we don't give it a meaning. The compiler
can assume it never happens, if that lets it generate more efficient
code for correct cases".
You can apply the same argument to unsigned.
Not in C. Unsigned arithmetic is defined. I think you meant you
*could* make the same argument about unsigned arithmetic by making it,
too, undefined on overflow. As things stand in C, you can't make the
same argument because the premise fails: there is behaviour that can be
called correct.

<snip>
Post by David Brown
It would be entirely reasonable for a language designer to say that
signed integer overflow used two's complement wraparound. But the point
is, that the /C/ designers did not do so - they specifically and
explicitly made it undefined behaviour. It would be entirely reasonable
for a language designer to make unsigned overflow undefined behaviour -
again, the /C/ designers did not do so. They could also have made it
implementation defined, but chose not to.
I think it was the committee that made it undefined (who, I suppose,
count as language designers too) but if we take K&R1 as the view of the
language designers, they chose to make it a slight variation on
"implementation defined":

"The action taken on overflow or underflow depends on the machine at
hand." (K&R 1st ed. page 38)

<snip>
--
Ben.
s***@casperkitty.com
2017-04-24 17:36:18 UTC
Permalink
Raw Message
Post by Ben Bacarisse
I think it was the committee that made it undefined (who, I suppose,
count as language designers too) but if we take K&R1 as the view of the
language designers, they chose to make it a slight variation on
"The action taken on overflow or underflow depends on the machine at
hand." (K&R 1st ed. page 38)
Note that Undefined Behavior explicitly lists as one of the possible
outcomes "behaving during translation or program execution in a documented
manner characteristic of the environment". The Standard does not specify
in what cases quality implementations *should* behave in a such a fashion,
but it would seem curious that the authors would have mentioned it if they
did not expect that many implementations would behave in such fashion. The
grammatical construction does not indicate whether the sentence should be
interpreted as:

- behaving in a fashion which an environment documents as a characteristic.

- behaving in a fashion which is documented by the implementation and
happens to be characteristic of the environment (which may or may not
document it).

Given the ambiguity, I would suggest that in cases where an environment
documents a characteristic behavior, any quality implementation whose
behavior would differ should document such difference.
Keith Thompson
2017-04-24 19:06:13 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Ben Bacarisse
I think it was the committee that made it undefined (who, I suppose,
count as language designers too) but if we take K&R1 as the view of the
language designers, they chose to make it a slight variation on
"The action taken on overflow or underflow depends on the machine at
hand." (K&R 1st ed. page 38)
And on page 185, in the "C Reference Manual" appendix:

The handling of overflow and divide check in expression evaluation
is machine-dependent. All existing implementations of C ignore
integer overflows; treatment of division by 0, and all
floating-point exceptions, varies between machines, and is usually
adjustable by a library function.

Which implies, I think, that INT_MAX + 1 should yield INT_MIN on a
machine with 2's-complement arithmetic (assuming those names have their
modern meanings). K&R1 says that the action depends on the machine, not
on the compiler. The statement is not very specific, so it would be
difficult to say that a compiler that generated code that behaves
differently was non-conforming, but that seems to have been the intent.

On the other hand, I might argue that the statement that "All existing
implementations of C ignore integer overflows" implies that some future
implementation might behave differently, perhaps treating signed integer
overflow as a fatal run-time error.
Post by s***@casperkitty.com
Note that Undefined Behavior explicitly lists as one of the possible
outcomes "behaving during translation or program execution in a documented
manner characteristic of the environment". The Standard does not specify
in what cases quality implementations *should* behave in a such a fashion,
but it would seem curious that the authors would have mentioned it if they
did not expect that many implementations would behave in such fashion. The
grammatical construction does not indicate whether the sentence should be
- behaving in a fashion which an environment documents as a characteristic.
- behaving in a fashion which is documented by the implementation and
happens to be characteristic of the environment (which may or may not
document it).
Given the ambiguity, I would suggest that in cases where an environment
documents a characteristic behavior, any quality implementation whose
behavior would differ should document such difference.
I would suggest that *the language changed*. The statement in K&R1
was superseded by ANSI C's statement that signed integer overflow
has undefined behavior. K&R1 did not have the modern concept of
"undefined behavior". This may have broken some existing code that
depended on 2's-complement wraparound. Them's the breaks.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-04-24 21:03:08 UTC
Permalink
Raw Message
Post by Keith Thompson
On the other hand, I might argue that the statement that "All existing
implementations of C ignore integer overflows" implies that some future
implementation might behave differently, perhaps treating signed integer
overflow as a fatal run-time error.
Indeed. I would expect that some compilers probably had such options even
in 1989, since there are purposes for which they would be useful. There is
a huge difference, though, between saying that implementations need not
behave in a particular fashion in cases where the implementer judges that
there are compelling reasons to do otherwise, and saying that code which
relies upon platform-specific behavior should be expected to work on
implementations where there is no particular reason why it shouldn't.
Post by Keith Thompson
I would suggest that *the language changed*. The statement in K&R1
was superseded by ANSI C's statement that signed integer overflow
has undefined behavior. K&R1 did not have the modern concept of
"undefined behavior". This may have broken some existing code that
depended on 2's-complement wraparound. Them's the breaks.
The stated purpose of C89 was to be compatible with existing programs. In
cases where some platforms defined a consistent behavior, and some programs
relied upon it, I see no reason to believe that the authors of the Standard
meant to imply that quality general-purpose implementations for platforms
that define the behaviors should not continue to treat them as defined, nor
even that such code should be regarded as deprecated in cases where they
did not define a better alternative (the latter having happened with things
like variadic-argument processing). The notion that the authors of the
Standard meant to throw compatibility with such techniques out the window
is a much more recent invention, and contradicts the express stated purpose
of the Standard.
David Brown
2017-04-25 09:09:38 UTC
Permalink
Raw Message
Post by Ben Bacarisse
Post by David Brown
Post by bartc
Post by David Brown
Different languages have different ways of defining (or leaving
undefined) the behaviour of impossible or meaningless actions. If we
assume 16-bit int (because the numbers are easier), and have x = 32767,
then what should be the result of adding 1 to x? Standard mathematics
says it should be 32768, and anything else is wrong. Typical processors
say it should be -32768, because that is the easiest in the hardware
design. Some algorithms would much prefer 32767, because saturation is
"less wrong" than anything else. Some algorithms would prefer a NaN
type indicator, or jumping to some error routine.
C says "Adding 1 to x here is meaningless. There is no behaviour which
can be called 'correct'. So we don't give it a meaning. The compiler
can assume it never happens, if that lets it generate more efficient
code for correct cases".
You can apply the same argument to unsigned.
Not in C. Unsigned arithmetic is defined. I think you meant you
*could* make the same argument about unsigned arithmetic by making it,
too, undefined on overflow. As things stand in C, you can't make the
same argument because the premise fails: there is behaviour that can be
called correct.
<snip>
Post by David Brown
It would be entirely reasonable for a language designer to say that
signed integer overflow used two's complement wraparound. But the point
is, that the /C/ designers did not do so - they specifically and
explicitly made it undefined behaviour. It would be entirely reasonable
for a language designer to make unsigned overflow undefined behaviour -
again, the /C/ designers did not do so. They could also have made it
implementation defined, but chose not to.
I think it was the committee that made it undefined (who, I suppose,
count as language designers too) but if we take K&R1 as the view of the
language designers, they chose to make it a slight variation on
"The action taken on overflow or underflow depends on the machine at
hand." (K&R 1st ed. page 38)
<snip>
I don't know the early history of C as well as you (and a number of
other group regulars) do. I used the term "language designers" in a
rather general sense - including both the original K&R authors, and the
committee(s) through the years.
s***@casperkitty.com
2017-04-24 17:25:37 UTC
Permalink
Raw Message
Post by David Brown
You certainly could. The C language designers felt it was useful to
define unsigned arithmetic as modulo arithmetic.
It would be entirely reasonable for a language designer to say that
signed integer overflow used two's complement wraparound. But the point
is, that the /C/ designers did not do so - they specifically and
explicitly made it undefined behaviour. It would be entirely reasonable
for a language designer to make unsigned overflow undefined behaviour -
again, the /C/ designers did not do so. They could also have made it
implementation defined, but chose not to. (Though an implementation is
always free to define the behaviour if it wants to.)
Would it have made sense for implementations targeting hardware with
integer semantics other than silent-wraparound two's-complement to generate
whatever extra code was necessary to emulate such behavior? In most cases,
it probably wouldn't, and it would have been bad for the Standard to mandate
such behavior in cases where it wouldn't make sense.

Would it be useful for some implementations targeting hardware with silent-
wraparound integer semantics to offer modes which emulate other semantics
(e.g. deterministically force Abnornal Program Termination in case of
overflow)? I would say it would, and it is likewise useful that the Standard
allows such implementations.

Does either of the above imply that the authors of the Standard would not
have expected and intended that quality general-purpose implementations
targeting platforms with silent-wraparound hardware should allow programmers
to reap the the benefits of such hardware except when asked to do otherwise?
Jorgen Grahn
2017-04-27 05:51:02 UTC
Permalink
Raw Message
On Mon, 2017-04-24, bartc wrote:
...
Post by bartc
You can apply the same argument to unsigned. With the same 16 bits,
what's the result of adding 1 to 65535? Mathematics says it should be
65536.
Depends on what kind of mathematics. This kind says 0:

https://en.wikipedia.org/wiki/Group_%28mathematics%29#Modular_arithmetic

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Richard Bos
2017-05-01 09:08:36 UTC
Permalink
Raw Message
Post by Jorgen Grahn
...
Post by bartc
You can apply the same argument to unsigned. With the same 16 bits,
what's the result of adding 1 to 65535? Mathematics says it should be
65536.
https://en.wikipedia.org/wiki/Group_%28mathematics%29#Modular_arithmetic
And, crucially, that's the kind of arithmetic unsigned integers work in,
in C.

(Note also that the reason for the difference in treatment of signed and
unsigned integers is obvious: there is no equivalent for unsigned
representations of the distinction between 2'C, 1'C and S/M. The only
different kind of unsigned representation you _can_ reasonably have is
saturation; and that is unnatural and massively expensive to every kind
of machine representation, including even mechanical odometers.)

Richard
bartc
2017-05-01 09:48:05 UTC
Permalink
Raw Message
Post by Richard Bos
Post by Jorgen Grahn
...
Post by bartc
You can apply the same argument to unsigned. With the same 16 bits,
what's the result of adding 1 to 65535? Mathematics says it should be
65536.
https://en.wikipedia.org/wiki/Group_%28mathematics%29#Modular_arithmetic
And, crucially, that's the kind of arithmetic unsigned integers work in,
in C.
(Note also that the reason for the difference in treatment of signed and
unsigned integers is obvious: there is no equivalent for unsigned
representations of the distinction between 2'C, 1'C and S/M.
OK. Where can I buy a machine that uses ones' complement or signed
magnitude integers? Or uses padding bits?

Because I'd really like to try out my C code and see that all the fuss
about dealing with signed integers was worthwhile.
--
bartc
Ben Bacarisse
2017-05-01 10:39:42 UTC
Permalink
Raw Message
Post by bartc
Post by Richard Bos
Post by Jorgen Grahn
...
Post by bartc
You can apply the same argument to unsigned. With the same 16 bits,
what's the result of adding 1 to 65535? Mathematics says it should be
65536.
https://en.wikipedia.org/wiki/Group_%28mathematics%29#Modular_arithmetic
And, crucially, that's the kind of arithmetic unsigned integers work in,
in C.
(Note also that the reason for the difference in treatment of signed and
unsigned integers is obvious: there is no equivalent for unsigned
representations of the distinction between 2'C, 1'C and S/M.
OK. Where can I buy a machine that uses ones' complement or signed
magnitude integers? Or uses padding bits?
Padding bits are easy: just declare a _Bool object. I don't think
anyone makes sign and magnitude or ones' complement machines anymore.
You'd have to buy a used one.
Post by bartc
Because I'd really like to try out my C code and see that all the fuss
about dealing with signed integers was worthwhile.
What fuss? I've never been bothered by those representations. What is
it about them that bother you?
--
Ben.
Noob
2017-05-01 11:54:50 UTC
Permalink
Raw Message
Post by bartc
OK. Where can I buy a machine that uses ones' complement or signed
magnitude integers? Or uses padding bits?
You'll probably need a time machine. C was created in the early 70s.
And the first version of the standard was developed 30 years old.

https://en.wikipedia.org/wiki/Ones'_complement
https://en.wikipedia.org/wiki/UNIVAC_1100/2200_series#UNISYS_2200_series
Post by bartc
Many early computers, including the CDC 6600, the LINC, the PDP-1,
and the UNIVAC 1107, used ones' complement notation. Successors of
the CDC 6600 continued to use ones' complement until the late 1980s,
and the descendants of the UNIVAC 1107 (the UNIVAC 1100/2200 series)
still do, but the majority of modern computers use two's complement.
s***@casperkitty.com
2017-04-24 15:51:33 UTC
Permalink
Raw Message
Post by David Brown
In C, signed overflow is /always/ a mistake. There are no circumstances
in standard C programming in which it is not an error.
Was it a mistake when Dennis Ritchie exploited it in the printf routine
given in the 1974 C Reference Manual?
Post by David Brown
That argument would have merit if you were talking about a language that
defines signed integer arithmetic in terms of the underlying hardware.
C does not do that.
The C Standard *allows*, but does not require, implementations to define
things in terms of underlying platform behavior except when that would be
inconsistent with other requirements in the Standard. Implementations that
are designed for low-level programming will often do so.

When the Standard was written, there existed code that relied upon certain
behaviors like two's-complement wraparound, and there existed platforms
where guaranteeing that behavior would be expensive. The authors of the
Standard didn't say that implementations that do something other than two's-
complement are "broken", but nor did they say that there is anything "broken"
about programs that rely upon it, beyond the fact that such code will not be
portable to implementations that do something else.

The Standard uses the phrase "non-portable or erroneous" to describe UB. The
use of the conjunction "OR" allows for the possibility that the behavior may
be non-portable without being erroneous.

The rationale given for making short unsigned values makes explicit mention
of the fact that the majority of then-current implementations used silent
wraparound two's-complement semantics and mentions situations in which this
is useful. I find it curious that they would mention such situations if they
would regard as "broken" any code that would exploit them.
Keith Thompson
2017-04-24 19:13:12 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Brown
In C, signed overflow is /always/ a mistake. There are no circumstances
in standard C programming in which it is not an error.
Was it a mistake when Dennis Ritchie exploited it in the printf routine
given in the 1974 C Reference Manual?
No. C as it existed at that time did not have the concept of undefined
behavior. It was not what we now call "standard C".

For a modern implementation, it would not be a mistake if the author
knows how the implementation happens to behave; since printf is part of
the implementation, the only requirement is that it works as specified.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Richard Bos
2017-05-01 09:11:46 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Brown
In C, signed overflow is /always/ a mistake. There are no circumstances
in standard C programming in which it is not an error.
Was it a mistake when Dennis Ritchie exploited it in the printf routine
given in the 1974 C Reference Manual?
No, because printf() is an implementation library function, and the
implementation itself need not be written in Standard C in the first
place. Much of it cannot.

Richard
Tim Rentsch
2017-04-27 21:04:47 UTC
Permalink
Raw Message
Post by Andrey Tarasevich
Post by bartc
That's not a compiler failure - the code is broken. Signed integer
overflow has undefined behaviour, and therefore, gcc is allowed to
assume that it doesn't ever happen. It's allowed to do so because, when
it _does_ happen, any result is allowed, including pretending that it
didn't happen.
Nevertheless, it desirable to have consistent behaviour across
optimisation levels.
...
That makes no sense whatsoever.
Undefined behavior is unstable by definition. [...]
That isn't right. You might say undefined behavior
has the _potential_ to be unstable by definition, but
that's not the same thing. Kind of like weather: there
can be a 10% chance that it /will/ rain, but the chance
that it /might/ rain is always 100% (or maybe 0% on one
of the outer planets, but you get the idea).
James Harris
2017-04-22 09:57:44 UTC
Permalink
Raw Message
Post by Richard Bos
Post by James Harris
I saw this posted elsewhere. The gcc optimiser evidently changes the
behaviour of the code relative to its non-optimised effect.
int main(int argc, char **argv) {
int i;
for (i = 0; i < i + 1; i++)
;
return 0;
}
That's not a compiler failure - the code is broken. Signed integer
overflow has undefined behaviour, and therefore, gcc is allowed to
assume that it doesn't ever happen. It's allowed to do so because, when
it _does_ happen, any result is allowed, including pretending that it
didn't happen.
Try changing the int to an unsigned int and run your tests again. If it
optimises the check into non-existence again, _then_ gcc is at fault.
Tried that and the non-optimised and optimised do work the same way.

IMO gcc is still at fault (and I use the term for a preference, not a
technical failure) for not reporting, by default, that the optimiser was
making an assumption. It doesn't seem even to have a way to turn on
warnings for all assumptions that the optimiser is making. Maybe there
is and I've just not found it yet. But ATM it seems there are loads of
individual switches which must be supplied on the command line if you
want to catch all recognisable cases of the optimiser making assumptions
that could change code behaviour.
--
James Harris
Gareth Owen
2017-04-22 10:38:36 UTC
Permalink
Raw Message
Post by James Harris
IMO gcc is still at fault (and I use the term for a preference, not a
technical failure) for not reporting, by default, that the optimiser
was making an assumption.
It's not an optimiser assumption, its a contract the programmer
willingly enters into when they chose to write (correct) C.
Kenny McCormack
2017-04-22 13:43:58 UTC
Permalink
Raw Message
Post by Gareth Owen
Post by James Harris
IMO gcc is still at fault (and I use the term for a preference, not a
technical failure) for not reporting, by default, that the optimiser
was making an assumption.
It's not an optimiser assumption, its a contract the programmer
willingly enters into when they chose to write (correct) C.
That is a stupid thing to say. Quite in keeping with this newsgroup, but
utterly useless and insulting to the OP. Congrats.
--
Rich people pay Fox people to convince middle class people to blame poor people.

(John Fugelsang)
Gareth Owen
2017-04-22 17:22:21 UTC
Permalink
Raw Message
Post by Kenny McCormack
Post by Gareth Owen
Post by James Harris
IMO gcc is still at fault (and I use the term for a preference, not a
technical failure) for not reporting, by default, that the optimiser
was making an assumption.
It's not an optimiser assumption, its a contract the programmer
willingly enters into when they chose to write (correct) C.
That is a stupid thing to say
No it isn't. The assumption is that there's some integer for which
(i < i + 1)
is not true.

And in C, as in actual mathematics, that's a **wrong** assumption.

And UB is UB for the entire program. If you're unprepared to deal with
the consequences of that, then you're going to get confused.
Kenny McCormack
2017-04-22 17:48:03 UTC
Permalink
Raw Message
Post by Gareth Owen
Post by Kenny McCormack
Post by Gareth Owen
Post by James Harris
IMO gcc is still at fault (and I use the term for a preference, not a
technical failure) for not reporting, by default, that the optimiser
was making an assumption.
It's not an optimiser assumption, its a contract the programmer
willingly enters into when they chose to write (correct) C.
That is a stupid thing to say
No it isn't.
It is. You're just too stupid to realize it.

It's like telling a Trump voter he's stupid. He won't agree with you.
--
A 70 year old man who watches 6 hours of TV a day, plays a lot of golf
and seems to always be in Florida is a retiree, not a president.
s***@casperkitty.com
2017-04-22 21:24:04 UTC
Permalink
Raw Message
Post by Gareth Owen
It's not an optimiser assumption, its a contract the programmer
willingly enters into when they chose to write (correct) C.
Unlike most "standards", the C Standard does not define conformance of C
programs in a fashion that is useful for most purposes. It defines a
category of "strictly conforming" programs which can't do many of the
tasks required of many practical C programs, or "conforming" programs,
a term which is so broad as to be almost meaningless. If the above code
were targeted at a particular implementation that specified that automatic
variables will initially hold whatever bit pattern was in their stack slot
when the function was called, and that adding +1 to INT_MAX would yield
INT_MIN, then the code would be non-portable but not "erroneous".

For some reason, compiler writers seem to assume that when the Standard says
"non-portable or erroneous", it means "broken". The above code makes
assumptions about signed integer behavior which would limit its use to
compilers that perform integer operations in very "literal" fashion. In
many cases, compilers will be able to generate more efficient code if they
are allowed to behave as though integer expressions may be evaluated with
excess range and then truncated to the specified size (or not) at the
compiler's convenience.

There are three levels of behavioral guarantees typical compilers without
documented overflow traps may offer with regard to integer overflow:

1. They may guarantee that all integer expressions will be immediately
truncated to a value of the appropriate type, in a fashion analogous to
unsigned types.

2. They may guarantee that the expressions will behave like numbers whose
lower (type size) bits are correct, but which may behave as though they
have additional upper bits that might be kept or dropped at any point
at the compiler's leisure.

3. They may reserve the right to behave in any manner whatsoever, including
corrupting unrelated parts of the program.

The code as written requires a compiler that upholds guarantee #1. If the
expression were rewritten (int)(i+1) > i, and if a system guaranteed that
casting a to "int" an integer value beyond the range of that type will
chop off and sign extend the lower (int-size) bits, then it would work with
compilers that uphold either #1 or #2; it would also be more visually
obvious that the programmer is wanting the program to do something which
behaves differently from the normal rules of arithmetic.
j***@gmail.com
2017-04-27 10:03:14 UTC
Permalink
Raw Message
Post by Gareth Owen
Post by James Harris
IMO gcc is still at fault (and I use the term for a preference, not a
technical failure) for not reporting, by default, that the optimiser
was making an assumption.
It's not an optimiser assumption, its a contract the programmer
willingly enters into when they chose to write (correct) C.
I never signed it, and I've been programming in C for a pretty long time.

There is a difference between implementation defined and undefined.

--
Joel Rees
James Kuyper
2017-04-27 14:12:03 UTC
Permalink
Raw Message
Post by j***@gmail.com
Post by Gareth Owen
Post by James Harris
IMO gcc is still at fault (and I use the term for a preference, not a
technical failure) for not reporting, by default, that the optimiser
was making an assumption.
It's not an optimiser assumption, its a contract the programmer
willingly enters into when they chose to write (correct) C.
I never signed it, and I've been programming in C for a pretty long time.
You don't have to sign it, and you're therefore not legally bound by it.
Calling it a contract is just a metaphor. If you provide code that obeys
the rules imposed by the C standard, a conforming implementation of C
must produce behavior consistent with the requirements imposed by that
standard for the code. There's no requirement that you actually obey
those rules (and this code doesn't), nor is there any requirement that
any particular implementation of C conform to the standard.
Richard Bos
2017-04-22 21:48:53 UTC
Permalink
Raw Message
Post by James Harris
IMO gcc is still at fault (and I use the term for a preference, not a
technical failure) for not reporting, by default, that the optimiser was
making an assumption. It doesn't seem even to have a way to turn on
warnings for all assumptions that the optimiser is making.
Have you read the third part of the article Andrew Cooper posted? It
explains this quite clearly: it is often not even possible to do this
because so many of those assumptions interact, and where it _is_
possible, it almost certainly does not give you the clear messages you
expect. It really is much more complicated than you presume it to be.
Maybe in such simple demonstration code as you posted you can force a
meaning, but certainly in any realistic code base, you're trying to
combat the tide.

Richard
James Harris
2017-04-24 09:14:16 UTC
Permalink
Raw Message
Post by Richard Bos
Post by James Harris
IMO gcc is still at fault (and I use the term for a preference, not a
technical failure) for not reporting, by default, that the optimiser was
making an assumption. It doesn't seem even to have a way to turn on
warnings for all assumptions that the optimiser is making.
Have you read the third part of the article Andrew Cooper posted? It
explains this quite clearly: it is often not even possible to do this
because so many of those assumptions interact, and where it _is_
possible, it almost certainly does not give you the clear messages you
expect.
Well, the summary from part 3 is:

People often ask why the compiler doesn't produce warnings when it is
taking advantage of undefined behavior to do an optimization, since any
such case might actually be a bug in the user code. The challenges with
this approach are that it is 1) likely to generate far too many warnings
to be useful - because these optimizations kick in all the time when
there is no bug, 2) it is really tricky to generate these warnings only
when people want them, and 3) we have no good way to express (to the
user) how a series of optimizations combined to expose the opportunity
being optimized.

My response(s).

(1. Likely to generate too many warnings.) That is surely not a reason
to give a programmer the option. Similar could have been said of the
original C compiler but people liked "lint". Also, he says that the LLVM
IR doesn't carry enough info from the frontend. Of course, it could
carry more than it does. And it might be up to the C frontend to flag up
UB code. After all, the frontend know in C terms what is defined and
what is not.

(2. Tricky to generate only when wanted.) Simple: make the warnings
"turn offable". Then a programmer can choose a compilation run to
generate and investigate such warnings. That would/could restrict the
warnings to the specific TUs which were compiled with such warnings
enabled.

(3. No good way to report.) Not sure about this one. I don't think
anyone would envisage detailed reports such as the one he gives as an
example.
Post by Richard Bos
It really is much more complicated than you presume it to be.
Maybe in such simple demonstration code as you posted you can force a
meaning, but certainly in any realistic code base, you're trying to
combat the tide.
Are you saying that realistic code contains lots of fragments that will
lead to UB? That surprises me a little. I'm sure others have suggested
that programmers should avoid UB code.

Maybe it's personal preference. I find compiler warnings cut down
development time. Am quite happy to do more work and to go by the rules
if it helps produce robust code.
--
James Harris
bartc
2017-04-24 09:33:13 UTC
Permalink
Raw Message
Post by James Harris
Post by Richard Bos
It really is much more complicated than you presume it to be.
Maybe in such simple demonstration code as you posted you can force a
meaning, but certainly in any realistic code base, you're trying to
combat the tide.
Are you saying that realistic code contains lots of fragments that will
lead to UB? That surprises me a little. I'm sure others have suggested
that programmers should avoid UB code.
If A and B are of type int (they could be expressions), then any
operation like:

A + B

could have undefined behaviour, if the result would be greater than INT_MAX.

As I understand it anyway.

If a compiler gave a warning for all such possibilities, then every
program would be flooded with them.
Post by James Harris
Maybe it's personal preference. I find compiler warnings cut down
development time. Am quite happy to do more work and to go by the rules
if it helps produce robust code.
But sometimes you can't get rid of them. With one compiler, it warned
about unused labels, with no way to turn off the warning. But this was
with auto-generated code that /did/ produce lots of such labels. Nothing
wrong the code, as it would already have been verified.

Sometimes you evade a warning for one compiler, by doing something which
triggers one in another!
--
bartc
David Brown
2017-04-24 11:32:23 UTC
Permalink
Raw Message
Post by James Harris
Post by Richard Bos
Post by James Harris
IMO gcc is still at fault (and I use the term for a preference, not a
technical failure) for not reporting, by default, that the optimiser was
making an assumption. It doesn't seem even to have a way to turn on
warnings for all assumptions that the optimiser is making.
Have you read the third part of the article Andrew Cooper posted? It
explains this quite clearly: it is often not even possible to do this
because so many of those assumptions interact, and where it _is_
possible, it almost certainly does not give you the clear messages you
expect.
People often ask why the compiler doesn't produce warnings when it is
taking advantage of undefined behavior to do an optimization, since any
such case might actually be a bug in the user code. The challenges with
this approach are that it is 1) likely to generate far too many warnings
to be useful - because these optimizations kick in all the time when
there is no bug, 2) it is really tricky to generate these warnings only
when people want them, and 3) we have no good way to express (to the
user) how a series of optimizations combined to expose the opportunity
being optimized.
My response(s).
(1. Likely to generate too many warnings.) That is surely not a reason
to give a programmer the option. Similar could have been said of the
original C compiler but people liked "lint". Also, he says that the LLVM
IR doesn't carry enough info from the frontend. Of course, it could
carry more than it does. And it might be up to the C frontend to flag up
UB code. After all, the frontend know in C terms what is defined and
what is not.
(2. Tricky to generate only when wanted.) Simple: make the warnings
"turn offable". Then a programmer can choose a compilation run to
generate and investigate such warnings. That would/could restrict the
warnings to the specific TUs which were compiled with such warnings
enabled.
I think it has already been established that gcc at least /has/ a
configurable warning that will catch your example :

-Wstrict-overflow=5

Another warning that will catch this case is:

-Wsuggest-attribute=noreturn

(this is good for spotting accidental infinite loops)
Scott Lurndal
2017-04-24 13:47:19 UTC
Permalink
Raw Message
Post by James Harris
I saw this posted elsewhere. The gcc optimiser evidently changes the
behaviour of the code relative to its non-optimised effect.
int main(int argc, char **argv) {
int i;
for (i = 0; i < i + 1; i++)
;
return 0;
}
The compiler generates the following with -O2:

0000000000400400 <main>:
400400: eb fe jmp 400400 <main>
400402: 66 90 xchg %ax,%ax


The compile could just as easily elide the loop as it
the value of 'i' is never subsequently used
Noob
2017-04-26 06:53:05 UTC
Permalink
Raw Message
Post by James Harris
I saw this posted elsewhere. The gcc optimiser evidently changes the
behaviour of the code relative to its non-optimised effect.
int main(int argc, char **argv) {
int i;
for (i = 0; i < i + 1; i++)
;
return 0;
}
I asked a non-programmer friend what he thought of this pseudo-code:

int i <- 0;
while i < i+1
i <- i+1;

He replied:
"Your algorithm never terminates, the condition is always true."

So, you see, your expectations are "tainted" by your knowledge
of the underlying hardware.

Regards.
Loading...