Discussion:
Casts/conversions
(too old to reply)
James Harris
2017-07-04 19:08:20 UTC
Permalink
Raw Message
On 04/07/2017 17:03, James Harris wrote:

...

(Copying in comp.lang.c re. a potentially abstruse query about integer
promotions, below.)

Starting with c as a (signed) char value of -2.
So (unsigned char) c would presumably convert -2 into 0xfe and then
leave it for the integer promotions to convert to 0x000000fe.
What about (unsigned) c instead? Would that cause c to be promoted to
int and then made unsigned? If so, it would produce a very different -
and, in this case, wrong - result of 0xfffffffe.
Maybe (unsigned char) would be the best/right/only thing to do when
using a signed char as an index from the start of an array of 256 bytes.
Just tried it on gcc with the following code

#include <stdio.h>
int main(void) {
char c = -2;
printf("%08x, %08x, %08x\n", (int)c, (unsigned)c, (unsigned char)c);
}

And I'm glad to say it's borne out..! Output:

fffffffe, fffffffe, 000000fe

But what's happening here with the last (0xfe) output (which is the only
one which is as desired)? Could it be that rather than the char simply
being cast to an unsigned char the integer promotions come in
conceptually before and after casting, leading to the following?

1. The original signed char is promoted to signed int 0xfffffffe.

2. The cast is applied, making it an unsigned char with value 0xfe.

3. Unsigned char 0xfe is promoted to unsigned int 0x000000fe.

4. printf is called with that unsigned promoted value.

I am thinking that the promotions would be done theoretically, in the
type system of the compiler. They would not be machine-code conversion
steps.
--
James Harris
bartc
2017-07-04 19:39:57 UTC
Permalink
Raw Message
Post by James Harris
Just tried it on gcc with the following code
#include <stdio.h>
int main(void) {
char c = -2;
printf("%08x, %08x, %08x\n", (int)c, (unsigned)c, (unsigned char)c);
}
fffffffe, fffffffe, 000000fe
But what's happening here with the last (0xfe) output (which is the only
one which is as desired)? Could it be that rather than the char simply
being cast to an unsigned char the integer promotions come in
conceptually before and after casting, leading to the following?
Yes, I think that's what happens. A char type needs to be widened to
int-size before doing anything with it. And any cast that results in
something narrower than int always needs widening, or promoting.

(BTW you only get that output when plain char is signed.)
--
bartc
Ben Bacarisse
2017-07-05 00:37:31 UTC
Permalink
Raw Message
Post by bartc
Post by James Harris
Just tried it on gcc with the following code
#include <stdio.h>
int main(void) {
char c = -2;
printf("%08x, %08x, %08x\n", (int)c, (unsigned)c, (unsigned char)c);
}
fffffffe, fffffffe, 000000fe
But what's happening here with the last (0xfe) output (which is the
only one which is as desired)? Could it be that rather than the char
simply being cast to an unsigned char the integer promotions come in
conceptually before and after casting, leading to the following?
Yes, I think that's what happens.
No. The integer promotions are only applied in specific situations and
the operand of a cast operator is /not/ one of them. It makes no
difference here, but James is asking about what happens conceptually.
Here, the signed char value -2 is converted to unsigned char value 254
and that is the promoted to signed int value 254 as part of the default
argument promotions (because printf is variadic).
Post by bartc
A char type needs to be widened to int-size before doing anything with
it. And any cast that results in something narrower than int always
needs widening, or promoting.
That is very often what happens but you can't say always like that.
--
Ben.
Keith Thompson
2017-07-04 20:32:30 UTC
Permalink
Raw Message
Post by James Harris
...
(Copying in comp.lang.c re. a potentially abstruse query about integer
promotions, below.)
Starting with c as a (signed) char value of -2.
Note that plain "char" can be either signed or unsigned.
Post by James Harris
So (unsigned char) c would presumably convert -2 into 0xfe and then
leave it for the integer promotions to convert to 0x000000fe.
There are a couple of implicit assumptions here: that CHAR_BIT==8 and
that int is 32 bits, and that plain char is signed. Nothing wrong
with assumptions, but I prefer them to be explicit.

Note that 0xfe, 0x000000fe, and 254 all mean exactly the same thing.
Leading zeros in a hexadecimal constant are ignored. I understand
that the intent is that 0xfe is the value 254 stored in 8 bits,
and 0x000000fe is the value 254 stored in 32 bits. There's no
really good way to express that in C notation.
Post by James Harris
What about (unsigned) c instead? Would that cause c to be promoted to
int and then made unsigned? If so, it would produce a very different -
and, in this case, wrong - result of 0xfffffffe.
No. The integer promotions (such as char to int) are applied only
when the standard says they are. They're applied to the operands of
arithmetic operators, but not to the operand of a cast operator.
`(unsigned)c` converts the value of c (which is of type char)
to type unsigned.
Post by James Harris
Maybe (unsigned char) would be the best/right/only thing to do when
using a signed char as an index from the start of an array of 256 bytes.
Just tried it on gcc with the following code
#include <stdio.h>
int main(void) {
char c = -2;
printf("%08x, %08x, %08x\n", (int)c, (unsigned)c, (unsigned char)c);
}
fffffffe, fffffffe, 000000fe
But what's happening here with the last (0xfe) output (which is the only
one which is as desired)? Could it be that rather than the char simply
being cast to an unsigned char the integer promotions come in
conceptually before and after casting, leading to the following?
1. The original signed char is promoted to signed int 0xfffffffe.
No, it's left as a char with the value -2. (0xfe and -2 are distinct
values.)
Post by James Harris
2. The cast is applied, making it an unsigned char with value 0xfe.
Yes.
Post by James Harris
3. Unsigned char 0xfe is promoted to unsigned int 0x000000fe.
No, it's promoted to int. It's an argument to a variadic function.
It's promoted to int if int can hold all the values of its type (which
is the case here, given the assumptions given above); otherwise it's
promoted to unsigned int. The most common case for the latter is that
unsigned short is promoted to unsigned int *if* short and int have the
same size. (Slightly oversimplified, ignoring padding bits.)
Post by James Harris
4. printf is called with that unsigned promoted value.
printf is called with an int value, the result of promoting the unsigned
char value converted from the original char value.

Corresponding signed and unsigned types are interchangeable as function
arguments for values within the range of both.

The code in question, again, is:

char c = -2;
printf("%08x, %08x, %08x\n", (int)c, (unsigned)c, (unsigned char)c);

Since "%x" requires an argument of type unsigned, I'd just give it one.

printf("%08x\n", (unsigned)c);

though the other forms are useful for exploring what happens when you
use something of a different type.
Post by James Harris
I am thinking that the promotions would be done theoretically, in the
type system of the compiler. They would not be machine-code conversion
steps.
Yes, probably. The standard specifies the results of conversions, not
how they're performed. In many cases, integer conversions can be done
just by reinterpreting the representation. Conversions from one size to
another might involve truncation, zero-extension, or sign-extension.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
James Harris
2017-07-05 07:45:49 UTC
Permalink
Raw Message
...
Post by Keith Thompson
What about (unsigned) c instead? Would that cause c to be promoted to
int and then made unsigned? If so, it would produce a very different -
and, in this case, wrong - result of 0xfffffffe.
No. The integer promotions (such as char to int) are applied only
when the standard says they are.
OK.
Post by Keith Thompson
They're applied to the operands of
arithmetic operators, but not to the operand of a cast operator.
`(unsigned)c` converts the value of c (which is of type char)
to type unsigned.
But it's notable how it effects that conversion. It could go in one of
two ways

signed char --> unsigned char --> unsigned int
signed char --> signed int --> unsigned int

In other words, the cast has two ostensible changes to make to convert
from signed char to unsigned int. It could change to an unsigned form
first and then widen. Or it could widen first and then change to an
unsigned form. The results would be either of

0x000000fe
0xfffffffe

The standards docs I've looked at don't specify which path would be
taken - only that a conversion would be carried out. As it happens,
assuming integer promotion before the cast would explain the result from
gcc. I take your point that integer promotions only happen where
specified but then isn't there otherwise an ambiguity in the specs over
what signed char to unsigned int means?
--
James Harris
David Brown
2017-07-05 08:42:12 UTC
Permalink
Raw Message
Post by James Harris
...
Post by Keith Thompson
What about (unsigned) c instead? Would that cause c to be promoted to
int and then made unsigned? If so, it would produce a very different -
and, in this case, wrong - result of 0xfffffffe.
No. The integer promotions (such as char to int) are applied only
when the standard says they are.
OK.
Post by Keith Thompson
They're applied to the operands of
arithmetic operators, but not to the operand of a cast operator.
`(unsigned)c` converts the value of c (which is of type char)
to type unsigned.
But it's notable how it effects that conversion. It could go in one of
two ways
signed char --> unsigned char --> unsigned int
signed char --> signed int --> unsigned int
It is better to think of the conversions a one type to another type as
"value preserving" where possible, rather than imagining steps in
between. If you have a signed char with value -2, and convert it to
another type (either by standard promotion for arithmetic, by casts, or
by assignment to another type), then the language says the value should
be preserved.

The second rule you have to remember is that when converting to an
unsigned type, modulo arithmetic is used to put it into the correct
range. (When converting to a signed type, if the value is outside the
valid range of the new type, you have undefined behaviour.)

So if you have a signed char with value -2, and convert it to another
signed integer type, you keep the value -2.

If you have a signed char with value -2, and convert it to an unsigned
integer type, you keep the value -2. Since that is outside the valid
range for the target type, it is reduced by modular arithmetic to fit.
For an 8-bit unsigned type, this means 0xfe. For a 32-bit unsigned
type, it is 0xfffffffe.

If you have an unsigned char with value 0xfe, and convert it to a signed
integer type with a large enough range, it retains the value 0xfe (equal
to 254). If you try to convert it to a signed integer type without that
range, such as signed char (or plain char) on your system, it is
undefined behaviour. The compiler /might/ give you -2. But it might
also launch nasal daemons.

If you have an unsigned char with value 0xfe, and convert it to an
unsigned integer type, it retains the value 0xfe.


Does that help?
Post by James Harris
In other words, the cast has two ostensible changes to make to convert
from signed char to unsigned int. It could change to an unsigned form
first and then widen. Or it could widen first and then change to an
unsigned form. The results would be either of
0x000000fe
0xfffffffe
The standards docs I've looked at don't specify which path would be
taken - only that a conversion would be carried out. As it happens,
assuming integer promotion before the cast would explain the result from
gcc. I take your point that integer promotions only happen where
specified but then isn't there otherwise an ambiguity in the specs over
what signed char to unsigned int means?
James Harris
2017-07-05 14:23:19 UTC
Permalink
Raw Message
On 05/07/2017 09:42, David Brown wrote:

...
Post by David Brown
Does that help?
Things are clearer now, thanks. As you said in another post, working out
how to read the C reference manuals is a challenge. :-)
--
James Harris
David Brown
2017-07-05 15:23:05 UTC
Permalink
Raw Message
Post by James Harris
...
Post by David Brown
Does that help?
Things are clearer now, thanks. As you said in another post, working out
how to read the C reference manuals is a challenge. :-)
Understanding the /standards/ can be a challenge. There are reference
manuals that are better. As a general rule, the more technically
precise a reference is, the harder it is to understand - something that
gives a quick and simple overview or "rule of thumb" will not cover all
the details. But often that is enough.

My usual point of reference for the language and standard library is
<http://en.cppreference.com/w/c>, with this particular topic at
<http://en.cppreference.com/w/c/language/conversion>. The website is
quite accurate, as far as I have found, often has useful examples, is
clear about the differences in different standard versions, and has good
cross-referencing on its topics.
Ben Bacarisse
2017-07-05 10:34:49 UTC
Permalink
Raw Message
...
Post by Keith Thompson
What about (unsigned) c instead? Would that cause c to be promoted to
int and then made unsigned? If so, it would produce a very different -
and, in this case, wrong - result of 0xfffffffe.
No. The integer promotions (such as char to int) are applied only
when the standard says they are.
OK.
Post by Keith Thompson
They're applied to the operands of
arithmetic operators, but not to the operand of a cast operator.
`(unsigned)c` converts the value of c (which is of type char)
to type unsigned.
But it's notable how it effects that conversion. It could go in one of two ways
signed char --> unsigned char --> unsigned int
signed char --> signed int --> unsigned int
In the example one quoted above: (unsigned)c, there is only one
conversion from signed char to unsigned int.
In other words, the cast has two ostensible changes to make to convert
from signed char to unsigned int. It could change to an unsigned form
first and then widen. Or it could widen first and then change to an
unsigned form.
There is only one conversion. The value -2 is converted to unsigned int
by adding one more the UINT_MAX.
The results would be either of
0x000000fe
0xfffffffe
The first would be wrong.
The standards docs I've looked at don't specify which path would be
taken - only that a conversion would be carried out.
What documents have you been consulting? A cast specifies just one
conversion and the standard specifies exactly how that conversion is
done.
As it happens,
assuming integer promotion before the cast would explain the result
from gcc. I take your point that integer promotions only happen where
specified but then isn't there otherwise an ambiguity in the specs
over what signed char to unsigned int means?
I don't think so. But in cases like this it's better to very explicit
about the text. What did you read that suggests there is some
ambiguity? The cast (unsigned)c is a single conversion and that is done
according to 6.3.1.3 p2: "the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in
the new type until the value is in the range of the new type".
--
Ben.
James Harris
2017-07-05 13:56:43 UTC
Permalink
Raw Message
Post by Ben Bacarisse
...
Post by Keith Thompson
What about (unsigned) c instead? Would that cause c to be promoted to
int and then made unsigned? If so, it would produce a very different -
and, in this case, wrong - result of 0xfffffffe.
No. The integer promotions (such as char to int) are applied only
when the standard says they are.
OK.
Post by Keith Thompson
They're applied to the operands of
arithmetic operators, but not to the operand of a cast operator.
`(unsigned)c` converts the value of c (which is of type char)
to type unsigned.
But it's notable how it effects that conversion. It could go in one of two ways
signed char --> unsigned char --> unsigned int
signed char --> signed int --> unsigned int
In the example one quoted above: (unsigned)c, there is only one
conversion from signed char to unsigned int.
In other words, the cast has two ostensible changes to make to convert
from signed char to unsigned int. It could change to an unsigned form
first and then widen. Or it could widen first and then change to an
unsigned form.
There is only one conversion. The value -2 is converted to unsigned int
by adding one more the UINT_MAX.
OK.
Post by Ben Bacarisse
The results would be either of
0x000000fe
0xfffffffe
The first would be wrong.
The standards docs I've looked at don't specify which path would be
taken - only that a conversion would be carried out.
What documents have you been consulting? A cast specifies just one
conversion and the standard specifies exactly how that conversion is
done.
As an example, n1570, but evidently not in the right place. I had
checked sections like 6.3.1.8: Usual arithmetic conversions, 6.5.4: Cast
operators, notes on integer promotions, and others. They seemed
reasonable places to look but it turns out were not enough.
Post by Ben Bacarisse
As it happens,
assuming integer promotion before the cast would explain the result
from gcc. I take your point that integer promotions only happen where
specified but then isn't there otherwise an ambiguity in the specs
over what signed char to unsigned int means?
I don't think so. But in cases like this it's better to very explicit
about the text. What did you read that suggests there is some
ambiguity? The cast (unsigned)c is a single conversion and that is done
according to 6.3.1.3 p2: "the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in
the new type until the value is in the range of the new type".
Thanks, that explains it. Perhaps this points to what is possibly a
feature of the standard: forward-only references. For example, I had
looked in 6.5.4 Cast operators - a reasonable place to look, one would
think, to find out how cast operators acted. Yet the semantics of a cast
is explained only as the following:


5 Preceding an expression by a parenthesized type name converts the
value of the expression to the named type. This construction is called a
cast.104) A cast that specifies no conversion has no effect on the type
or value of an expression.

6 If the value of the expression is represented with greater range or
precision than required by the type named by the cast (6.3.1.8), then
the cast specifies a conversion even if the type of the expression is
the same as the named type and removes any extra range and precision.


It then gives forward references to things like equality operators
(6.5.9), and type names (6.7.7) but no backwards references to the
useful and relevant 6.3.1.3 conversion section. Any idea why? Are the
standards expected to be read like a book, from the beginning, rather
than them being a reference manual? Why do they not also have backward
references?
--
James Harris
Ben Bacarisse
2017-07-05 14:55:55 UTC
Permalink
Raw Message
James Harris <***@gmail.com> writes:
<snip>
Post by James Harris
Thanks, that explains it. Perhaps this points to what is possibly a
feature of the standard: forward-only references. For example, I had
looked in 6.5.4 Cast operators - a reasonable place to look, one would
think, to find out how cast operators acted. Yet the semantics of a
5 Preceding an expression by a parenthesized type name converts the
value of the expression to the named type. This construction is called
a cast.104) A cast that specifies no conversion has no effect on the
type or value of an expression.
6 If the value of the expression is represented with greater range or
precision than required by the type named by the cast (6.3.1.8), then
the cast specifies a conversion even if the type of the expression is
the same as the named type and removes any extra range and precision.
It then gives forward references to things like equality operators
(6.5.9), and type names (6.7.7) but no backwards references to the
useful and relevant 6.3.1.3 conversion section. Any idea why? Are the
standards expected to be read like a book, from the beginning, rather
than them being a reference manual? Why do they not also have backward
references?
I don't know why there are only forward references.

But in this case there is a whole section (6.3) on conversions because
they are so common in C. They occur in initialisation, assignment,
parameter passing and in many expressions, so it makes sense to cover
them early on. Having done that, the description of the cast operator
need only say it performs a conversion. Where would you look to find
out the details other than in the section on conversions? I agree that
it would make some sense for this to refer back to 6.3, but that back
reference would be very common. (A reference back to 6.3.1.6 would be
inappropriately specific.)
--
Ben.
James Kuyper
2017-07-04 22:05:05 UTC
Permalink
Raw Message
Post by James Harris
...
(Copying in comp.lang.c re. a potentially abstruse query about integer
promotions, below.)
Starting with c as a (signed) char value of -2.
So (unsigned char) c would presumably convert -2 into 0xfe and then
leave it for the integer promotions to convert to 0x000000fe.
Without the relevant context, I can't be sure, but I assume that you
have verified that char is signed on the implementation that you're using?
Post by James Harris
What about (unsigned) c instead? Would that cause c to be promoted to
int and then made unsigned? If so, it would produce a very different -
and, in this case, wrong - result of 0xfffffffe.
Maybe (unsigned char) would be the best/right/only thing to do when
using a signed char as an index from the start of an array of 256 bytes.
Just tried it on gcc with the following code
#include <stdio.h>
int main(void) {
To make your point somewhat more portably, it would be better to declare
this as "signed char c".
Post by James Harris
char c = -2;
This line involves a conversion from int to char. Assuming that char is
a signed type, it's guaranteed to be able to store that value, so c has
a value of -2.
Post by James Harris
printf("%08x, %08x, %08x\n", (int)c, (unsigned)c, (unsigned char)c);
(int)c has the value of c, converted to signed int, which is guaranteed
to be able to represent that value, so it remains unchanged, at -2.
(unsigned)c has the value (UINT_MAX+1)-2 == UINT_MAX-1. (unsigned char)c
would have the value (UCHAR_MAX+1)-2 == UCHAR_MAX-1.

The default argument promotions are applied to all arguments after the
format string. Those promotions leave the second and third arguments
unchanged. However, the fourth argument will be converted to int, except
in the unlikely case that UCHAR_MAX > INT_MAX, in which case it will be
converted to unsigned int; either way, the value will be unchanged by
the conversion: it will still be UCHAR_MAX-1.

The x format specifier expects a value of type unsigned int. The first
%x corresponds to an argument of type int, and on most systems, so will
the third %x. Such a type mis-match renders the behavior of your code
undefined. For positive values, unsigned int and int are required to
have the same representation, "which is intended to imply
interchangeability as arguments ...", but the value of the first
argument is negative, so that doesn't apply.

I'm not sure what you're trying to do, but printing an int value with a
%x specifier doesn't prove much of anything. Is there a format specifier
that takes an int value which would be suitable for making your point?
Post by James Harris
}
fffffffe, fffffffe, 000000fe
But what's happening here with the last (0xfe) output (which is the only
one which is as desired)? Could it be that rather than the char simply
being cast to an unsigned char the integer promotions come in
conceptually before and after casting, leading to the following?>
1. The original signed char is promoted to signed int 0xfffffffe.
Most types of expressions that take arithmetic operands cause the
integer promotions to be applied to some of those operands - but a cast
expression isn't one of those types, so this step does not occur.
Post by James Harris
2. The cast is applied, making it an unsigned char with value 0xfe.
That, on the other hand, is correct.
Post by James Harris
3. Unsigned char 0xfe is promoted to unsigned int 0x000000fe.
The default argument promotions are applied to any function argument for
which there's no corresponding parameter in a function prototype, which
applies to the variable arguments of a variadic function such as
printf(). However, if 0xfe is the result of converting to unsigned char,
that implies that CHAR_MAX is 0xff, in which case all values of type
unsigned char can be represented as ints. That being the case, the
result, when promoted, is of type 'int', not 'unsigned int'.
Post by James Harris
4. printf is called with that unsigned promoted value.
It's called with a value that has been promoted only once, not twice,
and the result will normally have type 'int'.
Post by James Harris
I am thinking that the promotions would be done theoretically, in the
type system of the compiler. They would not be machine-code conversion
steps.
C conversions are value preserving to as great an extent as possible,
particularly the ones that occur implicitly. That means that in many
cases, they are no-ops at the machine code level. Even the conversion of
negative values to unsigned type, which cannot be value-preserving, can
be no-ops on 2's complement machines. Conversion of unsigned values to a
smaller unsigned type may involve nothing more complicated than storing
a value in an N-bit register, and then retrieving it using a smaller
register that overlaps that larger one, which looks very similar to a
no-op if you're not paying close attention.
However, in general, conversions do involve machine code.
James Harris
2017-07-05 14:20:11 UTC
Permalink
Raw Message
Post by James Kuyper
Post by James Harris
...
(Copying in comp.lang.c re. a potentially abstruse query about integer
promotions, below.)
Starting with c as a (signed) char value of -2.
So (unsigned char) c would presumably convert -2 into 0xfe and then
leave it for the integer promotions to convert to 0x000000fe.
Without the relevant context, I can't be sure, but I assume that you
have verified that char is signed on the implementation that you're using?
Post by James Harris
What about (unsigned) c instead? Would that cause c to be promoted to
int and then made unsigned? If so, it would produce a very different -
and, in this case, wrong - result of 0xfffffffe.
Maybe (unsigned char) would be the best/right/only thing to do when
using a signed char as an index from the start of an array of 256 bytes.
Just tried it on gcc with the following code
#include <stdio.h>
int main(void) {
To make your point somewhat more portably, it would be better to declare
this as "signed char c".
Post by James Harris
char c = -2;
This line involves a conversion from int to char. Assuming that char is
a signed type, it's guaranteed to be able to store that value, so c has
a value of -2.
Post by James Harris
printf("%08x, %08x, %08x\n", (int)c, (unsigned)c, (unsigned char)c);
(int)c has the value of c, converted to signed int, which is guaranteed
to be able to represent that value, so it remains unchanged, at -2.
(unsigned)c has the value (UINT_MAX+1)-2 == UINT_MAX-1. (unsigned char)c
would have the value (UCHAR_MAX+1)-2 == UCHAR_MAX-1.
The default argument promotions are applied to all arguments after the
format string. Those promotions leave the second and third arguments
unchanged. However, the fourth argument will be converted to int, except
in the unlikely case that UCHAR_MAX > INT_MAX,
in which case it will be
converted to unsigned int; either way, the value will be unchanged by
the conversion: it will still be UCHAR_MAX-1.
I find C's conversion rules strange where they depend on the
architecture, e.g. an unsigned short being converted to an integer would
be a signed int on most machines but an unsigned int on those machines
which have shorts being as wide as ints. I'm not sure of the potential
ramifications of that for program portability but it seems inconsistent
- albeit that I can see why the rule would exist. Perhaps the "problem"
is automatic implicit conversions between signed and unsigned integers.
Post by James Kuyper
The x format specifier expects a value of type unsigned int. The first
%x corresponds to an argument of type int, and on most systems, so will
the third %x. Such a type mis-match renders the behavior of your code
undefined. For positive values, unsigned int and int are required to
have the same representation, "which is intended to imply
interchangeability as arguments ...", but the value of the first
argument is negative, so that doesn't apply.
I'm not sure what you're trying to do, but printing an int value with a
%x specifier doesn't prove much of anything. Is there a format specifier
that takes an int value which would be suitable for making your point?
%i or %d would do but would be less clear. Are you saying that %x cannot
be used portably to print negative numbers in a hexadecimal form? UB
usually means that anything could happen, doesn't it? In this case, does
that technically mean more is at risk that just getting erroneous output?

What's the "right" way to print signed numbers as hex?
--
James Harris
s***@casperkitty.com
2017-07-05 14:45:06 UTC
Permalink
Raw Message
Post by James Harris
I find C's conversion rules strange where they depend on the
architecture, e.g. an unsigned short being converted to an integer would
be a signed int on most machines but an unsigned int on those machines
which have shorts being as wide as ints. I'm not sure of the potential
ramifications of that for program portability but it seems inconsistent
- albeit that I can see why the rule would exist. Perhaps the "problem"
is automatic implicit conversions between signed and unsigned integers.
Such conversion rules point to C as a language which is designed to allow
itself to be implemented on many platforms, *at the expense* of reducing
the portability of code. I'm not sure why so many people think C was
intended to facilitate the writing of portable *programs*, when it should
be given a C- in that regard.

A language which is intended to facilitate the writing of portable programs
should include types whose behavior is consistent on all platforms. If such
a language is designed to facilitate multi-platform implementation, it should
*also* include machine-specific types, and code intended to be portable among
platforms should favor those *except when exact behaviors are required*. C
unfortunately, does not include any such types. Some implementations provide
a 16-bit type such that given x==1 and y==2, x-y would equal 65535; others
provide a 16-bit type such that x-y would equal -1. The rules in the
Standard, however, effectively forbid implementations from including both.
Keith Thompson
2017-07-05 15:48:09 UTC
Permalink
Raw Message
James Harris <***@gmail.com> writes:
[...]
Post by James Harris
I find C's conversion rules strange where they depend on the
architecture, e.g. an unsigned short being converted to an integer would
be a signed int on most machines but an unsigned int on those machines
which have shorts being as wide as ints. I'm not sure of the potential
ramifications of that for program portability but it seems inconsistent
- albeit that I can see why the rule would exist. Perhaps the "problem"
is automatic implicit conversions between signed and unsigned integers.
I also find them strange.

When the ANSI C standard was being developed, existing C compilers used
one of two promotion schemes: "value preserving" and "unsigned
preserving".

The decision is discussed in the ANSI C Rationale,
<https://www.lysator.liu.se/c/rat/c2.html>:

The unsigned preserving rules greatly increase the number of
situations where unsigned int confronts signed int to yield
a questionably signed result, whereas the value preserving
rules minimize such confrontations. Thus, the value preserving
rules were considered to be safer for the novice, or unwary,
programmer. After much discussion, the Committee decided in
favor of value preserving rules, despite the fact that the UNIX
C compilers had evolved in the direction of unsigned preserving.

[...]
Post by James Harris
%i or %d would do but would be less clear. Are you saying that %x cannot
be used portably to print negative numbers in a hexadecimal form? UB
usually means that anything could happen, doesn't it? In this case, does
that technically mean more is at risk that just getting erroneous output?
That's correct. The "%x" format requires an argument of type unsigned
int. You can get away with giving it an argument of type signed int
*if* the value is representable in both types (this guarantee is made in
a non-normative footnote).

Yes, this:

printf("0x%x\n", -1);

has undefined behavior. In practice, it's very very likely to
print the value of UINT_MAX in hexadecimal (typically 0xffffffff).
It would print a different value on a system that uses something
other than 2's-complement (you're unlikely to encounter such a
system). A conforming compiler *could* recognize that the behavior
is undefined and reject it, or generate code that does something
unexpected, but that's unlikely.
Post by James Harris
What's the "right" way to print signed numbers as hex?
Convert to unsigned.

printf("0x%x\n", (unsigned)-1);

has defined behavior that matches the likely behavior of
printf("0x%x\n", -1). Note that this won't print the value of -1;
-1 and 0xffffffff (equivalently 4294967295) are two distinct values.

If you want to print -1 in hex as "-0x1", you can't do it directly:

int n = -1;
if (n >= 0) {
printf("0x%x\n", (unsigned)n);
}
else {
printf("-0x%x\n", (unsigned)-n);
}

There might still be a glitch for n==INT_MIN, since in that case -n can
overflow.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
James Harris
2017-07-05 17:32:10 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by James Harris
I find C's conversion rules strange where they depend on the
architecture, e.g. an unsigned short being converted to an integer would
be a signed int on most machines but an unsigned int on those machines
which have shorts being as wide as ints. I'm not sure of the potential
ramifications of that for program portability but it seems inconsistent
- albeit that I can see why the rule would exist. Perhaps the "problem"
is automatic implicit conversions between signed and unsigned integers.
I also find them strange.
When the ANSI C standard was being developed, existing C compilers used
one of two promotion schemes: "value preserving" and "unsigned
preserving".
The decision is discussed in the ANSI C Rationale,
The unsigned preserving rules greatly increase the number of
situations where unsigned int confronts signed int to yield
a questionably signed result, whereas the value preserving
rules minimize such confrontations. Thus, the value preserving
rules were considered to be safer for the novice, or unwary,
programmer. After much discussion, the Committee decided in
favor of value preserving rules, despite the fact that the UNIX
C compilers had evolved in the direction of unsigned preserving.
Thanks, it's always interesting to see how language design decisions
developed!

Of course, widening an unsigned number can always preserve the value. I
guess they are talking about the effects of those widened values then
interacting with other values which might be signed.
--
James Harris
h***@gmail.com
2017-07-07 21:18:16 UTC
Permalink
Raw Message
On Wednesday, July 5, 2017 at 8:48:35 AM UTC-7, Keith Thompson wrote:

(snip)
Post by Keith Thompson
has undefined behavior. In practice, it's very very likely to
print the value of UINT_MAX in hexadecimal (typically 0xffffffff).
It would print a different value on a system that uses something
other than 2's-complement (you're unlikely to encounter such a
system). A conforming compiler *could* recognize that the behavior
is undefined and reject it, or generate code that does something
unexpected, but that's unlikely.
The CDC 6500 is running:

https://www.computer.org/csdl/mags/co/2017/04/mco2017040010.html

but I suspect that there is no C compiler for it.

Seems to me that most often when %x is used on a signed value,
one wants to see the bit representation of the value.

One should know when one is using a ones complement machine,
that the value will be different when used as a negative
int, than the same bit representation on a twos complement
machine.

Someday, someone will have a 7090 running, and we can test
the effects of sign magnitude arithmetic.

(The 7090 uses 36 bit words, but integers are 16 bits.)
Keith Thompson
2017-07-07 23:22:45 UTC
Permalink
Raw Message
***@gmail.com writes:
[...]
Post by h***@gmail.com
Seems to me that most often when %x is used on a signed value,
one wants to see the bit representation of the value.
Probably, but the standard doesn't say that that's what it does.

[...]
]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
h***@gmail.com
2017-07-08 08:13:19 UTC
Permalink
Raw Message
On Friday, July 7, 2017 at 4:23:01 PM UTC-7, Keith Thompson wrote:

(snip, I wrote)
Post by Keith Thompson
Post by h***@gmail.com
Seems to me that most often when %x is used on a signed value,
one wants to see the bit representation of the value.
Probably, but the standard doesn't say that that's what it does.
But okay, say you have a ones complement or sign magnitude machine
and say:

i=-100;
printf("%x",(unsigned)i);

what should it print?

(assume int is 36 bits.)

Now, for extra challenge, consider that such machines might
not have a data type that is the same width as int, but
unsigned. (This might be true for the 7090 and Unisys 110x
machines.)
Ben Bacarisse
2017-07-08 10:48:55 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip, I wrote)
Post by Keith Thompson
Post by h***@gmail.com
Seems to me that most often when %x is used on a signed value,
one wants to see the bit representation of the value.
Probably, but the standard doesn't say that that's what it does.
But okay, say you have a ones complement or sign magnitude machine
i=-100;
printf("%x",(unsigned)i);
what should it print?
You don't show the type of i. Presumably int, yes? If so, the output
must be the hex representation of UINT_MAX-100+1.
Post by h***@gmail.com
(assume int is 36 bits.)
That's not quite enough to know actual the answer. You need to know how
many value bits int and unsigned int have. If unsigned has 36 value
bits, the output will be fffffff9c (if I've got the arithmetic right).
Post by h***@gmail.com
Now, for extra challenge, consider that such machines might
not have a data type that is the same width as int, but
unsigned. (This might be true for the 7090 and Unisys 110x
machines.)
Sometimes, C on such machines has an unsigned type that is one bit
shorter than int (in effect, the sign bit is treated as a padding bit),
but unsigned int can't have fewer value bits than signed int. Provided
the C implementation can meet the standard's requirements, the answer is
the same: UINT_MAX-100+1.
--
Ben.
James Kuyper
2017-07-09 01:36:50 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip, I wrote)
Post by Keith Thompson
Post by h***@gmail.com
Seems to me that most often when %x is used on a signed value,
one wants to see the bit representation of the value.
Probably, but the standard doesn't say that that's what it does.
But okay, say you have a ones complement or sign magnitude machine
i=-100;
printf("%x",(unsigned)i);
what should it print?
The answer is completely unaffected by the way signed integers are
represented. It's UINT_MAX+1-100.
Post by h***@gmail.com
(assume int is 36 bits.)
If 'int' has 36 bits, including the sign bit, then that means that
INT_MAX is 2^35-1, which in turn implies that UINT_MAX must be at least
that high, but it could be higher. In principle, that's all you can be
sure of; but in practice, UINT_MAX is likely to be 2^36-1 on such a machine.
Post by h***@gmail.com
Now, for extra challenge, consider that such machines might
not have a data type that is the same width as int, but
unsigned. (This might be true for the 7090 and Unisys 110x
machines.)
That implies that unsigned int must have a width greater than that of
int, but is not sufficient information in itself to determine that
width. However, regardless of how wide it is, the value it should print
is still UINT_MAX+1-100.
Keith Thompson
2017-07-09 04:14:47 UTC
Permalink
Raw Message
Post by James Kuyper
Post by h***@gmail.com
(snip, I wrote)
Post by Keith Thompson
Post by h***@gmail.com
Seems to me that most often when %x is used on a signed value,
one wants to see the bit representation of the value.
Probably, but the standard doesn't say that that's what it does.
But okay, say you have a ones complement or sign magnitude machine
i=-100;
printf("%x",(unsigned)i);
what should it print?
The answer is completely unaffected by the way signed integers are
represented. It's UINT_MAX+1-100.
And to be clear, it's up to the compiler to generate whatever code it
has to so that the conversion yields that result. The rules for
conversions are stated in terms of values, not representations.

The fact that many such conversions are effectively no-ops for
2's-complement systems is not coincidental.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
h***@gmail.com
2017-07-10 05:37:43 UTC
Permalink
Raw Message
(snip)
Post by Keith Thompson
Post by James Kuyper
The answer is completely unaffected by the way signed integers are
represented. It's UINT_MAX+1-100.
And to be clear, it's up to the compiler to generate whatever code it
has to so that the conversion yields that result. The rules for
conversions are stated in terms of values, not representations.
The fact that many such conversions are effectively no-ops for
2's-complement systems is not coincidental.
The Unisys machines are interesting for ones complement.

There are pretty much two ways to make ones complement machines.

In the first, and maybe more obvious, -0 equals +0.

In C, ~0U == 0.

(In the most obvious way, x+-x always gives ~0.)

But there is another way: be sure that -0 is never the
result in normal operation, which means that it is never
the result, as long as one (or both) operands of any
operator are -0.

If you build a ones complement subtractor, (which you can
use for subtraction), and subtract the complement for addition,
it is pretty easy to make this work.

In this case, signed variables give the expected result when
used as the appropriate number of flag bits. For the Unisys 2200,
it seems that -0 is less than +0, and greater than -1. It should
also work for unsigned without the special option.

The special compiler option arranges things, presumably with
extra conditional testing, such that -0 (that is, ~0) comes
out in the right place. It has to do this for pretty much
every operation, which defeats much of the speed that one
expects with C bitwise, or other operations on unsigned.

But even more, as I noted earlier, a common use for %x is to
see the bit value of a variable. That might be, for example,
in the case of bit flags. But C does not have a reliable way
to view the bits of a value using %x, signed or unsigned, on
a ones complement machine.

Sign magnitude has different questions, which I will leave
until someone writes a C compiler for the 7090.
Ike Naar
2017-07-10 14:08:34 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip)
Post by Keith Thompson
Post by James Kuyper
The answer is completely unaffected by the way signed integers are
represented. It's UINT_MAX+1-100.
And to be clear, it's up to the compiler to generate whatever code it
has to so that the conversion yields that result. The rules for
conversions are stated in terms of values, not representations.
The fact that many such conversions are effectively no-ops for
2's-complement systems is not coincidental.
The Unisys machines are interesting for ones complement.
Ones' complement or sign-and-magnitude?
The Unisys machines that I know of use the latter.
s***@casperkitty.com
2017-07-10 15:37:43 UTC
Permalink
Raw Message
Post by Ike Naar
Ones' complement or sign-and-magnitude?
The Unisys machines that I know of use the latter.
The one I've read the manual for was ones'-complement. I'm not really sure
I see much advantage to sign-magnitude. Both ones'-complement and sign-
magnitude forms would make it easy to add hardware to a register which will
show signed numbers in "human" readable form using one set of lights for
positive numbers and another for negative (making it easy to distinguish a
register which is flickering between -16 and +2 from one that's flickering
between -2 and +16). Although two's-complement will allow a simpler ALU
than ones'-complement, it would require more complicated register-display
hardware. If a machine includes hardware to continuously display 16
registers and only has one ALU, the saving s in display hardware might
outweigh the extra complexity in the ALU. Of course, if a machine doesn't
have any register-display hardware, that advantage goes away.

Complexity of both the ALU and display will be about the same for ones'-
complement and sign-magnitude formats. Ones'-complement, however, has the
advantage that unsigned math on values up to 2*INT_MAX will use the same
bit-level operations as signed math. The only possible advantage I could
see for sign-magnitude form would be in a machine which processed integer
math with floating-point hardware and "jammed" the exponent.
h***@gmail.com
2017-07-10 17:12:41 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Ike Naar
Ones' complement or sign-and-magnitude?
The Unisys machines that I know of use the latter.
The one I've read the manual for was ones'-complement. I'm not really sure
I see much advantage to sign-magnitude. Both ones'-complement and sign-
magnitude forms would make it easy to add hardware to a register which will
show signed numbers in "human" readable form using one set of lights for
positive numbers and another for negative (making it easy to distinguish a
register which is flickering between -16 and +2 from one that's flickering
between -2 and +16). Although two's-complement will allow a simpler ALU
than ones'-complement, it would require more complicated register-display
hardware. If a machine includes hardware to continuously display 16
registers and only has one ALU, the saving s in display hardware might
outweigh the extra complexity in the ALU. Of course, if a machine doesn't
have any register-display hardware, that advantage goes away.
The IBM series 704, 709, 7090, 7094, are sign magnitude, with 36 bit
floating point, and 16 bit integer. And yes, all computers of the
time have display lights and switches for reading and setting registers
and memory.

Some of the early machines of that era use CRTs for main memory, storing
charge on the screen, and reading it back later. (And with refresh
as the charge decays. Early versions of DRAM.) Which also has the
advantage that you get a visual display of the contents of memory.
I believe one bit of the word per CRT device, so it isn't so easy
to read out the bits of a value.
Post by s***@casperkitty.com
Complexity of both the ALU and display will be about the same for ones'-
complement and sign-magnitude formats. Ones'-complement, however, has the
advantage that unsigned math on values up to 2*INT_MAX will use the same
bit-level operations as signed math. The only possible advantage I could
see for sign-magnitude form would be in a machine which processed integer
math with floating-point hardware and "jammed" the exponent.
Note that sign magnitude is still usual for floating point.
I am not sure right now how the 7090 does fixed point, other than
that it uses 15 value bits and a sign bit, maybe not contiguous.

Many machines around that time use the same hardware for fixed
and floating point. Some arrange floating point normalization such
that the exponent is zero for values that can represent integers.
Scott Lurndal
2017-07-11 12:44:38 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Ike Naar
Ones' complement or sign-and-magnitude?
The Unisys machines that I know of use the latter.
The one I've read the manual for was ones'-complement. I'm not really sure
There were three lines of Burroughs mainframes and two lines of
Sperry mainframes at the time of the merger. One line from each
of Burroughs and Sperry are still being sold (albeit in emulated form).

Burroughs:

Small Systems (B1XXX). Reconfigurable instruction set during
task scheduling. Different instruction set
per high-level language (COBOL, Fortran, etc).
24-bit datapath.

Small systems reference: https://en.wikipedia.org/wiki/Burroughs_B1700

Medium Systems (B2XXX, B3XXX, B4XXX).
BCD machine. Addressible to the nibble. Stored
signed data in N+1 nibbles (sign digit first
followed by 1 to 100 numeric digits). Sign digit 0xd
indicates negative, any other value indicates
positive (canonicalized to 0xc by the processor).

Medium systems reference: http://vseries.lurndal.org/
https://en.wikipedia.org/wiki/Burroughs_Medium_Systems

Large Systems (B5xxx, B6xxx, B7xxx).
48-bit capability-based stack architecture.

https://en.wikipedia.org/wiki/Burroughs_large_systems

Sperry:
1100/2200 Univac. 36-bit machine
System 90 IBM/360 compatible.

The large systems still exist as Unisys Clearpath Libra.
The 1100/2200 still exists as Unisys Clearpath Dorado.
James R. Kuyper
2017-07-10 15:36:19 UTC
Permalink
Raw Message
On 07/10/2017 01:37 AM, ***@gmail.com wrote:
...
Post by h***@gmail.com
in the case of bit flags. But C does not have a reliable way
to view the bits of a value using %x, signed or unsigned, on
a ones complement machine.
If you want to look at the actual representation of any value in a given
type, access an object of that type containing that value as an array of
unsigned char, and print the value of each element of that array using %x.
h***@gmail.com
2017-07-10 16:51:43 UTC
Permalink
Raw Message
On Monday, July 10, 2017 at 8:36:32 AM UTC-7, James R. Kuyper wrote:

(snip, I wrote)
Post by James R. Kuyper
Post by h***@gmail.com
in the case of bit flags. But C does not have a reliable way
to view the bits of a value using %x, signed or unsigned, on
a ones complement machine.
If you want to look at the actual representation of any value in a given
type, access an object of that type containing that value as an array of
unsigned char, and print the value of each element of that array using %x.
With the 36 bit word, I suspect that char (and unsigned char)
are 9 bits each. That would work well for octal, not so well
for hex. Well, I would want output as 9 hex digits.

But OK, with the right shifts and masking and conversion
to unsigned char and printing, it could work.
j***@verizon.net
2017-07-10 17:06:30 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip, I wrote)
Post by James R. Kuyper
Post by h***@gmail.com
in the case of bit flags. But C does not have a reliable way
to view the bits of a value using %x, signed or unsigned, on
a ones complement machine.
If you want to look at the actual representation of any value in a given
type, access an object of that type containing that value as an array of
unsigned char, and print the value of each element of that array using %x.
With the 36 bit word, I suspect that char (and unsigned char)
are 9 bits each. That would work well for octal, not so well
for hex. Well, I would want output as 9 hex digits.
But OK, with the right shifts and masking and conversion
to unsigned char and printing, it could work.
Where do the shifts, masks, and conversion to unsigned char come in?

void print_representation(int value)
{
unsigned char *puc = (unsigned char*)&value;
printf("The representation for %d is:", value);
for(size_t i=0; i<sizeof(value); i++)
printf("%x ", puc[i]);
printf("\n");
}

That will work just as well with 36-bit int and 9-bit unsigned char as with 32-bit int and 8-bit unsigned char. If "value" is negative, you'll see different results depending upon whether "int" has a 2's complement, 1's complement, or sign-magnitude representation. You'll also see different results depending upon the byte ordering of "int", and you may see some interesting artifacts if "int" has padding bits, which is all as it should be if you're really interested in looking at the representation.
h***@gmail.com
2017-07-10 17:36:42 UTC
Permalink
Raw Message
On Monday, July 10, 2017 at 10:06:46 AM UTC-7, ***@verizon.net wrote:


(snip, I wrote)

(snip)
Post by j***@verizon.net
Post by h***@gmail.com
With the 36 bit word, I suspect that char (and unsigned char)
are 9 bits each. That would work well for octal, not so well
for hex. Well, I would want output as 9 hex digits.
But OK, with the right shifts and masking and conversion
to unsigned char and printing, it could work.
Where do the shifts, masks, and conversion to unsigned char come in?
void print_representation(int value)
{
unsigned char *puc = (unsigned char*)&value;
printf("The representation for %d is:", value);
for(size_t i=0; i<sizeof(value); i++)
printf("%x ", puc[i]);
printf("\n");
}
That will work just as well with 36-bit int and 9-bit
unsigned char as with 32-bit int and 8-bit unsigned char.
If "value" is negative, you'll see different results
depending upon whether "int" has a 2's complement,
1's complement, or sign-magnitude representation.
Yes, that is what I want.
Post by j***@verizon.net
You'll also see different results depending upon the
byte ordering of "int", and you may see some interesting
artifacts if "int" has padding bits, which is all as it
should be if you're really interested in looking at
the representation.
But I want a 36 bit word to come out as nine hex digits.

The usual character representation in the early days of 36
bit machines is six, 6 bit characters, which doesn't work
for C char.

Until IBM S/360, though, octal was the usual people-readable
representation of binary values. That may have been more for
documentation than software. The IBM Fortran compilers
for 36 bit machines don't have a format for printing octal.

DEC did put O format into their machines, even when the word
size wasn't a multiple of three. With VAX, DEC went to hex,
though O format is still there.

There is a story that, along with the development of VAX, DEC
published a calendar (for internal use) with the dates in
hex representation.
James R. Kuyper
2017-07-10 23:06:22 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip, I wrote)
(snip)
Post by j***@verizon.net
Post by h***@gmail.com
With the 36 bit word, I suspect that char (and unsigned char)
are 9 bits each. That would work well for octal, not so well
for hex. Well, I would want output as 9 hex digits.
But OK, with the right shifts and masking and conversion
to unsigned char and printing, it could work.
Where do the shifts, masks, and conversion to unsigned char come in?
void print_representation(int value)
{
unsigned char *puc = (unsigned char*)&value;
printf("The representation for %d is:", value);
for(size_t i=0; i<sizeof(value); i++)
printf("%x ", puc[i]);
printf("\n");
}
That will work just as well with 36-bit int and 9-bit
unsigned char as with 32-bit int and 8-bit unsigned char.
If "value" is negative, you'll see different results
depending upon whether "int" has a 2's complement,
1's complement, or sign-magnitude representation.
Yes, that is what I want.
Post by j***@verizon.net
You'll also see different results depending upon the
byte ordering of "int", and you may see some interesting
artifacts if "int" has padding bits, which is all as it
should be if you're really interested in looking at
the representation.
But I want a 36 bit word to come out as nine hex digits.
OK - then shifts are needed, but I still see no need for masking or
conversion to unsigned char:

void print_representation(int value)
{
unsigned char *puc = (unsigned char*)&value;
uintmax_t rep = puc[0];

for(size_t i=1; i<sizeof value; i++)
{
rep <<= CHAR_BIT;
rep += puc[i];
}
printf("The representation of %d is %jx\n", value, rep);
}

Note: while "rep" could probably be safely declared as "unsigned",
technically here is no type that is guaranteed to be large enough for
"rep" to do what it's intended to do. It is possible for (sizeof
value)*CHAR_BIT to be bigger than the number of value bits in uintmax_t,
which would require "int" to have at least CHAR_BIT padding bits, and
would cause "rep" to overflow. A work-around that correctly handles that
obscure possibility would be quite complicated, but that's not something
you should need to worry about with any real-world implementation. I
settled for using uintmax_t, which is probably unnecessarily big, but
doesn't complicate anything.
Ben Bacarisse
2017-07-10 17:40:19 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip)
Post by Keith Thompson
Post by James Kuyper
The answer is completely unaffected by the way signed integers are
represented. It's UINT_MAX+1-100.
And to be clear, it's up to the compiler to generate whatever code it
has to so that the conversion yields that result. The rules for
conversions are stated in terms of values, not representations.
The fact that many such conversions are effectively no-ops for
2's-complement systems is not coincidental.
The Unisys machines are interesting for ones complement.
I think some are and some are sign+magnitude. I'm going from various
Unisys texts found on the web, and I'm not 100% sure what they apply
to. One at least talks about the bit used for the sign those used for
the magnitude.
Post by h***@gmail.com
There are pretty much two ways to make ones complement machines.
In the first, and maybe more obvious, -0 equals +0.
In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. That's not likely, though I don't think
it's impossible.

You possibly meant to write ~0 == 0 because 0 has type int, and ~0
inverts every bit.

<snip>
--
Ben.
h***@gmail.com
2017-07-10 21:13:38 UTC
Permalink
Raw Message
On Monday, July 10, 2017 at 10:40:33 AM UTC-7, Ben Bacarisse wrote:


(snip, I wrote)
Post by Ben Bacarisse
Post by h***@gmail.com
There are pretty much two ways to make ones complement machines.
In the first, and maybe more obvious, -0 equals +0.
In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. That's not likely, though I don't think
it's impossible.
You possibly meant to write ~0 == 0 because 0 has type int, and ~0
inverts every bit.
I suspect that you are saying that I forgot the promotion
rules regarding signed, unsigned, and ==, which is likely true.

But consider hardware which only supplies a ones complement ALU.

I was about to write -0==0, and then ~0==0, but went for ~0U==0.
The Unisys hardware can distinguish -0 and 0 in comparisons,
and so also ~0U. For hardware where ~0==0, it might also be
that ~0U==0U, because that is the way that they hardware works.

It might be that Univac, the predecessor to Unisys, produced
sign magnitude integer machines. Last I knew, Unisys is still
selling the 2200. This isn't ancient history.

CDC ones complement machines were still popular not so many
years ago, at least into years when C became popular. I don't
know that a production C compiler was written for one.
James R. Kuyper
2017-07-10 22:10:12 UTC
Permalink
Raw Message
...
Post by h***@gmail.com
Post by Ben Bacarisse
Post by h***@gmail.com
There are pretty much two ways to make ones complement machines.
In the first, and maybe more obvious, -0 equals +0.
In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. That's not likely, though I don't think
it's impossible.
It is impossible. "If an int can represent all values of the original
type (as restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions." (6.3.1.1)
"unsigned" can't represent any of the negative values representable by
"int", so it always promotes to "unsigned int", regardless of how signed
ints are represented. That promotion is, of course, a no-op.
Post by h***@gmail.com
Post by Ben Bacarisse
You possibly meant to write ~0 == 0 because 0 has type int, and ~0
inverts every bit.
I suspect that you are saying that I forgot the promotion
rules regarding signed, unsigned, and ==, which is likely true.
Actually, no. It's the "usual arithmetic conversions" (6.3.1.8) that
matter. The "usual arithmetic conversions" start with the integer
promotions, but both 'int' and 'unsigned' are unaffected by the integer
promotions.

0U has the type "unsigned int". Therefore, so does ~0U, which has a
value of UINT_MAX. In the expression ~0U == 0, one operand is unsigned,
and the other is signed, and both types have the same integer conversion
rank. The relevant clause is therefore: "... if the operand that has
unsigned integer type has rank greater or equal to the rank of the type
of the other operand, then the operand with signed integer type is
converted to the type of the operand with unsigned integer type."
(6.3.1.8p1)

Conversion of a signed value of 0 to an unsigned type produces a value
of 0; it can never compare equal to UINT_MAX.
Post by h***@gmail.com
But consider hardware which only supplies a ones complement ALU.
I was about to write -0==0, and then ~0==0, but went for ~0U==0.
-0 doesn't work, either. It's required to have a value of 0. On a one's
complement system, there can be two different ways to represent a value
of 0. It's implementation-defined whether the second way is supported as
a normal value, or treated as a trap representation. If it is a normal
value, it's called a "negative zero" (6.2.6.2p2).
However, 6.2.6.2p3 imposes strict limits on how negative zeros can be
created. The integer constant 0 isn't one of those ways. The '-'
operator can produce a negative zero, but only if it's operand is a
negative 0, which doesn't apply in this case. ~0, on the other hand, is
permitted (but not required) to produce a negative 0 on such an
implementation.

That still wouldn't change anything. 6.2.6.2p2 describes how the value
represented by a signed integer is affected by the sign bit, and the bit
pattern that is permitted to be called a negative zero represents the
value -(2^M-1)+(2^M-1), which is the same value represented by a normal
0, so they're required to compare equal.
Post by h***@gmail.com
The Unisys hardware can distinguish -0 and 0 in comparisons,
and so also ~0U. For hardware where ~0==0, it might also be
that ~0U==0U, because that is the way that they hardware works.
It doesn't matter how the hardware works. A conforming implementation of
C is required to have 0, 0U, -0, -0U, and ~0 all compare equal, and for
~0U to compare unequal to any of them. If that's not what the hardware
does naturally for one or more of those cases, the implementation is
required to generate whatever fix-up code as needed to meet those
requirements.
James R. Kuyper
2017-07-10 22:29:40 UTC
Permalink
Raw Message
Post by James R. Kuyper
...
Post by Ben Bacarisse
Post by h***@gmail.com
There are pretty much two ways to make ones complement machines.
In the first, and maybe more obvious, -0 equals +0.
In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. That's not likely, though I don't think
it's impossible.
It is impossible. "If an int can represent all values of the original
type (as restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions." (6.3.1.1)
"unsigned" can't represent any of the negative values representable by
"int", so it always promotes to "unsigned int", regardless of how signed
ints are represented. That promotion is, of course, a no-op.
I got it wrong, again. I did the argument backwards, the relevant issue
is whether "int" can represent all values representable as "unsigned",
not the other way around.

The standard requires that "The range of nonnegative values of a signed
integer type is a subrange of the corresponding unsigned integer type,
..." (6.2.5p9), which implies that the number of value bits for the
unsigned type be greater than or equal to the number of value bits for
the signed type (as confirmed by 6.2.6.2p2). For most real world
implementations, the unsigned type has one more value bit, using the
same bit that the corresponding signed type uses as a sign bit. For
those implementations, "unsigned" can represent values that cannot be
represented using "int", so it promotes to "unsigned int", which is a
no-op. However, for an implementation where they have the same number of
value bits, "unsigned" promotes to "int". On such an implementation,
the sign bit for the signed type must be a padding bit for the unsigned
type.

However, regardless of that issue, the promotion causes no change in the
value. ~0U still represents UINT_MAX, which for such implementations is
the same value as INT_MAX. It still can't compare equal to 0, which is
the relevant point.
Ben Bacarisse
2017-07-11 01:28:42 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip, I wrote)
Post by Ben Bacarisse
Post by h***@gmail.com
There are pretty much two ways to make ones complement machines.
In the first, and maybe more obvious, -0 equals +0.
In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. That's not likely, though I don't think
it's impossible.
You possibly meant to write ~0 == 0 because 0 has type int, and ~0
inverts every bit.
I suspect that you are saying that I forgot the promotion
rules regarding signed, unsigned, and ==, which is likely true.
But consider hardware which only supplies a ones complement ALU.
I was about to write -0==0, and then ~0==0, but went for ~0U==0.
The Unisys hardware can distinguish -0 and 0 in comparisons,
and so also ~0U. For hardware where ~0==0, it might also be
that ~0U==0U, because that is the way that they hardware works.
I'm now getting a bit confused because you are writing things that look
like C but are clearly not intended to be C. This is not uncommon here
and it can cause a lot of trouble. People (not you, I know) write about
-1 being 0xffffffff when, in C, these are two quite different values.

My preference is to invent a notation for hardware bit patterns so that
nothing looks like C when it isn't.

<snip>
--
Ben.
h***@gmail.com
2017-07-11 02:12:37 UTC
Permalink
Raw Message
On Monday, July 10, 2017 at 6:28:57 PM UTC-7, Ben Bacarisse wrote:

(snip, I wrote)
Post by Ben Bacarisse
Post by h***@gmail.com
But consider hardware which only supplies a ones complement ALU.
I was about to write -0==0, and then ~0==0, but went for ~0U==0.
The Unisys hardware can distinguish -0 and 0 in comparisons,
and so also ~0U. For hardware where ~0==0, it might also be
that ~0U==0U, because that is the way that they hardware works.
I'm now getting a bit confused because you are writing things that look
like C but are clearly not intended to be C. This is not uncommon here
and it can cause a lot of trouble. People (not you, I know) write about
-1 being 0xffffffff when, in C, these are two quite different values.
Well, okay, say that there existed hardware that did everything
that you would expect hardware running C to do, but in addition
it allowed for

~0U==0U to be true (1).

you can complain all you want, but hardware makers aren't going
to change hardware once it is built, just because you don't like it.


It occured to me some years ago, when C allowed for two possibilities
for division with negative dividend or divisor, that it wasn't so
likely that hardware would be built for one of those ways.
Specifically, that Fortran requires it to work one way, and that
hardware designers know that. Hardware expected to work with
Fortran would work that way. (Though it isn't so hard to fix
with some conditionals, and divide doesn't usually occur all
that often.) Note that Fortran was first implemented on
a sign magnitude machine (IBM 704) which might have lead
to its choice on divide.

It seems, though, that hardware designers don't consider the
requirements of C when designing hardware. If they did, they
would have designed 36 and 72 bit unsigned operations into
the 2200.

Actually, I suspect that hardware that doesn't do unsigned
multiply and divide isn't all that rare, and that it takes
some extra work to get right for those machines. But some
programs might do a lot of operations on unsigned int, where
it would slow it down by a large factor to fix up each one.
It might even be implemented as a subroutine call, which could
really slow things down.
James Kuyper
2017-07-11 02:31:49 UTC
Permalink
Raw Message
...
Post by h***@gmail.com
Post by Ben Bacarisse
I'm now getting a bit confused because you are writing things that look
like C but are clearly not intended to be C. This is not uncommon here
and it can cause a lot of trouble. People (not you, I know) write about
-1 being 0xffffffff when, in C, these are two quite different values.
Well, okay, say that there existed hardware that did everything
that you would expect hardware running C to do, but in addition
it allowed for
~0U==0U to be true (1)>
you can complain all you want, but hardware makers aren't going
to change hardware once it is built, just because you don't like it.
It's not about changing the hardware, it's about what you have to do in
order to use that hardware to create a conforming implementation of C.

If the hardware has instructions that you would be tempted to use for
implementing the ~ and == operators, but the way in which they work
would cause ~0U to compare equal to 0U, then those instructions cannot
be used without modification to implement those operations in C.

Does the hardware you're thinking of cause these problems because of the
way it implements the ~ operator, or the way it handles the == operator?
More specifically, does the ~ operator fail to produce UINT_MAX when
applied to 0U, or does the == operator incorrectly treat UINT_MAX as
equivalent to 0? Does this hardware fail to match the behavior defined
by the C standard for the ~ and == operators only for one specific
value, or for many different values?
Post by h***@gmail.com
It occured to me some years ago, when C allowed for two possibilities
for division with negative dividend or divisor, that it wasn't so
likely that hardware would be built for one of those ways.
Specifically, that Fortran requires it to work one way, and that
hardware designers know that. Hardware expected to work with
That's why, when C was changed to allow for only one of those
possibilities, it was the one that matched Fortran which survived.
Ben Bacarisse
2017-07-11 03:10:36 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip, I wrote)
Post by Ben Bacarisse
Post by h***@gmail.com
But consider hardware which only supplies a ones complement ALU.
I was about to write -0==0, and then ~0==0, but went for ~0U==0.
The Unisys hardware can distinguish -0 and 0 in comparisons,
and so also ~0U. For hardware where ~0==0, it might also be
that ~0U==0U, because that is the way that they hardware works.
I'm now getting a bit confused because you are writing things that look
like C but are clearly not intended to be C. This is not uncommon here
and it can cause a lot of trouble. People (not you, I know) write about
-1 being 0xffffffff when, in C, these are two quite different values.
Well, okay, say that there existed hardware that did everything
that you would expect hardware running C to do, but in addition
it allowed for
~0U==0U to be true (1).
you can complain all you want, but hardware makers aren't going
to change hardware once it is built, just because you don't like it.
I think we are talking at cross purposes. I am not complaining about
the hardware or expecting it to change. I am commenting on the way it's
behaviour is described.

~0U == 0U

is always false in C. If the hardware does odd things, the compiler
must generate extra code to ensure that ~0U is UINT_MAX and not 0.

But you are not talking about C here I think. You are talking about
what the hardware does, but describing it in what looks like C. That is
going to get confusing.

<snip>
--
Ben.
h***@gmail.com
2017-07-11 06:44:45 UTC
Permalink
Raw Message
On Monday, July 10, 2017 at 8:10:51 PM UTC-7, Ben Bacarisse wrote:

(snip about the results of ones complement comparison)
Post by Ben Bacarisse
I think we are talking at cross purposes. I am not complaining about
the hardware or expecting it to change. I am commenting on the way it's
behaviour is described.
~0U == 0U
is always false in C. If the hardware does odd things, the compiler
must generate extra code to ensure that ~0U is UINT_MAX and not 0.
I suppose so. If, for example (again using C notation):

x==y && x+1==y+1

should work, even when two different zero values compare equal.
Post by Ben Bacarisse
But you are not talking about C here I think. You are talking about
what the hardware does, but describing it in what looks like C. That is
going to get confusing.
I suppose, but since I don't know 2200 assembly language, that
wouldn't have helped. And I suspect most readers here don't,
either.

The design of the 2200 is such that it doesn't generate the
negative zero bit pattern in ordinary arithmetic. I suspect
that one compiles (or assembles) -x as 0-x, such that a positive
zero results. But if you do actually ~0 then you get the
negative zero bit pattern, which as noted above, on the 2200
does not compare equal to 0.

I suppose I could write them in Fortran notation, which
doesn't have an unsigned type. Then there will be no
confusion I even had to look these up, as I never tried
this before:

NOT(0).EQ.0

will be true on some machines, but not the 2200.
Ben Bacarisse
2017-07-11 11:46:47 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip about the results of ones complement comparison)
Post by Ben Bacarisse
I think we are talking at cross purposes. I am not complaining about
the hardware or expecting it to change. I am commenting on the way it's
behaviour is described.
~0U == 0U
is always false in C. If the hardware does odd things, the compiler
must generate extra code to ensure that ~0U is UINT_MAX and not 0.
x==y && x+1==y+1
should work, even when two different zero values compare equal.
Now I don't know if you are talking about C in general, some specific C
implementation, or trying to explain some hardware property using C-like
notation.
Post by h***@gmail.com
Post by Ben Bacarisse
But you are not talking about C here I think. You are talking about
what the hardware does, but describing it in what looks like C. That is
going to get confusing.
I suppose, but since I don't know 2200 assembly language, that
wouldn't have helped. And I suspect most readers here don't,
either.
There are other options!

Often you can just say it in words (maybe "+ve and -ve zero don't
compare equal on the 2200"), or you can invent a notation to describe
bit patterns, maybe using # instead of 0x and/or separating the sign bit
off to make things clearer: 1#000....000 and so on. Sure, you need to
give a quick description of what you mean, but there is likely to be
less confusion that writing, say, a C hex constant. These don't
describe bit patterns -- they denote integer values.
Post by h***@gmail.com
The design of the 2200 is such that it doesn't generate the
negative zero bit pattern in ordinary arithmetic. I suspect
that one compiles (or assembles) -x as 0-x, such that a positive
zero results. But if you do actually ~0 then you get the
negative zero bit pattern, which as noted above, on the 2200
does not compare equal to 0.
Now the English parts are clear, but even the two little bit of C-like
notation (-x and 0-x) introduce some confusion. I don't know what is
special about -x or 0-x (rather than +x or 0+x for example) that matters
to this discussion.
Post by h***@gmail.com
I suppose I could write them in Fortran notation, which
doesn't have an unsigned type. Then there will be no
confusion I even had to look these up, as I never tried
NOT(0).EQ.0
will be true on some machines, but not the 2200.
But then we'd have to know what the Fortran standard says about these
operators. They might require or permit a compiler to do more than the
hardware offers, though I take it you know otherwise or you would not
have used that example.
--
Ben.
h***@gmail.com
2017-07-11 20:09:30 UTC
Permalink
Raw Message
On Tuesday, July 11, 2017 at 4:47:03 AM UTC-7, Ben Bacarisse wrote:

(snip, I wrote)
Post by Ben Bacarisse
Post by h***@gmail.com
Post by Ben Bacarisse
~0U == 0U
(snip, and later)
Post by Ben Bacarisse
Post by h***@gmail.com
x==y && x+1==y+1
should work, even when two different zero values compare equal.
Now I don't know if you are talking about C in general, some specific C
implementation, or trying to explain some hardware property using C-like
notation.
OK, the problem with stating bit patterns is that they have a length.

I could write 0xfffffffff if I wanted a 36 bit word of all ones, but
then it is fixed at 36 bits. ~0 is a word of all ones, of whatever
length an int is in the C system it is written in.

If you don't like C hex constants, I sometimes write them in
the IBM S/360 and successor assembler form, X'FFFFFFFFF' which seems
obvious enough for just about everyone.
Post by Ben Bacarisse
Post by h***@gmail.com
Post by Ben Bacarisse
But you are not talking about C here I think. You are talking about
what the hardware does, but describing it in what looks like C. That is
going to get confusing.
I suppose, but since I don't know 2200 assembly language, that
wouldn't have helped. And I suspect most readers here don't,
either.
There are other options!
There are languages meant for describing hardware, verilog
and VHDL. Verilog uses operators similar to C, though with
a few changes. Two interesing ones are the & and | unary
reduction operators. &x is the (single bit) value resulting
from anding all the bits of its operand, |x from oring all
the bits. There is the continuous assignment statement,
which represents the result of a wire connecting to some
combination of logic, and with a possible delay.
Post by Ben Bacarisse
Often you can just say it in words (maybe "+ve and -ve zero don't
compare equal on the 2200"), or you can invent a notation to describe
bit patterns, maybe using # instead of 0x and/or separating the sign bit
off to make things clearer: 1#000....000 and so on. Sure, you need to
give a quick description of what you mean, but there is likely to be
less confusion that writing, say, a C hex constant. These don't
describe bit patterns -- they denote integer values.
Well, they describe bit patterns when used with bitwise operators,
but with other operators, they are integer values.
Post by Ben Bacarisse
Post by h***@gmail.com
The design of the 2200 is such that it doesn't generate the
negative zero bit pattern in ordinary arithmetic. I suspect
that one compiles (or assembles) -x as 0-x, such that a positive
zero results. But if you do actually ~0 then you get the
negative zero bit pattern, which as noted above, on the 2200
does not compare equal to 0.
Now the English parts are clear, but even the two little bit of C-like
notation (-x and 0-x) introduce some confusion. I don't know what is
special about -x or 0-x (rather than +x or 0+x for example) that matters
to this discussion.
Twos complement adders, adding bit patterns representing twos
complement integers, generate the same bit patterns as unsigned
adders adding the same bit patterns, representing unsigned values.
Convenient, in that people can sometimes ignore which operation
is being done. This is not true for ones complement.

For all values except zero, the negative of a ones complement
number inverts all the bits. If you invert all the bits of a
normal (positive) zero, you get a negative zero. People expect
all zeros to compare equal, so the obvious way to build a ones
complement machine is to add logic that allow both the all zeros
and all ones bit pattern to compare equal. You can see that those
using the bit patterns to represent unsigned values, or just
patterns of bits without a value (such as flags) will be surprised
at that result. If you add a value, and its ones complement
(all bits inverted) the result is naturally all bits one, or
negative zero.

It turns out, though, that there is another way to build
such hardware. If you build the logic for a ones complement
subtractor, the most obvious way generates all zero bits
(positive zero) when subtracting equal values, including
both operands as positive zero. Even more, it is easy to
arrange such that subtracting the ones complement (invert
all bits) to do addition, generates positive zero, even
when both operands (before the complement) are positive zero.

For the case of the unary negation operator, you then don't
want to just invert all bits, but instead use the subtractor
with zero for the first operand. That will generate positive
zero negating positive zero, but otherwise generate the
appropriate complement.

If you add ones complement values using an unsigned adder,
and there is a carry out from the addition, you have to
add one to the sum. This is called end-around carry.
There is a similar operation for the subtractor. If you
are doing unsigned addition or subtraction using a ones
complement adder or subtractor, you need to correct for this.

The result, for the 2200, is that it is easy to do unsigned
arithmetic with the largest value of 2*INT_MAX, but takes
a lot of extra work to get it right for the value with
all bits one.

And with 36 bits, all the values of an unsigned 32 bit integer
can be represented just fine.
Post by Ben Bacarisse
Post by h***@gmail.com
I suppose I could write them in Fortran notation, which
doesn't have an unsigned type. Then there will be no
confusion I even had to look these up, as I never tried
NOT(0).EQ.0
will be true on some machines, but not the 2200.
But then we'd have to know what the Fortran standard says about these
operators. They might require or permit a compiler to do more than the
hardware offers, though I take it you know otherwise or you would not
have used that example.
The Fortran standard, to allow for sign magnitude or ones
complement machines, leaves that open. As with C, if you only
use values between zero and INT_MAX, everything is fine.

NOT(x) is defined to invert all the bits of x, including the
sign bit. In the case of negative values, the value is
implementation dependent, but the bit representation isn't.

There are no unsigned integer types, so the problem of
representing their values doesn't occur.

Note that Fortran also allows for radix other than two.
bartc
2017-07-11 20:15:15 UTC
Permalink
Raw Message
Post by h***@gmail.com
OK, the problem with stating bit patterns is that they have a length.
I could write 0xfffffffff if I wanted a 36 bit word of all ones, but
then it is fixed at 36 bits. ~0 is a word of all ones, of whatever
length an int is in the C system it is written in.
I wonder what ~0 does in a language with arbitrary precision integers?
--
bartc
s***@casperkitty.com
2017-07-11 20:53:56 UTC
Permalink
Raw Message
Post by bartc
I wonder what ~0 does in a language with arbitrary precision integers?
It should yield an infinite string of ones to the left of the radix point.
For any value of i, any number where the 'i' digits preceding the radix
point are all 1's will be congruent to -1 mod 2**i, and the only integer
which is congruent to -1 mod all those values is, in fact, -1.

From a storage perspective, the infinite string of ones poses no problem,
since any finite integer value will start with either an infinite string
of zeroes or an infinite string of ones. All one need do to accommodate
both possibilities is specify that all bits to the left of the leftmost
represented bit have the same value as that bit.
h***@gmail.com
2017-07-11 21:59:15 UTC
Permalink
Raw Message
On Tuesday, July 11, 2017 at 1:15:20 PM UTC-7, Bart wrote:

(snip, I wrote)
Post by bartc
Post by h***@gmail.com
OK, the problem with stating bit patterns is that they have a length.
I could write 0xfffffffff if I wanted a 36 bit word of all ones, but
then it is fixed at 36 bits. ~0 is a word of all ones, of whatever
length an int is in the C system it is written in.
I wonder what ~0 does in a language with arbitrary precision integers?
Some years ago, I asked here about an arbitrary, grows as needed,
integer type. The reply was that it couldn't be legal C.

But if you really want one, you add a complement bit.

In Python (or maybe Numpy, I always forget) when you transpose a
matrix, it sets the T bit, and doesn't actually move anything.
Array references check the T bit to index the right way.

Which reminds me that someone else noted that Unisys still
sells descendants of the Burroughs machines. These use a type,
or tag, field on data in storage, indicating what is stored there.
Instructions and data have different tags, such that you can't
overwrite instructions, and then execute data. Array bounds are
always checked.
Keith Thompson
2017-07-11 22:09:53 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip, I wrote)
Post by bartc
Post by h***@gmail.com
OK, the problem with stating bit patterns is that they have a length.
I could write 0xfffffffff if I wanted a 36 bit word of all ones, but
then it is fixed at 36 bits. ~0 is a word of all ones, of whatever
length an int is in the C system it is written in.
I wonder what ~0 does in a language with arbitrary precision integers?
Some years ago, I asked here about an arbitrary, grows as needed,
integer type. The reply was that it couldn't be legal C.
Such a type can certainly be implemented as a library (there are
numerous such implementations), but it can't be a C integer type.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-07-11 22:25:19 UTC
Permalink
Raw Message
Post by Keith Thompson
Such a type can certainly be implemented as a library (there are
numerous such implementations), but it can't be a C integer type.
Such a thing cannot be a C object, since the maximum number of states that
can be encapsulated by an object of type T is UCHAR_MAX+1 to the power
sizeof (T), and both UCHAR_MAX and sizeof (T) are required to be finite.
Tim Rentsch
2017-07-12 10:14:15 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
Such a type can certainly be implemented as a library (there are
numerous such implementations), but it can't be a C integer type.
You're really outdone yourself this time - oversnipping to the
point where there is no context left at all. Bravo!
Post by s***@casperkitty.com
Such a thing cannot be a C object, since the maximum number of states that
can be encapsulated by an object of type T is UCHAR_MAX+1 to the power
sizeof (T), and both UCHAR_MAX and sizeof (T) are required to be finite.
Of course it can. Don't be stupid.
Keith Thompson
2017-07-12 16:33:07 UTC
Permalink
Raw Message
[...]

(Context: Arbitrary precision integer types.)
Post by Tim Rentsch
Post by s***@casperkitty.com
Such a thing cannot be a C object, since the maximum number of states that
can be encapsulated by an object of type T is UCHAR_MAX+1 to the power
sizeof (T), and both UCHAR_MAX and sizeof (T) are required to be finite.
Of course it can. Don't be stupid.
I suspect what supercat meant is that an arbitrary precision integer
cannot be a self-contained fixed-size C object. If so, I'm not sure
what his point adds to the discussion, even though it's valid.

If this works reliably:

huge_integer obj = some_value;
obj = huge_add(obj, 1);

then a huge_integer object cannot contain within itself all the
information needed to represent a huge integer value.

It could be an array type, but that would place a substantial memory
management burden on code that uses it. Much more commonly, it could
be, or it could contain, a pointer that refers to some dynamic data
structure. (The values such a structure can represent will still be
limited by available memory and address space, unless it uses files.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-07-12 16:47:36 UTC
Permalink
Raw Message
Post by Keith Thompson
It could be an array type, but that would place a substantial memory
management burden on code that uses it. Much more commonly, it could
be, or it could contain, a pointer that refers to some dynamic data
structure. (The values such a structure can represent will still be
limited by available memory and address space, unless it uses files.)
The C Standard makes pretty clear that if X and Y are disjoint objects
of *any* type T, then X can be copied to Y via:

for (int i=0; i<sizeof X; i++)
((unsigned char*)&Y)[i] = ((unsigned char*)&X)[i] & UCHAR_MAX;

but I can't think of any way an implementation could uphold that
guarantee if T supported more than (UCHAR_MAX+1) ** sizeof (T) states.

I can imagine uses for a "safe" version of C which allows for the
possibility that values of type "unsigned char" could hold extra state
which would be retained/copied if the code were written as:

for (int i=0; i<sizeof X; i++)
((unsigned char*)&Y)[i] = ((unsigned char*)&X)[i];

but which could get stripped off when the values are converted to other
integer types, including in arithmetic promotions. Such an implementation
could make it possible to "forge" pointers via bit manipulation, and thus
assure that all invalid pointer operations will trap. Such an implementation
could not be conforming with the present standard, however.
Keith Thompson
2017-07-12 17:11:41 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
It could be an array type, but that would place a substantial memory
management burden on code that uses it. Much more commonly, it could
be, or it could contain, a pointer that refers to some dynamic data
structure. (The values such a structure can represent will still be
limited by available memory and address space, unless it uses files.)
The C Standard makes pretty clear that if X and Y are disjoint objects
for (int i=0; i<sizeof X; i++)
((unsigned char*)&Y)[i] = ((unsigned char*)&X)[i] & UCHAR_MAX;
i should be of type size_t, not int, and I don't know why you think the
"& UCHAR_MAX" is needed.
Post by s***@casperkitty.com
but I can't think of any way an implementation could uphold that
guarantee if T supported more than (UCHAR_MAX+1) ** sizeof (T) states.
An object X can hold at most CHAR_BIT & sizeof (X) bits of
information. That's fairly obvious, I don't think anyone has
suggested otherwise, and I don't know why you insist on restating
it in more complicated terms.
Post by s***@casperkitty.com
I can imagine uses for a "safe" version of C which allows for the
possibility that values of type "unsigned char" could hold extra state
[snip]

Perhaps, but I fail to see the relevance.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Ike Naar
2017-07-12 17:39:12 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
Post by Keith Thompson
It could be an array type, but that would place a substantial memory
management burden on code that uses it. Much more commonly, it could
be, or it could contain, a pointer that refers to some dynamic data
structure. (The values such a structure can represent will still be
limited by available memory and address space, unless it uses files.)
The C Standard makes pretty clear that if X and Y are disjoint objects
for (int i=0; i<sizeof X; i++)
((unsigned char*)&Y)[i] = ((unsigned char*)&X)[i] & UCHAR_MAX;
i should be of type size_t, not int, and I don't know why you think the
"& UCHAR_MAX" is needed.
Post by s***@casperkitty.com
but I can't think of any way an implementation could uphold that
guarantee if T supported more than (UCHAR_MAX+1) ** sizeof (T) states.
An object X can hold at most CHAR_BIT & sizeof (X) bits of
information.
The '&' should probably be a '*'.
Keith Thompson
2017-07-12 19:22:53 UTC
Permalink
Raw Message
[...]
Post by Ike Naar
Post by Keith Thompson
An object X can hold at most CHAR_BIT & sizeof (X) bits of
information.
The '&' should probably be a '*'.
Yes, thanks.

Stupid fingers!
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-07-12 17:44:49 UTC
Permalink
Raw Message
Post by Keith Thompson
i should be of type size_t, not int, and I don't know why you think the
"& UCHAR_MAX" is needed.
You're right about the size of the object, but my point with UCHAR_MAX isn't
that it's needed, but that it is guaranteed not to have any adverse effect.
If one had an implementation where every storage location was tagged with
its type, and "unsigned char" could hold such tags in addition to its
numerical value, such an implementation could *almost* be conforming. It
would be tripped up, however, by the fact that an "unsigned char" is
forbidden from holding any observable other than a number 0..UCHAR_MAX.
Post by Keith Thompson
An object X can hold at most CHAR_BIT & sizeof (X) bits of
information. That's fairly obvious, I don't think anyone has
suggested otherwise, and I don't know why you insist on restating
it in more complicated terms.
If C were to add an arbitrary-precision type to the language, it could not
behave in a fashion consistent with how other types behave, and any
aggregates containing such types would behave in a fashion contrary to how
any other aggregates behave.
Post by Keith Thompson
Post by s***@casperkitty.com
I can imagine uses for a "safe" version of C which allows for the
possibility that values of type "unsigned char" could hold extra state
[snip]
Perhaps, but I fail to see the relevance.
Such a version of C could encapsulate an arbitrary amount of information
in things that behave like C objects.
Keith Thompson
2017-07-12 19:26:05 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
i should be of type size_t, not int, and I don't know why you think the
"& UCHAR_MAX" is needed.
You're right about the size of the object, but my point with UCHAR_MAX isn't
that it's needed, but that it is guaranteed not to have any adverse effect.
So is "+ 0".
Post by s***@casperkitty.com
If one had an implementation where every storage location was tagged with
its type, and "unsigned char" could hold such tags in addition to its
numerical value, such an implementation could *almost* be conforming.
Or, in other words, not conforming.

[snip]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Keith Thompson
2017-07-11 22:11:00 UTC
Permalink
Raw Message
bartc <***@freeuk.com> writes:
[...]
Post by bartc
I wonder what ~0 does in a language with arbitrary precision integers?
In Python 3:

Python 3.5.2+ (default, Sep 22 2016, 12:18:14)
[GCC 6.2.0 20160927] on linux
Type "help", "copyright", "credits" or "license" for more information.
Post by bartc
print(~0)
-1

For a language with arbitrary precision *unsigned* integers, it could be
an issue.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Ben Bacarisse
2017-07-12 00:23:02 UTC
Permalink
Raw Message
Post by bartc
Post by h***@gmail.com
OK, the problem with stating bit patterns is that they have a length.
I could write 0xfffffffff if I wanted a 36 bit word of all ones, but
then it is fixed at 36 bits. ~0 is a word of all ones, of whatever
length an int is in the C system it is written in.
I wonder what ~0 does in a language with arbitrary precision integers?
I'm sure it varies, but in Haskell:

Prelude> import Data.Bits
Prelude Data.Bits> complement 0
-1
--
Ben.
Ben Bacarisse
2017-07-12 00:47:03 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip, I wrote)
Post by Ben Bacarisse
Post by h***@gmail.com
Post by Ben Bacarisse
~0U == 0U
(snip, and later)
Post by Ben Bacarisse
Post by h***@gmail.com
x==y && x+1==y+1
should work, even when two different zero values compare equal.
Now I don't know if you are talking about C in general, some specific C
implementation, or trying to explain some hardware property using C-like
notation.
OK, the problem with stating bit patterns is that they have a length.
I could write 0xfffffffff if I wanted a 36 bit word of all ones, but
then it is fixed at 36 bits. ~0 is a word of all ones, of whatever
length an int is in the C system it is written in.
But 0xfffffffff is more than 36 one bits, it's a C constant with a type
almost certainly unsigned (it depends on a few details of the
implementation). Thus 0xfffffffff + 1 won't overflow because C's
unsigned arithmetic does not. On a 36-bit machine the value must be 0.
~0 + 1 may cause a trap. It might have value 0. It might have value 1.
It might have value INT_MIN+1. This is the problem with using C to talk
about bit patterns -- it says too much.

<snip>
Post by h***@gmail.com
Post by Ben Bacarisse
Often you can just say it in words (maybe "+ve and -ve zero don't
compare equal on the 2200"), or you can invent a notation to describe
bit patterns, maybe using # instead of 0x and/or separating the sign bit
off to make things clearer: 1#000....000 and so on. Sure, you need to
give a quick description of what you mean, but there is likely to be
less confusion that writing, say, a C hex constant. These don't
describe bit patterns -- they denote integer values.
Well, they describe bit patterns when used with bitwise operators,
Not really. You can't unambiguously describe the bit pattern of a
negative zero using C hex constants. You might write 0x800000000, but
that's actually a large positive value. People will, with luck, know
what you mean but it introduces some ambiguity.

<snip>
--
Ben.
Scott Lurndal
2017-07-12 12:45:03 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip, I wrote)
Post by Ben Bacarisse
Post by h***@gmail.com
Post by Ben Bacarisse
~0U == 0U
(snip, and later)
Post by Ben Bacarisse
Post by h***@gmail.com
x==y && x+1==y+1
should work, even when two different zero values compare equal.
Now I don't know if you are talking about C in general, some specific C
implementation, or trying to explain some hardware property using C-like
notation.
OK, the problem with stating bit patterns is that they have a length.
I could write 0xfffffffff if I wanted a 36 bit word of all ones, but
then it is fixed at 36 bits. ~0 is a word of all ones, of whatever
length an int is in the C system it is written in.
If you don't like C hex constants, I sometimes write them in
the IBM S/360 and successor assembler form, X'FFFFFFFFF' which seems
obvious enough for just about everyone.
Personally, I like the Verilog syntax <bit-length><radix><value>,
for example:
3b011
8hff
h***@gmail.com
2017-07-12 12:55:59 UTC
Permalink
Raw Message
On Wednesday, July 12, 2017 at 5:45:19 AM UTC-7, Scott Lurndal wrote:

(snip, I wrote)
Post by Scott Lurndal
Post by h***@gmail.com
OK, the problem with stating bit patterns is that they have a length.
I could write 0xfffffffff if I wanted a 36 bit word of all ones, but
then it is fixed at 36 bits. ~0 is a word of all ones, of whatever
length an int is in the C system it is written in.
If you don't like C hex constants, I sometimes write them in
the IBM S/360 and successor assembler form, X'FFFFFFFFF' which seems
obvious enough for just about everyone.
Personally, I like the Verilog syntax <bit-length><radix><value>,
3b011
8hff
It is actually 3'b011 and 8'h11.

I don't mind them, but sometimes put in x instead of h, as so
many other languages use x for hex constants.

-------------------------------------------------------

IBM OS/360 and later VAX/VMS Fortran use Zxxxx for hex constants,
only allowed in DATA statements. When added to the standard,
and allowed outside DATA statments, Z'xxxx'.

They can be used as bit patterns for other than integer
values, as arguments to the appropriate conversion function
such as REAL.
James R. Kuyper
2017-07-12 15:08:55 UTC
Permalink
Raw Message
Post by h***@gmail.com
(snip, I wrote)
Post by Ben Bacarisse
Post by h***@gmail.com
Post by Ben Bacarisse
~0U == 0U
(snip, and later)
Post by Ben Bacarisse
Post by h***@gmail.com
x==y && x+1==y+1
should work, even when two different zero values compare equal.
Now I don't know if you are talking about C in general, some specific C
implementation, or trying to explain some hardware property using C-like
notation.
OK, the problem with stating bit patterns is that they have a length.
I could write 0xfffffffff if I wanted a 36 bit word of all ones, but
then it is fixed at 36 bits. ~0 is a word of all ones, of whatever
length an int is in the C system it is written in.
If you don't like C hex constants, I sometimes write them in
The key problem with using C hex constants to represent bit patterns is
that they all have a specific type. Unless that type is explicitly
specified using the U, L, LL (or u, l, and ll) suffixes, it is
implementation-specific, by reason of depending upon the values of
[U]INT_MAX, [U]LONG_MAX, and [U]LLONG_MAX. There's no way to force the
constant to be signed, and there's no way to write a hex constant of any
type with a lower conversion rank than 'int', or an extended integer type.

Using a C hex constant to describe the bit pattern of a value which is
not of the same type as that constant is just asking for confusion. And
if that bit pattern includes a sign bit, it's absolutely guaranteed to
not have the same type.

There's a similar problem with using C operators to describe machine
language instructions that cannot be used to implement those operators,
by reason of not having behavior that's consistent with the C standard's
requirements for those operators.
s***@casperkitty.com
2017-07-12 16:10:25 UTC
Permalink
Raw Message
Post by James R. Kuyper
There's a similar problem with using C operators to describe machine
language instructions that cannot be used to implement those operators,
by reason of not having behavior that's consistent with the C standard's
requirements for those operators.
Likewise, there's no integer operator whose meaning would be to compute the
2ⁿ as a "number" rather than as a value of some specific type (e.g. if one
wanted to say that an "unsigned long long" must be able to hold at least
2⁶⁴ distinct values, writing that value as 1<<64 wouldn't seem satisfactory
because C89 defined the behavior of << in terms of bit patterns rather than
multiplication (and in a way inconsistent with multiplication on non-two's-
complement platforms), and C99 makes it Undefined Behavior in any cases
where its result could differ from that of multiplication.
Keith Thompson
2017-07-12 16:39:56 UTC
Permalink
Raw Message
"James R. Kuyper" <***@verizon.net> writes:
[...]
Post by James R. Kuyper
The key problem with using C hex constants to represent bit patterns is
that they all have a specific type. Unless that type is explicitly
specified using the U, L, LL (or u, l, and ll) suffixes, it is
implementation-specific, by reason of depending upon the values of
[U]INT_MAX, [U]LONG_MAX, and [U]LLONG_MAX. There's no way to force the
constant to be signed, and there's no way to write a hex constant of any
type with a lower conversion rank than 'int', or an extended integer type.
Quibble: A suffixed constant can still be of an implementation-defined
type, though the choice of types is more tightly constrained. For
example, 0xffffffffffffffffL (16 fs) can be of type unsigned int,
unsigned long int or unsigned long long int, depending on the values of
UINT_MAX and ULONG_MAX. N1570 6.4.4.1.
Post by James R. Kuyper
Using a C hex constant to describe the bit pattern of a value which is
not of the same type as that constant is just asking for confusion. And
if that bit pattern includes a sign bit, it's absolutely guaranteed to
not have the same type.
And even if we assume that the number of digits, including leading 0s,
is significant, a hex constant can only represent a size that's a
multiple of 4 bits. (That's usually not a problem.)

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
James R. Kuyper
2017-07-12 17:26:41 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by James R. Kuyper
The key problem with using C hex constants to represent bit patterns is
that they all have a specific type. Unless that type is explicitly
specified using the U, L, LL (or u, l, and ll) suffixes, it is
implementation-specific, by reason of depending upon the values of
[U]INT_MAX, [U]LONG_MAX, and [U]LLONG_MAX. There's no way to force the
constant to be signed, and there's no way to write a hex constant of any
type with a lower conversion rank than 'int', or an extended integer type.
Quibble: A suffixed constant can still be of an implementation-defined
type, though the choice of types is more tightly constrained. For
example, 0xffffffffffffffffL (16 fs) can be of type unsigned int,
"unsigned int" is not in the list provided by 6.4.4.1p5 for a
hexadecimal constant ending in "L".
Post by Keith Thompson
unsigned long int or unsigned long long int, depending on the values of
That list also includes long int and long long int. Did you intend to
use a U suffix, rather than an L one?
Post by Keith Thompson
UINT_MAX and ULONG_MAX. N1570 6.4.4.1.
I'd forgotten about the fact that an L suffix doesn't prevent the
constant from having the type long long or unsigned long long. I've
seldom, if ever, used a constant ending in L that was large enough for
that issue to come up.

What I should have said:
You can constrain an integer constant to have an unsigned type by using
a u or U suffix. Otherwise, you can force it to have signed type, by
using decimal format.
You can force a minimum rank for the type of an integer constant by
using the l, L, ll, or LL suffixes. You can force a maximum rank for the
type by making sure the constant is no larger than the minimum permitted
value of the corresponding *_MAX macro. Otherwise, the type depends upon
a comparison of the value of the constant with the
implementation-defined values of [U]INT_MAX, [U]LONG_MAX, and [U]LLONG_MAX.
Keith Thompson
2017-07-12 19:22:21 UTC
Permalink
Raw Message
Post by James R. Kuyper
Post by Keith Thompson
[...]
Post by James R. Kuyper
The key problem with using C hex constants to represent bit patterns is
that they all have a specific type. Unless that type is explicitly
specified using the U, L, LL (or u, l, and ll) suffixes, it is
implementation-specific, by reason of depending upon the values of
[U]INT_MAX, [U]LONG_MAX, and [U]LLONG_MAX. There's no way to force the
constant to be signed, and there's no way to write a hex constant of any
type with a lower conversion rank than 'int', or an extended integer type.
Quibble: A suffixed constant can still be of an implementation-defined
type, though the choice of types is more tightly constrained. For
example, 0xffffffffffffffffL (16 fs) can be of type unsigned int,
"unsigned int" is not in the list provided by 6.4.4.1p5 for a
hexadecimal constant ending in "L".
Post by Keith Thompson
unsigned long int or unsigned long long int, depending on the values of
That list also includes long int and long long int. Did you intend to
use a U suffix, rather than an L one?
I obviously hadn't had enough coffee.

0xffffffffffffffffU can be of type unsigned long int or unsigned long
long int, depending on the value of ULONG_MAX.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-07-11 03:59:22 UTC
Permalink
Raw Message
Post by h***@gmail.com
It occured to me some years ago, when C allowed for two possibilities
for division with negative dividend or divisor, that it wasn't so
likely that hardware would be built for one of those ways.
Specifically, that Fortran requires it to work one way, and that
hardware designers know that. Hardware expected to work with
Fortran would work that way. (Though it isn't so hard to fix
with some conditionals, and divide doesn't usually occur all
that often.) Note that Fortran was first implemented on
a sign magnitude machine (IBM 704) which might have lead
to its choice on divide.
Actually, for most cases involving division by a constant, most platforms
could perform Euclidian division and modulus would be faster than they can
perform FORTRAN division and remainder (since unsigned division by a
constant can often be performed using a combination of a multiply and a
shift, and Euclidian signed division would require at most one or two
additions as well, but FORTRAN-style division requires extra conditional
logic.
Scott Lurndal
2017-07-11 12:46:49 UTC
Permalink
Raw Message
Post by h***@gmail.com
It might be that Univac, the predecessor to Unisys, produced
Burroughs bought Sperry[-Univac] in 1986 then changed its name to Unisys.
Post by h***@gmail.com
sign magnitude integer machines. Last I knew, Unisys is still
selling the 2200. This isn't ancient history.
The current Clearpath Dorado (2200 family) is being sold,
albeit they're all emulated using intel processors.
Tim Rentsch
2017-07-10 22:18:05 UTC
Permalink
Raw Message
Post by Ben Bacarisse
[...] In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. [...]
There is never a case where unsigned promotes to int. There was a
version of the Standard (or a draft Standard) where it seemed that
unsigned could promote to int if INT_MAX == UINT_MAX, but that
text was the result of an editing oversight, and corrected prior
to C11. The expression '~0U' is always of type unsigned, and
always UINT_MAX.
j***@verizon.net
2017-07-10 22:40:20 UTC
Permalink
Raw Message
Post by Tim Rentsch
Post by Ben Bacarisse
[...] In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. [...]
There is never a case where unsigned promotes to int. There was a
version of the Standard (or a draft Standard) where it seemed that
unsigned could promote to int if INT_MAX == UINT_MAX, ...
I just issued a correction to a previous message based upon reaching that same conclusion. I'm going to be even more annoyed with myself if my correction was wrong.
Post by Tim Rentsch
... but that
text was the result of an editing oversight, and corrected prior
to C11. ...
I used the wording of n1570.pdf to reach that conclusion, so if they corrected it, I must have misinterpreted the corrected wording. What was the correction?
Tim Rentsch
2017-07-10 23:59:01 UTC
Permalink
Raw Message
Post by j***@verizon.net
Post by Tim Rentsch
Post by Ben Bacarisse
[...] In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. [...]
There is never a case where unsigned promotes to int. There was a
version of the Standard (or a draft Standard) where it seemed that
unsigned could promote to int if INT_MAX == UINT_MAX, ...
I just issued a correction to a previous message based upon reaching
that same conclusion. I'm going to be even more annoyed with myself
if my correction was wrong.
Post by Tim Rentsch
... but that
text was the result of an editing oversight, and corrected prior
to C11. ...
I used the wording of n1570.pdf to reach that conclusion, so if they
corrected it, I must have misinterpreted the corrected wording. What
was the correction?
Actually there were two problems, and two corrections.

The original C99 had the problem that it did not cover enumerated
types that are compatible with int or unsigned int. This problem
was corrected in one of the post-C99 TC's (I don't know which
one), and the correction can be seen in N1256.

The text in N1256 is the one with the editing oversight. It
corrects the problem with not handing enumerated types compatible
with int or unsigned int (section 6.3.1.1 p2, near the end),
but then accidentally allows the pathological possibility of
unsigned int promoting to int. Oops.

The problem in N1256 is corrected in N1570 (the same subparagraph
of 6.3.1.1 p2). The only types eligible for promotion to int
or unsigned int are those with an integer conversion rank less
than /or equal to/ the integer conversion rank of int, _except_
int and unsigned int are excluded from that set.
Ben Bacarisse
2017-07-11 01:40:43 UTC
Permalink
Raw Message
Post by Tim Rentsch
Post by Ben Bacarisse
[...] In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. [...]
There is never a case where unsigned promotes to int. There was a
version of the Standard (or a draft Standard) where it seemed that
unsigned could promote to int if INT_MAX == UINT_MAX, but that
text was the result of an editing oversight, and corrected prior
to C11. The expression '~0U' is always of type unsigned, and
always UINT_MAX.
Yay! I am delighted to hear that -- that all-important parenthetical
remark "(other than int or unsigned int)" in 6.3.1.1 p2. I've been
wording replies to allow for that nonsensical possibility for some time
and now I need not do so.

Was this corrected in the published C99 or did it have to wait for C11?
--
Ben.
Tim Rentsch
2017-07-12 10:12:34 UTC
Permalink
Raw Message
Post by Ben Bacarisse
Post by Tim Rentsch
Post by Ben Bacarisse
[...] In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. [...]
There is never a case where unsigned promotes to int. There was a
version of the Standard (or a draft Standard) where it seemed that
unsigned could promote to int if INT_MAX == UINT_MAX, but that
text was the result of an editing oversight, and corrected prior
to C11. The expression '~0U' is always of type unsigned, and
always UINT_MAX.
Yay! I am delighted to hear that -- that all-important parenthetical
remark "(other than int or unsigned int)" in 6.3.1.1 p2. I've been
wording replies to allow for that nonsensical possibility for some time
and now I need not do so.
Was this corrected in the published C99 or did it have to wait for C11?
C99 had a different problem. That problem was "fixed" in one of
the TC's, but the change caused the "unsigned promoting to int"
anomaly (as seen in, eg, N1256). The second problem was noticed
and corrected in the leadup to C11 - the new wording can be seen
in N1570. Incidentally, N1570 also clarifies the rule with
respect to bitfields, where the result can depend on the width of
the bitfield in question.
Ben Bacarisse
2017-07-12 14:16:33 UTC
Permalink
Raw Message
Post by Tim Rentsch
Post by Ben Bacarisse
Post by Tim Rentsch
Post by Ben Bacarisse
[...] In C, ~0U == 0.
The only way that ~0U == 0 can be true is on a ones' complement system
where unsigned promotes to int. [...]
There is never a case where unsigned promotes to int. There was a
version of the Standard (or a draft Standard) where it seemed that
unsigned could promote to int if INT_MAX == UINT_MAX, but that
text was the result of an editing oversight, and corrected prior
to C11. The expression '~0U' is always of type unsigned, and
always UINT_MAX.
Yay! I am delighted to hear that -- that all-important parenthetical
remark "(other than int or unsigned int)" in 6.3.1.1 p2. I've been
wording replies to allow for that nonsensical possibility for some time
and now I need not do so.
Was this corrected in the published C99 or did it have to wait for C11?
C99 had a different problem. That problem was "fixed" in one of
the TC's, but the change caused the "unsigned promoting to int"
anomaly (as seen in, eg, N1256). The second problem was noticed
and corrected in the leadup to C11 - the new wording can be seen
in N1570.
Ah, yes, I think you said that in a longer reply to someone else but I
was not paying proper attention.
Post by Tim Rentsch
Incidentally, N1570 also clarifies the rule with
respect to bitfields, where the result can depend on the width of
the bitfield in question.
Thanks.
--
Ben.
James R. Kuyper
2017-07-05 15:48:20 UTC
Permalink
Raw Message
...
Post by James Harris
Post by James Kuyper
Post by James Harris
#include <stdio.h>
int main(void) {
To make your point somewhat more portably, it would be better to declare
this as "signed char c".
Post by James Harris
char c = -2;
This line involves a conversion from int to char. Assuming that char is
a signed type, it's guaranteed to be able to store that value, so c has
a value of -2.
Post by James Harris
printf("%08x, %08x, %08x\n", (int)c, (unsigned)c, (unsigned char)c);
(int)c has the value of c, converted to signed int, which is guaranteed
to be able to represent that value, so it remains unchanged, at -2.
(unsigned)c has the value (UINT_MAX+1)-2 == UINT_MAX-1. (unsigned char)c
would have the value (UCHAR_MAX+1)-2 == UCHAR_MAX-1.
The default argument promotions are applied to all arguments after the
format string. Those promotions leave the second and third arguments
unchanged. However, the fourth argument will be converted to int, except
in the unlikely case that UCHAR_MAX > INT_MAX,
in which case it will be
converted to unsigned int; either way, the value will be unchanged by
the conversion: it will still be UCHAR_MAX-1.
I find C's conversion rules strange where they depend on the
architecture, e.g. an unsigned short being converted to an integer would
be a signed int on most machines but an unsigned int on those machines
which have shorts being as wide as ints. I'm not sure of the potential
ramifications of that for program portability but it seems inconsistent
- albeit that I can see why the rule would exist. Perhaps the "problem"
is automatic implicit conversions between signed and unsigned integers.
Those rules make it feasible to efficiently and usefully implement C on
a wide variety of platforms, which is part of the reason why C is the
language that is, in fact, implemented on the widest variety of
platforms. The down side is that they make it somewhat harder to port
code between all of those different platforms.
Post by James Harris
Post by James Kuyper
The x format specifier expects a value of type unsigned int. The first
%x corresponds to an argument of type int, and on most systems, so will
the third %x. Such a type mis-match renders the behavior of your code
undefined. For positive values, unsigned int and int are required to
have the same representation, "which is intended to imply
interchangeability as arguments ...", but the value of the first
argument is negative, so that doesn't apply.
I'm not sure what you're trying to do, but printing an int value with a
%x specifier doesn't prove much of anything. Is there a format specifier
that takes an int value which would be suitable for making your point?
%i or %d would do but would be less clear. Are you saying that %x cannot
be used portably to print negative numbers in a hexadecimal form? UB
usually means that anything could happen, doesn't it? In this case, does
that technically mean more is at risk that just getting erroneous output?
What's the "right" way to print signed numbers as hex?
The designers of printf() apparently expected people to only want to use
hexadecimal (or octal) format for unsigned values. As a result, some of
the work that printf() does for you when handling "%d" will have to be
done by your own code when printing signed values in hexadecimal format.

printf("%s%08x", c < 0 ? "-" : "", (unsigned)(c < 0 ? -c : c));

Note that the expression "-c" has undefined behavior if c == INT_MIN on
a 2's complement system. Of course, if char is signed, that could only
happen if CHAR_MIN==INT_MIN, which is very unlikely, but it is
permitted. There might be a clever way to handle that corner case
without too much additional complication, but I can't justify spending
the time right now to figure out what that way is.
James R. Kuyper
2017-07-05 17:38:40 UTC
Permalink
Raw Message
I forgot to answer the following questions in my earlier response:

On 07/05/2017 10:20 AM, James Harris wrote:
...
Post by James Harris
%i or %d would do but would be less clear. Are you saying that %x cannot
be used portably to print negative numbers in a hexadecimal form?
Correct.
Post by James Harris
... UB
usually means that anything could happen, doesn't it? ...
More precisely, it means that the standard imposes no requirements on
the behavior. Other things (such as the laws of physics) might prevent a
given possibility, but the standard does not.
Post by James Harris
... In this case, does
that technically mean more is at risk that just getting erroneous output?
In principle, yes. In practice, the only plausible way I can see it
causing more serious problems is an optimization that I'm sure would
infuriate supercat: If, between two different points in the code where
the value of 'c' is changed, the code executes printf("%x", c), then the
compiler is permitted to optimize that code between those two points on
the assumption that c is not negative. The following code:

if(c < 0)
{
// code handling negative c
} else {
// code handling non-negative c
}

could be optimized to:

// code handling non-negative c
s***@casperkitty.com
2017-07-05 19:00:21 UTC
Permalink
Raw Message
Post by James R. Kuyper
In principle, yes. In practice, the only plausible way I can see it
causing more serious problems is an optimization that I'm sure would
infuriate supercat: If, between two different points in the code where
the value of 'c' is changed, the code executes printf("%x", c), then the
compiler is permitted to optimize that code between those two points on
if(c < 0)
{
// code handling negative c
} else {
// code handling non-negative c
}
// code handling non-negative c
If "unsigned" has padding bits, one of those padding bits could be mapped
to the sign bit for "int". On some such platforms, an attempt to use a
%X format specifier with a negative value might plausibly trap or coerce
the value to unsigned.

On most platforms, the danger is essentially what you observe: that an
implementation might behave in goofy fashion solely because the Standard
would allow it, rather than for lack of a logical, useful, and practical
behavior.
j***@verizon.net
2017-07-05 20:09:19 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by James R. Kuyper
In principle, yes. In practice, the only plausible way I can see it
causing more serious problems is an optimization that I'm sure would
infuriate supercat: If, between two different points in the code where
the value of 'c' is changed, the code executes printf("%x", c), then the
compiler is permitted to optimize that code between those two points on
if(c < 0)
{
// code handling negative c
} else {
// code handling non-negative c
}
// code handling non-negative c
If "unsigned" has padding bits, one of those padding bits could be mapped
to the sign bit for "int". On some such platforms, an attempt to use a
%X format specifier with a negative value might plausibly trap or coerce
the value to unsigned.
On most platforms, the danger is essentially what you observe: that an
implementation might behave in goofy fashion solely because the Standard
would allow it, rather than for lack of a logical, useful, and practical
behavior.
It's not "just because the Standard would allow it". It's because you have an obligation, as a programmer, to make sure that c is not negative before executing printf("%x", c). The compiler is entitled to assume that you've done whatever is necessary to satisfy that obligation, without bothering to check whether you actually did - because what you did to make sure that c is not negative could be far too subtle for it to figure out.

Given that assumption, the if(c<0) test is unnecessary, as is the code associated with the "true" branch of that if-statement. The optimization is being performed to avoid wasting space storing that unnecessary code, and to avoid wasting time executing the unnecessary condition test. It's not being done just because the standard allows it to be done.

Note: this optimization assumes that the code which ensures that c is not negative was written by someone smart enough to realize that this needed to be ensured, but that the code which tests whether c was negative was written by someone dumb enough to be unaware of the fact that this has already been arranged. This might seem like an unlikely combination - but most large projects are worked on by coders with a wide range of competence. We already know that whoever wrote the testing code was not very smart: if he had any uncertainty about whether or not c was negative, he should have moved th printf() expression into the else-clause of his if-statement:

if (c<0)
{
// code handling negative c
} else {
// code handling non-negative c, part 1
printf("%x", c);
// code handling non-negative c, part 2
}
s***@casperkitty.com
2017-07-05 23:13:55 UTC
Permalink
Raw Message
Post by j***@verizon.net
Post by s***@casperkitty.com
On most platforms, the danger is essentially what you observe: that an
implementation might behave in goofy fashion solely because the Standard
would allow it, rather than for lack of a logical, useful, and practical
behavior.
It's not "just because the Standard would allow it". It's because you have an obligation, as a programmer, to make sure that c is not negative before executing printf("%x", c).
Not if using an implementation that promises to behave sanely in such cases.
Post by j***@verizon.net
The compiler is entitled to assume that you've done whatever is necessary to satisfy that obligation, without bothering to check whether you actually did - because what you did to make sure that c is not negative could be far too subtle for it to figure out.
If an implementation is intended to be suitable for purposes of running
code written for other implementations that offer certain behavioral
guarantees when practical, it should honor those guarantees when practical,
or else document that it cannot practically do so. The Standard makes no
effort to require that implementations be suitable for any particular
purpose, but if there's some reason for such behavior beyond "because the
Standard would allow it", answer me this: given two compilers:

- One guarantees that in the scenario at hand it will behave as though the
value were converted to unsigned.

- The other usually behaves as if the value were converted to unsigned,
but will occasionally process such cases in arbitrary and unpredictable
fashion when the Standard would allow it.

In what non-contrived scenarios would the second be preferable to the first?
Post by j***@verizon.net
Given that assumption, the if(c<0) test is unnecessary, as is the code associated with the "true" branch of that if-statement. The optimization is being performed to avoid wasting space storing that unnecessary code, and to avoid wasting time executing the unnecessary condition test. It's not being done just because the standard allows it to be done.
In what non-contrived situations would the kinds of inferences you describe
provide benefits that could not be achieved more easily, safely, and
effectively with directives that would explicitly invite compilers to make
targeted behavioral assumptions?
Post by j***@verizon.net
if (c<0)
{
// code handling negative c
} else {
// code handling non-negative c, part 1
printf("%x", c);
// code handling non-negative c, part 2
}
If c would be positive for all valid inputs, and there would be no need for
the conditional on any implementation which guaranteed that "c" would simply
be treated as unsigned (nonsensical output in response to invalid input
would be acceptable if no nuclear missles were launched, etc.), adding the
conditional would impair efficiency compared with using an implementation
which tries to be compatible with others when practical.
Keith Thompson
2017-07-06 00:00:41 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by j***@verizon.net
Post by s***@casperkitty.com
On most platforms, the danger is essentially what you observe: that an
implementation might behave in goofy fashion solely because the Standard
would allow it, rather than for lack of a logical, useful, and practical
behavior.
It's not "just because the Standard would allow it". It's because you
have an obligation, as a programmer, to make sure that c is not
negative before executing printf("%x", c).
Not if using an implementation that promises to behave sanely in such cases.
Unless you, as a programmer, wish to write portable code, or code whose
behavior is defined by the C standard.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
h***@gmail.com
2017-07-08 09:51:46 UTC
Permalink
Raw Message
It seems that for the Unisys 2200 systems:

http://public.support.unisys.com/2200/docs/cp14.0/pdf/78310422-011.pdf
(page 2-9)

that in the default mode, UINT_MAX is 2**36-2, wrapping the negative ones complement values to positive values.

With a compiler option:

https://public.support.unisys.com/2200/docs/cp15.0/pdf/78310430-016.pdf
(page 2-29)

you can get the usual 0 to 2**36-1, where the bit pattern for negative zero is used for the largest value. This slows down execution (presumably a special test at each use), so is not the default.

In the default mode, the bit patterns for negative zero and positive zero compare as equal, as either int or unsigned int. I suspect, then, if cast to (unsigned) and printed with %x, one gets the hex value that represents the bit pattern, with the possible exception of 0x800000000.

Being a 36 bit machine, programs needing 32 bit unsigned integers might work just fine.
James Harris
2017-07-08 10:51:04 UTC
Permalink
Raw Message
Post by h***@gmail.com
http://public.support.unisys.com/2200/docs/cp14.0/pdf/78310422-011.pdf
(page 2-9)
that in the default mode, UINT_MAX is 2**36-2,
Wow! And not a typo! Thanks for pointing that out. Quirky machines are
always of interest.
Post by h***@gmail.com
wrapping the negative ones complement values to positive values.
https://public.support.unisys.com/2200/docs/cp15.0/pdf/78310430-016.pdf
(page 2-29)
you can get the usual 0 to 2**36-1, where the bit pattern for negative zero is used for the largest value. This slows down execution (presumably a special test at each use), so is not the default.
In the default mode, the bit patterns for negative zero and positive zero compare as equal, as either int or unsigned int. I suspect, then, if cast to (unsigned) and printed with %x, one gets the hex value that represents the bit pattern, with the possible exception of 0x800000000.
Being a 36 bit machine, programs needing 32 bit unsigned integers might work just fine.
--
James Harris
Ben Bacarisse
2017-07-08 12:04:49 UTC
Permalink
Raw Message
Post by James Harris
Post by h***@gmail.com
http://public.support.unisys.com/2200/docs/cp14.0/pdf/78310422-011.pdf
(page 2-9)
that in the default mode, UINT_MAX is 2**36-2,
Wow! And not a typo! Thanks for pointing that out. Quirky machines are
always of interest.
I'm guessing you don't follow c.l.c regularly, because that C
implementation (and others from Unisys) are the go-to examples of
oddball implementations.

But note that page 2-9 describes the compiler flag you need to use to
make this /not/ be the case. The default mode is non-conforming, though
that page does not give the details. The compiler can provide
conforming semantics, though at some cost because the hardware does not
play along.

<snip>
--
Ben.
s***@casperkitty.com
2017-07-08 22:16:18 UTC
Permalink
Raw Message
Post by James Harris
Post by h***@gmail.com
http://public.support.unisys.com/2200/docs/cp14.0/pdf/78310422-011.pdf
(page 2-9)
that in the default mode, UINT_MAX is 2**36-2,
Wow! And not a typo! Thanks for pointing that out. Quirky machines are
always of interest.
The main reason I can see for a C implementation to use ones'-complement
math would be a lack of add and subtract operations that operate mod 2**n.
If instructions are available that would allow efficient unsigned operations
with UINT_MAX equal to 2*INT_MAX+1, those same instructions would be usable
for two's-complement arithmetic.

One thing I haven't seen documented on the Unisys machine is how its "long
long" type works. I would not be surprised if the memory representation
of "long long" uses something other than a straight binary representation,
and if a straight binary "unsigned long long" would be expensive to support
on that platform.

In any case, so far as I can tell there has never been and likely never will
be a C99 or C11 production implementation for anything other than two's-
complement machines, since efficient support for the mandatory "unsigned
long long" type would either require the ability to efficiently perform
multi-word arithmetic using a power-of-two base, a word size of exactly
64 bits and an ability to do math mod 2**64, or a word size larger than 64
bits. The first two options would imply an ability to use two's-complement
math; a ones'-complement or sign-magnitude machine with a 65+-bit word size
could support "unsigned long long" without being able to efficiently
handle two's-complement math, but I don't know that any such machines will
ever be used to run C.
Loading...