Representation of

Discussion:

Representation of _Bool

(too old to reply)

Keith Thompson

2021-05-24 02:14:09 UTC

As promised, I've studied what the C standard says about the
requirements for the representation of _Bool. I've referred to the
C11 standard and to drafts of C17 and C2x (N2596). C11 and C17 do
not differ in this area as far as I can tell, but there are some
new things in the C2x proposal.

An object declared as type _Bool is large enough to store the values
0 and 1.

_Bool is an unsigned integer type.

The rank of _Bool shall be less than the rank of all other standard
integer types. This implies that the range of values of _Bool is
a subrange of the range of values of unsigned char. A _Bool object
cannot store a value less than 0 or greater than UCHAR_MAX.

When any scalar value is converted to _Bool, the result is 0 if the
value compares equal to 0; otherwise, the result is 1. This makes
it difficult, but not impossible, to store a value other than 0
or 1 in a _Bool object, but it can be done (or at least attempted)
via type-punning using a union with _Bool and unsigned char members.

C11 footnote: "While the number of bits in a _Bool object is at least
CHAR_BIT, the width (number of sign and value bits) of a _Bool may be
just 1 bit." This acknowledges that _Bool *may* have more than one
value bit, and therefore may represent values other than 0 and 1.
N2596 drops the parenthesized clause (probably because _Bool has
no sign bit).

N2596 adds a macro BOOL_WIDTH to <limits.h>, "width for an object
of type _Bool". It is *at least* 1, implying again that it can
be greater than 1. (I don't see any implementation that defines
BOOL_WIDTH.)

(N2596 also changes the definitions of false and true in <stdbool.h>
so they're of type _Bool rather than int. This doesn't affect
representation.)

Conclusions:

sizeof (_Bool) >= 1. It may be greater than 1, but that would
be weird. If sizeof (_Bool) > 1, then it must have padding bits.

_Bool has no sign bit.

_Bool has *at least* one value bit. It may have more, but no more
than CHAR_BIT of them.

The standard allows some variations in how _Bool is represented.
C programmers would be well advised to avoid writing code for which
this matters.

A conforming implementation may do any of the following (I'll assume
for brevity that CHAR_BIT==8):

* _Bool has 8 value bits. Any value from 0 to 255 inclusive
is valid. Storing a value other than 0 or 1 can be done via
type punning using a union of a _Bool and an unsigned char.

* _Bool has 1 value bit and 7 padding bits, with 254 trap
representations. Using type punning to store a value other than
0 or 1 in a _Bool object, and then accessing that object's value,
results in undefined behavior.

* _Bool has 1 value bit, 7 padding bits, and no trap representations.
Since padding bits by definition do not contribute to the value,
only the value bit's value is relevant. Using type punning to store
a value other than 0 or 1 in a _Bool object gives it a value of 0
if the value is even, 1 if the value is odd.

Other variations are possible (and arguably silly). For example, _Bool
might have 4 value bits and 4 padding bits, or it might be bigger than
1 byte. I expect that kind of thing only on the DeathStation 9000.

Here's a small program that attempts to explore how an implementation
represents objects of type _Bool:

#include <stdio.h>
#include <limits.h>

union U {
_Bool b;
unsigned char rep;
};

int main(void) {
union U obj;
_Bool b;
for (obj.rep = 0; obj.rep <= 3; obj.rep ++) {
printf("obj.b = %d, which is %s, obj.rep = %d",
obj.b, obj.b ? "true " : "false", obj.rep);
b = obj.b;
printf(" ... b = %d, which is %s\n", b, b ? "true " : "false");
}
}

Using gcc 11.1.0, on Ubuntu 20.02 x86_64, I get this output:
obj.b = 0, which is false, obj.rep = 0 ... b = 0, which is false
obj.b = 1, which is true , obj.rep = 1 ... b = 1, which is true
obj.b = 2, which is true , obj.rep = 2 ... b = 2, which is true
obj.b = 3, which is true , obj.rep = 3 ... b = 3, which is true

This mostly looks like _Bool has 8 value bits, but if that were the
case, then I *think* that the value of b would always be 0 or 1.
The rules of simple assignment (b = obj.b) specify that the value
of the right operand is converted to the type of the assignment
expression. Converting *any* scalar value to _Bool yields 0 or 1,
even if the value is already of type _Bool. So I conclude that
for gcc, 2 and 3 (and probably anything other than 0 or 1) are
trap representations for _Bool, and that _Bool has 1 value bit,
7 padding bits, and 254 trap representation.

It's possible that the intent is for _Bool to have 8 value bits and the
gcc authors' interpretation of the requirements for simple assignment
differ from mine. (I won't presume to say who's right.)

Using clang 12.0.0 on the same system, I get:
obj.b = 0, which is false, obj.rep = 0 ... b = 0, which is false
obj.b = 1, which is true , obj.rep = 1 ... b = 1, which is true
obj.b = 0, which is false, obj.rep = 2 ... b = 0, which is false
obj.b = 1, which is true , obj.rep = 3 ... b = 1, which is true

All bits other than the low-order one are ignored. This is
consistent with _Bool having 1 value bit, 7 padding bits, and no
trap representations. It's also consistent with 2 and 3 being
trap representations, since that would cause undefined behavior.
It's not consistent with _Bool having more than 1 value bit.

When implementers add support for BOOL_WIDTH, they'll have to decide
explicitly how many value bits _Bool has.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Ben Bacarisse

2021-05-24 11:11:17 UTC