sizeof struct with flexible array: when did it change?

Discussion:

(too old to reply)

Kaz Kylheku

2024-10-07 02:32:13 UTC

C99 said that the size of a structure that ends in a flexible array
member is the same as the offset of that flexible member in a
similar structure in which the array has some unspecified size.

The latest draft says that the size is calculated as if the flexible
array member were omitted, except that there may be more padding than
the omission would imply.

I can't think of a reasonable interpretation of the original
wording which would allow the size to be other than the offset
of the array, when the array is of a character type.

The current wording clearly does allow the size to go beyond the offset
in that case.

Don't get burned: don't rely on the size of a flexible array struct.
Use the offsetof that flexible member.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Tim Rentsch

2024-10-07 03:43:52 UTC

Permalink

Post by Kaz Kylheku
C99 said that the size of a structure that ends in a flexible array
member is the same as the offset of that flexible member in a
similar structure in which the array has some unspecified size.
The latest draft says that the size is calculated as if the flexible
array member were omitted, except that there may be more padding than
the omission would imply.

The change was made in a TC to C99, sometime in the early
2000s. (No I don't know which TC specifically, but the
wording change can be seen in N1256.)

Nick Bowler

2024-10-07 18:32:33 UTC

Permalink

I can't think of a reasonable interpretation of the original wording
which would allow the size to be other than the offset of the array,
when the array is of a character type.
The current wording clearly does allow the size to go beyond the offset
in that case.

The original wording includes no requirement that the offset of the
replacement array used for the size calculation has any relationship
whatsoever with the offset of the flexible array member.

For example, in

struct foo { int a; char b; char c[]; };

in many real-world implementations the offset of c is 5 but the size
of the structure is 8. On these implementations, the size matches the
offset of c in a similar structure where c is replaced by a length-1
array of int, and also matches the size of a similar structure with c
deleted, so this is consistent with old and new wordings.

I don't think the updated wording alters any implementation requirement,
but it does seem quite a bit less complicated to explain.

Don't get burned: don't rely on the size of a flexible array struct.
Use the offsetof that flexible member.

An evil compiler could probably make the size less than the offset
of the flexible array member and be conforming, with both old and
new wordings. This would break some examples but an evil compiler
obviously won't care about non-normative trivialities like examples.

So you need to use offsetof when porting to the DeathStation 9000.

Otherwise avoid evil compilers and the handful of extra bytes to
some malloc calls probably makes no practical difference.

Kaz Kylheku

2024-10-07 23:42:41 UTC

Permalink

Post by Nick Bowler

The original wording includes no requirement that the offset of the
replacement array used for the size calculation has any relationship
whatsoever with the offset of the flexible array member.

But there is no reason why they would be different.

Post by Nick Bowler
For example, in
struct foo { int a; char b; char c[]; };
in many real-world implementations the offset of c is 5 but the size
of the structure is 8.

How that can be is clear under the new wording. The most strictly
aligned member is the int a. Assuming sizeof(int) is 4, that calls
for 3 byte padding after the b.

Post by Nick Bowler
On these implementations, the size matches the
offset of c in a similar structure where c is replaced by a length-1
array of int, and also matches the size of a similar structure with c
deleted, so this is consistent with old and new wordings.

array of int, what? The element type of the replacement array must be
the same:

"First, the size of the structure shall be equal to the offset of the
last element of an otherwise identical structure that replaces the
flexible array member with an array of unspecified length.(106)
---
106. The length is unspecified to allow for the fact that
implementations may give array members different
alignments according to their lengths."

The structure being "otherwise identical" means only the array size
varies in an unspecified manner.

The alignment changing for an array of char, due to variation in size,
makes no sense.

Those compilers you refer to do *not* vary the alignment of a char
array based on its size. For any X from, say, 1 to 256,
the offset of c will be 5 in the following:

struct foo { int a; char b; char c[X]; };

C99 requires them to give a size of 5 to this structure. And that
of course causes an alignment problem if the structure is arrayed.

The real solution to all this would have been to specify that
structures which have a flexible array member, directly or
recursively, are incomplete types.

And thus:

sizeof (struct foo) -> constraint violation

{ struct foo local_foo; } -> constraint violation

typedef struct foo foo_array[42]; -> constraint violation

Don't support taking the size, defining objects, or making arrays.

The present wording does allow sane use of these structures as array
elements.

What GCC seems to be doing is simply nothing special. When determining
the most strictly aligned member of the struct, it takes the flexible
array into account (the alignment of its element type). It otherwise
ignores it (or perhaps treats it as a size zero subobject). The
structure is padded after that for the sake of the most strictly aligned
member.

If it were specified that way in ISO C, it would be an improvement:
that the array's element type is used when determining what is the
most strictly aligned member of the structure, but the array is
otherwise considered deleted.

Post by Nick Bowler
I don't think the updated wording alters any implementation requirement,
but it does seem quite a bit less complicated to explain.

Don't get burned: don't rely on the size of a flexible array struct.
Use the offsetof that flexible member.

If the size is anything other than what the program expects, whether
it is larger or smaller, that breaks the program.

For instance, if the wrong value is used when displacing a pointer to
the flexible member to recover a pointer to the struct.

This issue showed up in exactly one program of mine in which I
experimented with using the flexible array member.

It was reported by a user who ran into a crash.

In all previous coding I have always used the [1] struct hack,
or else something like this:

#if __STDC_VERSION__ >= 199901L
#define FLEX_ARRAY
#elif __GNUC__
#define FLEX_ARRAY 0
#else
#define FLEX_ARRAY 1
#endif

struct foo { ...; char s[FLEX_ARRAY]; }

and then of course in such a program you wouldn't think of
using anything other than offsetof(struct foo, s).

But it has also showed up in another way in another program of mine.

In the TXR Lisp FFI type compiler, the size of a flexible struct
is not calculated the way GCC does it.

TXR 296:

1> (typedef test (struct test (i int) (s short) (a (array char))))
#<ffi-type (struct test (i int) (s short) (a (array char)))>
2> (sizeof test)
6
3> (offsetof test a)
6

Private repo with fix:

1> (typedef test (struct test (i int) (s short) (a (array char))))
#<ffi-type (struct test (i int) (s short) (a (array char)))>
2> (sizeof test)
8
3> (offsetof test a)
6

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Jeremy Brubaker

2024-10-09 12:55:42 UTC

Permalink

Post by Kaz Kylheku
What GCC seems to be doing is simply nothing special. When determining
the most strictly aligned member of the struct, it takes the flexible
array into account (the alignment of its element type). It otherwise
ignores it (or perhaps treats it as a size zero subobject). The
structure is padded after that for the sake of the most strictly
aligned member.

Post by Kaz Kylheku
Don't get burned: don't rely on the size of a flexible array struct.
Use the offsetof that flexible member.

As the user who had the pleasure of running into said crash, here is a
brief demo of the sizes and addresses reported by my system (gcc 13.3.1)
using both methods of determining the start of the struct:

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>

typedef struct dstr {
int a;
size_t b;
int c;
char str[];
} dstr;

typedef struct ref {
int a;
size_t b;
int c;
} ref;

#define old_dstr_of(str) ((dstr *) ((str) - sizeof (dstr)))
#define new_dstr_of(s) ((dstr *) ((s) - offsetof (struct dstr, str)))

int main (int argc, char ** argv)
{
dstr *ds = malloc (sizeof (dstr));

printf ("sizeof(int) %zu\n", sizeof (int));
printf ("sizeof(char) %zu\n", sizeof (char));
printf ("sizeof(size_t) %zu\n", sizeof (size_t));
printf ("sizeof(dstr) %zu\n", sizeof (dstr));
printf ("sizeof(ref) %zu\n", sizeof (ref));
puts ("");

puts ("Addresses:");
printf ("ds %p\n", ds);
printf ("ds->str %p\n", ds->str);
printf ("old dstr_of %p\n", old_dstr_of(ds->str));
printf ("new dstr_of %p\n", new_dstr_of(ds->str));

}

And the output on my machine:

sizeof(int) 4
sizeof(char) 1
sizeof(size_t) 8
sizeof(dstr) 24
sizeof(ref) 24

Addresses:
ds 0x9d62a0
ds->str 0x9d62b4
old dstr_of 0x9d629c
new dstr_of 0x9d62a0

--
() www.asciiribbon.org | Jeremy Brubaker
/\ - against html mail | јЬruЬаkе@оrіоnаrtѕ.іо / neonrex on IRC

Success is something I will dress for when I get there, and not until.

Scott Lurndal

2024-10-09 15:06:23 UTC

Permalink

Post by Jeremy Brubaker

Post by Kaz Kylheku
Don't get burned: don't rely on the size of a flexible array struct.
Use the offsetof that flexible member.

As the user who had the pleasure of running into said crash, here is a
brief demo of the sizes and addresses reported by my system (gcc 13.3.1)
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
typedef struct dstr {
int a;
size_t b;
int c;
char str[];
} dstr;
typedef struct ref {
int a;
size_t b;
int c;
} ref;
#define old_dstr_of(str) ((dstr *) ((str) - sizeof (dstr)))
#define new_dstr_of(s) ((dstr *) ((s) - offsetof (struct dstr, str)))
int main (int argc, char ** argv)
{
dstr *ds = malloc (sizeof (dstr));
printf ("sizeof(int) %zu\n", sizeof (int));
printf ("sizeof(char) %zu\n", sizeof (char));
printf ("sizeof(size_t) %zu\n", sizeof (size_t));
printf ("sizeof(dstr) %zu\n", sizeof (dstr));
printf ("sizeof(ref) %zu\n", sizeof (ref));
puts ("");
puts ("Addresses:");
printf ("ds %p\n", ds);
printf ("ds->str %p\n", ds->str);
printf ("old dstr_of %p\n", old_dstr_of(ds->str));
printf ("new dstr_of %p\n", new_dstr_of(ds->str));
}
sizeof(int) 4
sizeof(char) 1
sizeof(size_t) 8
sizeof(dstr) 24
sizeof(ref) 24
ds 0x9d62a0
ds->str 0x9d62b4
old dstr_of 0x9d629c
new dstr_of 0x9d62a0

On my system your program produces
similar results.

$ /tmp/aa
sizeof(int) 4
sizeof(char) 1
sizeof(size_t) 8
sizeof(dstr) 24
sizeof(ref) 24

Addresses:
ds 0x2350010
ds->str 0x2350024
old dstr_of 0x235000c
new dstr_of 0x2350010

However, after modifying the
structure definitions to avoid internal padding:

typedef struct dstr {
int a;
int c;
size_t b;
char str[];
} dstr;

typedef struct ref {
int a;
int c;
size_t b;
} ref;

$ /tmp/aa
sizeof(int) 4
sizeof(char) 1
sizeof(size_t) 8
sizeof(dstr) 16
sizeof(ref) 16

Addresses:
ds 0xbce010
ds->str 0xbce020
old dstr_of 0xbce010
new dstr_of 0xbce010

Tim Rentsch

2024-10-14 04:55:32 UTC

Permalink

Post by Nick Bowler

Post by Kaz Kylheku
I can't think of a reasonable interpretation of the original
wording which would allow the size to be other than the offset
of the array, when the array is of a character type.
The current wording clearly does allow the size to go beyond
the offset in that case.

The original wording includes no requirement that the offset of
the replacement array used for the size calculation has any
relationship whatsoever with the offset of the flexible array
member.

The original wording is moot because it was superseded by the TC.
The purpose of a TC is not to change the language but to clarify
what semantics are intended. The point of the revised wording in
the TC is to say "this is what the earlier wording meant".