relearning C: why does an in-place change to a char* segfault?

Discussion:

relearning C: why does an in-place change to a char* segfault?

(too old to reply)

Mark Summerfield

2024-08-01 08:06:57 UTC

This program segfaults at the commented line:

#include <ctype.h>
#include <stdio.h>

void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}

int main() {
char* text = "this is a test";
printf("before [%s]\n", text);
uppercase_ascii(text);
printf("after [%s]\n", text);
}

I know there are better ways to do ASCII uppercase, I don't care about
that; what I don't understand is why I can't do an in-place edit of a non-
const char*?

I build using scons, which does:

gcc -o inplace.o -c -Wall -g inplace.c

gcc -o inplace inplace.o

The error with gdb is:

Starting program: /tmp/inplace/inplace
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
before [this is a test]

Program received signal SIGSEGV, Segmentation fault.
0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
at inplace.c:6
6 *s = toupper(*s);

Mark Summerfield

2024-08-01 08:24:45 UTC

The formatting was messed up by Pan.

The function was:

void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s);
s++;
}
}

Ben Bacarisse

2024-08-01 10:53:48 UTC

Post by Mark Summerfield
The formatting was messed up by Pan.
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s);

There's a tricky technicality with all of the character functions. They
take an int argument so that EOF (typically -1) can be passed, but
otherwise the argument must be an int "which shall be representable as
an unsigned char" or the result is undefined.

If char is signed (as it very often is) then in some locales, like the
ISO-8859-* ones, many lower-case letters are negative so, to be 100%
portable, you should write

*s = toupper((unsigned char)*s);

Now, since the behaviour is undefined, many implementations "do what you
want" but that only means you won't spot the bug by testing until the
code is linked to some old library that does not fix the issue!

Post by Mark Summerfield
s++;
}
}

Note that this does not crop up in a typical input loop:

int ch;
while ((ch = getchar()) != EOF)
putchar(toupper(ch));

because the input function "obtains [the] character as an unsigned char
converted to an int".

--
Ben.

Richard Harnden

2024-08-01 08:38:13 UTC

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";
printf("before [%s]\n", text);
uppercase_ascii(text);
printf("after [%s]\n", text);
}

text is pointing to "this is a test" - and that is stored in the program
binary and that's why can't modify it.

Change it to:

char text[] = "this is a test";

You can modify that, text gets it's own copy.

Mark Summerfield

2024-08-01 08:54:23 UTC

On Thu, 1 Aug 2024 09:38:13 +0100, Richard Harnden wrote:

[snip]

Post by Richard Harnden
text is pointing to "this is a test" - and that is stored in the program
binary and that's why can't modify it.
char text[] = "this is a test";
You can modify that, text gets it's own copy.

Thanks that works; & thanks for the explanation.

Bart

2024-08-01 10:12:47 UTC

Post by Richard Harnden

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
     while (*s) {
         *s = toupper(*s); // SEGFAULT
         s++;
     }
}
int main() {
     char* text = "this is a test";
     printf("before [%s]\n", text);
     uppercase_ascii(text);
     printf("after [%s]\n", text);
}

text is pointing to "this is a test" - and that is stored in the program
binary and that's why can't modify it.

That's not the reason for the segfault in this case. With some
compilers, you *can* modify it, but that will permanently modify that
string constant. (If the code is repeated, the text is already in
capitals the second time around.)

It segfaults when the string is stored in a read-only part of the binary.

Keith Thompson

2024-08-01 20:59:49 UTC

Post by Richard Harnden

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
     while (*s) {
         *s = toupper(*s); // SEGFAULT
         s++;
     }
}
int main() {
     char* text = "this is a test";
     printf("before [%s]\n", text);
     uppercase_ascii(text);
     printf("after [%s]\n", text);
}

text is pointing to "this is a test" - and that is stored in the
program binary and that's why can't modify it.

That's not the reason for the segfault in this case.

I'm fairly sure it is.

Post by Bart
With some
compilers, you *can* modify it, but that will permanently modify that
string constant. (If the code is repeated, the text is already in
capitals the second time around.)
It segfaults when the string is stored in a read-only part of the binary.

A string literal creates an array object with static storage duration.
Any attempt to modify that array object has undefined behavior. (Which
means there's no guarantee that your program will crash.)

Storing the array in a memory segment that results in a trap on an
attempt to modify it is probably the most common implementation
strategy. Storing the array in writable memory is another, but is rare
these days. (gcc had an option to do this, but it was removed some time
ago).

If you want a pointer to a string literal, it's best to define it as
"const", so attempts to write to it can be caught at compile time:

const char* text = "this is a test";

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Bart

2024-08-01 21:07:16 UTC

Post by Keith Thompson

Post by Richard Harnden

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
     while (*s) {
         *s = toupper(*s); // SEGFAULT
         s++;
     }
}
int main() {
     char* text = "this is a test";
     printf("before [%s]\n", text);
     uppercase_ascii(text);
     printf("after [%s]\n", text);
}

text is pointing to "this is a test" - and that is stored in the
program binary and that's why can't modify it.

That's not the reason for the segfault in this case.

I'm fairly sure it is.

Post by Bart
With some
compilers, you *can* modify it, but that will permanently modify that
string constant. (If the code is repeated, the text is already in
capitals the second time around.)
It segfaults when the string is stored in a read-only part of the binary.

A string literal creates an array object with static storage duration.
Any attempt to modify that array object has undefined behavior.

What's the difference between such an object, and an array like one of
these:

static char A[100];
static char B[100]={1};

Do these not also have static storage duration? Yet presumably these can
be legally modified.

Keith Thompson

2024-08-01 21:28:37 UTC

Post by Keith Thompson

Post by Richard Harnden

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
     while (*s) {
         *s = toupper(*s); // SEGFAULT
         s++;
     }
}
int main() {
     char* text = "this is a test";
     printf("before [%s]\n", text);
     uppercase_ascii(text);
     printf("after [%s]\n", text);
}

text is pointing to "this is a test" - and that is stored in the
program binary and that's why can't modify it.

That's not the reason for the segfault in this case.

I'm fairly sure it is.

Post by Bart
With some
compilers, you *can* modify it, but that will permanently modify that
string constant. (If the code is repeated, the text is already in
capitals the second time around.)
It segfaults when the string is stored in a read-only part of the binary.

A string literal creates an array object with static storage
duration.
Any attempt to modify that array object has undefined behavior.

What's the difference between such an object, and an array like one of
static char A[100];
static char B[100]={1};
Do these not also have static storage duration? Yet presumably these
can be legally modified.

Perhaps you thought I meant to imply that objects with static storage
duration are read-only. I didn't. I wrote "A string literal creates an
array object with static storage duration. Any attempt to modify that
array object has undefined behavior." Both statements are true, and
neither follows from the other.

An array object associated with a string literal has static storage
duration because the standard says so. Attempting to modify such an
object has undefined behavior because the standard says so.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

James Kuyper

2024-08-02 00:20:43 UTC

...

Post by Keith Thompson

Post by Bart
compilers, you *can* modify it, but that will permanently modify that
string constant. (If the code is repeated, the text is already in
capitals the second time around.)
It segfaults when the string is stored in a read-only part of the binary.

A string literal creates an array object with static storage
duration.
Any attempt to modify that array object has undefined behavior.

What's the difference between such an object, and an array like one of
static char A[100];
static char B[100]={1};

The difference is that when 6.4.5p7 says ""... If the program attempts
to modify such an array, the behavior is undefined.", it is not talking
about arrays with static storage duration in general, but only
specifically about the arrays with static storage duration that are
created to store the contents of string literals.

For other arrays, whether or not it is defined behavior to modify them
depends upon whether or not the array's definition is const-qualified.
The arrays associated with string literals should have been specified as
const-qualified, in which case any code that put them at risk of being
modified would have required either a cast or a diagnostic.

In C++ string literals are const-qualified, but "const" was a late
addition to C, and by the time it was added to C, the committee's desire
to ensure backwards compatibility prevented doing so in what would
otherwise have been the most reasonable way.

Kaz Kylheku

2024-08-02 01:06:08 UTC

Post by Keith Thompson

Post by Bart
It segfaults when the string is stored in a read-only part of the binary.

A string literal creates an array object with static storage duration.
Any attempt to modify that array object has undefined behavior.

What's the difference between such an object, and an array like one of

Programming languages can have objects that have the same lifetime, yet some
of which are mutable and some of which are immutable.

If the compiler believes that the immutable objects are in fact
not mutated, it's a bad idea to modify them behind the compiler's
back.

There doesn't have to be any actual difference in the implementation of
these objects, like in what area they are stored, other than the rules
regarding their correct use, namely prohibiting modification.

The Racket language has both mutable and immutable cons cells.
The difference is that the immutable cons cells simply lack the
operations needed to mutate them. I'm not an expert on the Racket
internals but I don't see a reason why they couldn't be stored in the
same heap.

Post by Bart
static char A[100];
static char B[100]={1};
Do these not also have static storage duration? Yet presumably these can
be legally modified.

That 1 which initializes B[0] cannot be modified.

There is no portable way to request that.

C++ implementations have late initialization for block scope statics.

A program which somehow gains access to the initialization data for those,
and modifies it, would be squarely in undefined behavior territory.

In mainstream C implementations there typically isn't a separate storage
for the initialization data for statics. They are set up before the
program runs.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Bart

2024-08-02 09:43:36 UTC

Post by Kaz Kylheku

Post by Keith Thompson

Post by Bart
It segfaults when the string is stored in a read-only part of the binary.

A string literal creates an array object with static storage duration.
Any attempt to modify that array object has undefined behavior.

What's the difference between such an object, and an array like one of

Programming languages can have objects that have the same lifetime, yet some
of which are mutable and some of which are immutable.
If the compiler believes that the immutable objects are in fact
not mutated, it's a bad idea to modify them behind the compiler's
back.
There doesn't have to be any actual difference in the implementation of
these objects, like in what area they are stored, other than the rules
regarding their correct use, namely prohibiting modification.
The Racket language has both mutable and immutable cons cells.
The difference is that the immutable cons cells simply lack the
operations needed to mutate them. I'm not an expert on the Racket
internals but I don't see a reason why they couldn't be stored in the
same heap.

Post by Bart
static char A[100];
static char B[100]={1};
Do these not also have static storage duration? Yet presumably these can
be legally modified.

That 1 which initializes B[0] cannot be modified.

Why not? I haven't requested that those are 'const'. Further, gcc has no
problem running this program:

static char A[100];
static char B[100]={1};

printf("%d %d %d\n", A[0], B[0], 1);
A[0]=55;
B[0]=89;
printf("%d %d %d\n", A[0], B[0], 1);

But it does use readonly memory for string literals.

(The point of A and B was to represent .bss and .data segments
respectively. A's data is not part of the EXE image; B's is.

While the point of 'static' was to avoid having to specify whether A and
B were at module scope or within a function.)

Post by Kaz Kylheku
That 1 which initializes B[0] cannot be modified.

Or do you literally mean the value of that '1'? Then it doesn' make
sense; here that is a copy of the literal stored in one cell of 'B'. The
value of the cell can change, then that particular copy of '1' is lost.

Here:

static char B[100] = {1, 1, 1, 1, 1, 1};

changing B[0] will not affect the 1s in B[1..5], and in my example
above, that standalone '1' is not affected.

Richard Damon

2024-08-02 15:03:13 UTC

Post by Kaz Kylheku

Post by Keith Thompson

Post by Bart
It segfaults when the string is stored in a read-only part of the binary.

A string literal creates an array object with static storage duration.
Any attempt to modify that array object has undefined behavior.

What's the difference between such an object, and an array like one of

Programming languages can have objects that have the same lifetime, yet some
of which are mutable and some of which are immutable.
If the compiler believes that the immutable objects are in fact
not mutated, it's a bad idea to modify them behind the compiler's
back.
There doesn't have to be any actual difference in the implementation of
these objects, like in what area they are stored, other than the rules
regarding their correct use, namely prohibiting modification.
The Racket language has both mutable and immutable cons cells.
The difference is that the immutable cons cells simply lack the
operations needed to mutate them. I'm not an expert on the Racket
internals but I don't see a reason why they couldn't be stored in the
same heap.

Post by Bart
static char A[100];
static char B[100]={1};
Do these not also have static storage duration? Yet presumably these can
be legally modified.

That 1 which initializes B[0] cannot be modified.

Why not? I haven't requested that those are 'const'. Further, gcc has no
    static char A[100];
    static char B[100]={1};
    printf("%d %d %d\n", A[0], B[0], 1);
    A[0]=55;
    B[0]=89;
    printf("%d %d %d\n", A[0], B[0], 1);
But it does use readonly memory for string literals.
(The point of A and B was to represent .bss and .data segments
respectively. A's data is not part of the EXE image; B's is.
While the point of 'static' was to avoid having to specify whether A and
B were at module scope or within a function.)

Post by Kaz Kylheku
That 1 which initializes B[0] cannot be modified.

Or do you literally mean the value of that '1'? Then it doesn' make
sense; here that is a copy of the literal stored in one cell of 'B'. The
value of the cell can change, then that particular copy of '1' is lost.
static char B[100] = {1, 1, 1, 1, 1, 1};
changing B[0] will not affect the 1s in B[1..5], and in my example
above, that standalone '1' is not affected.

The key point is that the {1} isn't the value loclated in B[0], but the
source of that value when B was initialize, which if B is in the .data
segement is the source of the data to initialize that .data segement,
which might exist nowhere in the actual ram memory of the machine, but
might exist just in the file that was loaded.

WHen accessing the value of a string literal, the compiler needs to do
something so value is accessible, perhaps by creating a const object
created like any other const object, and exposing that.

The confusing part is that while it creates a "const char[]" object, the
type of that object when refered to in code is just "char[]", the
difference imposed to avoid breaking most code that used strings when
the standard just was coming out.

Most implementations have an option to at least give a warning if used
in a way that the const is lost, and most programs today should be
compiled using that option.

James Kuyper

2024-08-02 18:19:49 UTC

Post by Kaz Kylheku

Post by Keith Thompson

Post by Bart
It segfaults when the string is stored in a read-only part of the binary.

A string literal creates an array object with static storage duration.
Any attempt to modify that array object has undefined behavior.

What's the difference between such an object, and an array like one of
static char A[100];
static char B[100]={1};
Do these not also have static storage duration? Yet presumably these can
be legally modified.

That 1 which initializes B[0] cannot be modified.

Why not? I haven't requested that those are 'const'. ...

You don't get a choice in the matter. The C language doesn't permit
numeric literals of any kind to be modified by your code. They can't be,
and don't need to be, declared 'const'. I've heard that in some other
languages, if you call foo(3), and foo() changes the value of it's
argument to 2, then subsequent calls to bar(3) will pass a value of 2 to
bar(). That sounds like such a ridiculous mis-feature that I hesitate to
identify which languages I had heard accused of having that feature, but
it is important to note that C is not one of them.

Just as 1 is an integer literal whose value cannot be modified, "Hello,
world!" is a string literal whose contents cannot be safely modified.
The key difference is that, in many context "Hello, world!" gets
automatically converted into a pointer to it's first element, a feature
that makes it a lot easier to work with string literals - but also opens
up the possibility of attempting to write though that pointer. Doing so
has undefined behavior, which can include the consequences of storing
the contents of string literals in read-only memory.

That pointer's value should logically have had the type "const char*",
which would have made most attempts to write though that pointer
constraint violations, but the language didn't have 'const' at the time
that decision was made. In C++ the value is const-qualified. In C, the
best you can do is to make sure that if you define a pointer, and
initialize that pointer by setting it to point it inside a string
literal, you should declare that pointer as "const char*".

... Further, gcc has no
    static char A[100];
    static char B[100]={1};
    printf("%d %d %d\n", A[0], B[0], 1);
    A[0]=55;
    B[0]=89;
    printf("%d %d %d\n", A[0], B[0], 1);

Of course, why should it? Neither A nor B are string literals, they are
only initialized by copying from a string literal. Since their
definitions are not const-qualified, there's no problems with such code.

Bart

2024-08-02 18:33:20 UTC

Post by James Kuyper

Post by Kaz Kylheku

Post by Keith Thompson

Post by Bart
It segfaults when the string is stored in a read-only part of the binary.

A string literal creates an array object with static storage duration.
Any attempt to modify that array object has undefined behavior.

What's the difference between such an object, and an array like one of
static char A[100];
static char B[100]={1};
Do these not also have static storage duration? Yet presumably these can
be legally modified.

That 1 which initializes B[0] cannot be modified.

Why not? I haven't requested that those are 'const'. ...

You don't get a choice in the matter. The C language doesn't permit
numeric literals of any kind to be modified by your code.

My post wasn't about numerical literals. I assumed it was about that '1'
value which is stored B's first cell.

However, just in case KK was talking about that unlikely possibly, I
covered that as well.

ey can't be,

Post by James Kuyper
and don't need to be, declared 'const'. I've heard that in some other
languages, if you call foo(3), and foo() changes the value of it's
argument to 2, then subsequent calls to bar(3) will pass a value of 2 to
bar(). That sounds like such a ridiculous mis-feature that I hesitate to
identify which languages I had heard accused of having that feature, but
it is important to note that C is not one of them.
Just as 1 is an integer literal whose value cannot be modified,

It can't modified, in a value that would also affect other instances of
'1' within that module or produce, because it is very unlikely to be shared.

I don't know of any implementations of this kind of language which do
that. (The nearest might FORTRAN IV when '1' was passed by reference to
a subroutine, and the subroutine then assigns to that parameter.)

Where it would be more plausible is here:

const char* B[] = {"A", "A", "A"};

where if you can somehow change that first "A", then the other two could
also change if the compiler decides to share those 3 identical strings.

Lawrence D'Oliveiro

2024-08-03 01:31:17 UTC

Post by James Kuyper
I've heard that in some other
languages, if you call foo(3), and foo() changes the value of it's
argument to 2, then subsequent calls to bar(3) will pass a value of 2 to
bar(). That sounds like such a ridiculous mis-feature that I hesitate to
identify which languages I had heard accused of having that feature ...

I heard that, too. I think it was on some early FORTRAN compilers, on
early machine architectures, without stacks or reentrancy. And with the
weird FORTRAN argument-passing conventions.

Richard Damon

2024-08-03 02:01:21 UTC

Post by Lawrence D'Oliveiro

Post by James Kuyper
I've heard that in some other
languages, if you call foo(3), and foo() changes the value of it's
argument to 2, then subsequent calls to bar(3) will pass a value of 2 to
bar(). That sounds like such a ridiculous mis-feature that I hesitate to
identify which languages I had heard accused of having that feature ...

I heard that, too. I think it was on some early FORTRAN compilers, on
early machine architectures, without stacks or reentrancy. And with the
weird FORTRAN argument-passing conventions.

I remember it too, and was based on the fact that all arguments were
pass by reference (so they could be either in or out parameters), and
constants were passed as pointers to the location of memory where that
constant was stored, and perhaps used elsewhere too. Why waste precious
memory to setup a temporary to hold be initialized and hold the value,
when you could just pass the address of a location that you knew had the
right value.

Joe Pfeiffer

2024-08-03 14:32:00 UTC

Post by Richard Damon

Post by Lawrence D'Oliveiro

Post by James Kuyper
I've heard that in some other
languages, if you call foo(3), and foo() changes the value of it's
argument to 2, then subsequent calls to bar(3) will pass a value of 2 to
bar(). That sounds like such a ridiculous mis-feature that I hesitate to
identify which languages I had heard accused of having that feature ...

I heard that, too. I think it was on some early FORTRAN compilers, on
early machine architectures, without stacks or reentrancy. And with the
weird FORTRAN argument-passing conventions.

I remember it too, and was based on the fact that all arguments were
pass by reference (so they could be either in or out parameters), and
constants were passed as pointers to the location of memory where that
constant was stored, and perhaps used elsewhere too. Why waste
precious memory to setup a temporary to hold be initialized and hold
the value, when you could just pass the address of a location that you
knew had the right value.

I actually had a bug once in my FORTRAN code on a CDC6400 where I changed the
value of an argument in a function, and then passed in a constant. That
"constant" had the new value for the rest of the program. Finding that
one was a challenge, particularly since I was a very inexperienced
undergrad at the time.

Lawrence D'Oliveiro

2024-08-04 01:05:01 UTC

... was based on the fact that all arguments were pass by reference ...

Slightly more subtle than that: simple variables (and I think array
elements) were passed by reference; more complex expressions had their
value stored in a temporary and the temporary was passed by reference.

The “more complex” criterion could be triggered by something as simple as
putting an extra pair of parentheses around a variable reference.

It was a calling convention that really made no logical sense.

Tim Rentsch

2024-08-12 09:52:15 UTC

Post by Richard Damon

Post by James Kuyper
I've heard that in some other
languages, if you call foo(3), and foo() changes the value of it's
argument to 2, then subsequent calls to bar(3) will pass a value of 2 to
bar(). That sounds like such a ridiculous mis-feature that I hesitate to
identify which languages I had heard accused of having that feature ...

I heard that, too. I think it was on some early FORTRAN compilers, on
early machine architectures, without stacks or reentrancy. And with the
weird FORTRAN argument-passing conventions.

I remember it too, and was based on the fact that all arguments were
pass by reference (so they could be either in or out parameters), and
constants were passed as pointers to the location of memory where that
constant was stored, and perhaps used elsewhere too. Why waste
precious memory to setup a temporary to hold be initialized and hold
the value, when you could just pass the address of a location that you
knew had the right value.

I think the original FORTRAN, and FORTRAN II, used call by reference.
In the early 1960s FORTRAN changed to using call by value-result
(which is similar to call by reference but slightly different).

Tim Rentsch

2024-08-14 00:46:05 UTC

Post by James Kuyper
Just as 1 is an integer literal whose value cannot be modified,
[...]

The C language doesn't have integer literals. C has string
literals, and compound literals, and it has integer constants.
But C does not have integer literals.

Keith Thompson

2024-08-14 01:44:18 UTC

Post by Tim Rentsch

Post by James Kuyper
Just as 1 is an integer literal whose value cannot be modified,
[...]

The C language doesn't have integer literals. C has string
literals, and compound literals, and it has integer constants.
But C does not have integer literals.

Technically correct (but IMHO not really worth worrying about).

There is a proposal for C2y, authored by Jens Gustedt, to change the
term "constant" to "literal" for character, integer, and floating
constants. (I think it's a good idea.)

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3239.htm>

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

2024-08-15 23:00:35 UTC

Post by Keith Thompson

Post by Tim Rentsch

Post by James Kuyper
Just as 1 is an integer literal whose value cannot be modified,
[...]

The C language doesn't have integer literals. C has string
literals, and compound literals, and it has integer constants.
But C does not have integer literals.

Technically correct (but IMHO not really worth worrying about).

Anyone who flogs others posters for incorrectly using terminology
defined in the ISO C standard should set a good example by using
the ISO-C-defined terms correctly himself.

Post by Keith Thompson
There is a proposal for C2y, authored by Jens Gustedt, to change the
term "constant" to "literal" for character, integer, and floating
constants. (I think it's a good idea.)
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3239.htm>

The more C is changed to resemble C++ the worse it becomes. It
isn't surprising that you like it.

Keith Thompson

2024-08-15 23:27:29 UTC

Post by Tim Rentsch

Post by Keith Thompson

Post by Tim Rentsch

Post by James Kuyper
Just as 1 is an integer literal whose value cannot be modified,
[...]

The C language doesn't have integer literals. C has string
literals, and compound literals, and it has integer constants.
But C does not have integer literals.

Technically correct (but IMHO not really worth worrying about).

Anyone who flogs others posters for incorrectly using terminology
defined in the ISO C standard should set a good example by using
the ISO-C-defined terms correctly himself.

In fact I do. In my own writing, I use the term "integer constant",
not "integer literal", when discussing C. (It's likely I haven't
been 100% consistent.)

My point is that, while "integer literal" is inconsistent with the
terminology used in the C standard, it is not ambiguous or confusing.

C does not define the term "literal" (it defines the phrases "string
literal" and "compound literal"). The word "literal" by itself
is a very common term used when discussing programs in general.
When I looked into it last time this came up, I found that most of
the programming languages I looked into refer to 42 as a literal,
not as a constant.

I'll also note that the word "constant" is overloaded in C.
For example, as of C17, the description of "sizeof" says: "If the
type of the operand is a variable length array type, the operand
is evaluated; otherwise, the operand is not evaluated and the
result is an integer constant." Though the meaning is clear,
it's an incorrect usage. (C23 changes this to "... and the result
is an integer constant expression", which is better, but it's the
expression, not its result, that is an integer constant expression.)

Replacing the term "constant" by "literal" would, in my opinion,
improve the clarity of the standard. I see no drawbacks to such
a change (other than the overhead of *any* change to the standard).

Post by Tim Rentsch

Post by Keith Thompson
There is a proposal for C2y, authored by Jens Gustedt, to change the
term "constant" to "literal" for character, integer, and floating
constants. (I think it's a good idea.)
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3239.htm>

The more C is changed to resemble C++ the worse it becomes. It
isn't surprising that you like it.

I presume that was intended as a personal insult. I urge you to do
better.

I acknowledge your opinion. I do not share it.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

2024-09-28 00:33:56 UTC

Post by Keith Thompson

Post by Tim Rentsch
The more C is changed to resemble C++ the worse it becomes. It
isn't surprising that you like it.

I presume that was intended as a personal insult.

It wasn't.

James Kuyper

2024-08-14 14:33:03 UTC

Post by Tim Rentsch

Post by James Kuyper
Just as 1 is an integer literal whose value cannot be modified,
[...]

The C language doesn't have integer literals. C has string
literals, and compound literals, and it has integer constants.
But C does not have integer literals.

True, but C++ does, and it means the same thing by "integer literal"
that C means by "integer constant". C doesn't define the term "integer
literal" with any conflicting meaning, and my use of the C++ terminology
allowed me to make the parallel with string literals clearer, so I don't
see any particular problem with my choice of words.

Tim Rentsch

2024-08-15 23:05:11 UTC

Post by James Kuyper

Post by Tim Rentsch

Post by James Kuyper
Just as 1 is an integer literal whose value cannot be modified,
[...]

The C language doesn't have integer literals. C has string
literals, and compound literals, and it has integer constants.
But C does not have integer literals.

True, but C++ does, and it means the same thing by "integer literal"
that C means by "integer constant".

This is comp.lang.c, not comp.lang.c++. You flog Bart for using
C-standard-defined terms wrongly. This case is no different.

Post by James Kuyper
C doesn't define the term "integer
literal" with any conflicting meaning, and my use of the C++ terminology
allowed me to make the parallel with string literals clearer, so I don't
see any particular problem with my choice of words.

In this case you are in the wrong. Just be a man and admit it. Oh, I
forgot, your rhetorical religion doesn't allow you to admit any
linguistic imperfection, so you try to sleaze your way to a different
subject so you can continue to argue.

Bonita Montero

2024-08-04 13:52:59 UTC

Post by Keith Thompson

Post by Richard Harnden

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
      while (*s) {
          *s = toupper(*s); // SEGFAULT
          s++;
      }
}
int main() {
      char* text = "this is a test";
      printf("before [%s]\n", text);
      uppercase_ascii(text);
      printf("after [%s]\n", text);
}

text is pointing to "this is a test" - and that is stored in the
program binary and that's why can't modify it.

That's not the reason for the segfault in this case.

I'm fairly sure it is.

Post by Bart
With some
compilers, you *can* modify it, but that will permanently modify that
string constant. (If the code is repeated, the text is already in
capitals the second time around.)
It segfaults when the string is stored in a read-only part of the binary.

A string literal creates an array object with static storage duration.
Any attempt to modify that array object has undefined behavior.

What's the difference between such an object, and an array like one of
static char A[100];
static char B[100]={1};

This char arrays are modifyable because they're not const.

Post by Bart
Do these not also have static storage duration? Yet presumably these can
be legally modified.

Tim Rentsch

2024-08-12 21:11:47 UTC

Keith Thompson <Keith.S.Thompson+***@gmail.com> writes:

[...]

Post by Keith Thompson
A string literal creates an array object with static storage
duration. [...]

A small quibble. Every string literal does sit in an array,
but it might not be a _new_ array, because different string
literals are allowed to overlap as long as the bytes in the
overlapping arrays have the right values.

Vir Campestris

2024-08-13 14:34:19 UTC

Post by Tim Rentsch
[...]

Post by Keith Thompson
A string literal creates an array object with static storage
duration. [...]

A small quibble. Every string literal does sit in an array,
but it might not be a _new_ array, because different string
literals are allowed to overlap as long as the bytes in the
overlapping arrays have the right values.

And this is exactly why string literals should always have been const.

A compiler is entitled to share memory between strings. so

puts("lap");
puts("overlap");

it's entitled to make them overlap. Then add

char * p = "lap";
*p='X';

and it can overwrite the shared string. I think. which would mean that
writing "lap" again would have a different result.

But that ship has sailed. I'm not even sure const had been invented that
far back!

Andy

Keith Thompson

2024-08-13 20:08:16 UTC

Post by Vir Campestris

Post by Tim Rentsch
[...]

Post by Keith Thompson
A string literal creates an array object with static storage
duration. [...]

A small quibble. Every string literal does sit in an array,
but it might not be a _new_ array, because different string
literals are allowed to overlap as long as the bytes in the
overlapping arrays have the right values.

And this is exactly why string literals should always have been const.
A compiler is entitled to share memory between strings. so
puts("lap");
puts("overlap");
it's entitled to make them overlap. Then add
char * p = "lap";
*p='X';
and it can overwrite the shared string. I think. which would mean that
writing "lap" again would have a different result.
But that ship has sailed. I'm not even sure const had been invented
that far back!

The reason that *wasn't* done is that it would have broken existing
code.

In pre-ANSI C, "const" didn't exist, and this:
char *ptr = "hello";
was the only way to create a pointer into a string literal object.
Making string literals const would have broken such code, with no clean
way to rewrite it so it would be accepted by old and new compilers.

I suppose you could do something like:

#ifndef __STDC__
#define const
#endif

In 20/20 hindsight, my personal opinion is that it would have been
better to make string literals const in C89/C90. Compilers could
still accept old const-incorrect code with a non-fatal warning,
and programmers would be encouraged but not immediately forced to
use const.

This could still be done in C2y, but I'm not aware of any proposals.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

David Brown

2024-08-14 08:40:05 UTC

Post by Keith Thompson
In 20/20 hindsight, my personal opinion is that it would have been
better to make string literals const in C89/C90. Compilers could
still accept old const-incorrect code with a non-fatal warning,
and programmers would be encouraged but not immediately forced to
use const.

Agreed.

That's basically what happened when C++ was designed.

Post by Keith Thompson
This could still be done in C2y, but I'm not aware of any proposals.

There is always going to be some hassle with things like search
functions - 100% const correctness is not easy when you don't have
overloads. (It's not always easy even in C++ where you /do/ have
overloads and templates.)

Tim Rentsch

2024-08-14 00:41:16 UTC

Post by Keith Thompson
In 20/20 hindsight, my personal opinion is that it would have been
better to make string literals const in C89/C90.

Fortunately wiser heads prevailed.

Keith Thompson

2024-08-14 01:47:08 UTC

Tim Rentsch <***@z991.linuxsc.com> writes:
[...]

C was already well established before 'const' was invented, and it
was a number of years after that before some C compilers started
allowing 'const' in source code. The cost of not being backward
compatible would be high; the cost adding const incrementally in
new code is low. Generally speaking using string literals in open
code is a bad idea anyway, regardless whether there is any concern
that the string might be modified. I think most people who want
string literals to be of type const char[] are only thinking about
one side of the equation. It's always important to remember to
look at both sides of the cost/benefit forces, and not focus on
just the (imagined) benefits or (imagined) downsides.

I can't speak for most people, but I want string literals to be const
and I've thought about both sides of the equation. (Existing code could
be compiled with options to enable the old behavior and could be changed
incrementally.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Kaz Kylheku

2024-08-14 03:16:26 UTC

Post by Keith Thompson
I can't speak for most people, but I want string literals to be const
and I've thought about both sides of the equation. (Existing code could
be compiled with options to enable the old behavior and could be changed
incrementally.)

C++ made string literals const sometime in the early 2000s.

That makes it much easier to be in favor of the change; it not
only helps prevent bugs, but improves C and C++ compatibility.

When programmers write string manipulating functions, they tend
to test them with string literal arguments. When string literals
are const, that encourages the programmers to make arguments
const whenever they can be which tends to improve the functions.

I work in C codebases that are also compiled as C++, so const
string literals are second nature. It's old hat by now.

Also, <string.h> could have type generic functions where it
makes sense to support both const char * and char *.

E.g. strchr should could return const char * if the
parameter is const char *, and char * when the parameter is char *.
The one function we have now strips the qualifier, which is bad;
when you find a character in a const string, you get a non-const
pointer to it.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Keith Thompson

2024-08-14 03:49:45 UTC

Kaz Kylheku <643-408-***@kylheku.com> writes:
[...]

Post by Kaz Kylheku
Also, <string.h> could have type generic functions where it
makes sense to support both const char * and char *.
E.g. strchr should could return const char * if the
parameter is const char *, and char * when the parameter is char *.
The one function we have now strips the qualifier, which is bad;
when you find a character in a const string, you get a non-const
pointer to it.

C23 does exactly this. It changes memchr, strchr, strpbrk, strrchr, and
strstr into generic functions (macros, presumably using _Generic) whose
return type is pointer-to-const if and only if the appropriate argument
is pointer-to-const. If you suppress the macro definition, you get a
function that takes a const-qualified argument and returns a non-const
result.

(C++ does something similar for its functions in <cstring>, imported
from C, but by making them templates.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Scott Lurndal

2024-08-01 13:28:06 UTC

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";
printf("before [%s]\n", text);
uppercase_ascii(text);
printf("after [%s]\n", text);
}
I know there are better ways to do ASCII uppercase, I don't care about
that; what I don't understand is why I can't do an in-place edit of a non-
const char*?

Because char* is a pointer, not a string. In this case, it is
pointing to a string stored in read-only memory.

Michael S

2024-08-01 14:40:26 UTC

On Thu, 01 Aug 2024 08:06:57 +0000

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";
printf("before [%s]\n", text);
uppercase_ascii(text);
printf("after [%s]\n", text);
}

The answers to your question are already given above, so I'd talk about
something else. Sorry about it.

To my surprise, none of the 3 major compilers that I tried issued the
warning at this line:
char* text = "this is a test";
If implicit conversion of 'const char*' to 'char*' does not warrant
compiler warning than I don't know what does.
Is there something in the Standard that explicitly forbids diagnostic
for this sort of conversion?

BTW, all 3 compilers issue reasonable warnings when I write it slightly
differently:
const char* ctext = "this is a test";
char* text = ctext;

I am starting to suspect that compilers (and the Standard?) consider
string literals as being of type 'char*' rather than 'const char*'.

David Brown

2024-08-01 17:56:00 UTC

Post by Michael S
On Thu, 01 Aug 2024 08:06:57 +0000

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";
printf("before [%s]\n", text);
uppercase_ascii(text);
printf("after [%s]\n", text);
}

The answers to your question are already given above, so I'd talk about
something else. Sorry about it.
To my surprise, none of the 3 major compilers that I tried issued the
char* text = "this is a test";
If implicit conversion of 'const char*' to 'char*' does not warrant
compiler warning than I don't know what does.
Is there something in the Standard that explicitly forbids diagnostic
for this sort of conversion?
BTW, all 3 compilers issue reasonable warnings when I write it slightly
const char* ctext = "this is a test";
char* text = ctext;
I am starting to suspect that compilers (and the Standard?) consider
string literals as being of type 'char*' rather than 'const char*'.

Your suspicions are correct - in C, string literals are used to
initialise an array of char (or wide char, or other appropriate
character type). Perhaps you are thinking of C++, where the type is
"const char" (or other const character type).

So in C, when a string literal is used in an expression it is converted
to a "char *" pointer. You can, of course, assign that to a "const char
*" pointer. But it does not make sense to have a warning when assigning
it to a non-const "char *" pointer. This is despite it being undefined
behaviour (explicitly stated in the standards) to attempt to write to a
string literal.

The reason string literals are not const in C is backwards compatibility
- they existed before C had "const", and making string literals into
"const char" arrays would mean that existing code that assigned them to
non-const pointers would then be in error. C++ was able to do the right
thing and make them arrays of const char because it had "const" from the
beginning.

gcc has the option "-Wwrite-strings" that makes string literals in C
have "const char" array type, and thus give errors when you try to
assign to a non-const char * pointer. But the option has to be
specified explicitly (it is not in -Wall) because it changes the meaning
of the code and can cause compatibility issues with existing correct code.

candycanearter07

2024-08-02 05:30:02 UTC

Post by David Brown

Post by Michael S
On Thu, 01 Aug 2024 08:06:57 +0000

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";
printf("before [%s]\n", text);
uppercase_ascii(text);
printf("after [%s]\n", text);
}

The answers to your question are already given above, so I'd talk about
something else. Sorry about it.
To my surprise, none of the 3 major compilers that I tried issued the
char* text = "this is a test";
If implicit conversion of 'const char*' to 'char*' does not warrant
compiler warning than I don't know what does.
Is there something in the Standard that explicitly forbids diagnostic
for this sort of conversion?
BTW, all 3 compilers issue reasonable warnings when I write it slightly
const char* ctext = "this is a test";
char* text = ctext;
I am starting to suspect that compilers (and the Standard?) consider
string literals as being of type 'char*' rather than 'const char*'.

Your suspicions are correct - in C, string literals are used to
initialise an array of char (or wide char, or other appropriate
character type). Perhaps you are thinking of C++, where the type is
"const char" (or other const character type).
So in C, when a string literal is used in an expression it is converted
to a "char *" pointer. You can, of course, assign that to a "const char
*" pointer. But it does not make sense to have a warning when assigning
it to a non-const "char *" pointer. This is despite it being undefined
behaviour (explicitly stated in the standards) to attempt to write to a
string literal.
The reason string literals are not const in C is backwards compatibility
- they existed before C had "const", and making string literals into
"const char" arrays would mean that existing code that assigned them to
non-const pointers would then be in error. C++ was able to do the right
thing and make them arrays of const char because it had "const" from the
beginning.
gcc has the option "-Wwrite-strings" that makes string literals in C
have "const char" array type, and thus give errors when you try to
assign to a non-const char * pointer. But the option has to be
specified explicitly (it is not in -Wall) because it changes the meaning
of the code and can cause compatibility issues with existing correct code.

-Wwrite-strings is included in -Wpedantic.

--
user <candycane> is generated from /dev/urandom

Keith Thompson

2024-08-02 10:02:03 UTC

[...]

Post by candycanearter07

Post by David Brown
gcc has the option "-Wwrite-strings" that makes string literals in C
have "const char" array type, and thus give errors when you try to
assign to a non-const char * pointer. But the option has to be
specified explicitly (it is not in -Wall) because it changes the meaning
of the code and can cause compatibility issues with existing correct code.

-Wwrite-strings is included in -Wpedantic.

No it isn't, nor is it included in -Wall -- and it wouldn't make sense
to do so.

The -Wpedantic option is intended to produce all required diagnostics
for the specified C standard. -Wwrite-strings gives string literals the
type `const char[LENGTH]`, which enables useful diagnostics but is
*non-conforming*.

For example, this program:

```
#include <stdio.h>
int main(void) {
char *s = "hello, world";
puts(s);
}
```

is valid (no diagnostic required), since it doesn't actually write to
the string literal object, but `-Wwrite-strings` causes gcc to warn
about it (because making the pointer non-const creates the potential for
an error).

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Richard Harnden

2024-08-02 12:04:55 UTC

Post by Keith Thompson
[...]

Post by candycanearter07

Post by David Brown
gcc has the option "-Wwrite-strings" that makes string literals in C
have "const char" array type, and thus give errors when you try to
assign to a non-const char * pointer. But the option has to be
specified explicitly (it is not in -Wall) because it changes the meaning
of the code and can cause compatibility issues with existing correct code.

-Wwrite-strings is included in -Wpedantic.

No it isn't, nor is it included in -Wall -- and it wouldn't make sense
to do so.
The -Wpedantic option is intended to produce all required diagnostics
for the specified C standard. -Wwrite-strings gives string literals the
type `const char[LENGTH]`, which enables useful diagnostics but is
*non-conforming*.
```
#include <stdio.h>
int main(void) {
char *s = "hello, world";
puts(s);
}
```
is valid (no diagnostic required), since it doesn't actually write to
the string literal object, but `-Wwrite-strings` causes gcc to warn
about it (because making the pointer non-const creates the potential for
an error).

Is there any reason not to always write ...

static const char *s = "hello, world";

... ?

You get all the warnings for free that way.

James Kuyper

2024-08-02 13:59:40 UTC

On 8/2/24 08:04, Richard Harnden wrote:
...

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?
You get all the warnings for free that way.

If you hate being notified of the errors that can be caught by
appropriate use of 'const', as many do, that can be considered a
disadvantage. I can't claim to understand why they feel that way, but
such people do exist.

Keith Thompson

2024-08-02 18:24:06 UTC

Richard Harnden <***@gmail.invalid> writes:
[...]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?
You get all the warnings for free that way.

The "static", if this is at block scope, specifies that the pointer
object, not the array object, has static storage duration. If it's at
file scope it specifies that the name "s" is not visible to other
translation units. Either way, use it if that's what you want, don't
use it if it isn't.

There's no good reason not to use "const". (If string literal objects
were const, you'd have to use "const" here.)

If you also want the pointer to be const, you can write:

const char *const s = "hello, world";

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Richard Damon

2024-08-02 18:42:08 UTC

Post by Keith Thompson
[...]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?
You get all the warnings for free that way.

The "static", if this is at block scope, specifies that the pointer
object, not the array object, has static storage duration. If it's at
file scope it specifies that the name "s" is not visible to other
translation units. Either way, use it if that's what you want, don't
use it if it isn't.
There's no good reason not to use "const". (If string literal objects
were const, you'd have to use "const" here.)
const char *const s = "hello, world";

The one good reason to not make it const is that if you are passing it
to functions that take (non-const) char* parameters that don't actually
change that parameters contents.

These may still exist in legacy code since so far nothing has required
them to change.

Perhaps it is getting to the point that the language needs to abandon
support for that ancient code, and force "const correctness" (which I
admit some will call const-pollution) onto code, first with a formal
deprecation period, where implementations are strongly suggested to make
the violation of the rule a warning, and then later changing the type of
string constants.

Of course, implementations would still be free to accept such code, and
maybe even not even warn about it in non-pedantic mode, but making it
part of the Standard would be a step to cleaning this up.

James Kuyper

2024-08-02 18:58:10 UTC

Post by Richard Damon

[...]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?

...

Post by Richard Damon

There's no good reason not to use "const". (If string literal objects
were const, you'd have to use "const" here.)

...

Post by Richard Damon
The one good reason to not make it const is that if you are passing it
to functions that take (non-const) char* parameters that don't
actually change that parameters contents.

Actually, that's not a good reason. If you can't modify the function's
interface, you should use a (char*) cast, which will serve to remind
future programmers that this is a dangerous function call. You shouldn't
make the pointer's own type "char *".

Richard Damon

2024-08-02 19:11:20 UTC

Post by James Kuyper

Post by Richard Damon

[...]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?

...

Post by Richard Damon

There's no good reason not to use "const". (If string literal objects
were const, you'd have to use "const" here.)

...

Post by Richard Damon
The one good reason to not make it const is that if you are passing it
to functions that take (non-const) char* parameters that don't
actually change that parameters contents.

Actually, that's not a good reason. If you can't modify the function's
interface, you should use a (char*) cast, which will serve to remind
future programmers that this is a dangerous function call. You shouldn't
make the pointer's own type "char *".

Depends on the library and how many times it is used. It may be a
perfectly safe call, as the function is defined not to change its
parameter, but being external code the signature might not be fixable.

Adding the cast at each call, may cause a "crying wolf" response that
trains people to just add the cast where it seems to be needed (even if
not warrented). You likely DO want a note at the statement explaining
the situation.

Tim Rentsch

2024-08-12 15:32:32 UTC

Post by Richard Damon

Post by Keith Thompson
[...]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?

...

Post by Richard Damon

Post by Keith Thompson
There's no good reason not to use "const". (If string literal objects
were const, you'd have to use "const" here.)

...

Post by Richard Damon
The one good reason to not make it const is that if you are passing it
to functions that take (non-const) char* parameters that don't
actually change that parameters contents.

Actually, that's not a good reason. If you can't modify the function's
interface, you should use a (char*) cast, which will serve to remind
future programmers that this is a dangerous function call. You shouldn't
make the pointer's own type "char *".

Depends on the library and how many times it is used. It may be a
perfectly safe call, as the function is defined not to change its
parameter, but being external code the signature might not be fixable.

Right. It isn't always feasible to assume source code can be
modified, especially without causing downstream problems.

Adding the cast at each call, may cause a "crying wolf" response that
trains people to just add the cast where it seems to be needed (even
if not warrented).

Exactly. The last thing we want to do is have developers learn
habits that tend to push code in the direction of being less
safe.

Tim Rentsch

2024-08-12 15:27:04 UTC

Post by Richard Damon

Post by Keith Thompson
[...]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?
You get all the warnings for free that way.

The "static", if this is at block scope, specifies that the pointer
object, not the array object, has static storage duration. If it's at
file scope it specifies that the name "s" is not visible to other
translation units. Either way, use it if that's what you want, don't
use it if it isn't.
There's no good reason not to use "const". (If string literal objects
were const, you'd have to use "const" here.)
const char *const s = "hello, world";

The one good reason to not make it const is that if you are passing it
to functions that take (non-const) char* parameters that don't
actually change that parameters contents.

Right.

Post by Richard Damon
These may still exist in legacy code since so far nothing has required
them to change.
Perhaps it is getting to the point that the language needs to abandon
support for that ancient code, and force "const correctness" (which I
admit some will call const-pollution) onto code, first with a formal
deprecation period, where implementations are strongly suggested to
make the violation of the rule a warning, and then later changing the
type of string constants.

Given the widespread availability of compiler options to treat
string literals as being const-qualified, it seems better to
leave the language alone and have people use those options as
they see fit. Making existing programs that have worked fine
for years become non-conforming is a heavy and unnecessary
burden, with an ROI that is at best very small and more likely
negative.

Chris M. Thomasson

2024-08-02 19:27:47 UTC

Post by Keith Thompson
[...]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?
You get all the warnings for free that way.

The "static", if this is at block scope, specifies that the pointer
object, not the array object, has static storage duration. If it's at
file scope it specifies that the name "s" is not visible to other
translation units. Either way, use it if that's what you want, don't
use it if it isn't.
There's no good reason not to use "const". (If string literal objects
were const, you'd have to use "const" here.)
const char *const s = "hello, world";

For some reason I had a sort of a habit wrt const pointers:

(experimental code, no ads, raw text...)
https://pastebin.com/raw/f52a443b1

________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>

struct object_prv_vtable {
int (*fp_destroy) (void* const);
};

struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};

struct device_vtable {
struct object_prv_vtable const object;
struct device_prv_vtable const device;
};

struct device {
struct device_vtable const* vtable;
};

#define object_destroy(mp_self) ( \
(mp_self)->vtable->object.fp_destroy((mp_self)) \
)

#define device_read(mp_self, mp_buf, mp_size) ( \
(mp_self)->vtable->device.fp_read((mp_self), (mp_buf), (mp_size)) \
)

#define device_write(mp_self, mp_buf, mp_size) ( \
(mp_self)->vtable->device.fp_write((mp_self), (mp_buf), (mp_size)) \
)
________________________________

;^)

Ben Bacarisse

2024-08-02 22:29:42 UTC

Post by Chris M. Thomasson
(experimental code, no ads, raw text...)
https://pastebin.com/raw/f52a443b1
________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>
struct object_prv_vtable {
int (*fp_destroy) (void* const);
};
struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};

Why? It seems like an arbitrary choice to const qualify some pointer
types and some pointed-to types (but never both).

Post by Chris M. Thomasson
;^)

Does the wink mean I should not take what you write seriously? If so,
please ignore my question.

--
Ben.

Chris M. Thomasson

2024-08-02 23:11:43 UTC

Post by Ben Bacarisse

Post by Chris M. Thomasson
(experimental code, no ads, raw text...)
https://pastebin.com/raw/f52a443b1
________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>
struct object_prv_vtable {
int (*fp_destroy) (void* const);
};
struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};

Why? It seems like an arbitrary choice to const qualify some pointer
types and some pointed-to types (but never both).

I just wanted to get the point across that the first parameter, aka,
akin to "this" in C++ is a const pointer. Shall not be modified in any
way shape or form. It is as it is, so to speak:

void foo(struct foobar const* const self);

constant pointer to a constant foobar, fair enough?

void
foo(struct foobar const* const self)
{
//self is there... Do not mutate it!
//Please for self is "special"?
}

Post by Ben Bacarisse

Post by Chris M. Thomasson
;^)

Does the wink mean I should not take what you write seriously? If so,
please ignore my question.

The wink was meant to show my habit in basically a jestful sort of way.
Some people did not seem to like it very much, even though its was just
me doing my thing. I can adapt rather quickly.

Ben Bacarisse

2024-08-05 01:06:36 UTC

Post by Ben Bacarisse

Post by Chris M. Thomasson
(experimental code, no ads, raw text...)
https://pastebin.com/raw/f52a443b1
________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>
struct object_prv_vtable {
int (*fp_destroy) (void* const);
};
struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};

Why? It seems like an arbitrary choice to const qualify some pointer
types and some pointed-to types (but never both).

I just wanted to get the point across that the first parameter, aka, akin
to "this" in C++ is a const pointer. Shall not be modified in any way shape
void foo(struct foobar const* const self);
constant pointer to a constant foobar, fair enough?

No. If you intended a const pointer to const object why didn't you
write that? My point was that the consts seems to be scattered about
without any apparent logic and you've not explained why.

Post by Ben Bacarisse

Post by Chris M. Thomasson
;^)

Does the wink mean I should not take what you write seriously? If so,
please ignore my question.

The wink was meant to show my habit in basically a jestful sort of way.

Your habit of what?

--
Ben.

Chris M. Thomasson

2024-08-05 02:37:11 UTC

Post by Ben Bacarisse

Post by Ben Bacarisse

Post by Chris M. Thomasson
(experimental code, no ads, raw text...)
https://pastebin.com/raw/f52a443b1
________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>
struct object_prv_vtable {
int (*fp_destroy) (void* const);
};
struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};

Why? It seems like an arbitrary choice to const qualify some pointer
types and some pointed-to types (but never both).

I just wanted to get the point across that the first parameter, aka, akin
to "this" in C++ is a const pointer. Shall not be modified in any way shape
void foo(struct foobar const* const self);
constant pointer to a constant foobar, fair enough?

No. If you intended a const pointer to const object why didn't you
write that? My point was that the consts seems to be scattered about
without any apparent logic and you've not explained why.

Post by Ben Bacarisse

Post by Chris M. Thomasson
;^)

Does the wink mean I should not take what you write seriously? If so,
please ignore my question.

The wink was meant to show my habit in basically a jestful sort of way.

Your habit of what?

To write the declaration with names and the const access I want, so:

extern void (void const* const ptr);

void (void const* const ptr)
{
// ptr is a const pointer to a const void
}

Chris M. Thomasson

2024-08-05 02:38:24 UTC

Post by Chris M. Thomasson

Post by Chris M. Thomasson
(experimental code, no ads, raw text...)
https://pastebin.com/raw/f52a443b1
________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>
struct object_prv_vtable {
    int (*fp_destroy) (void* const);
};
struct device_prv_vtable {
    int (*fp_read) (void* const, void*, size_t);
    int (*fp_write) (void* const, void const*, size_t);
};

Why? It seems like an arbitrary choice to const qualify some pointer
types and some pointed-to types (but never both).

I just wanted to get the point across that the first parameter, aka, akin
to "this" in C++ is a const pointer. Shall not be modified in any way shape
void foo(struct foobar const* const self);
constant pointer to a constant foobar, fair enough?

No. If you intended a const pointer to const object why didn't you
write that? My point was that the consts seems to be scattered about
without any apparent logic and you've not explained why.

Post by Chris M. Thomasson
;^)

Does the wink mean I should not take what you write seriously? If so,
please ignore my question.

The wink was meant to show my habit in basically a jestful sort of way.

Your habit of what?

extern void (void const* const ptr);
void (void const* const ptr)
{
// ptr is a const pointer to a const void
}

Perhaps give the function a name... ;^)

To write the declaration with names and the const access I want, so:

extern void foobar(void const* const ptr);

void foobar(void const* const ptr)
{
// ptr is a const pointer to a const void
}

Ben Bacarisse

2024-08-05 11:03:08 UTC

Post by Chris M. Thomasson

Post by Ben Bacarisse

Post by Ben Bacarisse

Post by Chris M. Thomasson
(experimental code, no ads, raw text...)
https://pastebin.com/raw/f52a443b1
________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>
struct object_prv_vtable {
int (*fp_destroy) (void* const);
};
struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};

Why? It seems like an arbitrary choice to const qualify some pointer
types and some pointed-to types (but never both).

I just wanted to get the point across that the first parameter, aka, akin
to "this" in C++ is a const pointer. Shall not be modified in any way shape
void foo(struct foobar const* const self);
constant pointer to a constant foobar, fair enough?

No. If you intended a const pointer to const object why didn't you
write that? My point was that the consts seems to be scattered about
without any apparent logic and you've not explained why.

Post by Ben Bacarisse

Post by Chris M. Thomasson
;^)

Does the wink mean I should not take what you write seriously? If so,
please ignore my question.

The wink was meant to show my habit in basically a jestful sort of way.

Your habit of what?

extern void (void const* const ptr);
void (void const* const ptr)
{
// ptr is a const pointer to a const void
}

I don't think you are following what I'm, saying. If you think there
might be some value in finding out, you could as a few questions. I
won't say it again ;-)

--
Ben.

Chris M. Thomasson

2024-08-05 20:35:28 UTC

Post by Ben Bacarisse

Post by Chris M. Thomasson

Post by Ben Bacarisse

Post by Ben Bacarisse

Post by Chris M. Thomasson
(experimental code, no ads, raw text...)
https://pastebin.com/raw/f52a443b1
________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>
struct object_prv_vtable {
int (*fp_destroy) (void* const);
};
struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};

Why? It seems like an arbitrary choice to const qualify some pointer
types and some pointed-to types (but never both).

I just wanted to get the point across that the first parameter, aka, akin
to "this" in C++ is a const pointer. Shall not be modified in any way shape
void foo(struct foobar const* const self);
constant pointer to a constant foobar, fair enough?

No. If you intended a const pointer to const object why didn't you
write that? My point was that the consts seems to be scattered about
without any apparent logic and you've not explained why.

Post by Ben Bacarisse

Post by Chris M. Thomasson
;^)

Does the wink mean I should not take what you write seriously? If so,
please ignore my question.

The wink was meant to show my habit in basically a jestful sort of way.

Your habit of what?

extern void (void const* const ptr);
void (void const* const ptr)
{
// ptr is a const pointer to a const void
}

I don't think you are following what I'm, saying. If you think there
might be some value in finding out, you could as a few questions. I
won't say it again ;-)

I must be misunderstanding you. My habit in such code was to always make
the "this" pointer wrt some of my "object" oriented code a const
pointer. This was always the first parameter:

extern void foobar(void const* const ptr);

or

extern void foobar(void* const ptr);

Actually, I used the name of self for a while.

extern void foobar(void const* const self);
extern void foobar(void* const self);

Ben Bacarisse

2024-08-05 20:54:59 UTC

Post by Chris M. Thomasson

Post by Ben Bacarisse

Post by Chris M. Thomasson

Post by Ben Bacarisse

Post by Ben Bacarisse

Post by Chris M. Thomasson
(experimental code, no ads, raw text...)
https://pastebin.com/raw/f52a443b1
________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>
struct object_prv_vtable {
int (*fp_destroy) (void* const);
};
struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};

Why? It seems like an arbitrary choice to const qualify some pointer
types and some pointed-to types (but never both).

I just wanted to get the point across that the first parameter, aka, akin
to "this" in C++ is a const pointer. Shall not be modified in any way shape
void foo(struct foobar const* const self);
constant pointer to a constant foobar, fair enough?

No. If you intended a const pointer to const object why didn't you
write that? My point was that the consts seems to be scattered about
without any apparent logic and you've not explained why.

Post by Ben Bacarisse

Post by Chris M. Thomasson
;^)

Does the wink mean I should not take what you write seriously? If so,
please ignore my question.

The wink was meant to show my habit in basically a jestful sort of way.

Your habit of what?

extern void (void const* const ptr);
void (void const* const ptr)
{
// ptr is a const pointer to a const void
}

I don't think you are following what I'm, saying. If you think there
might be some value in finding out, you could as a few questions. I
won't say it again ;-)

I must be misunderstanding you. My habit in such code was to always make
the "this" pointer wrt some of my "object" oriented code a const
extern void foobar(void const* const ptr);

OK. So I conclude you don't want to know what I was saying. That's
fine. It was a trivial point.

--
Ben.

Chris M. Thomasson

2024-08-05 22:39:31 UTC

Post by Ben Bacarisse

Post by Chris M. Thomasson

Post by Ben Bacarisse

Post by Chris M. Thomasson

Post by Ben Bacarisse

Post by Ben Bacarisse

Post by Chris M. Thomasson
(experimental code, no ads, raw text...)
https://pastebin.com/raw/f52a443b1
________________________________
/* Interfaces
____________________________________________________________________*/
#include <stddef.h>
struct object_prv_vtable {
int (*fp_destroy) (void* const);
};
struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};

Why? It seems like an arbitrary choice to const qualify some pointer
types and some pointed-to types (but never both).

I just wanted to get the point across that the first parameter, aka, akin
to "this" in C++ is a const pointer. Shall not be modified in any way shape
void foo(struct foobar const* const self);
constant pointer to a constant foobar, fair enough?

No. If you intended a const pointer to const object why didn't you
write that? My point was that the consts seems to be scattered about
without any apparent logic and you've not explained why.

Post by Ben Bacarisse

Post by Chris M. Thomasson
;^)

Does the wink mean I should not take what you write seriously? If so,
please ignore my question.

The wink was meant to show my habit in basically a jestful sort of way.

Your habit of what?

extern void (void const* const ptr);
void (void const* const ptr)
{
// ptr is a const pointer to a const void
}

I don't think you are following what I'm, saying. If you think there
might be some value in finding out, you could as a few questions. I
won't say it again ;-)

I must be misunderstanding you. My habit in such code was to always make
the "this" pointer wrt some of my "object" oriented code a const
extern void foobar(void const* const ptr);

OK. So I conclude you don't want to know what I was saying. That's
fine. It was a trivial point.

I must have completely missed it. Sorry about that. Please redefine?

Ben Bacarisse

2024-08-06 11:29:29 UTC

Post by Chris M. Thomasson
I must have completely missed it. Sorry about that. Please redefine?

It's going to seem silly after all these exchanges. I simply wanted to
know why you chose to use const as you originally posted:

| struct object_prv_vtable {
| int (*fp_destroy) (void* const);
| int (*fp_read) (void* const, void*, size_t);
| int (*fp_write) (void* const, void const*, size_t);
| };

because that looks peculiar (to the point of being arbitrary) to me.
You went on to talk about "self" pointers being const pointers to const
void, but that was not what you wrote, so it did not address what I was
asking about.

In general, const qualified argument types are rarely used and are even
more rarely used in function (or type) declarations because there have
no effect at all in that position. For example, I can assign fp_destroy
from a function declared without the const-qualified parameter:

int destroy(void *self) { /* ... */; return 1; }
...
vtab.fp_destroy = destroy;

or, if I do want the compiler to check that the function does not alter
its parameter, I can add the const in the function definition (were it
can be useful) even if it is missing from the declaration:

struct object_prv_vtable {
int (*fp_destroy) (void*);
/* ... */
};

int destroy(void *const self) { /* ... */; return 1; }
...
vtab.fp_destroy = destroy;

But if you want the const there so that the declaration matches the
function defintion, why not do that for all the parameters? Basically,
I would have expercted either this (just ine const where it matters):

struct object_prv_vtable {
int (*fp_destroy) (void *);
int (*fp_read) (void *, void *, size_t);
int (*fp_write) (void *, void const *, size_t);
};

and the actual functions that get assigned to these pointers might, if
you want that extra check, have all their parametera marked const. Or,
for consistency, you might have written

struct object_prv_vtable {
int (*fp_destroy) (void * const);
int (*fp_read) (void * const, void * const, size_t const);
int (*fp_write) (void * const, void const * const, size_t const);
};

even if none of the actual functions have const parameters.

Finally, if you had intended to write what you later went on to talk
about, you would have written either

struct object_prv_vtable {
int (*fp_destroy) (const void *);
int (*fp_read) (const void *, void *, size_t);
int (*fp_write) (const void *, void const *, size_t);
};

or

struct object_prv_vtable {
int (*fp_destroy) (const void * const);
int (*fp_read) (const void * const, void * const, size_t const);
int (*fp_write) (const void * const, void const * const, size_t const);
};

TL;DR: where you put the consts in the original just seemed arbitrary.

I'll also note that the term "const pointer" is often used when the
pointer is not const! It most often mean that the pointed-to type is
const qualified. As such, it's best to avoid the term altogether.

--
Ben.

Chris M. Thomasson

2024-08-06 19:48:12 UTC

Post by Ben Bacarisse

Post by Chris M. Thomasson
I must have completely missed it. Sorry about that. Please redefine?

It's going to seem silly after all these exchanges. I simply wanted to
| struct object_prv_vtable {
| int (*fp_destroy) (void* const);
| int (*fp_read) (void* const, void*, size_t);
| int (*fp_write) (void* const, void const*, size_t);
| };
because that looks peculiar (to the point of being arbitrary) to me.
You went on to talk about "self" pointers being const pointers to const
void, but that was not what you wrote, so it did not address what I was
asking about.
In general, const qualified argument types are rarely used and are even
more rarely used in function (or type) declarations because there have
no effect at all in that position. For example, I can assign fp_destroy
int destroy(void *self) { /* ... */; return 1; }
...
vtab.fp_destroy = destroy;
or, if I do want the compiler to check that the function does not alter
its parameter, I can add the const in the function definition (were it
struct object_prv_vtable {
int (*fp_destroy) (void*);
/* ... */
};
int destroy(void *const self) { /* ... */; return 1; }
...
vtab.fp_destroy = destroy;
But if you want the const there so that the declaration matches the
function defintion, why not do that for all the parameters? Basically,
struct object_prv_vtable {
int (*fp_destroy) (void *);
int (*fp_read) (void *, void *, size_t);
int (*fp_write) (void *, void const *, size_t);
};
and the actual functions that get assigned to these pointers might, if
you want that extra check, have all their parametera marked const. Or,
for consistency, you might have written
struct object_prv_vtable {
int (*fp_destroy) (void * const);
int (*fp_read) (void * const, void * const, size_t const);
int (*fp_write) (void * const, void const * const, size_t const);
};
even if none of the actual functions have const parameters.
Finally, if you had intended to write what you later went on to talk
about, you would have written either
struct object_prv_vtable {
int (*fp_destroy) (const void *);
int (*fp_read) (const void *, void *, size_t);
int (*fp_write) (const void *, void const *, size_t);
};
or
struct object_prv_vtable {
int (*fp_destroy) (const void * const);
int (*fp_read) (const void * const, void * const, size_t const);
int (*fp_write) (const void * const, void const * const, size_t const);
};
TL;DR: where you put the consts in the original just seemed arbitrary.
I'll also note that the term "const pointer" is often used when the
pointer is not const! It most often mean that the pointed-to type is
const qualified. As such, it's best to avoid the term altogether.

I wanted to get across that the pointer value for the first parameter
itself should not be modified. I read (void* const) as a const pointer
to a "non-const" void. Now a const pointer to a const void is (void
const* const), from my code, notice the first parameter?

I consider the first parameter to be special in this older OO experiment
of mine. It shall not be modified, so I wrote it into the API:

struct device_prv_vtable {
int (*fp_read) (void* const, void*, size_t);
int (*fp_write) (void* const, void const*, size_t);
};

// impl...
static int usb_drive_device_read(void* const, void*, size_t);
static int usb_drive_device_write(void* const, void const*, size_t);

int usb_drive_device_read(
void* const self_,
void* buf,
size_t size
) {
struct usb_drive* const self = self_;
printf("usb_drive_device_read(%p, %p, %lu)\n",
(void*)self, buf, (unsigned long)size);
return 0;
}

int usb_drive_device_write(
void* const self_,
void const* buf,
size_t size
) {
struct usb_drive* const self = self_;
printf("usb_drive_device_write(%p, %p, %lu)\n",
(void*)self, buf, (unsigned long)size);
return 0;
}

Ben Bacarisse

2024-08-06 22:59:28 UTC

Post by Chris M. Thomasson

Post by Ben Bacarisse

Post by Chris M. Thomasson
I must have completely missed it. Sorry about that. Please redefine?

It's going to seem silly after all these exchanges. I simply wanted to
| struct object_prv_vtable {
| int (*fp_destroy) (void* const);
| int (*fp_read) (void* const, void*, size_t);
| int (*fp_write) (void* const, void const*, size_t);
| };
because that looks peculiar (to the point of being arbitrary) to me.
You went on to talk about "self" pointers being const pointers to const
void, but that was not what you wrote, so it did not address what I was
asking about.
In general, const qualified argument types are rarely used and are even
more rarely used in function (or type) declarations because there have
no effect at all in that position. For example, I can assign fp_destroy
int destroy(void *self) { /* ... */; return 1; }
...
vtab.fp_destroy = destroy;
or, if I do want the compiler to check that the function does not alter
its parameter, I can add the const in the function definition (were it
struct object_prv_vtable {
int (*fp_destroy) (void*);
/* ... */
};
int destroy(void *const self) { /* ... */; return 1; }
...
vtab.fp_destroy = destroy;
But if you want the const there so that the declaration matches the
function defintion, why not do that for all the parameters? Basically,
struct object_prv_vtable {
int (*fp_destroy) (void *);
int (*fp_read) (void *, void *, size_t);
int (*fp_write) (void *, void const *, size_t);
};
and the actual functions that get assigned to these pointers might, if
you want that extra check, have all their parametera marked const. Or,
for consistency, you might have written
struct object_prv_vtable {
int (*fp_destroy) (void * const);
int (*fp_read) (void * const, void * const, size_t const);
int (*fp_write) (void * const, void const * const, size_t const);
};
even if none of the actual functions have const parameters.
Finally, if you had intended to write what you later went on to talk
about, you would have written either
struct object_prv_vtable {
int (*fp_destroy) (const void *);
int (*fp_read) (const void *, void *, size_t);
int (*fp_write) (const void *, void const *, size_t);
};
or
struct object_prv_vtable {
int (*fp_destroy) (const void * const);
int (*fp_read) (const void * const, void * const, size_t const);
int (*fp_write) (const void * const, void const * const, size_t const);
};
TL;DR: where you put the consts in the original just seemed arbitrary.
I'll also note that the term "const pointer" is often used when the
pointer is not const! It most often mean that the pointed-to type is
const qualified. As such, it's best to avoid the term altogether.

I wanted to get across that the pointer value for the first parameter
itself should not be modified. I read (void* const) as a const pointer to a
"non-const" void. Now a const pointer to a const void is (void const*
const), from my code, notice the first parameter?
I consider the first parameter to be special in this older OO experiment of

You could have said that when I asked many posts ago! I can't see a
sound technical reason to put a const there but that parameter is in
some way different I suppose. The effect on readers is likely to be a
puzzled, mild confusion.

Note that is not really "in the API" as it is entirely optional whether
the implementation has a const first parameter.

--
Ben.

Chris M. Thomasson

2024-08-12 23:18:18 UTC

[...]

Also, take notice of:

struct device_vtable {
struct object_prv_vtable const object;
struct device_prv_vtable const device;
};

:^)

Chris M. Thomasson

2024-08-05 22:44:18 UTC

On 8/5/2024 1:54 PM, Ben Bacarisse wrote:
[...]

Another habit of mine was to write:

void* const self_,

or:

void* const this_,

For the first parameter... That trailing underscore... ;^o

Tim Rentsch

2024-08-12 21:38:36 UTC

Post by Keith Thompson
[...]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?
You get all the warnings for free that way.

The "static", if this is at block scope, specifies that the
pointer object, not the array object, has static storage duration.
If it's at file scope it specifies that the name "s" is not
visible to other translation units. Either way, use it if that's
what you want, don't use it if it isn't.
There's no good reason not to use "const". [...]

Other people have different opinions on that question.

Keith Thompson

2024-08-12 21:55:32 UTC

Post by Tim Rentsch

Post by Keith Thompson
[...]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?
You get all the warnings for free that way.

The "static", if this is at block scope, specifies that the
pointer object, not the array object, has static storage duration.
If it's at file scope it specifies that the name "s" is not
visible to other translation units. Either way, use it if that's
what you want, don't use it if it isn't.
There's no good reason not to use "const". [...]

Other people have different opinions on that question.

You could have told us your opinion. You could have explained why
someone might have a different opinion. You could have given us
a good reason not to use "const", assuming there is such a reason.
You know the language well enough to make me suspect you might have
something specific in mind.

That could have been interesting and useful.

Instead, you chose to waste everyone's time with a practically
content-free response.

Yes, different people have different opinions. Golly, I never
knew that.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

2024-09-03 13:11:52 UTC

Post by Keith Thompson

Post by Tim Rentsch

Post by Keith Thompson
[...]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?
You get all the warnings for free that way.

The "static", if this is at block scope, specifies that the
pointer object, not the array object, has static storage duration.
If it's at file scope it specifies that the name "s" is not
visible to other translation units. Either way, use it if that's
what you want, don't use it if it isn't.
There's no good reason not to use "const". [...]

Other people have different opinions on that question.

You could have told us your opinion. You could have explained why
someone might have a different opinion. You could have given us a
good reason not to use "const", assuming there is such a reason.
You know the language well enough to make me suspect you might
have something specific in mind. [...]

I said all that I thought needed saying. I see no reason
to add to it.

d***@comcast.net

2024-08-25 20:52:15 UTC

On Fri, 2 Aug 2024 13:04:55 +0100, Richard Harnden
<***@gmail.invalid> wrote:

[string literals not typed const in C even though writing prohibited]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?
You get all the warnings for free that way.

But sizeof s is 8 or 4 regardless of the string, while sizeof "some
string" is the length of the string plus 1 (for the null terminator).

static const char s[] = "hello, world";
// autosized by initializer

would be a better replacement, or in C99+ if at file scope

(const char[]){"hello, world"}

Keith Thompson

2024-08-25 21:26:59 UTC

Post by d***@comcast.net
On Fri, 2 Aug 2024 13:04:55 +0100, Richard Harnden
[string literals not typed const in C even though writing prohibited]

Post by Richard Harnden
Is there any reason not to always write ...
static const char *s = "hello, world";
... ?
You get all the warnings for free that way.

But sizeof s is 8 or 4 regardless of the string, while sizeof "some
string" is the length of the string plus 1 (for the null terminator).
static const char s[] = "hello, world";
// autosized by initializer
would be a better replacement, or in C99+ if at file scope
(const char[]){"hello, world"}

Most uses of that string are very likely to be via function arguments.

If it's defined at file scope, defining s as an array rather than as a
pointer can be useful for any code that refers to it directly (and needs
its size), but as soon as you pass it to a function you lose the size
information (and probably need to pass an extra argument for the
length).

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

2024-08-12 21:33:48 UTC

Post by Keith Thompson
[...]

Post by candycanearter07

Post by David Brown
gcc has the option "-Wwrite-strings" that makes string literals in
C have "const char" array type, and thus give errors when you try
to assign to a non-const char * pointer. But the option has to be
specified explicitly (it is not in -Wall) because it changes the
meaning of the code and can cause compatibility issues with
existing correct code.

-Wwrite-strings is included in -Wpedantic.

No it isn't, nor is it included in -Wall -- and it wouldn't make
sense to do so.
The -Wpedantic option is intended to produce all required
diagnostics for the specified C standard. -Wwrite-strings
gives string literals the type `const char[LENGTH]`, which
enables useful diagnostics but is *non-conforming*.

As long as the -Wwrite-strings diagnostics are only warnings the
result is still conforming.

Keith Thompson

2024-08-12 21:45:13 UTC

Post by Tim Rentsch

Post by Keith Thompson
[...]

Post by candycanearter07

Post by David Brown
gcc has the option "-Wwrite-strings" that makes string literals in
C have "const char" array type, and thus give errors when you try
to assign to a non-const char * pointer. But the option has to be
specified explicitly (it is not in -Wall) because it changes the
meaning of the code and can cause compatibility issues with
existing correct code.

-Wwrite-strings is included in -Wpedantic.

No it isn't, nor is it included in -Wall -- and it wouldn't make
sense to do so.
The -Wpedantic option is intended to produce all required
diagnostics for the specified C standard. -Wwrite-strings
gives string literals the type `const char[LENGTH]`, which
enables useful diagnostics but is *non-conforming*.

As long as the -Wwrite-strings diagnostics are only warnings the
result is still conforming.

It's not just about diagnostics. This program:

#include <stdio.h>
int main(void) {
puts(_Generic("hello",
char*: "char*",
const char*: "const char*",
default: "?"));
}

must print "char*" in a conforming implementation. With
(gcc|clang) -Wwrite-strings, it prints "const char*".

And something as simple as:

char *p = "hello";

is rejected with a fatal error with "-Wwrite-strings -pedantic-errors".

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

2024-08-12 23:05:29 UTC

Post by Mark Summerfield

Post by Tim Rentsch

Post by Keith Thompson
[...]

Post by candycanearter07

Post by David Brown
gcc has the option "-Wwrite-strings" that makes string literals in
C have "const char" array type, and thus give errors when you try
to assign to a non-const char * pointer. But the option has to be
specified explicitly (it is not in -Wall) because it changes the
meaning of the code and can cause compatibility issues with
existing correct code.

-Wwrite-strings is included in -Wpedantic.

No it isn't, nor is it included in -Wall -- and it wouldn't make
sense to do so.
The -Wpedantic option is intended to produce all required
diagnostics for the specified C standard. -Wwrite-strings
gives string literals the type `const char[LENGTH]`, which
enables useful diagnostics but is *non-conforming*.

As long as the -Wwrite-strings diagnostics are only warnings the
result is still conforming.

#include <stdio.h>
int main(void) {
puts(_Generic("hello",
char*: "char*",
const char*: "const char*",
default: "?"));
}
must print "char*" in a conforming implementation. With
(gcc|clang) -Wwrite-strings, it prints "const char*".

Good point. I hadn't considered such cases.

Post by Mark Summerfield
char *p = "hello";
is rejected with a fatal error with "-Wwrite-strings -pedantic-errors".

That violates the "As long as the -Wwrite-strings diagnostics are
only warnings" condition.

David Brown

2024-08-13 11:08:57 UTC

Post by Tim Rentsch

Post by Mark Summerfield

Post by Tim Rentsch

Post by Keith Thompson
[...]

Post by candycanearter07

Post by David Brown
gcc has the option "-Wwrite-strings" that makes string literals in
C have "const char" array type, and thus give errors when you try
to assign to a non-const char * pointer. But the option has to be
specified explicitly (it is not in -Wall) because it changes the
meaning of the code and can cause compatibility issues with
existing correct code.

-Wwrite-strings is included in -Wpedantic.

No it isn't, nor is it included in -Wall -- and it wouldn't make
sense to do so.
The -Wpedantic option is intended to produce all required
diagnostics for the specified C standard. -Wwrite-strings
gives string literals the type `const char[LENGTH]`, which
enables useful diagnostics but is *non-conforming*.

As long as the -Wwrite-strings diagnostics are only warnings the
result is still conforming.

#include <stdio.h>
int main(void) {
puts(_Generic("hello",
char*: "char*",
const char*: "const char*",
default: "?"));
}
must print "char*" in a conforming implementation. With
(gcc|clang) -Wwrite-strings, it prints "const char*".

Good point. I hadn't considered such cases.

Post by Mark Summerfield
char *p = "hello";
is rejected with a fatal error with "-Wwrite-strings -pedantic-errors".

That violates the "As long as the -Wwrite-strings diagnostics are
only warnings" condition.

Indeed.

I personally think it is nice to have an option to make string literals
"const" in C, even though it is non-conforming. I also think it is very
useful to have a warning on attempts to write to string literals. But I
think gcc has made a mistake here by conflating the two. I'd rather see
the warning being enabled by default (or at least in -Wall), while the
"make string literals const" option should require an explicit flag and
be a "-f" flag rather than a "-W" flag. The current situation seems to
be a quick-and-dirty way to get the warning.

Other people may have different opinions, of course :-)

Keith Thompson

2024-08-13 20:00:26 UTC

Post by David Brown

Post by Tim Rentsch

Post by Mark Summerfield

Post by Tim Rentsch

Post by Keith Thompson
[...]

Post by candycanearter07

Post by David Brown
gcc has the option "-Wwrite-strings" that makes string literals in
C have "const char" array type, and thus give errors when you try
to assign to a non-const char * pointer. But the option has to be
specified explicitly (it is not in -Wall) because it changes the
meaning of the code and can cause compatibility issues with
existing correct code.

-Wwrite-strings is included in -Wpedantic.

No it isn't, nor is it included in -Wall -- and it wouldn't make
sense to do so.
The -Wpedantic option is intended to produce all required
diagnostics for the specified C standard. -Wwrite-strings
gives string literals the type `const char[LENGTH]`, which
enables useful diagnostics but is *non-conforming*.

As long as the -Wwrite-strings diagnostics are only warnings the
result is still conforming.

#include <stdio.h>
int main(void) {
puts(_Generic("hello",
char*: "char*",
const char*: "const char*",
default: "?"));
}
must print "char*" in a conforming implementation. With
(gcc|clang) -Wwrite-strings, it prints "const char*".

Good point. I hadn't considered such cases.

Post by Mark Summerfield
char *p = "hello";
is rejected with a fatal error with "-Wwrite-strings -pedantic-errors".

That violates the "As long as the -Wwrite-strings diagnostics are
only warnings" condition.

Indeed.
I personally think it is nice to have an option to make string
literals "const" in C, even though it is non-conforming. I also think
it is very useful to have a warning on attempts to write to string
literals. But I think gcc has made a mistake here by conflating the
two. I'd rather see the warning being enabled by default (or at least
in -Wall), while the "make string literals const" option should
require an explicit flag and be a "-f" flag rather than a "-W" flag.
The current situation seems to be a quick-and-dirty way to get the
warning.
Other people may have different opinions, of course :-)

I agree. An alternative way to implement "-Wwrite-strings" might have
been to invent a new attribute that can be applied to string literal
objects. With the current "-Wwrite-strings", gcc marks string literal
objects as const, with all the non-conforming consequences that implies.
Instead, they could have added an attribute like say, "unwritable" that
triggers warnings but no other changes in semantics and no fatal errors
(unless you use -Werror, but then you're literally asking for it).

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

David Brown

2024-08-03 17:54:20 UTC

Post by candycanearter07

Post by David Brown

Post by Michael S
On Thu, 01 Aug 2024 08:06:57 +0000

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";
printf("before [%s]\n", text);
uppercase_ascii(text);
printf("after [%s]\n", text);
}

The answers to your question are already given above, so I'd talk about
something else. Sorry about it.
To my surprise, none of the 3 major compilers that I tried issued the
char* text = "this is a test";
If implicit conversion of 'const char*' to 'char*' does not warrant
compiler warning than I don't know what does.
Is there something in the Standard that explicitly forbids diagnostic
for this sort of conversion?
BTW, all 3 compilers issue reasonable warnings when I write it slightly
const char* ctext = "this is a test";
char* text = ctext;
I am starting to suspect that compilers (and the Standard?) consider
string literals as being of type 'char*' rather than 'const char*'.

Your suspicions are correct - in C, string literals are used to
initialise an array of char (or wide char, or other appropriate
character type). Perhaps you are thinking of C++, where the type is
"const char" (or other const character type).
So in C, when a string literal is used in an expression it is converted
to a "char *" pointer. You can, of course, assign that to a "const char
*" pointer. But it does not make sense to have a warning when assigning
it to a non-const "char *" pointer. This is despite it being undefined
behaviour (explicitly stated in the standards) to attempt to write to a
string literal.
The reason string literals are not const in C is backwards compatibility
- they existed before C had "const", and making string literals into
"const char" arrays would mean that existing code that assigned them to
non-const pointers would then be in error. C++ was able to do the right
thing and make them arrays of const char because it had "const" from the
beginning.
gcc has the option "-Wwrite-strings" that makes string literals in C
have "const char" array type, and thus give errors when you try to
assign to a non-const char * pointer. But the option has to be
specified explicitly (it is not in -Wall) because it changes the meaning
of the code and can cause compatibility issues with existing correct code.

-Wwrite-strings is included in -Wpedantic.

No, it is not - which is a good thing, because -Wpedantic should not
include features that change the semantics of the language! (IMHO the
flag should not be called -Wwrite-strings, but -fconst-string-literals
or similar. It's not really a normal warning option.)

For C++, -pedantic-errors includes the -Wwrite-strings flag which then
makes implicit conversion of string literal expressions to non-const
char* pointers an error. But that's C++, not C.

James Kuyper

2024-08-01 16:02:30 UTC

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";

"In translation phase 7, a byte or code of value zero is appended to
each multibyte character sequence that results from a string literal or
literals. 89) The multibyte character sequence is then used to
initialize an array of static storage duration and length just
sufficient to contain the sequence. ..." (6.4.5p6)

"... If the program attempts to modify such an array, the behavior is
undefined." (6.4.5p7).

This gives implementation the freedom,for instance, to store that array
in read-only memory, though they don't have to do so. The segfault you
got suggests that the implementation you're using did so. On other
platforms, writes to read-only memory might be silently ignored. On a
platform where it is possible to write to such memory, the
implementation is still free to optimize the code on the assumption that
you won't. That could produce bizarrely unexpected behavior if you
actually do modify it.

What you want to do is initialize an array with the static literal:

char text[] = "this is a test";

Nominally, such an array is initialized by copying from the string
literal's array. However, there's no way for strictly conforming code to
determine whether or not there are two such arrays. If the "text" array
has static storage duration, the string literal's array is likely to be
optimized away.

Kaz Kylheku

2024-08-01 19:39:04 UTC

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";

The "this is a test" object is a literal. It is part of the program's image.
When you try to change it, you're making your program self-modifying.

The ISO C language standard doesn't require implementations to support
self-modifying programs; the behavior is left undefined.

It could work in some documented, reliable way, in a given
implementation.

It's the same with any other constant in the program. Say you have
a malloc(1024) somewhere in the program. That 1024 number is encoded
into the program's image somhow, and in principle you could write code
to somehow get at that number and change it to 256. Long before you got
that far, you would be in undefined behavior territory. If it worked,
it could have surprising effects. For instance, there could be another
call to malloc(1024) in the program and, surprisingly, *that* one also
changes to malloc(256).

A literal like "this is a test" is similar to that 1024, except
that it's very easy to get at it. The language defines it aws an object
with an address, and to get that address all we have to do is evaluate
that expression itself. A minimal piece of code that requests the
undefined consequences of modifying a string literal is as easy
as "a"[0] = 0.

Post by Mark Summerfield
Program received signal SIGSEGV, Segmentation fault.
0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
at inplace.c:6
6 *s = toupper(*s);

On Linux, the string literals of a C executable are located together
with the program text. They are interspersed among the machine
instructions which reference them. The program text is mapped
read-only, so an attempted modification is an access violation trapped
by the OS, turned into a SIGSEGV signal.

GCC uses to have a -fwritable-strings option, but it has been removed
for quite some time now.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Bart

2024-08-01 20:42:48 UTC

Post by Kaz Kylheku

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";

The "this is a test" object is a literal. It is part of the program's image.

So is the text here:

char text[]="this is a test";

But this can be changed without making the program self-modifying.

I guess it depends on what is classed as the program's 'image'.

I'd say the image in the state it is in just after loading or just
before execution starts (since certain fixups are needed). But some
sections will be writable during execution, some not.

Post by Kaz Kylheku
When you try to change it, you're making your program self-modifying.

Post by Mark Summerfield
Program received signal SIGSEGV, Segmentation fault.
0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
at inplace.c:6
6 *s = toupper(*s);

On Linux, the string literals of a C executable are located together
with the program text. They are interspersed among the machine
instructions which reference them. The program text is mapped
read-only, so an attempted modification is an access violation trapped
by the OS, turned into a SIGSEGV signal.

Does it really do that? That's the method I've used for read-only
strings, to put them into the code-segment (since I neglected to support
a dedicated read-only data section, and it's too much work now).

But I don't like it since the code section is also executable; you could
inadvertently execute code within a string (which might happen to
contain machine code for other purposes).

The dangers are small, but there must be reasons why a dedication
section is normally used. gcc on Windows creates up to 19 sections, so
it would odd for literal strings to share with code.

Keith Thompson

2024-08-01 21:13:32 UTC

Post by Kaz Kylheku

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";

The "this is a test" object is a literal. It is part of the
program's image.

char text[]="this is a test";
But this can be changed without making the program self-modifying.

Incorrect. The string literal results in the creation of an array
object. Any attempt to modify that array object would have undefined
behavior -- but there's no way to modify it because its address isn't
available to the code.

`text` is a distinct object. At execution time (assuming it's defined
at block scope), that object is initialized by copying from the string
literal object. (This is what happens in the abstract machine; there
are opportunities for optimization that might result in the string
literal object not existing in the generated code.)

Post by Bart
I guess it depends on what is classed as the program's 'image'.

Not really.

Given:

int n = 42;

you can't modify 42, but you can modify n. There's no need to consider
the idea of self-modifying code. You're just trying to make it seem
more confusing than it really is.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Ben Bacarisse

2024-08-01 21:40:23 UTC

Post by Kaz Kylheku

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";

The "this is a test" object is a literal. It is part of the program's image.

char text[]="this is a test";
But this can be changed without making the program self-modifying.

Different "this". The array generated by the string can't be modified
without UB. The "this" that can be changed in the corrected version is
a plain, automatically allocated array of char, initialised with the
values from the string.

Post by Bart
I guess it depends on what is classed as the program's 'image'.

The self-modifying remark is a bit of a red-herring, but altering the
value of named automatic objects can't be classed as altering the
program's image even in any reasonable way at all.

Post by Bart
I'd say the image in the state it is in just after loading or just before
execution starts (since certain fixups are needed). But some sections will
be writable during execution, some not.

Post by Kaz Kylheku
When you try to change it, you're making your program self-modifying.

Post by Mark Summerfield
Program received signal SIGSEGV, Segmentation fault.
0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
at inplace.c:6
6 *s = toupper(*s);

On Linux, the string literals of a C executable are located together
with the program text. They are interspersed among the machine
instructions which reference them. The program text is mapped
read-only, so an attempted modification is an access violation trapped
by the OS, turned into a SIGSEGV signal.

Does it really do that?

Linux does not really have much to do with it; the C implementation
decides, though the OS will influence what choices make more or less
sense.

For example, with my gcc (13.2.0) on Ubuntu the string is put into a
section called .rodata, but tcc on the same Linux box puts it in .data.
As a result the tcc compiled program runs without any issues and outputs

before [this is a test]
after [THIS IS A TEST]

Some C implementations, on some Linux systems might put strings in the
text segment, but I've not see a system that does that for decades.
Mind you "Linux" refers to a huge class of systems ranging from top-end
servers to tiny embedded devices)

--
Ben.

Kaz Kylheku

2024-08-02 00:37:44 UTC

Post by Kaz Kylheku

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";

The "this is a test" object is a literal. It is part of the program's image.

char text[]="this is a test";
But this can be changed without making the program self-modifying.

The array which is initialized by the literal is what can be
changed.

In this situation, the literal is just initializer syntax,
not required to be an object with an address.

But there could well be such an object in the program image,
especially if the array is automatic, and thus instantiated
many times.

If the program tries to search for that object and modify it,
it will run into UB.

Post by Bart
I guess it depends on what is classed as the program's 'image'.
I'd say the image in the state it is in just after loading or just
before execution starts (since certain fixups are needed). But some
sections will be writable during execution, some not.

Programs can self-modify in ways designed into the run time.
The toaster has certain internal receptacles that can take
certain forks, according to some rules, which do not affect
the user operating the toaster according to the manual.

Post by Bart
The dangers are small, but there must be reasons why a dedication
section is normally used. gcc on Windows creates up to 19 sections, so
it would odd for literal strings to share with code.

One reason is that PC-relative addressing can be used by code to
find its literals. Since that usually has a limited range, it helps
to keep the literals with the code. Combining sections also reduces
size. The addressing is also relocatable, which is useful in shared
libs.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Bart

2024-08-02 10:36:36 UTC

Post by Kaz Kylheku

Post by Kaz Kylheku

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";

The "this is a test" object is a literal. It is part of the program's image.

char text[]="this is a test";
But this can be changed without making the program self-modifying.

The array which is initialized by the literal is what can be
changed.
In this situation, the literal is just initializer syntax,
not required to be an object with an address.

I don't spot the 'int main() {' part of your example; my version of it
was meant to be static. (My A, B examples explicitly used 'static'.)

Post by Kaz Kylheku

Post by Bart
I guess it depends on what is classed as the program's 'image'.
I'd say the image in the state it is in just after loading or just
before execution starts (since certain fixups are needed). But some
sections will be writable during execution, some not.

Programs can self-modify in ways designed into the run time.
The toaster has certain internal receptacles that can take
certain forks, according to some rules, which do not affect
the user operating the toaster according to the manual.

Post by Bart
The dangers are small, but there must be reasons why a dedication
section is normally used. gcc on Windows creates up to 19 sections, so
it would odd for literal strings to share with code.

One reason is that PC-relative addressing can be used by code to
find its literals. Since that usually has a limited range, it helps
to keep the literals with the code. Combining sections also reduces
size. The addressing is also relocatable, which is useful in shared
libs.

You must be talking about ARM then, with its limited address
displacement (I think 12 bits or +/- 2KB).

On x64, PC-relative uses a 32-bit offset so the range is +/- 2GB; enough
to have string literals located in their own read-only section of memory.

I'm sure you can do that on ARM too, I can think of several ways (and
there are loads more registers to play with keep as bases to tables of
such data). But I don't know what real code does.

Tim Rentsch

2024-08-12 20:47:02 UTC

Post by Kaz Kylheku

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";

The "this is a test" object is a literal. It is part of the
program's image.

char text[]="this is a test";
But this can be changed without making the program self-modifying.

The array which is initialized by the literal is what can be
changed.
In this situation, the literal is just initializer syntax,
not required to be an object with an address.

In the abstract machine I believe the initializing string
literal is required to be an object with an address. The
discussion of string literals in 6.4.5 says there is such
an object for every string literal, and I don't see any
text in 6.7.9, covering Initialization, that overrules or
contradicts that.

David Brown

2024-08-02 22:14:09 UTC

Post by Kaz Kylheku

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
     while (*s) {
         *s = toupper(*s); // SEGFAULT
         s++;
     }
}
int main() {
     char* text = "this is a test";

The "this is a test" object is a literal. It is part of the program's image.

char text[]="this is a test";
But this can be changed without making the program self-modifying.

"this is a test" is a string literal, and is typically part of the
program's image. (There are some C implementations that do things
differently, like storing such initialisation data in a compressed format.)

The array "char text[]", however, is a normal variable of type array of
char. It is most definitely not part of the program image - it is in
ram (statically allocated or on the stack, depending on the context) and
is initialised by copying the characters from the string literal (prior
to main(), or at each entry to its scope if it is a local variable).

The string literal initialisation data cannot be changed without
self-modifying code or other undefined behaviour. The variable "text"
is just a normal array and can be changed at will.

I guess it depends on what is classed as the program's 'image'.

No, it depends on understanding what the C means and not trying to
confuse yourself and others.

I'd say the image in the state it is in just after loading or just
before execution starts (since certain fixups are needed). But some
sections will be writable during execution, some not.

That is a poor definition because you are not considering initialised
data, and you are not clear about what you mean by "before execution
starts". A C program typically has an entry point that clears the
zero-initialised program-lifetime data, initialises the initialised
program-lifetime data by copying from a block in the program image, then
sets up things like stdin, heap support, argc/argv, and various other
run-time setup features. Then it calls main(). The initialised data
section and zero-initialised data section are certainly part of the
state of the program at the start of the execution from C's viewpoint -
entry to main(). They are equally certainly not part of the program image.

One reasonable definition of "program image" would be "the file on the
disk" (on general-purpose OS's) or "the binary data in flash" on typical
embedded systems. Another might be the read-only data sections set up
by the OS loader just before jumping to the entry point of the C
run-time code (long before main() is called and the C code itself starts).

Post by Kaz Kylheku
When you try to change it, you're making your program self-modifying.

Post by Mark Summerfield
Program received signal SIGSEGV, Segmentation fault.
0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
at inplace.c:6
6 *s = toupper(*s);

On Linux, the string literals of a C executable are located together
with the program text. They are interspersed among the machine
instructions which reference them. The program text is mapped
read-only, so an attempted modification is an access violation trapped
by the OS, turned into a SIGSEGV signal.

Does it really do that? That's the method I've used for read-only
strings, to put them into the code-segment (since I neglected to support
a dedicated read-only data section, and it's too much work now).

No, Linux systems don't have read-only data or string literals
interspersed with code. They have such data in separate segments, for
better cache efficiency and to allow different section attributes
(read-only data can't be executed).

But I don't like it since the code section is also executable; you could
inadvertently execute code within a string (which might happen to
contain machine code for other purposes).

That's why code and read-only data is rarely interspersed.

The dangers are small, but there must be reasons why a dedication
section is normally used. gcc on Windows creates up to 19 sections, so
it would odd for literal strings to share with code.

Scott Lurndal

2024-08-03 17:07:59 UTC

Post by David Brown

Â char text[]="this is a test";
But this can be changed without making the program self-modifying.

"this is a test" is a string literal, and is typically part of the
program's image. (There are some C implementations that do things
differently, like storing such initialisation data in a compressed format.)
The array "char text[]", however, is a normal variable of type array of
char. It is most definitely not part of the program image - it is in
ram (statically allocated or on the stack, depending on the context) and
is initialised by copying the characters from the string literal (prior
to main(), or at each entry to its scope if it is a local variable).

Linux (ELF):

A file-scope static declaration of char text[] will emit the string
literal into the .data section and that data section will be loaded
into memory by the ELF loader. There is no copy made at runtime
before main().

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

char text1[] = "This is a test of a static-scope string";

int
main(int argc, const char **argv)
{
char text2[] = "This is a test of a function-scope string";

fprintf(stdout, "%p %s\n", &text1, text1);
fprintf(stdout, "%s\n", text2);

return 0;
}

$ /tmp/a
0x601060 This is a test of a static-scope string
This is a test of a function-scope string

$ objdump -p /tmp/a

/tmp/a: file format elf64-x86-64

Program Header:
PHDR off 0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 2**3
filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x
INTERP off 0x0000000000000238 vaddr 0x0000000000400238 paddr 0x0000000000400238 align 2**0
filesz 0x000000000000001c memsz 0x000000000000001c flags r--
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
filesz 0x00000000000007dc memsz 0x00000000000007dc flags r-x
LOAD off 0x0000000000000e10 vaddr 0x0000000000600e10 paddr 0x0000000000600e10 align 2**21
filesz 0x0000000000000278 memsz 0x0000000000000290 flags rw-

.data section:

0000e00: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000e10: 5005 4000 0000 0000 3005 4000 0000 0000 ***@.....0.@.....
0000e20: 0000 0000 0000 0000 0100 0000 0000 0000 ................
0000e30: 0100 0000 0000 0000 0c00 0000 0000 0000 ................
0000e40: 2804 4000 0000 0000 0d00 0000 0000 0000 (***@.............
0000e50: a406 4000 0000 0000 1900 0000 0000 0000 ***@.............
0000e60: 100e 6000 0000 0000 1b00 0000 0000 0000 ..`.............
0000e70: 0800 0000 0000 0000 1a00 0000 0000 0000 ................
0000e80: 180e 6000 0000 0000 1c00 0000 0000 0000 ..`.............
0000e90: 0800 0000 0000 0000 f5fe ff6f 0000 0000 ...........o....
0000ea0: 9802 4000 0000 0000 0500 0000 0000 0000 ***@.............
0000eb0: 3803 4000 0000 0000 0600 0000 0000 0000 ***@.............
0000ec0: c002 4000 0000 0000 0a00 0000 0000 0000 ***@.............
0000ed0: 4700 0000 0000 0000 0b00 0000 0000 0000 G...............
0000ee0: 1800 0000 0000 0000 1500 0000 0000 0000 ................
0000ef0: 0000 0000 0000 0000 0300 0000 0000 0000 ................
0000f00: 0010 6000 0000 0000 0200 0000 0000 0000 ..`.............
0000f10: 4800 0000 0000 0000 1400 0000 0000 0000 H...............
0000f20: 0700 0000 0000 0000 1700 0000 0000 0000 ................
0000f30: e003 4000 0000 0000 0700 0000 0000 0000 ***@.............
0000f40: b003 4000 0000 0000 0800 0000 0000 0000 ***@.............
0000f50: 3000 0000 0000 0000 0900 0000 0000 0000 0...............
0000f60: 1800 0000 0000 0000 feff ff6f 0000 0000 ...........o....
0000f70: 9003 4000 0000 0000 ffff ff6f 0000 0000 ***@........o....
0000f80: 0100 0000 0000 0000 f0ff ff6f 0000 0000 ...........o....
0000f90: 8003 4000 0000 0000 0000 0000 0000 0000 ***@.............
0000fa0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000fb0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0001000: 280e 6000 0000 0000 0000 0000 0000 0000 (.`.............
0001010: 0000 0000 0000 0000 6604 4000 0000 0000 ***@.....
0001020: 7604 4000 0000 0000 8604 4000 0000 0000 ***@.......@.....
0001030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0001040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0001050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0001060: 5468 6973 2069 7320 6120 7465 7374 206f This is a test o
0001070: 6620 6120 7374 6174 6963 2d73 636f 7065 f a static-scope
0001080: 2073 7472 696e 6700 4743 433a 2028 474e string.GCC: (GN

$ printf "0x%x\n" $(( 0x601060 - 0x0000000000600e10 ))
0x250

Keith Thompson

2024-08-04 00:11:55 UTC

Post by Scott Lurndal

Post by David Brown

char text[]="this is a test";
But this can be changed without making the program self-modifying.

"this is a test" is a string literal, and is typically part of the
program's image. (There are some C implementations that do things
differently, like storing such initialisation data in a compressed format.)
The array "char text[]", however, is a normal variable of type array of
char. It is most definitely not part of the program image - it is in
ram (statically allocated or on the stack, depending on the context) and
is initialised by copying the characters from the string literal (prior
to main(), or at each entry to its scope if it is a local variable).

A file-scope static declaration of char text[] will emit the string
literal into the .data section and that data section will be loaded
into memory by the ELF loader. There is no copy made at runtime
before main().
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
char text1[] = "This is a test of a static-scope string";

In the abstract machine, there's an anonymous array object corresponding
to the string literal, and `text` is a distinct object, also with static
storage duration. The compiler optimizes it away and only stores the
data in `text`.

Post by Scott Lurndal
int
main(int argc, const char **argv)
{
char text2[] = "This is a test of a function-scope string";

Since the second string literal is identical, the compiler is permitted
to store them in the same place (it's unspecified, so the implementation
doesn't have to document this). Presumably there's code to copy from
the static array into `text2`, executed within `main`.

Post by Scott Lurndal
fprintf(stdout, "%p %s\n", &text1, text1);
fprintf(stdout, "%s\n", text2);
return 0;
}
$ /tmp/a
0x601060 This is a test of a static-scope string
This is a test of a function-scope string
$ objdump -p /tmp/a
/tmp/a: file format elf64-x86-64
PHDR off 0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 2**3
filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x
INTERP off 0x0000000000000238 vaddr 0x0000000000400238 paddr 0x0000000000400238 align 2**0
filesz 0x000000000000001c memsz 0x000000000000001c flags r--
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
filesz 0x00000000000007dc memsz 0x00000000000007dc flags r-x
LOAD off 0x0000000000000e10 vaddr 0x0000000000600e10 paddr 0x0000000000600e10 align 2**21
filesz 0x0000000000000278 memsz 0x0000000000000290 flags rw-
0000e00: 0000 0000 0000 0000 0000 0000 0000 0000 ................

[36 lines deleted]

Post by Scott Lurndal
0001050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0001060: 5468 6973 2069 7320 6120 7465 7374 206f This is a test o
0001070: 6620 6120 7374 6174 6963 2d73 636f 7065 f a static-scope
0001080: 2073 7472 696e 6700 4743 433a 2028 474e string.GCC: (GN
$ printf "0x%x\n" $(( 0x601060 - 0x0000000000600e10 ))
0x250

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson

2024-08-04 00:07:37 UTC

David Brown <***@hesbynett.no> writes:
[...]

Post by David Brown
"this is a test" is a string literal, and is typically part of the
program's image. (There are some C implementations that do things
differently, like storing such initialisation data in a compressed format.)

[...]

What implementations do that? Typically data that's all zeros isn't
stored in the image, but general compression isn't something I've seen
(not that I've paid much attention). It would save space in the image,
but it would require decompression at load time and wouldn't save any
space at run time.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Lawrence D'Oliveiro

2024-08-04 01:08:40 UTC

... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the “PEF” executable format
that they created for their PowerPC machines running old MacOS. This had
to do with some clever instruction encodings for loading stuff into
memory.

Keith Thompson

2024-08-04 02:58:37 UTC

Post by Lawrence D'Oliveiro

... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the “PEF” executable format
that they created for their PowerPC machines running old MacOS. This had
to do with some clever instruction encodings for loading stuff into
memory.

Is that relevant to what I asked about?

What I had in mind is something that, given this:

static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements

would store something less than 1000*sizeof(int) bytes in the executable
file. I wouldn't be hard to do, but I'm not convinced it would be
worthwhile.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Richard Damon

2024-08-04 11:22:57 UTC

Post by Keith Thompson

Post by Lawrence D'Oliveiro

... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the “PEF” executable format
that they created for their PowerPC machines running old MacOS. This had
to do with some clever instruction encodings for loading stuff into
memory.

Is that relevant to what I asked about?
static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements
would store something less than 1000*sizeof(int) bytes in the executable
file. I wouldn't be hard to do, but I'm not convinced it would be
worthwhile.

I vaguely seem to remember an embedded format that did something like
this. The .init segement that was "copied" to the .data segement has a
simple run-length encoding option. For non-repetitive data, it just
encoded 1 copy of length n. But it could also encode repeats like your
example. When EPROM was a scarce commodity squeezing out a bit of size
for the .init segment was useful.

My guess that since it didn't persist, it didn't actually help that much.

Tim Rentsch

2024-08-12 09:55:01 UTC

Post by Richard Damon

Post by Keith Thompson

... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the ?PEF?
executable format that they created for their PowerPC machines
running old MacOS. This had to do with some clever instruction
encodings for loading stuff into memory.

Is that relevant to what I asked about?
static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements
would store something less than 1000*sizeof(int) bytes in the
executable file. I wouldn't be hard to do, but I'm not convinced
it would be worthwhile.

I vaguely seem to remember an embedded format that did something like
this. The .init segement that was "copied" to the .data segement has
a simple run-length encoding option. For non-repetitive data, it
just encoded 1 copy of length n. But it could also encode repeats
like your example. When EPROM was a scarce commodity squeezing out a
bit of size for the .init segment was useful.
My guess that since it didn't persist, it didn't actually help that much.

Or maybe it helped back in the day, but since then technology has
changed and it doesn't help any more.

Lawrence D'Oliveiro

2024-08-05 06:33:22 UTC

Post by Keith Thompson

Post by Lawrence D'Oliveiro

... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the “PEF” executable
format that they created for their PowerPC machines running old MacOS.
This had to do with some clever instruction encodings for loading stuff
into memory.

Is that relevant to what I asked about?

“Compression”

Keith Thompson

2024-08-05 06:38:14 UTC

Post by Lawrence D'Oliveiro

Post by Keith Thompson

Post by Lawrence D'Oliveiro

... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the “PEF” executable
format that they created for their PowerPC machines running old MacOS.
This had to do with some clever instruction encodings for loading stuff
into memory.

Is that relevant to what I asked about?

“Compression”

Was that intended to be responsive?

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Lawrence D'Oliveiro

2024-08-05 21:27:16 UTC

Post by Keith Thompson

Post by Lawrence D'Oliveiro

Post by Keith Thompson

Post by Lawrence D'Oliveiro

... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the “PEF” executable
format that they created for their PowerPC machines running old
MacOS. This had to do with some clever instruction encodings for
loading stuff into memory.

Is that relevant to what I asked about?

“Compression”

Was that intended to be responsive?

Hint: you have to know something about executable formats.

Keith Thompson

2024-08-05 22:40:42 UTC

Post by Lawrence D'Oliveiro

Post by Keith Thompson

Post by Lawrence D'Oliveiro

Post by Keith Thompson

Post by Lawrence D'Oliveiro

... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the “PEF” executable
format that they created for their PowerPC machines running old
MacOS. This had to do with some clever instruction encodings for
loading stuff into memory.

Is that relevant to what I asked about?

“Compression”

Was that intended to be responsive?

Hint: you have to know something about executable formats.

I am profoundly uninterested in hints.

Here's what you snipped from what I wrote upthread:

What I had in mind is something that, given this:

static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements

would store something less than 1000*sizeof(int) bytes in the executable
file. I wouldn't be hard to do, but I'm not convinced it would be
worthwhile.

There's a lot I don't know about executable formats, and you seem
uninterested in doing more than showing off your presumed knowledge
without actually sharing it. Others have already answered my direct
question (Richard Damon and David Brown mentioned implementations
that use simple run-length encoding, and David gave some reasons
why it could be useful), so you can stop wasting everyone's time.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Bart

2024-08-06 15:57:16 UTC

Post by Keith Thompson

Post by Lawrence D'Oliveiro

Post by Keith Thompson

Post by Lawrence D'Oliveiro

Post by Keith Thompson

Post by Lawrence D'Oliveiro

... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the “PEF” executable
format that they created for their PowerPC machines running old
MacOS. This had to do with some clever instruction encodings for
loading stuff into memory.

Is that relevant to what I asked about?

“Compression”

Was that intended to be responsive?

Hint: you have to know something about executable formats.

I am profoundly uninterested in hints.
static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements
would store something less than 1000*sizeof(int) bytes in the executable
file. I wouldn't be hard to do, but I'm not convinced it would be
worthwhile.
There's a lot I don't know about executable formats, and you seem
uninterested in doing more than showing off your presumed knowledge
without actually sharing it. Others have already answered my direct
question (Richard Damon and David Brown mentioned implementations
that use simple run-length encoding, and David gave some reasons
why it could be useful), so you can stop wasting everyone's time.

Storing those 1000 integers is normally going to take 4000 bytes (at
least, since data sections may be rounded up etc).

Doing it in under 4000 bytes would require some extra help. Who or what
is going to do that, and at what point?

There are two lots of support needed:

(1) Some process needs to run either while generating the EXE, or
compressing an existing EXE, to convert that data into a more compact form

(2) When launched, some other process is needed to decompress the data
before reaching the normal entry point.

I can tell you that nothing about Windows' EXE format will help here for
either (1) or (2), since it would need support from the OS loader to
decompress any data, and that doesn't exist.

So it would presumably need to be done by some extra code that is added
to the executable, that needs to be arranged to run as part of the
user-code.

A compiler that supports such compression could do this job: compressing
sections, and then generating extra extra code, which must be called
first, which decompresses those sections.

Or an external utility like UPX can be applied, which tyically reduces
the size of an EXE by 2/3 (both code /and/ data), and which
transparently expands it when launched.

So, with the existence of such a utility, I wouldn't even bother trying
it within a compiler.

David Brown

2024-08-06 18:40:39 UTC

Post by Keith Thompson

Post by Lawrence D'Oliveiro

Post by Keith Thompson

Post by Lawrence D'Oliveiro

Post by Keith Thompson

Post by Lawrence D'Oliveiro

... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the “PEF” executable
format that they created for their PowerPC machines running old
MacOS. This had to do with some clever instruction encodings for
loading stuff into memory.

Is that relevant to what I asked about?

“Compression”

Was that intended to be responsive?

Hint: you have to know something about executable formats.

I am profoundly uninterested in hints.
         static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements
     would store something less than 1000*sizeof(int) bytes in the
executable
     file. I wouldn't be hard to do, but I'm not convinced it would be
     worthwhile.
There's a lot I don't know about executable formats, and you seem
uninterested in doing more than showing off your presumed knowledge
without actually sharing it. Others have already answered my direct
question (Richard Damon and David Brown mentioned implementations
that use simple run-length encoding, and David gave some reasons
why it could be useful), so you can stop wasting everyone's time.

Storing those 1000 integers is normally going to take 4000 bytes (at
least, since data sections may be rounded up etc).
Doing it in under 4000 bytes would require some extra help. Who or what
is going to do that, and at what point?
(1) Some process needs to run either while generating the EXE, or
compressing an existing EXE, to convert that data into a more compact form
(2) When launched, some other process is needed to decompress the data
before reaching the normal entry point.
I can tell you that nothing about Windows' EXE format will help here for
either (1) or (2), since it would need support from the OS loader to
decompress any data, and that doesn't exist.
So it would presumably need to be done by some extra code that is added
to the executable, that needs to be arranged to run as part of the
user-code.
A compiler that supports such compression could do this job: compressing
sections, and then generating extra extra code, which must be called
first, which decompresses those sections.
Or an external utility like UPX can be applied, which tyically reduces
the size of an EXE by 2/3 (both code /and/ data), and which
transparently expands it when launched.
So, with the existence of such a utility, I wouldn't even bother trying
it within a compiler.

That may all be true for Windows - you know far more about executable
formats on Windows, and how the OS loads and runs them, than I do.

But it is not true for the kind of embedded development tools that I
have seen using compression for initialised data - tools such as UPX are
simply not applicable in this case.

However, it is fair to say that it is not the compiler itself that will
do the compression or decompression. In the implementations I have
seen, it is the linker that compresses the initialised data section's
data. And the code for decompressing it is part of the C runtime
support code (the stuff that, amongst other things, zeros out the bss
section).

David Brown

2024-08-04 15:20:42 UTC

Post by Keith Thompson
[...]

Post by David Brown
"this is a test" is a string literal, and is typically part of the
program's image. (There are some C implementations that do things
differently, like storing such initialisation data in a compressed format.)

[...]
What implementations do that? Typically data that's all zeros isn't
stored in the image, but general compression isn't something I've seen
(not that I've paid much attention). It would save space in the image,
but it would require decompression at load time and wouldn't save any
space at run time.

It is a technique I have seen in embedded systems. It is not uncommon
for flash or other non-volatile storage to be significantly slower than
ram, and for it to be helpful to keep the flash image as small as
possible (this also helps for things like over-the-air updates). The
compression is typically fairly simple, such as run-length encoding, to
avoid significant time, code space and temporary ram space, but it can
help with some initialised data.

Keith Thompson

2024-08-01 21:06:03 UTC

[...]

Post by Kaz Kylheku

Post by Mark Summerfield
int main() {
char* text = "this is a test";

The "this is a test" object is a literal. It is part of the program's image.
When you try to change it, you're making your program self-modifying.

The ISO C language standard doesn't require the object to be part of the
program's image. A fully conforming implementation could store it in
read/write memory and allow it to be modified. Or, it could store it in
some kind of storage where attempts to write to it appear to succeed,
but do not actually modify it (this is implausible, but allowed by the
standard).

Post by Kaz Kylheku
The ISO C language standard doesn't require implementations to support
self-modifying programs; the behavior is left undefined.

The behavior is undefined because the standard explicitly says so.
There is no reference to self-modifying programs.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

2024-08-14 00:43:09 UTC

Post by Mark Summerfield
#include <ctype.h>
#include <stdio.h>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
int main() {
char* text = "this is a test";

The "this is a test" object is a literal. It is part of the
program's image. When you try to change it, you're making your
program self-modifying.
The ISO C language standard doesn't require implementations to
support self-modifying programs; the behavior is left undefined.
It could work in some documented, reliable way, in a given
implementation.
It's the same with any other constant in the program. [...]

That is wrong both technically and practically. And obviously
so.

105 Replies
39 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Mark Summerfield 2024-08-01 08:06:57 UTC

Mark Summerfield 2024-08-01 08:24:45 UTC

Ben Bacarisse 2024-08-01 10:53:48 UTC

Richard Harnden 2024-08-01 08:38:13 UTC

Mark Summerfield 2024-08-01 08:54:23 UTC

Bart 2024-08-01 10:12:47 UTC

Keith Thompson 2024-08-01 20:59:49 UTC

Bart 2024-08-01 21:07:16 UTC

Keith Thompson 2024-08-01 21:28:37 UTC

James Kuyper 2024-08-02 00:20:43 UTC

Kaz Kylheku 2024-08-02 01:06:08 UTC

Bart 2024-08-02 09:43:36 UTC

Richard Damon 2024-08-02 15:03:13 UTC

James Kuyper 2024-08-02 18:19:49 UTC

Bart 2024-08-02 18:33:20 UTC

Lawrence D'Oliveiro 2024-08-03 01:31:17 UTC

Richard Damon 2024-08-03 02:01:21 UTC

Joe Pfeiffer 2024-08-03 14:32:00 UTC

Lawrence D'Oliveiro 2024-08-04 01:05:01 UTC

Tim Rentsch 2024-08-12 09:52:15 UTC

Tim Rentsch 2024-08-14 00:46:05 UTC

Keith Thompson 2024-08-14 01:44:18 UTC

Tim Rentsch 2024-08-15 23:00:35 UTC

Keith Thompson 2024-08-15 23:27:29 UTC

Tim Rentsch 2024-09-28 00:33:56 UTC

James Kuyper 2024-08-14 14:33:03 UTC

Tim Rentsch 2024-08-15 23:05:11 UTC

Bonita Montero 2024-08-04 13:52:59 UTC

Tim Rentsch 2024-08-12 21:11:47 UTC

Vir Campestris 2024-08-13 14:34:19 UTC

Keith Thompson 2024-08-13 20:08:16 UTC

David Brown 2024-08-14 08:40:05 UTC

Tim Rentsch 2024-08-14 00:41:16 UTC

Keith Thompson 2024-08-14 01:47:08 UTC

Kaz Kylheku 2024-08-14 03:16:26 UTC

Keith Thompson 2024-08-14 03:49:45 UTC

Scott Lurndal 2024-08-01 13:28:06 UTC

Michael S 2024-08-01 14:40:26 UTC

David Brown 2024-08-01 17:56:00 UTC

candycanearter07 2024-08-02 05:30:02 UTC

Keith Thompson 2024-08-02 10:02:03 UTC

Richard Harnden 2024-08-02 12:04:55 UTC

James Kuyper 2024-08-02 13:59:40 UTC

Keith Thompson 2024-08-02 18:24:06 UTC

Richard Damon 2024-08-02 18:42:08 UTC

James Kuyper 2024-08-02 18:58:10 UTC

Richard Damon 2024-08-02 19:11:20 UTC

Tim Rentsch 2024-08-12 15:32:32 UTC

Tim Rentsch 2024-08-12 15:27:04 UTC

Chris M. Thomasson 2024-08-02 19:27:47 UTC

Ben Bacarisse 2024-08-02 22:29:42 UTC

Chris M. Thomasson 2024-08-02 23:11:43 UTC

Ben Bacarisse 2024-08-05 01:06:36 UTC

Chris M. Thomasson 2024-08-05 02:37:11 UTC

Chris M. Thomasson 2024-08-05 02:38:24 UTC

Ben Bacarisse 2024-08-05 11:03:08 UTC

Chris M. Thomasson 2024-08-05 20:35:28 UTC

Ben Bacarisse 2024-08-05 20:54:59 UTC

Chris M. Thomasson 2024-08-05 22:39:31 UTC

Ben Bacarisse 2024-08-06 11:29:29 UTC

Chris M. Thomasson 2024-08-06 19:48:12 UTC

Ben Bacarisse 2024-08-06 22:59:28 UTC

Chris M. Thomasson 2024-08-12 23:18:18 UTC

Chris M. Thomasson 2024-08-05 22:44:18 UTC

Tim Rentsch 2024-08-12 21:38:36 UTC

Keith Thompson 2024-08-12 21:55:32 UTC

Tim Rentsch 2024-09-03 13:11:52 UTC

d***@comcast.net 2024-08-25 20:52:15 UTC

Keith Thompson 2024-08-25 21:26:59 UTC

Tim Rentsch 2024-08-12 21:33:48 UTC

Keith Thompson 2024-08-12 21:45:13 UTC

Tim Rentsch 2024-08-12 23:05:29 UTC

David Brown 2024-08-13 11:08:57 UTC

Keith Thompson 2024-08-13 20:00:26 UTC

David Brown 2024-08-03 17:54:20 UTC

James Kuyper 2024-08-01 16:02:30 UTC

Kaz Kylheku 2024-08-01 19:39:04 UTC

Bart 2024-08-01 20:42:48 UTC

Keith Thompson 2024-08-01 21:13:32 UTC

Ben Bacarisse 2024-08-01 21:40:23 UTC

Kaz Kylheku 2024-08-02 00:37:44 UTC

Bart 2024-08-02 10:36:36 UTC

Tim Rentsch 2024-08-12 20:47:02 UTC

David Brown 2024-08-02 22:14:09 UTC

Scott Lurndal 2024-08-03 17:07:59 UTC

Keith Thompson 2024-08-04 00:11:55 UTC

Keith Thompson 2024-08-04 00:07:37 UTC

Lawrence D'Oliveiro 2024-08-04 01:08:40 UTC

Keith Thompson 2024-08-04 02:58:37 UTC

Richard Damon 2024-08-04 11:22:57 UTC

Tim Rentsch 2024-08-12 09:55:01 UTC

Lawrence D'Oliveiro 2024-08-05 06:33:22 UTC

Keith Thompson 2024-08-05 06:38:14 UTC

Lawrence D'Oliveiro 2024-08-05 21:27:16 UTC

Keith Thompson 2024-08-05 22:40:42 UTC

Bart 2024-08-06 15:57:16 UTC

David Brown 2024-08-06 18:40:39 UTC

David Brown 2024-08-04 15:20:42 UTC

Keith Thompson 2024-08-01 21:06:03 UTC

Tim Rentsch 2024-08-14 00:43:09 UTC

about - legalese

Loading...