"undefined behavior"?

Discussion:

"undefined behavior"?

(too old to reply)

DFS

2024-06-12 20:47:23 UTC

Wrote a C program to mimic the stats shown on:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.

And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).

Very strange.

Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";

And the problem went away. Reset it to before and problem came back.

Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?

Also, why doesn't gcc just do you a solid and initialize to "" for you?

Barry Schwarz

2024-06-12 21:30:26 UTC

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?
Also, why doesn't gcc just do you a solid and initialize to "" for you?

Makes perfect sense. The first rule of undefined behavior is
"Whatever happens is exactly correct." You are not entitled to any
expectations and none of the behavior (or perhaps all of the behavior)
can be called unexpected.

Since we cannot see your code, I will guess that you use a non-zero
value in outliers[i] to indicate that the corresponding value has been
identified as an outlier. Since you did not initialize the array
outliers, you have no idea what indeterminate value any element of the
array contains when your program begins execution. Apparently some of
them are non-zero. The fact that the first 40 are zero and the
remaining non-zero is merely an artifact of how your system builds
this particular program with that particular set of compile and link
options. Change anything and you could see completely different
behavior, or not.

I don't use gcc but, in debug mode, some compilers will put
recognizable "garbage values" in uninitialized variables so you can
spot the condition more easily.

In any case, the C language does not prevent you from shooting
yourself in the foot if you choose to. Evaluating an indeterminate
value is one fairly common way to do this.

--
Remove del for email

DFS

2024-06-12 21:53:35 UTC

Post by Barry Schwarz

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?
Also, why doesn't gcc just do you a solid and initialize to "" for you?

Makes perfect sense. The first rule of undefined behavior is
"Whatever happens is exactly correct." You are not entitled to any
expectations and none of the behavior (or perhaps all of the behavior)
can be called unexpected.

I HATE bogus answers like this.

Aren't you embarrassed to say things like that?

Post by Barry Schwarz
Since we cannot see your code, I will guess that you use a non-zero
value in outliers[i] to indicate that the corresponding value has been
identified as an outlier.

No.

I compare the data point to the lower and upper bounds of a stat rule
commonly called the "IQR Rule":

lo = Q1 - (1.5 * IQR)
hi = Q3 + (1.5 * IQR)

If it falls outside the range of lo-hi I strcat the value to a char.

The outlier routine starts line 170.

If you change

char outliers[200]="", temp[10]="";
to
char outliers[200], temp[10];

you might see what happens when you run the program for consecutive values:

$ ./prog 100 -c

=========================================================================

//this code is hereby released to the public domain

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <time.h>

/*
this program computes the descriptive statistics of a randomly
generated set of N integers

1.0 release Dec 2020
2.0 release Jun 2024

used the population skewness and Kurtosis formulas from:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
also test the results of this code against that site

compile: gcc -Wall prog.c -o prog -lm
usage : ./prog N -option (where N is 2 or higher, and option is -r or
-c or -o)
-r generates N random numbers
-c generates consecutive numbers 1 to N
-o generates random numbers with outliers
*/

//random ints
int randNbr(int low, int high) {
return (low + rand() / (RAND_MAX / (high - low + 1) + 1));
}

//comparator function used with qsort
int compareint (const void * a, const void * b)
{
if (*(int*)a > *(int*)b) return 1;
else if (*(int*)a < *(int*)b) return -1;
else return 0;
}

int main(int argc, char *argv[])
{
if(argc < 3) {
printf("Missing argument:\n");
printf(" * enter a number greater than 2\n");
printf(" * enter an option -r -c or -o\n");
exit(0);
}

//vars
int i=0, lastmode=0;
int N = atoi(argv[1]);
int nums[N];
//int *nums = malloc(N * sizeof(int));

double sumN=0.0, median=0.0, Q1=0.0, Q2=0.0, Q3=0.0, IQR=0.0;
double stddev = 0.0, kurtosis = 0.0;
double sqrdiffmean = 0.0, cubediffmean = 0.0, quaddiffmean = 0.0;
double meanabsdev = 0.0, rootmeansqr = 0.0;
char mode[100], tmp[12];

//generate random dataset
if(strcmp(argv[2],"-r") == 0) {
srand(time(NULL));
for(i=0;i<N;i++) { nums[i] = randNbr(1,N*3); }

printf("%d Randoms:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
qsort(nums,N,sizeof(int),compareint);
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//generate random dataset with outliers
if(strcmp(argv[2],"-o") == 0) {
srand(time(NULL));
nums[0] = 1; nums[1] = 3;
for(i=2;i<N-2;i++) { nums[i] = randNbr(100,N*30); }
nums[N-2] = 1000; nums[N-1] = 2000;

printf("%d Randoms with outliers:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
qsort(nums,N,sizeof(int),compareint);
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//generate consecutive numbers 1 to N
if(strcmp(argv[2],"-c") == 0) {
for(i=0;i<N;i++) { nums[i] = i + 1; }

printf("%d Consecutive:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//various
for(i=0;i<N;i++) {sumN += nums[i];}
double min = nums[0], max = nums[N-1];

//calc descriptive stats
double mean = sumN / (double)N;
int ucnt = 1, umaxcnt=1;
for(i = 0; i < N; i++)
{
sqrdiffmean += pow(nums[i] - mean, 2); // for variance and sum squares
cubediffmean += pow(nums[i] - mean, 3); // for skewness
quaddiffmean += pow(nums[i] - mean, 4); // for Kurtosis
meanabsdev += fabs((nums[i] - mean)); // for mean absolute deviation
rootmeansqr += nums[i] * nums[i]; // for root mean square

//mode
if(ucnt == umaxcnt && lastmode != nums[i])
{
sprintf(tmp,"%d ",nums[i]);
strcat(mode,tmp);
}

if(nums[i]-nums[i+1]!=0) {ucnt=1;} else {ucnt++;}

if(ucnt>umaxcnt)
{
umaxcnt=ucnt;
memset(mode, '\0', sizeof(mode));
sprintf(tmp, "%d ", nums[i]);
strcat(mode, tmp);
lastmode = nums[i];
}
}

// median and quartiles
// quartiles divide sorted dataset into four sections
// Q1 = median of values less than Q2
// Q2 = median of the data set
// Q3 = median of values greater than Q2
if(N % 2 == 0) {
Q2 = median = (nums[(N/2)-1] + nums[N/2]) / 2.0;
i = N/2;
if(i % 2 == 0) {
Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
Q3 = (nums[i + ((i-1)/2)] + nums[i+(i/2)]) / 2.0;
}
if(i % 2 != 0) {
Q1 = nums[(i-1)/2];
Q3 = nums[i + ((i-1)/2)];
}
}

if(N % 2 != 0) {
Q2 = median = nums[(N-1)/2];
i = (N-1)/2;
if(i % 2 == 0) {
Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
Q3 = (nums[i + (i/2)] + nums[i + (i/2) + 1]) / 2.0;
}
if(i % 2 != 0) {
Q1 = nums[(i-1)/2];
Q3 = nums[i + ((i+1)/2)];
}
}

// outliers: below Q1−1.5xIQR, or above Q3+1.5xIQR
IQR = Q3 - Q1;
char outliers[200]="", temp[10]="";
if (N > 3) {

//range for outliers
double lo = Q1 - (1.5 * IQR);
double hi = Q3 + (1.5 * IQR);

//no outliers
if ( min > lo && max < hi) {
strcat(outliers,"none (using IQR * 1.5 rule)");
}

//at least one outlier
if ( min < lo || max > hi) {
for(i = 0; i < N; i++) {
double val = (double)nums[i];
if(val < lo || val > hi) {
sprintf(temp,"%.0f ",val);
temp[strlen(temp)] = '\0';
strcat(outliers,temp);
}
}
strcat(outliers," (using IQR * 1.5 rule)");
}
outliers[strlen(outliers)] = '\0';
}

stddev = sqrt(sqrdiffmean/N);
kurtosis = quaddiffmean / (N * pow(sqrt(sqrdiffmean/N),4));

//output
printf("\n--------------------------------------------------------------\n");
printf("Minimum = %.0f\n", min);
printf("Maximum = %.0f\n", max);
printf("Range = %.0f\n", max - min);
printf("Size N = %d\n" , N);
printf("Sum N = %.0f\n", sumN);
printf("Mean μ = %.2f\n", mean);
printf("Median = %.1f\n", median);
if(umaxcnt > 1) {
printf("Mode(s) = %s (%d occurrences ea)\n", mode,umaxcnt);}
if(umaxcnt < 2) {
printf("Mode(s) = na (no repeating values)\n");}
printf("Std Dev σ = %.4f\n", stddev);
printf("Variance σ^2 = %.4f\n", sqrdiffmean/N);
printf("Mid Range = %.1f\n", (max + min)/2);
printf("Quartiles");
if(N > 3) {printf(" Q1 = %.1f\n", Q1);}
if(N < 4) {printf(" Q1 = na\n");}
printf(" Q2 = %.1f (median)\n", Q2);
if(N > 3) {printf(" Q3 = %.1f\n", Q3);}
if(N < 4) {printf(" Q3 = na\n");}
printf("IQR = %.1f (interquartile range)\n", IQR);
if(N > 3) {printf("Outliers = %s\n", outliers);}
if(N < 4) {printf("Outliers = na\n");}
printf("Sum Squares SS = %.2f\n", sqrdiffmean);
printf("MAD = %.4f (mean absolute deviation)\n",
meanabsdev / N);
printf("Root Mean Sqr = %.4f\n", sqrt(rootmeansqr / N));
printf("Std Error Mean = %.4f\n", stddev / sqrt(N));
printf("Skewness γ1 = %.4f\n", cubediffmean / (N *
pow(sqrt(sqrdiffmean/N),3)));
printf("Kurtosis β2 = %.4f\n", kurtosis);
printf("Kurtosis Excess α4 = %.4f\n", kurtosis - 3);
printf("CV = %.6f (coefficient of variation\n",
sqrt(sqrdiffmean/N) / mean);
printf("RSD = %.4f%% (relative std deviation)\n", 100 *
(sqrt(sqrdiffmean/N) / mean));
printf("--------------------------------------------------------------\n");
printf("Check results against\n");
printf("https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php");
printf("\n\n");

//free(nums);
return(0);
}

=========================================================================

Post by Barry Schwarz
Since you did not initialize the array
outliers, you have no idea what indeterminate value any element of the
array contains when your program begins execution. Apparently some of
them are non-zero. The fact that the first 40 are zero and the
remaining non-zero is merely an artifact of how your system builds
this particular program with that particular set of compile and link
options. Change anything and you could see completely different
behavior, or not.
I don't use gcc but, in debug mode, some compilers will put
recognizable "garbage values" in uninitialized variables so you can
spot the condition more easily.
In any case, the C language does not prevent you from shooting
yourself in the foot if you choose to. Evaluating an indeterminate
value is one fairly common way to do this.

Keith Thompson

2024-06-12 22:30:52 UTC

Post by Barry Schwarz

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?
Also, why doesn't gcc just do you a solid and initialize to "" for you?

Makes perfect sense. The first rule of undefined behavior is
"Whatever happens is exactly correct." You are not entitled to any
expectations and none of the behavior (or perhaps all of the behavior)
can be called unexpected.

I HATE bogus answers like this.
Aren't you embarrassed to say things like that?

He has nothing to be embarrassed about. What he wrote is correct.

The C standard's definition of "undefined behavior" is "behavior, upon
use of a nonportable or erroneous program construct or of erroneous
data, for which this International Standard imposes no requirements".

If you don't like the way C deals with undefined behavior, that's
perfectly valid, and a lot of people are likely to agree with you.
But I advise against lashing out at people who are correctly explaining
what the C standard says.

DFS, since you've been posting in comp.lang.c for at least ten years,
I'm surprised you're having difficulties with this.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

DFS

2024-06-12 23:07:29 UTC

Post by Keith Thompson

Post by Barry Schwarz

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?
Also, why doesn't gcc just do you a solid and initialize to "" for you?

Makes perfect sense. The first rule of undefined behavior is
"Whatever happens is exactly correct." You are not entitled to any
expectations and none of the behavior (or perhaps all of the behavior)
can be called unexpected.

I HATE bogus answers like this.
Aren't you embarrassed to say things like that?

He has nothing to be embarrassed about. What he wrote is correct.

No it's not.

"Whatever happens is exactly correct." is nonsense.

"You are not entitled to any expectations" is nonsense.

Post by Keith Thompson
The C standard's definition of "undefined behavior" is "behavior, upon
use of a nonportable or erroneous program construct or of erroneous
data, for which this International Standard imposes no requirements".
If you don't like the way C deals with undefined behavior, that's
perfectly valid, and a lot of people are likely to agree with you.

Thanks for feeling my pain!

It's frustrating. By now I spent a half-hour dealing with it. gcc
could've just filled the char[] variable with 0s by default. I bet that
would save a LOT of people time and headaches.

Post by Keith Thompson
But I advise against lashing out at people who are correctly explaining
what the C standard says.

The C standard really says "Whatever happens is exactly correct."?

Post by Keith Thompson
DFS, since you've been posting in comp.lang.c for at least ten years,

Time flies.

How do you know I've posted here that long?

Post by Keith Thompson
I'm surprised you're having difficulties with this.

I'm surprised at some of the wonkiness of gcc and C.

* warns relentlessly when the printf specifier doesn't match the var
type, but gives no warning when you use an int with memset (instead of
the size_t specified in the function prototype).

* a missing bracket } throws 50 nonsensical compiler errors.

* warns of unused vars but not uninitialized ones

* one uninitialized var makes your program do crazy things. Worse than
crazy is it's identically crazy each time.

./prog 40 -c
outliers: none

./prog 41 -c
outliers: 41

./prog 42 -c
outliers: 41 42

./prog 43 -c
outliers: 41 42 43

./prog 44 -c
outliers: 41 42 43 44

etc. And none were outliers - not even close.

At least if it showed nonsense data it would be easier to track down.
Maybe.

The thing is, none of those values (40+) were ever in that char[] prior
to running the code for a set of 50 consecutive values.

And I edited/compiled the code many times, but still got the identical
error.

I doubt my environment (gcc 11.4 on Windows Subsys for Linux on Ubuntu)
has anything to do with it.

Keith Thompson

2024-06-13 00:33:45 UTC

Post by Keith Thompson

Post by Barry Schwarz

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?
Also, why doesn't gcc just do you a solid and initialize to "" for you?

Makes perfect sense. The first rule of undefined behavior is
"Whatever happens is exactly correct." You are not entitled to any
expectations and none of the behavior (or perhaps all of the behavior)
can be called unexpected.

I HATE bogus answers like this.
Aren't you embarrassed to say things like that?

He has nothing to be embarrassed about. What he wrote is correct.

No it's not.
"Whatever happens is exactly correct." is nonsense.
"You are not entitled to any expectations" is nonsense.

Neither statement is nonsense.

I quoted the C standard's definition of "undefined behavior".
The C standard *imposes no requirements* on code whose behavior
is undefined.

Perhaps "Nothing that happens is incorrect" would be clearer.
The standard joke is that code with undefined behavior can make
demons fly out of your nose. Obviously it can't, but the point
is that if it did, it would not violate the requirements of the
C standard.

If your code has undefined behavior, and you have any expectations
at all about how it will behave, none of those expectations are
supported by the C standard. Compilers perform optimizations
under the assumption that the code's behavior is not undefined,
which can and does result in arbitrarily weird behavior if you lie
to the compiler by feeding it code whose behavior is undefined.

Post by Keith Thompson
The C standard's definition of "undefined behavior" is "behavior, upon
use of a nonportable or erroneous program construct or of erroneous
data, for which this International Standard imposes no requirements".
If you don't like the way C deals with undefined behavior, that's
perfectly valid, and a lot of people are likely to agree with you.

Thanks for feeling my pain!
It's frustrating. By now I spent a half-hour dealing with it. gcc
could've just filled the char[] variable with 0s by default. I bet
that would save a LOT of people time and headaches.

Perhaps -- but it would also hurt the performance of code that
doesn't *need* automatic objects to be initialized implicitly.

I am neither defending nor attacking the decisions that went into
the ISO C standard. I am explaining what it says.

If you think that the language *should* require automatic objects
to be initialized to zero, that's a perfectly valid opinion.
But you need to accept the the fact that the language doesn't
require such initialization, it never has, and most C compilers do
not perform such initializations (unless perhaps you specify some
obscure option).

If that half hour led you to learn that, I suggest it was time
well spent.

And leaving out such a requirement was not accidental. It was a
deliberate decision made for reasons the authors felt were valid.
Zero-initializing all uninitialized automatic objects might be
an idea worth considering for a future standard, but it *would*
hurt performance.

Post by Keith Thompson
But I advise against lashing out at people who are correctly explaining
what the C standard says.

The C standard really says "Whatever happens is exactly correct."?

Not in those words, but that's what "imposes no requirements" means.
If you write:
printf("%d\n", INT_MAX + 1);
and your program prints "0", or "hello, world", or invokes nethack, that
behavior does not violate the requirements of the C standard.

Post by Keith Thompson
DFS, since you've been posting in comp.lang.c for at least ten years,

Time flies.
How do you know I've posted here that long?

I have a collection of saved articles from this newsgroup.
I took a quick look and found some of your posts from 10 years ago.
There are a number of other ways to search for old articles. That,
and I recognized the name "DFS" well enough to infer that you're
a semi-regular.

Post by Keith Thompson
I'm surprised you're having difficulties with this.

I'm surprised at some of the wonkiness of gcc and C.

By all means be surprised, but *learn*.

Post by DFS
* warns relentlessly when the printf specifier doesn't match the var
type, but gives no warning when you use an int with memset (instead of
the size_t specified in the function prototype).

Of course. The warning for a bad print specifier is not required,
but it's useful and fairly easy to generated.

Passing an int as the third argument in a call to memset() is
perfectly valid and well defined, and does not justify a warning.
Yes, the parameter is defined with type size_t, which is an unsigned
integer type. Given that the prototype is visible (which it always
will be if you have the required `#include <string.h>`), you can
pass an argument of any integer or floating-point type and it will
be implicitly converted to size_t. There is no ambiguity.

(If the int value exceeds SIZE_MAX, which is impossible in most
implementations, then the conversion still yields a well-defined
result.)

printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
by the format string. This:
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.
There is no implicit conversion to the expected type. Note that
the format string doesn't have to be a string literal, so it's
not always even possible for the compiler to check the types.
Variadic functions give you a lot of flexibility at the cost of
making some type errors difficult to detect.

(I wrote "probably" because size_t *might* be a typedef for unsigned
int, and there are special rules about arguments of corresponding
signed and unsigned types.)

Post by DFS
* a missing bracket } throws 50 nonsensical compiler errors.

Recovering from parse errors can be difficult. Look for the *first*
reported syntax error, fix it, and recompile.

Post by DFS
* warns of unused vars but not uninitialized ones

Compilers warn about what they *can* warn about. No warnings are
required, but good compilers try to do as much analysis as they can.

In a simple example:
char buf[100];
printf("%d\n", buf[50]);
with "gcc -Wall" I get a "warning: ‘buf’ is used uninitialized".

In your code, you had something like:

char outliers[200];
// ...
if (some_condition) {
strcat(outliers, "some string literal");
}

That has undefined behavior, but it's a bit more difficult
to diagnose than a direct reference to the array. In a quick
experiment, gcc warned about the reference if it's unconditional,
but not if it's in an if statement.

No such warnings are required by the language. It's a matter of how
much effort the compiler developers have put into it while trying to
avoid *too many* warnings.

Post by DFS
* one uninitialized var makes your program do crazy things. Worse than
crazy is it's identically crazy each time.

That's just the nature of undefined behavior. Consistent incorrect
behavior is allowed. Inconsistent incorrect behavior is allowed.
Seemingly correct behavior is allowed (and is perhaps the worst,
because it means you have a hidden bug that you can't test for that
might manifest in a future version).

You defined and did not initialize a character array. There could
be any number of reasons that its contents happened to be the same
from one run of your program to another.

And now you know how to fix the problem: make sure all objects are
initialized before you try to use their values. That's a fairly
easy rule to follow, even if compilers don't always do enough to
help enforce it.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Malcolm McLean

2024-06-13 04:47:57 UTC

Post by Keith Thompson
printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

Post by Keith Thompson
There is no implicit conversion to the expected type. Note that
the format string doesn't have to be a string literal, so it's
not always even possible for the compiler to check the types.
Variadic functions give you a lot of flexibility at the cost of
making some type errors difficult to detect.
(I wrote "probably" because size_t *might* be a typedef for unsigned
int, and there are special rules about arguments of corresponding
signed and unsigned types.)

We just can't have size_t variables swilling around in prgrams for these
reasons.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Scott Lurndal

2024-06-13 15:39:25 UTC

Post by Malcolm McLean

Post by Keith Thompson
printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

No, because compilers have been able to diagnose mismatches
for more than two decades.

Post by Malcolm McLean
We just can't have size_t variables swilling around in prgrams for these
reasons.

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

Ben Bacarisse

2024-06-13 17:08:03 UTC

Post by Scott Lurndal
POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

And C provides "%zu".

--
Ben.

bart

2024-06-13 18:01:23 UTC

Post by Scott Lurndal

Post by Malcolm McLean

Post by Keith Thompson
printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

No, because compilers have been able to diagnose mismatches
for more than two decades.

What about the previous 3 decades?

What about the compilers that can't do that?

What about even the latest gcc 14.1 that won't diagnose it even with
-Wpedantic -Wextra?

What about when the format string is a variable?

What about the example given below?

It is definitely a language problem. Dealing with some of it with some
compilers with some options isn't a solution, it's just a workaround.

Meanwhile for over 4 decades I've been able to just write 'print foo'
with no format mismatch, because such a silly concept doesn't exist.
THAT's how you deal with it.

Post by Scott Lurndal

Post by Malcolm McLean
We just can't have size_t variables swilling around in prgrams for these
reasons.

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

And here it just gets even uglier. You also get situations like this:

uint64_t i=0;
printf("%lld\n", i);

This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.

It can't tell you that you should be using one of those ludicrous macros.

I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Malcolm McLean

2024-06-13 18:54:31 UTC

Post by Scott Lurndal

Post by Malcolm McLean

Post by Keith Thompson
printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

No, because compilers have been able to diagnose mismatches
for more than two decades.

What about the previous 3 decades?
What about the compilers that can't do that?
What about even the latest gcc 14.1 that won't diagnose it even with
-Wpedantic -Wextra?
What about when the format string is a variable?
What about the example given below?
It is definitely a language problem. Dealing with some of it with some
compilers with some options isn't a solution, it's just a workaround.
Meanwhile for over 4 decades I've been able to just write 'print foo'
with no format mismatch, because such a silly concept doesn't exist.
THAT's how you deal with it.

Post by Scott Lurndal

Post by Malcolm McLean
We just can't have size_t variables swilling around in prgrams for these
reasons.

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.

In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be
plenty for a Baby program, and if you want more, you always have big
boy's malloc.
But at a stroke, that gets rid of any need for size_t, and long is very
special purpose (it holds the 32 bit rgba values).

On which subject, I spruced up the colour page today.

https://malcolmmclean.github.io/babyx/BabyColours.html

I think that looks rather nice. I had to write a little C program to
generate the table, of course.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Chris M. Thomasson

2024-06-13 19:34:45 UTC

Post by Malcolm McLean

Post by Scott Lurndal

Post by Malcolm McLean

Post by Keith Thompson
printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

No, because compilers have been able to diagnose mismatches
for more than two decades.

What about the previous 3 decades?
What about the compilers that can't do that?
What about even the latest gcc 14.1 that won't diagnose it even with
-Wpedantic -Wextra?
What about when the format string is a variable?
What about the example given below?
It is definitely a language problem. Dealing with some of it with some
compilers with some options isn't a solution, it's just a workaround.
Meanwhile for over 4 decades I've been able to just write 'print foo'
with no format mismatch, because such a silly concept doesn't exist.
THAT's how you deal with it.

Post by Scott Lurndal

Post by Malcolm McLean
We just can't have size_t variables swilling around in prgrams for these
reasons.

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under
Linux64 and it complains the format should be %ld. Change it to %ld,
and it complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.
In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure.

[...]

It calls exit() on an allocation failure? Is that rather harsh, humm...?
Let the program decide on what to do?

Malcolm McLean

2024-06-13 23:32:55 UTC

Post by Chris M. Thomasson

Post by Malcolm McLean

Post by Scott Lurndal

Post by Malcolm McLean

Post by Keith Thompson
printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

No, because compilers have been able to diagnose mismatches
for more than two decades.

What about the previous 3 decades?
What about the compilers that can't do that?
What about even the latest gcc 14.1 that won't diagnose it even with
-Wpedantic -Wextra?
What about when the format string is a variable?
What about the example given below?
It is definitely a language problem. Dealing with some of it with
some compilers with some options isn't a solution, it's just a
workaround.
Meanwhile for over 4 decades I've been able to just write 'print foo'
with no format mismatch, because such a silly concept doesn't exist.
THAT's how you deal with it.

Post by Scott Lurndal

Post by Malcolm McLean
We just can't have size_t variables swilling around in prgrams for these
reasons.

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under
Linux64 and it complains the format should be %ld. Change it to %ld,
and it complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't
say anything.

>
Exactly. We can't have this just to print out an integer.
In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure.

[...]
It calls exit() on an allocation failure? Is that rather harsh, humm...?
Let the program decide on what to do?

Not really. For instance the shell gives the user the option to specify
an editor with the -editor option, The program then duplicates that string.

Now in any legitimate scenario, that string will be few hundred
characters at the most, and be some complex path on a weird and
wonderful filing system. And normally the editor will be within the host
shell's execution evironment, and the string will be "vi" or "emacs" or
similar. And the program is not going to allocation fail at that point,
and if it does, yes someone has defeated the usual restriction on OS
line length and is playing silly games, so it is best to pack up and go
home.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Ben Bacarisse

2024-06-13 23:55:12 UTC

Post by Malcolm McLean

Post by bart
uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another
language or do what you have to in C.

Post by Malcolm McLean
In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be
plenty for a Baby program, and if you want more, you always have big boy's
malloc.

And if you need to change the size?

Post by Malcolm McLean
But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

For people taught to ignore size_t, care is also needed when calling
functions that take size_t arguments as the signed to unsigned
conversion can cause surprises when not flagged by the compiler. I
don't know if I am right, but I would bet that many of the "don't bother
with size_t" crowd are also in the "don't bother with all those warning
flags to the compiler" crowd.

Post by Malcolm McLean
and long is very
special purpose (it holds the 32 bit rgba values).

Isn't that rather wasteful when long is 64 bits?

--
Ben.

Malcolm McLean

2024-06-14 01:48:51 UTC

Post by Ben Bacarisse

Post by Malcolm McLean

Post by bart
uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another
language or do what you have to in C.

Post by Malcolm McLean
In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be
plenty for a Baby program, and if you want more, you always have big boy's
malloc.

And if you need to change the size?

Post by Malcolm McLean
But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

But these are not Baby X functions. Obviously Baby X isn't responsible
if you call functions without the bbx_ prefix and things go wrong. But
if you don' use standard library functions, thre should be no way in
Baby X of creating an obect too big to be indexed as an array of
unsigned char by an int. And so we simply do away with all the
complexities of different integer types which make C code bug prone,
hatd to read, and are of no real interest. The types are char, unsigned
char, int, and double, depending on whether a value is aa ASCII
cgaracter, a series of undefined bits, an integer, or a real value. And
what could be more simple and sensible?

Post by Ben Bacarisse
For people taught to ignore size_t, care is also needed when calling
functions that take size_t arguments as the signed to unsigned
conversion can cause surprises when not flagged by the compiler. I
don't know if I am right, but I would bet that many of the "don't bother
with size_t" crowd are also in the "don't bother with all those warning
flags to the compiler" crowd.

Post by Malcolm McLean
and long is very
special purpose (it holds the 32 bit rgba values).

Isn't that rather wasteful when long is 64 bits?

No, because we store images as unsigned char buffers. But it's
convenient to pass around coulor values in a single variable.
However there is the worry that accessing rgba channels as bytes rather
than insisting that the buffer be aligned, and accessing as a 32-bit
value, and pulling channels out with shifts and masking, might be a bit
slower. C does provide a way to solve this, but only polluting the
codebase, abd the current way is absolutely robust.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Ben Bacarisse

2024-06-14 11:44:13 UTC

Post by Malcolm McLean

Post by Ben Bacarisse

Post by Malcolm McLean

Post by bart
uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another
language or do what you have to in C.

Post by Malcolm McLean
In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be
plenty for a Baby program, and if you want more, you always have big boy's
malloc.

And if you need to change the size?

Post by Malcolm McLean
But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

But these are not Baby X functions.

Neither is malloc but you wanted t replace that to get rid of the need
for size_t.

I confess that I am all at sea about what you are doing. In essence, I
don't understand the rules of the game so I should probably just stop
commenting.

Post by Malcolm McLean

Post by Ben Bacarisse

Post by Malcolm McLean
and long is very
special purpose (it holds the 32 bit rgba values).

Isn't that rather wasteful when long is 64 bits?

No, because we store images as unsigned char buffers. But it's convenient
to pass around coulor values in a single variable.

Right. So you don't always use long for "holding rgba values". Another
rule I didn't know.

Post by Malcolm McLean
However there is the worry that accessing rgba channels as bytes rather
than insisting that the buffer be aligned, and accessing as a 32-bit
value,

Which is why I thought you might be including images in the notion of
"holding rgba values".

--
Ben.

Malcolm McLean

2024-06-14 14:30:23 UTC

Post by Ben Bacarisse

Post by Malcolm McLean

Post by Ben Bacarisse

Post by Malcolm McLean

Post by bart
uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another
language or do what you have to in C.

Post by Malcolm McLean
In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be
plenty for a Baby program, and if you want more, you always have big boy's
malloc.

And if you need to change the size?

Post by Malcolm McLean
But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

But these are not Baby X functions.

Neither is malloc but you wanted t replace that to get rid of the need
for size_t.
I confess that I am all at sea about what you are doing. In essence, I
don't understand the rules of the game so I should probably just stop
commenting.

Yes, I really need to get that website together so that people cotton on
to what Baby X is, what it can and cannot do, and what is the point.

Post by Ben Bacarisse

Post by Malcolm McLean

Post by Ben Bacarisse

Post by Malcolm McLean
and long is very
special purpose (it holds the 32 bit rgba values).

Isn't that rather wasteful when long is 64 bits?

No, because we store images as unsigned char buffers. But it's convenient
to pass around coulor values in a single variable.

Right. So you don't always use long for "holding rgba values". Another
rule I didn't know.

Imgage buffer are arrays of unsigned char holding 32 bit rgba values in
the order red, green, blue, alpha, always and regardless of the format
used on the host platform.

However if you need to pass a colour value to a fuction, you normall
pass a BBX_RGBA value, which is typedefed to unsigned long, and is
opaque, and you query the channels using the macros in bbx_color.h

#ifndef bbx_color_h
#define bbx_color_h

typedef unsigned long BBX_RGBA;

#define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) | ((b)
<< 8) | (a) ))
#define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
#define bbx_red(col) ((col >> 24) & 0xFF)
#define bbx_green(col) ((col >> 16) & 0xFF)
#define bbx_blue(col) ((col >> 8) & 0xFF)
#define bbx_alpha(col) (col & 0xFF)

#define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )

#endif

The last macro is to make it easier to interface with Xlib, and has the
prefix BBX_ (upper case) indicating that it is for internal use by the
bbx library / system and not meant for user programs.

Now of course C will allow you to create images using arrays of
BBX_RGBAs, but Baby X programmers must not do that, or their programs
may break if endianess switches or long has more than 32 bits, and the
point of Baby X that it never breaks, it always runs everywhere and
anywhere there is a C compiler and a windowing library we can retarget
bbxlib at.

Post by Ben Bacarisse

Post by Malcolm McLean
However there is the worry that accessing rgba channels as bytes rather
than insisting that the buffer be aligned, and accessing as a 32-bit
value,

Which is why I thought you might be including images in the notion of
"holding rgba values".

Images do hold rgba values, of course. But no, I'm not defining a new
image format for Baby X. A Filesystem XML format, yes, a new image
format, no.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Richard Harnden

2024-06-14 15:32:58 UTC

Post by Malcolm McLean
Yes, I really need to get that website together so that people cotton on
to what Baby X is, what it can and cannot do, and what is the point.

Is it a shell? A windowing toolkit? A filesystem? A resource compiler?

I have no idea.

Malcolm McLean

2024-06-14 18:06:03 UTC

Post by Richard Harnden

Post by Malcolm McLean
Yes, I really need to get that website together so that people cotton
on to what Baby X is, what it can and cannot do, and what is the point.

Is it a shell? A windowing toolkit? A filesystem? A resource compiler?
I have no idea.

It consists of three components:

Baby X - the GUI toolkit which allows small programs to run on either
Linux or Windows with a graphical interface (a small number of files are
switched depending on the target).

Baby X RC - the resource compiler - a program weitten in C
which converts resources - images, fonts, audio files etc, into
compileable C code so you can easily get them inot Baby X programs.

Baby X FS - the filing system - code that allows you to create
a virtual drive on your computer and access files from it using special
fopen(), fclose() functions, but standard library functions like
fprintf() or fgetc() for the other operations. It consists of a library
designed to be incorporated into The Baby X program itself, and a suite
of supporting programs to generate the FileSystem XML files it relies on
and to manipulate them. The jewel in the crown is the shell,
babyxfs_shell, which invokes a UNIX-like shell which uses a FileSystem
XML file as backing store.

They are designed to be used together as a single system for making Baby
X programs. But each component is independent of the others and can be
used on its own.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

bart

2024-06-14 18:31:29 UTC

Baby X FS - the filing system - code that allows you to create a
virtual drive on your computer and access files from it using special
fopen(), fclose() functions, but standard library functions like
fprintf() or fgetc() for the other operations

I think people don't get is why they should use this filing system, when
they already have a perfectly good one within their OS on which fopen()
etc already work.

When you do fclose() after writing a file, will get it written to some
persistent media?

Because either it says in memory (dangerous if your machine crashes, or
someone just turns it off), or it gets written to the same SSD/SD/HDD
media that the real OS uses. In which case, what is the point?

I gather this is not any of kind of OS with its own drivers for the
peripherals on the machine, that takes over the real OS, or runs as some
kind of virtual machine.

Malcolm McLean

2024-06-14 19:13:33 UTC

Baby X FS - the filing system - code that allows you to create a
virtual drive on your computer and access files from it using special
fopen(), fclose() functions, but standard library functions like
fprintf() or fgetc() for the other operations

I think people don't get is why they should use this filing system, when
they already have a perfectly good one within their OS on which fopen()
etc already work.
When you do fclose() after writing a file, will get it written to some
persistent media?
Because either it says in memory (dangerous if your machine crashes, or
someone just turns it off), or it gets written to the same SSD/SD/HDD
media that the real OS uses. In which case, what is the point?
I gather this is not any of kind of OS with its own drivers for the
peripherals on the machine, that takes over the real OS, or runs as some
kind of virtual machine.

The main use is intended to be as a read-only internal drive containing
resources. But the interface also supports opening files in write mode,
and they are held in memory. In the case of an application with
FileSystem XML embedded as a string, the files will revert to the
original on program termination - there's no sensible way of altering an
embedded internal string at run time. In the case of programs which
supply the FileStstem XML file in a configuation file, the filesystem
has a function caled bbx_fs_dump() whch writes its entire state to disk
as a FileSystem XML file.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Ben Bacarisse

2024-06-14 21:29:00 UTC

Post by Ben Bacarisse

Post by Malcolm McLean

Post by Ben Bacarisse

Post by Malcolm McLean

Post by bart
uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another
language or do what you have to in C.

Post by Malcolm McLean
In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be
plenty for a Baby program, and if you want more, you always have big boy's
malloc.

And if you need to change the size?

Post by Malcolm McLean
But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

But these are not Baby X functions.

Neither is malloc but you wanted t replace that to get rid of the need
for size_t.
I confess that I am all at sea about what you are doing. In essence, I
don't understand the rules of the game so I should probably just stop
commenting.

Yes, I really need to get that website together so that people cotton on to
what Baby X is, what it can and cannot do, and what is the point.

I know what Baby X is. I don't know why "these are not Baby X
functions" applies to the ones I listed and not to malloc.

...

However if you need to pass a colour value to a fuction, you normall pass a
BBX_RGBA value, which is typedefed to unsigned long, and is opaque, and you
query the channels using the macros in bbx_color.h
#ifndef bbx_color_h
#define bbx_color_h
typedef unsigned long BBX_RGBA;

Curious. The macros below seem to assume that int is 32 bits, so why
use long?

#define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) | ((b) <<
8) | (a) ))

This is likely to involve undefined behaviour when r >= 128. (I presume
you are ruling out int narrower than 32 bits or there are other problems
as well.)

#define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
#define bbx_red(col) ((col >> 24) & 0xFF)
#define bbx_green(col) ((col >> 16) & 0xFF)
#define bbx_blue(col) ((col >> 8) & 0xFF)
#define bbx_alpha(col) (col & 0xFF)

It might not be an issue (as col is opaque and unlikely to be an
expression) but I'd still write (col) here to stop the reader having to
check or reason that out.

#define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )
#endif
The last macro is to make it easier to interface with Xlib, and has the
prefix BBX_ (upper case) indicating that it is for internal use by the bbx
library / system and not meant for user programs.

As a reader of the code, I made exactly the reverse assumption. When I
see lower-case macros I assume they are for internal use.

--
Ben.

Malcolm McLean

2024-06-14 22:35:20 UTC

Post by Ben Bacarisse

Post by Ben Bacarisse

Post by Malcolm McLean

Post by Ben Bacarisse

Post by Malcolm McLean

Post by bart
uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another
language or do what you have to in C.

Post by Malcolm McLean
In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be
plenty for a Baby program, and if you want more, you always have big boy's
malloc.

And if you need to change the size?

Post by Malcolm McLean
But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

But these are not Baby X functions.

Neither is malloc but you wanted t replace that to get rid of the need
for size_t.
I confess that I am all at sea about what you are doing. In essence, I
don't understand the rules of the game so I should probably just stop
commenting.

Yes, I really need to get that website together so that people cotton on to
what Baby X is, what it can and cannot do, and what is the point.

I know what Baby X is. I don't know why "these are not Baby X
functions" applies to the ones I listed and not to malloc.
...

However if you need to pass a colour value to a fuction, you normall pass a
BBX_RGBA value, which is typedefed to unsigned long, and is opaque, and you
query the channels using the macros in bbx_color.h
#ifndef bbx_color_h
#define bbx_color_h
typedef unsigned long BBX_RGBA;

Curious. The macros below seem to assume that int is 32 bits, so why
use long?

#define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) | ((b) <<
8) | (a) ))

This is likely to involve undefined behaviour when r >= 128. (I presume
you are ruling out int narrower than 32 bits or there are other problems
as well.)

No, it's been miswritten. Which is what I mean about C's integer types
being a source of bugs. That code does not look buggy, but it is.

Post by Ben Bacarisse

#define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
#define bbx_red(col) ((col >> 24) & 0xFF)
#define bbx_green(col) ((col >> 16) & 0xFF)
#define bbx_blue(col) ((col >> 8) & 0xFF)
#define bbx_alpha(col) (col & 0xFF)

It might not be an issue (as col is opaque and unlikely to be an
expression) but I'd still write (col) here to stop the reader having to
check or reason that out.

#define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )
#endif
The last macro is to make it easier to interface with Xlib, and has the
prefix BBX_ (upper case) indicating that it is for internal use by the bbx
library / system and not meant for user programs.

As a reader of the code, I made exactly the reverse assumption. When I
see lower-case macros I assume they are for internal use.

They're function-like macros. Iterating over an rgba buffer is very
processor-intensive, and so we do haave to compromise sfatety for speed
here. All function-like symbols bbx_ are provided by Baby X for users,
all symbols BBX_ have that prefix to reduce the chance of collisions
with other code.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Ben Bacarisse

2024-06-14 23:14:22 UTC

Post by Malcolm McLean

Post by Ben Bacarisse

Post by Ben Bacarisse

Post by Malcolm McLean

Post by Ben Bacarisse

Post by Malcolm McLean

Post by bart
uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those
ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use another
language or do what you have to in C.

Post by Malcolm McLean
In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on
allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be
plenty for a Baby program, and if you want more, you always have big boy's
malloc.

And if you need to change the size?

Post by Malcolm McLean
But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

But these are not Baby X functions.

Neither is malloc but you wanted t replace that to get rid of the need
for size_t.
I confess that I am all at sea about what you are doing. In essence, I
don't understand the rules of the game so I should probably just stop
commenting.

Yes, I really need to get that website together so that people cotton on to
what Baby X is, what it can and cannot do, and what is the point.

I know what Baby X is. I don't know why "these are not Baby X
functions" applies to the ones I listed and not to malloc.
...

However if you need to pass a colour value to a fuction, you normall pass a
BBX_RGBA value, which is typedefed to unsigned long, and is opaque, and you
query the channels using the macros in bbx_color.h
#ifndef bbx_color_h
#define bbx_color_h
typedef unsigned long BBX_RGBA;

Curious. The macros below seem to assume that int is 32 bits, so why
use long?

Why use long?

Post by Malcolm McLean

Post by Ben Bacarisse

#define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) | ((b) <<
8) | (a) ))

This is likely to involve undefined behaviour when r >= 128. (I presume
you are ruling out int narrower than 32 bits or there are other problems
as well.)

No, it's been miswritten. Which is what I mean about C's integer types
being a source of bugs. That code does not look buggy, but it is.

I have no idea what this means. You start with "no" but I can't work
out what you think is wrong about what I said. And what does "has been
miswritten" mean? Both the tense and the use of "miswritten" are
confusing to me. And, to me, the code does look "buggy".

Post by Malcolm McLean

Post by Ben Bacarisse

#define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
#define bbx_red(col) ((col >> 24) & 0xFF)
#define bbx_green(col) ((col >> 16) & 0xFF)
#define bbx_blue(col) ((col >> 8) & 0xFF)
#define bbx_alpha(col) (col & 0xFF)

It might not be an issue (as col is opaque and unlikely to be an
expression) but I'd still write (col) here to stop the reader having to
check or reason that out.

#define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )
#endif
The last macro is to make it easier to interface with Xlib, and has the
prefix BBX_ (upper case) indicating that it is for internal use by the bbx
library / system and not meant for user programs.

As a reader of the code, I made exactly the reverse assumption. When I
see lower-case macros I assume they are for internal use.

They're function-like macros. Iterating over an rgba buffer is very
processor-intensive, and so we do haave to compromise sfatety for speed
here.

I am not suggesting otherwise.

Post by Malcolm McLean
All function-like symbols bbx_ are provided by Baby X for users, all
symbols BBX_ have that prefix to reduce the chance of collisions with other
code.

Clearly. I'm not sure why you have reiterated this. I did not intend
to change your mind, just to point out that it's the reverse of the
common convention.

--
Ben.

David Brown

2024-06-15 18:57:49 UTC

Post by Malcolm McLean

Post by Ben Bacarisse

Post by Malcolm McLean

Post by Malcolm McLean

uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under Linux64
and it complains the format should be %ld. Change it to %ld, and it
complains under Windows.
It can't tell you that you should be using one of those
ludicrous macros.
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

Exactly. We can't have this just to print out an integer.

This is how C works. There's no point in moaning about it. Use
another
language or do what you have to in C.

Post by Malcolm McLean
In Baby X I provide a function called bbx_malloc(). It's is guaranteed
never to return null. Currently it just calls exit() on
allocation failure.
But it also limits allocation to slightly under INT_MAX. Which should be
plenty for a Baby program, and if you want more, you always have big boy's
malloc.

And if you need to change the size?

Post by Malcolm McLean
But at a stroke, that gets rid of any need for size_t,

But sizeof, strlen (and friends like the mbs... and wcs... functions),
strspn (and friend), strftime, fread, fwrite. etc. etc. all return
size_t.

But these are not Baby X functions.

Neither is malloc but you wanted t replace that to get rid of the need
for size_t.
I confess that I am all at sea about what you are doing. In essence, I
don't understand the rules of the game so I should probably just stop
commenting.

Yes, I really need to get that website together so that people cotton on to
what Baby X is, what it can and cannot do, and what is the point.

I know what Baby X is. I don't know why "these are not Baby X
functions" applies to the ones I listed and not to malloc.
...

However if you need to pass a colour value to a fuction, you normall pass a
BBX_RGBA value, which is typedefed to unsigned long, and is opaque, and you
query the channels using the macros in bbx_color.h
#ifndef bbx_color_h
#define bbx_color_h
typedef unsigned long BBX_RGBA;

Curious. The macros below seem to assume that int is 32 bits, so why
use long?

#define bbx_rgba(r,g,b,a) ((BBX_RGBA) ( ((r) << 24) | ((g) << 16) | ((b) <<
8) | (a) ))

This is likely to involve undefined behaviour when r >= 128. (I presume
you are ruling out int narrower than 32 bits or there are other problems
as well.)

No, it's been miswritten. Which is what I mean about C's integer types
being a source of bugs. That code does not look buggy, but it is.

#define bbx_rgb(r, g, b) bbx_rgba(r,g,b, 255)
#define bbx_red(col) ((col >> 24) & 0xFF)
#define bbx_green(col) ((col >> 16) & 0xFF)
#define bbx_blue(col) ((col >> 8) & 0xFF)
#define bbx_alpha(col) (col & 0xFF)

It might not be an issue (as col is opaque and unlikely to be an
expression) but I'd still write (col) here to stop the reader having to
check or reason that out.

#define BBX_RgbaToX(col) ( (col >> 8) & 0xFFFFFF )
#endif
The last macro is to make it easier to interface with Xlib, and has the
prefix BBX_ (upper case) indicating that it is for internal use by the bbx
library / system and not meant for user programs.

As a reader of the code, I made exactly the reverse assumption. When I
see lower-case macros I assume they are for internal use.

They're function-like macros. Iterating over an rgba buffer is very
processor-intensive, and so we do haave to compromise sfatety for speed
here. All function-like symbols bbx_ are provided by Baby X for users,
all symbols BBX_ have that prefix to reduce the chance of collisions
with other code.

In this little exchange, there have been several points where your code
is unclear, inefficient, non-portable or downright buggy, purely due to
your insistence in using an outdated version of C.

If you want BBX_RGBA to be a typedef for an unsigned 32-bit integer, write:

typedef uint32_t BBX_RGBA;

If you want bbx_rgba() to be a function that is typesafe, correct, and
efficient (for any decent compiler), write :

static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
uint32_t b, uint32_t a)
{
return (r << 24) | (g << 16) | (b << 8) | a;
}

If you want your colour types to be "opaque", as you claimed, make it a
struct with inline accessor functions.

Use static inline functions instead of function-like macros and you
don't need the extra parenthesis round things (and you don't need to
justify to readers why they are not there). You can use small letter
names without running contrary to common conventions.

Your insistence on hobbling your choice of language shows through in the
poor quality of the code - or at least, the missed opportunities to make
the code better and safer for both you and your users.

Richard Harnden

2024-06-15 19:27:25 UTC

    typedef uint32_t BBX_RGBA;
If you want bbx_rgba() to be a function that is typesafe, correct, and
    static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
            uint32_t b, uint32_t a)
    {
        return (r << 24) | (g << 16) | (b << 8) | a;
    }

Shouldn't that be ... ?

static inline BBX_RGBA bbx_rgba(uint8_t r, uint8_t g,
uint8_t b, uint8_t a)

Ben Bacarisse

2024-06-15 22:13:01 UTC

Post by Richard Harnden

    typedef uint32_t BBX_RGBA;
If you want bbx_rgba() to be a function that is typesafe, correct, and
    static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
            uint32_t b, uint32_t a)
    {
        return (r << 24) | (g << 16) | (b << 8) | a;
    }

Shouldn't that be ... ?
static inline BBX_RGBA bbx_rgba(uint8_t r, uint8_t g,
uint8_t b, uint8_t a)

Maybe, but the function then needs more care as uint8_t will promote to
int and then r << 24 can be undefined. One needs

((BBX_RGBA)r << 24) | (g << 16) | (b << 8) | a

(assuming that int is never going to be 16 bits or the same issue comes
up with the g << 16 shift). Given this assumption, I'd just check that
unsigned int is at least 32 bits are use that for BBX_RGBA.

--
Ben.

David Brown

2024-06-16 10:53:38 UTC

Post by Richard Harnden

     typedef uint32_t BBX_RGBA;
If you want bbx_rgba() to be a function that is typesafe, correct, and
     static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
             uint32_t b, uint32_t a)
     {
         return (r << 24) | (g << 16) | (b << 8) | a;
     }

Shouldn't that be ... ?
static inline BBX_RGBA bbx_rgba(uint8_t r, uint8_t g,
uint8_t b, uint8_t a)

As Ben says, that will not work on its own - "r" would get promoted to
signed int before the shift, and we are back to undefined behaviour.

I think there is plenty of scope for improvement in a variety of ways,
depending on what the author is looking for. For example, uint8_t might
not exist on all platforms (indeed there are current processors that
don't support it, not just dinosaur devices). But any system that
supports a general-purpose gui, such as Windows or *nix systems, will
have these types and will also have a 32-bit int. So the code author
can balance portability with convenient assumptions.

There are also balances to be found between run-time checking and
efficiency, and how to handle bad data. If the function can assume that
no one calls it with values outside 0..255, or that it doesn't matter
what happens if such values are used, then you don't need any checks.
As it stands, with uint32_t parameters, out-of-range values will lead to
fully defined but wrong results. Switching to "uint8_t" types would
give a different fully defined but wrong result. Maybe the function
should use saturation, or run-time checks and error messages - that will
depend on where it is in the API, what the code author wants, and what
users expect.

Malcolm McLean

2024-06-16 13:44:53 UTC

Post by David Brown

Post by Richard Harnden

     typedef uint32_t BBX_RGBA;
If you want bbx_rgba() to be a function that is typesafe, correct,
     static inline BBX_RGBA bbx_rgba(uint32_t r, uint32_t g,
             uint32_t b, uint32_t a)
     {
         return (r << 24) | (g << 16) | (b << 8) | a;
     }

Shouldn't that be ... ?
static inline BBX_RGBA bbx_rgba(uint8_t r, uint8_t g,
uint8_t b, uint8_t a)

As Ben says, that will not work on its own - "r" would get promoted to
signed int before the shift, and we are back to undefined behaviour.
I think there is plenty of scope for improvement in a variety of ways,
depending on what the author is looking for. For example, uint8_t might
not exist on all platforms (indeed there are current processors that
don't support it, not just dinosaur devices). But any system that
supports a general-purpose gui, such as Windows or *nix systems, will
have these types and will also have a 32-bit int. So the code author
can balance portability with convenient assumptions.
There are also balances to be found between run-time checking and
efficiency, and how to handle bad data. If the function can assume that
no one calls it with values outside 0..255, or that it doesn't matter
what happens if such values are used, then you don't need any checks. As
it stands, with uint32_t parameters, out-of-range values will lead to
fully defined but wrong results. Switching to "uint8_t" types would
give a different fully defined but wrong result. Maybe the function
should use saturation, or run-time checks and error messages - that will
depend on where it is in the API, what the code author wants, and what
users expect.

It's the general function which converts reba quads or rgb triplets
(with a little wrapper) to opaque colour values to pass about to
graphics functions. And currently it's not used to access the raster for
BBX_Canvas elements.
But that's where Baby X does a lot of work which is a target for
optimisation. And the opaque format might have to change to match the
internal format used by the platform. However optimising the graphics
isn't the priority for now, which is getting the API and the
documentation right.
And what should happen if user passes wrong values? Since the function
is provided as a macro rather than a subroutine, it's kind of accepted
that it is too low-level for error checking. So user will usually draw
the wrong colour on his screen, which might be a hard bug to diagnose.

But yes, this is absolutely the sort of thing you need to get right.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Chris M. Thomasson

2024-06-14 18:49:02 UTC

On 6/14/2024 4:44 AM, Ben Bacarisse wrote:
[...]

Post by Ben Bacarisse
Which is why I thought you might be including images in the notion of
"holding rgba values".

Fwiw, I remember doing a channel based hit map that stored an image
using RGBA but used floats. Each pixel would have a hit:

struct hit
{
float m_color[4];
};

It would take all of the hits and depending on what was going on during
iteration it would increment parts of hit::m_color[4]. The sum of all of
the colors would be the total hits. Fwiw, here is an example of one of
my results from a trig based bifurcation iterated function system I
developed. There is actually some pseudo code in the description:

Notice the white colors? Those are high density points that create the
"main" connections in the bifurcation. Here is my pseudo code that has
allowed others to recreate it:
______________
A highly experimental #iterated function system of mine that creates
many #fractal #bifurcation diagrams locked in the unit square. Afaict,
the animation makes it appear as if everything is rotating around a
cylinder. Here is my #IFS that was used to create this animation:
______________
// px_mutation interpolates from -4...4 across each frame; 1440 here.

render frames:
_________
// angle interpolates from 0...pi2 across iterations
// px = py = 0

// Iteration:
px = sin(angle * px_mutation);
py = cos(angle * py);
______________

Plot every pixel in the ifs. Actually, I am adding color to each pixel
visited during iteration.
______________

It was fun to create! :^)

Ben Bacarisse

2024-06-14 21:32:12 UTC

Fwiw, I remember doing a channel based hit map that stored an image using
struct hit
{
float m_color[4];
};
It would take all of the hits and depending on what was going on during
iteration it would increment parts of hit::m_color[4].

Not in C you didn't!

--
Ben.

Chris M. Thomasson

2024-06-15 07:56:51 UTC

Post by Ben Bacarisse

Fwiw, I remember doing a channel based hit map that stored an image using
struct hit
{
float m_color[4];
};
It would take all of the hits and depending on what was going on during
iteration it would increment parts of hit::m_color[4].

Not in C you didn't!

Are you referring to the hit::m_color[4] part I wrote? Yeah. That is not
C. My bad. I was just trying to denote the m_color array in the struct
hit for my post.

Keith Thompson

2024-06-13 22:58:39 UTC

Post by Scott Lurndal

Post by Malcolm McLean

Post by Keith Thompson
printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

No, because compilers have been able to diagnose mismatches
for more than two decades.

What about the previous 3 decades?

They're over. Sheesh.

Post by bart
What about the compilers that can't do that?

Use a compiler that can. If you're using a compiler for production that
can't produce the warnings, you can even use a different compiler just
to get the warnings (if your code isn't too dependent on the compiler
you're using). Code reviews also help.

Post by bart
What about even the latest gcc 14.1 that won't diagnose it even with
-Wpedantic -Wextra?

I don't know. The default gcc on my system diagnoses it by default, but
various versions of gcc I've built from source do not. Perhaps Ubuntu
configures gcc differently. (Ubuntu 22.04.4, gcc 11.4.0.) I'm building
gcc 11.4.0 from source, and I'll compare its behavior to that of
Ubuntu's gcc 11.4.0-1ubuntu1~22.04.

The "-pedantic" option enables diagnostics that are required by the C
standard. I wouldn't expect it to enable optional warnings like those
for format strings.

The "-Wextra" option enabled additional warnings that are not enabled by
"-Wall". Common usage if you want a lot of warnings is "-Wall -Wextra".
It doesn't make much sense to use "-Wextra" by itself.

You've used an unusual set of options that avoid enabling format string
warnings.

Format string warnings are enabled by "-Wformat", which is included in
"-Wall".

On serious projects, gcc is rarely invoked with default options.
If you don't like the default settings, I'm likely to agree with
you, but specifying the options you want is a lot more effective
than complaining.

But the mechanism for enabling the warning, and whether it's enabled by
default, is a gcc issue, not a C issue.

Post by bart
What about when the format string is a variable?

Then the compiler probably won't be able to diagnose it. (How often
do you use a variable format string?)

Post by bart
What about the example given below?
It is definitely a language problem. Dealing with some of it with some
compilers with some options isn't a solution, it's just a workaround.

Because C doesn't have the language features necessary for the library
to provide something as flexible as printf with more type safety.

Post by bart
Meanwhile for over 4 decades I've been able to just write 'print foo'
with no format mismatch, because such a silly concept doesn't exist.
THAT's how you deal with it.

By using a different language, which perhaps you should consider
discussing in a different newsgroup. We discuss C here.

If foo is an int, for example, printf lets you decide how to print
it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
in C23), upper vs. lower case for hex). Perhaps "print foo" in
your language has similar features.

Yes, the fact that incorrect printf format strings cause undefined
behavior, and that that's sometimes difficult to diagnose, is a
language problem. I don't recall anyone saying it isn't. But it's
really not that hard to deal with it as a programmer.

If you have ideas (other than abandoning C) for a flexible
type-safe printing function, by all means share them. What are your
suggestions? Adding `print` as a new keyword so you can use `print
foo` is unlikely to be considered practical; I'd want a much more
general mechanism that's not limited to stdio files. Reasonable new
language features that enable type-safe printf-like functions could
be interesting. I'm not aware of any such proposals for C.

Post by Scott Lurndal

Post by Malcolm McLean
We just can't have size_t variables swilling around in prgrams for these
reasons.

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under
Linux64 and it complains the format should be %ld. Change it to %ld,
and it complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.

And you know why, right? uint64_t is a typedef (an alias) for some
existing type, typically either unsigned long or unsigned long long.
If uint64_t is a typedef for unsigned long long, then i is of type
unsigned long long, and the format string is correct.

Sure, that's a language problem. It's unfortunate that code can be
either valid or a constraint violation depending on how the current
implementation defines uint64_t. I just don't spend much time
complaining about it.

I wouldn't mind seeing a new kind of typedef that creates a new type
rather than an alias. Then uint64_t could be a distinct type.
That could cause some problems for _Generic, for example.

C99 added <stdint.h>, defining fixed-width and other integer types using
existing language features. Sure, there are some disadvantages in the
way it was done. The alternative, creating new language features, would
likely have resulted in the proposal not being accepted until some time
after C99, if ever.

Post by bart
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

The standard allows using an argument of an integer type with a format
of the corresponding type of the other signedness, as long as the value
is in the range of both. (I vaguely recall the standard's wording being
a bit vague on this point.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

bart

2024-06-14 01:18:45 UTC

Post by Keith Thompson

Post by bart
Meanwhile for over 4 decades I've been able to just write 'print foo'
with no format mismatch, because such a silly concept doesn't exist.
THAT's how you deal with it.

By using a different language, which perhaps you should consider
discussing in a different newsgroup. We discuss C here.

That was my point about the 3 decades it took to do something about it.
In the end nothing really changed.

Post by Keith Thompson
If foo is an int, for example, printf lets you decide how to print
it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
in C23), upper vs. lower case for hex). Perhaps "print foo" in
your language has similar features.

The format string specified two things. One is to do with the type of an
expression, which the compiler knows. After all that's how sometimes it
can tell you you've got it wrong.

And if it can do that, it could also put in the format for you.

Post by Keith Thompson
Yes, the fact that incorrect printf format strings cause undefined
behavior, and that that's sometimes difficult to diagnose, is a
language problem. I don't recall anyone saying it isn't. But it's
really not that hard to deal with it as a programmer.
If you have ideas (other than abandoning C) for a flexible
type-safe printing function, by all means share them. What are your
suggestions?

A few years ago I played with a "%?" format code in my 'bcc' compiler
and demonstrated it here. The ? gets replaced by some suitable format
code. This is done within the compiler, not the printf library.

For other display control, such as hex output, or to provide other info
such as width, that still needs to be provided as it is done now.

This would cover most of my points except variable format strings, which
you said were not worth worrying about.

Here is a demo:

--------------------------
#include <stdio.h>
#include <stdint.h>
#include <time.h>

int main(void) {
uint64_t a = 0xFFFFFFFF00000000;
float b = 1.46;
int c = -67;
char* d = "Hello";
int* e = &c;

for (int i=0; i<100000000; ++i);

clock_t f = clock();

printf("%=? %=? %=? %=? %=? %=?\n", a, b, c, d, e, f);
printf("%=? %=? %=? %=? %=? %=?\n", f, e, d, c, b, a);
}
--------------------------

This prints 6 variables of diverse types with a suitable default format.
Then it prints then in reverse order, without having to change those
format codes.

The '=' is an extra feature which displays the name of the argument.

The output from this was:

A=18446744069414584320 B=1.460000 C=-67 D=Hello E=000000000080FF08 F=219
F=219 E=000000000080FF08 D=Hello C=-67 B=1.460000 A=18446744069414584320

It's not quite as good as my language where it's just:

println =a, =b, =c, =d, =d, =f

but I think it was an interesting experiment. This required 50 lines of
code within my C compiler; a bit more for a full treatment.

Adding `print` as a new keyword so you can use `print

Post by Keith Thompson
foo` is unlikely to be considered practical; I'd want a much more
general mechanism that's not limited to stdio files. Reasonable new
language features that enable type-safe printf-like functions could
be interesting. I'm not aware of any such proposals for C.

Post by Scott Lurndal

Post by Malcolm McLean
We just can't have size_t variables swilling around in prgrams for these
reasons.

POSIX defines a set of strings that can be used by a programmer to
specify the format string for size_t on any given implementation.

uint64_t i=0;
printf("%lld\n", i);
This compiles OK with gcc -Wall, on Windows64. But compile under
Linux64 and it complains the format should be %ld. Change it to %ld,
and it complains under Windows.
It can't tell you that you should be using one of those ludicrous macros.

And you know why, right? uint64_t is a typedef (an alias) for some
existing type, typically either unsigned long or unsigned long long.
If uint64_t is a typedef for unsigned long long, then i is of type
unsigned long long, and the format string is correct.
Sure, that's a language problem. It's unfortunate that code can be
either valid or a constraint violation depending on how the current
implementation defines uint64_t. I just don't spend much time
complaining about it.
I wouldn't mind seeing a new kind of typedef that creates a new type
rather than an alias. Then uint64_t could be a distinct type.
That could cause some problems for _Generic, for example.
C99 added <stdint.h>, defining fixed-width and other integer types using
existing language features. Sure, there are some disadvantages in the
way it was done. The alternative, creating new language features, would
likely have resulted in the proposal not being accepted until some time
after C99, if ever.

Post by bart
I've also just noticed that 'i' is unsigned but the format calls for
signed. That may or may not be deliberate, but the compiler didn't say
anything.

The standard allows using an argument of an integer type with a format
of the corresponding type of the other signedness, as long as the value
is in the range of both. (I vaguely recall the standard's wording being
a bit vague on this point.)

David Brown

2024-06-14 17:08:13 UTC

Post by Keith Thompson
If foo is an int, for example, printf lets you decide how to print
it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
in C23), upper vs. lower case for hex). Perhaps "print foo" in
your language has similar features.

C23 also adds explicit width length modifiers. So instead of having to
guess if uint64_t is "%llu" or "%lu" on a particular platform, or using
the PRIu64 macro, you can now use "%w64u" for uint64_t (or
uint_least64_t if the exact width type does not exist). I think that's
about as neat as you could get, within the framework of printf.

Post by Keith Thompson
Yes, the fact that incorrect printf format strings cause undefined
behavior, and that that's sometimes difficult to diagnose, is a
language problem. I don't recall anyone saying it isn't. But it's
really not that hard to deal with it as a programmer.

It is particularly easy if you have a decent compiler and know how to
enable the right warning flags!

Post by Keith Thompson
If you have ideas (other than abandoning C) for a flexible
type-safe printing function, by all means share them. What are your
suggestions? Adding `print` as a new keyword so you can use `print
foo` is unlikely to be considered practical; I'd want a much more
general mechanism that's not limited to stdio files. Reasonable new
language features that enable type-safe printf-like functions could
be interesting. I'm not aware of any such proposals for C.

It is possible to come a long way with variadic macros and _Generic.
You can at least end up being able to write something like :

int x = 123;
const char * s = "Hello, world!";
uint64_t u = 0x4242;

Print("X = ", x, " the string is ", s, " and u = 0x",
as_hex(u, 6), newline);

rather than:

printf("X = %i the string is %s and u = 0x%06lx\n");

Which you think is better is a matter of opinion.

Post by Keith Thompson
I wouldn't mind seeing a new kind of typedef that creates a new type
rather than an alias. Then uint64_t could be a distinct type.
That could cause some problems for _Generic, for example.

I too would like such a typedef. Using it for uint64_t would cause
problems for /existing/ uses of _Generic, but would make future uses better.

Keith Thompson

2024-06-14 19:34:51 UTC

Post by David Brown

Post by Keith Thompson
If foo is an int, for example, printf lets you decide how to print
it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
in C23), upper vs. lower case for hex). Perhaps "print foo" in
your language has similar features.

C23 also adds explicit width length modifiers. So instead of having
to guess if uint64_t is "%llu" or "%lu" on a particular platform, or
using the PRIu64 macro, you can now use "%w64u" for uint64_t (or
uint_least64_t if the exact width type does not exist). I think
that's about as neat as you could get, within the framework of printf.

Note that the new "%wN" modifier applies only to [u]intN_t and
[u]int_leastN_t types, not to all integer types with a width of N bits.

The standard doesn't guarantee that integer types with the same
representation are interchangeable, so for example printf("%d", 0L) and
printf("%ld", 0) both have undefined behavior. An implementation would
probably have to go out of its way to make either of those do anything
other than printing "0", but the behavior is still undefined (i.e., the
standard doesn't guarantee it will work).

That's still the case in C23, even for the %wN modifiers. For a typical
implementation with 32-bit integer types, uint32_t and uint_least32_t
will be the same type (C17 doesn't require that), and "%w32u" will work
with that type. It's not guaranteed to work with any other 32-bit
unsigned type. For an implementation that doesn't have any 32-bit
integer type, uint32_t won't exist, uint_least32_t will be, say, 64
bits, and "%w32u" will work with *that* type.

That covers the exact-width and "least" types. The "%wfN" modifiers
cover the "fast" types.

So if you want to use C23's new "%wN" modifiers, you have to use the
types defined in <stdint.h> if you want to avoid undefined behavior.
On the other hand, though `int n = 42; printf("%w32d\n", n);` has
undefined behavior, it's very very likely to work if int is 32 bits.
(`gcc -Wformat` warns about using "%ld" with a long long argument
even when long and long long have the same size, but not about using
"%w32d" with a 32-bit int argument.)

The new modifiers are supported in glibc 2.39, which is included in
Ubuntu 24.04. They're not supported in newlib (used by Cygwin) or in
MS Visual Studio 2022.

[...]

Post by David Brown

Post by Keith Thompson
I wouldn't mind seeing a new kind of typedef that creates a new type
rather than an alias. Then uint64_t could be a distinct type.
That could cause some problems for _Generic, for example.

I too would like such a typedef. Using it for uint64_t would cause
problems for /existing/ uses of _Generic, but would make future uses better.

Currently, there are (in the absence of extended integer types) only a
finite number of incompatible integer types. This makes it possible to
write a _Generic expression that accepts an operand of any integer type,
which can be useful if you have an integer typedef and don't know the
underlying type. This new kind of typedef would allow programmers to
introduce an unlimited number of new incompatible integer types.

I haven't seen a lot of code that does that kind of thing, and none
that I didn't write myself.

Perhaps if this is introduced, there should be a way to determine the
underlying type. C23 introduces typeof and typeof_unqual; perhaps we
could have typeof_underlying. It could also apply to enum types.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

David Brown

2024-06-15 20:13:24 UTC

Post by Keith Thompson

Post by David Brown

Post by Keith Thompson
If foo is an int, for example, printf lets you decide how to print
it (leading zeros or spaces, decimal vs. hex vs. octal (or binary
in C23), upper vs. lower case for hex). Perhaps "print foo" in
your language has similar features.

C23 also adds explicit width length modifiers. So instead of having
to guess if uint64_t is "%llu" or "%lu" on a particular platform, or
using the PRIu64 macro, you can now use "%w64u" for uint64_t (or
uint_least64_t if the exact width type does not exist). I think
that's about as neat as you could get, within the framework of printf.

Note that the new "%wN" modifier applies only to [u]intN_t and
[u]int_leastN_t types, not to all integer types with a width of N bits.

Yes, but by the definition of these types, if you uintN_t exists then
uint_leastN_t is the same type. (This is a new detail in C23.)

Post by Keith Thompson
The standard doesn't guarantee that integer types with the same
representation are interchangeable, so for example printf("%d", 0L) and
printf("%ld", 0) both have undefined behavior. An implementation would
probably have to go out of its way to make either of those do anything
other than printing "0", but the behavior is still undefined (i.e., the
standard doesn't guarantee it will work).

True. For C17, uint32_t and uint_least32_t could be different and
incompatible types. It's highly unlikely, but possible. That was fixed
in C23. (7.22.1.1p3)

Post by Keith Thompson
That's still the case in C23, even for the %wN modifiers. For a typical
implementation with 32-bit integer types, uint32_t and uint_least32_t
will be the same type (C17 doesn't require that), and "%w32u" will work
with that type. It's not guaranteed to work with any other 32-bit
unsigned type. For an implementation that doesn't have any 32-bit
integer type, uint32_t won't exist, uint_least32_t will be, say, 64
bits, and "%w32u" will work with *that* type.

Yes, that is correct.

It can be surprising for some people to hear that types with identical
size and characteristics can still be incompatible. But at least with
C23, we don't have to worry about that for the uintN_t and uint_leastN_t
types. (The same applies to the signed versions.)

If you want to use the bit width length modifiers in C23 printf, you
might still have to cast your "int" or "long" data to an appropriate
intN_t or int_leastN_t.

Post by Keith Thompson
That covers the exact-width and "least" types. The "%wfN" modifiers
cover the "fast" types.
So if you want to use C23's new "%wN" modifiers, you have to use the
types defined in <stdint.h> if you want to avoid undefined behavior.

Yes. But if you want particular sizes for your types, that's a good
idea anyway.

Post by Keith Thompson
On the other hand, though `int n = 42; printf("%w32d\n", n);` has
undefined behavior, it's very very likely to work if int is 32 bits.
(`gcc -Wformat` warns about using "%ld" with a long long argument
even when long and long long have the same size, but not about using
"%w32d" with a 32-bit int argument.)
The new modifiers are supported in glibc 2.39, which is included in
Ubuntu 24.04. They're not supported in newlib (used by Cygwin) or in
MS Visual Studio 2022.
[...]

Post by David Brown

Post by Keith Thompson
I wouldn't mind seeing a new kind of typedef that creates a new type
rather than an alias. Then uint64_t could be a distinct type.
That could cause some problems for _Generic, for example.

I too would like such a typedef. Using it for uint64_t would cause
problems for /existing/ uses of _Generic, but would make future uses better.

Currently, there are (in the absence of extended integer types) only a
finite number of incompatible integer types. This makes it possible to
write a _Generic expression that accepts an operand of any integer type,
which can be useful if you have an integer typedef and don't know the
underlying type. This new kind of typedef would allow programmers to
introduce an unlimited number of new incompatible integer types.

Yes. But it would also allow you to make a "strong typedef" for a
particular use and have a _Generic that distinguishes it. I believe I
would find that more useful than the disadvantage you describe.
(Perhaps it would be even better if it were possible to extend
_Generic's, rather than cover all the types in one go.)

Post by Keith Thompson
I haven't seen a lot of code that does that kind of thing, and none
that I didn't write myself.
Perhaps if this is introduced, there should be a way to determine the
underlying type. C23 introduces typeof and typeof_unqual; perhaps we
could have typeof_underlying. It could also apply to enum types.

Interesting idea.

Keith Thompson

2024-06-14 20:43:28 UTC

Keith Thompson <Keith.S.Thompson+***@gmail.com> writes:
[...]

Post by Keith Thompson
I don't know. The default gcc on my system diagnoses it by default, but
various versions of gcc I've built from source do not. Perhaps Ubuntu
configures gcc differently. (Ubuntu 22.04.4, gcc 11.4.0.) I'm building
gcc 11.4.0 from source, and I'll compare its behavior to that of
Ubuntu's gcc 11.4.0-1ubuntu1~22.04.

[...]

Context: Warning about incorrect printf format strings, such as
`printf("%d\n", strlen(s));` ("%d" requires an int argument but strlen()
returns a result of type size_t).

I've confirmed that, on my Ubuntu 22.04 system, the system-provided
gcc ("Ubuntu 11.4.0-1ubuntu1~22.04") warns about the mismatch, but
gcc 11.4.0 built from source does not.

So Ubuntu (or its upstream Debian) does a custom build of gcc that
enables "-Wformat" by default.

Confirmed by this answer on Stack Overflow:
<https://stackoverflow.com/a/50112401/827263>
"""
This is not caused by a difference in GCC versions. Rather, Ubuntu has
modified GCC to enable -Wformat -Wformat-security by default. If you
pass those options on Arch Linux, you should see the same behaviour
there.
"""

(The answer contains a link to a web page that no longer exists.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson

2024-06-13 21:47:44 UTC

Post by Malcolm McLean

Post by Keith Thompson
printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

Not at all. Compilers commonly diagnose mismatches when the format
string is a string literal, as it most commonly is. The format
specifier for size_t is "%zu", since C99.

Post by Malcolm McLean

Post by Keith Thompson
There is no implicit conversion to the expected type. Note that
the format string doesn't have to be a string literal, so it's
not always even possible for the compiler to check the types.
Variadic functions give you a lot of flexibility at the cost of
making some type errors difficult to detect.
(I wrote "probably" because size_t *might* be a typedef for unsigned
int, and there are special rules about arguments of corresponding
signed and unsigned types.)

We just can't have size_t variables swilling around in prgrams for
these reasons.

We can and do.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Malcolm McLean

2024-06-13 23:41:08 UTC

Post by Keith Thompson

Post by Malcolm McLean

Post by Keith Thompson
printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

Not at all. Compilers commonly diagnose mismatches when the format
string is a string literal, as it most commonly is. The format
specifier for size_t is "%zu", since C99.

Post by Malcolm McLean

Post by Keith Thompson
There is no implicit conversion to the expected type. Note that
the format string doesn't have to be a string literal, so it's
not always even possible for the compiler to check the types.
Variadic functions give you a lot of flexibility at the cost of
making some type errors difficult to detect.
(I wrote "probably" because size_t *might* be a typedef for unsigned
int, and there are special rules about arguments of corresponding
signed and unsigned types.)

We just can't have size_t variables swilling around in prgrams for
these reasons.

We can and do.

And this is how things break.

Now, running a third party editor under your control so that user can
edit an text and return control and the edited text back to you when he
exits the editor. Yes, I understand that this is a difficult thing to
do, the software engineeering isn't consistent, and theway you have to
do it may change from one version of C to another.
But printing out a variable which holds the length of a string? And
something so basic breaks from one version of C to the next? We should
ahave no tolerance for that at all.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Keith Thompson

2024-06-14 00:09:48 UTC

Post by Malcolm McLean

Post by Keith Thompson

Post by Malcolm McLean

Post by Keith Thompson
printf is a variadic function, so the types of the arguments after
the format string are not specified in its declaration. The printf
function has to *assume* that arguments have the types specified
printf("%d\n", foo);
(probably) has undefined behavior if foo is of type size_t.

And isn't that a nightmare?

Not at all. Compilers commonly diagnose mismatches when the format
string is a string literal, as it most commonly is. The format
specifier for size_t is "%zu", since C99.

Post by Malcolm McLean

Post by Keith Thompson
There is no implicit conversion to the expected type. Note that
the format string doesn't have to be a string literal, so it's
not always even possible for the compiler to check the types.
Variadic functions give you a lot of flexibility at the cost of
making some type errors difficult to detect.
(I wrote "probably" because size_t *might* be a typedef for unsigned
int, and there are special rules about arguments of corresponding
signed and unsigned types.)

We just can't have size_t variables swilling around in prgrams for
these reasons.

We can and do.

And this is how things break.
Now, running a third party editor under your control so that user can
edit an text and return control and the edited text back to you when
he exits the editor. Yes, I understand that this is a difficult thing
to do, the software engineeering isn't consistent, and theway you have
to do it may change from one version of C to another.
But printing out a variable which holds the length of a string? And
something so basic breaks from one version of C to the next? We should
ahave no tolerance for that at all.

What broke? And how would *you* print the result of strlen()?

strlen() has returned a result of type size_t since C89/C90.

C99 (that's 25 years ago) added the "%zu" format specifier. Today,
you're unlikely to find an implementation that doesn't support
printf("%zu\n", strlen(s));
But even if you need to deal with pre-C99 implementations for some
reason, this:
printf("%lu\n", (unsigned long)(strlen(s));
works reliably in C90, and works in C99 and later as long as size_t is
no wider than unsigned long -- and even then it breaks (printing an
incorrect value) only if the actual value returned by strlen(s) exceeds
ULONG_MAX, which is at least 4294967295. If you're using 4-gigabyte
strings, you probably want to avoid calling strlen() on them anyway.

This:
printf("%d\n", strlen(s));
has *never* been valid (it has undefined behavior unless the
implementation you're using happens to make size_t a typedef for
unsigned int and the value doesn't exceed INT_MAX, which might be
as small as 32767).

We're simply not going to throw away the last quarter century of
progress in C and go back to C90. You can if you like, but don't
expect anyone else to follow you.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

David Brown

2024-06-12 21:38:45 UTC

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?
Also, why doesn't gcc just do you a solid and initialize to "" for you?

It is /really/ difficult to know exactly what your problem is without
seeing your C code! There may be other problems that you haven't seen yet.

Non-static local variables without initialisers have "indeterminate"
value if there is no initialiser. Trying to use these "indeterminate"
values is undefined behaviour - you have absolutely no control over what
might happen. Any particular behaviour you see is done to luck from the
rest of the code and what happened to be in memory at the time.

There is no automatic initialisation of non-static local variables,
because that would often be inefficient. The best way to avoid errors
like yours, IMHO, is not to declare such variables until you have data
to put in them - thus you always have a sensible initialiser of real
data. Occasionally that is not practical, but it works in most cases.

For a data array, zero initialisation is common. Typically you do this
with :

int xs[100] = { 0 };

That puts the explicit 0 in the first element of xs, and then the rest
of the array is cleared with zeros.

I recommend never using "char" as a type unless you really mean a
character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to be
numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.

A major lesson here is to learn how to use your tools. C is not a
forgiving language. Make use of all the help your tools can give you -
enable warnings here. "gcc -Wall" enables a range of common warnings
with few false positives in normal well-written code, including ones
that check for attempts to read uninitialised data. "-Wextra" enables a
slew of extra warnings. Some of these will annoy people and trigger on
code they find reasonable, while most are good choices for a lot of code
- but personal preference varies significantly. And remember to enable
optimisation, since it makes the static checking more powerful.

If you /really/ want gcc to zero out such local data automatically, use
"-ftrivial-auto-var-init=zero". But it is much better to use warnings
and write correct code - options like that one are an addition to
well-checked code for paranoid software in security-critical contexts.

Keith Thompson

2024-06-12 22:18:34 UTC

David Brown <***@hesbynett.no> writes:
[...]

Post by David Brown
I recommend never using "char" as a type unless you really mean a
character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to
be numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.

[...]

The implementation-definedness of plain char is awkward, but char
arrays generally work just fine for UTF-8 strings. If char is
signed, byte values greater than 127 will be stored as negative
values, but it will almost certainly just work (if your system
is configured to handle UTF-8). Likewise for Latin-1 and similar
8-bit character sets.

The standard string functions operate on arrays of plain char, so
storing UTF-8 strings in arrays of uint8_t or unsigned char will
seriously restrict what you can do with them.

(I'd like to a future standard require plain char to be unsigned,
but I don't know how likely that is.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

David Brown

2024-06-13 12:42:22 UTC

Post by Keith Thompson
[...]

Post by David Brown
I recommend never using "char" as a type unless you really mean a
character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to
be numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.

[...]
The implementation-definedness of plain char is awkward, but char
arrays generally work just fine for UTF-8 strings.

Yes, but "generally work" is not quite as strong as I would like. My
preference for UTF-8 strings is a const unsigned char type (with C23, it
will be char8_t, which is defined to be the same type as "unsigned
char"). But u8"Hello, world" UTF-8 string literals (since C11) are
considered to be like an array of type "char" in C (until C23), so I
guess UTF-8 strings will be safe in plain char arrays. Still, the bytes
in a UTF-8 strings are code units with values between 0 and 255, so I
prefer to store these in a type that can hold that range of values.

(What happens if you have a platform that uses ones' complement
arithmetic, with "char" being signed and a range of -127 to +127, and
you have a u8"..." string which has a code unit of 0x80 that cannot be
represented in "char" ? It's just a hypothetical question, of course.)

Post by Keith Thompson
If char is
signed, byte values greater than 127 will be stored as negative
values, but it will almost certainly just work (if your system
is configured to handle UTF-8). Likewise for Latin-1 and similar
8-bit character sets.
The standard string functions operate on arrays of plain char, so
storing UTF-8 strings in arrays of uint8_t or unsigned char will
seriously restrict what you can do with them.
(I'd like to a future standard require plain char to be unsigned,
but I don't know how likely that is.)

I would also prefer that, but too much existing code relies on plain
char being signed on the platforms it runs on. I personally think the
idea of having signed or unsigned characters is a very poor choice of
names for the terms, but it's way too late to change that! C23 has
"char8_t" which is always unsigned.

(In C23, "char8_t" is defined in <uchar.h> and is the same type as
"unsigned char". In C++20, in contrast, "char8_t" is a keyword and a
distinct type with identical size and range to "unsigned char".)

Keith Thompson

2024-06-13 23:39:52 UTC

Post by David Brown

Post by Keith Thompson
[...]

Post by David Brown
I recommend never using "char" as a type unless you really mean a
character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to
be numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.

[...]
The implementation-definedness of plain char is awkward, but char
arrays generally work just fine for UTF-8 strings.

Yes, but "generally work" is not quite as strong as I would like.

Agreed, but we're stuck with it.

Post by David Brown
My
preference for UTF-8 strings is a const unsigned char type (with C23,
it will be char8_t, which is defined to be the same type as "unsigned
char").

But then you can't use standard library functions (unless you use
pointer conversions).

#include <stdio.h>
int main(void) {
const char *s = "héllo, wörld";
const unsigned char *u = "héllo, wörld";
puts(s);
puts(u); // constraint violation
puts((const char*)u); // valid but ugly
}

Implementations that make plain char signed *have to* deal sanely with
8-bit data. The standard might permit some things to misbehave, but as
a QoI issue it's reasonably safe to assume that it Just Works unless
you're using the DeathStation 9000.

Post by David Brown
(What happens if you have a platform that uses ones' complement
arithmetic, with "char" being signed and a range of -127 to +127, and
you have a u8"..." string which has a code unit of 0x80 that cannot be
represented in "char" ? It's just a hypothetical question, of
course.)

C23 mandates two's-complement for all integer types.
Ones'-complement implementations are rare, and I don't think any of
them support recent C standards, so "u8"..." is going to be a syntax
error anyway. My guess (and it's nothing more than that) is that
any ones'-complement implementations make plain char unsigned just
to avoid this kind of issue. But even if they don't, a signed byte
with all bits 1 (-0 in ones'-complement) is likely to be treated
as 0xff by I/O functions.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

2024-06-19 00:23:09 UTC

Post by Keith Thompson
(I'd like to a future standard require plain char to be unsigned,
but I don't know how likely that is.)

It seems unnecessary given that the upcoming C standard
is choosing to mandate two's complement for all signed
integer types.

Keith Thompson

2024-06-19 00:42:48 UTC

Post by Tim Rentsch

Post by Keith Thompson
(I'd like to a future standard require plain char to be unsigned,
but I don't know how likely that is.)

It seems unnecessary given that the upcoming C standard
is choosing to mandate two's complement for all signed
integer types.

It's less necessary, but I'd still like to see it.

These days, strings very commonly hold UTF-8 data. The fact that bytes
whose values exceed 127 are negative is conceptually awkward, even if
everything happens to work. It rarely if ever makes sense to treat a
character value as negative. (And of course signed char still exists,
or int8_t if you prefer 8 bits vs. CHAR_BIT bits.)

A drawback is that it could break existing (non-portable) code that
assumes plain char is signed.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

2024-06-22 16:28:14 UTC

Post by Keith Thompson

Post by Tim Rentsch

Post by Keith Thompson
(I'd like to a future standard require plain char to be unsigned,
but I don't know how likely that is.)

It seems unnecessary given that the upcoming C standard
is choosing to mandate two's complement for all signed
integer types.

It's less necessary, but I'd still like to see it.
These days, strings very commonly hold UTF-8 data. The fact that bytes
whose values exceed 127 are negative is conceptually awkward, even if
everything happens to work. It rarely if ever makes sense to treat a
character value as negative.

The combination of mandating two's complement and using a compiler
option like -funsigned-char (supported by both gcc and clang)
should be enough to do what you want.

Post by Keith Thompson
(And of course signed char still exists,
or int8_t if you prefer 8 bits vs. CHAR_BIT bits.)

It makes me laugh when people use int8_t instead of signed char.
If CHAR_BIT isn't 8 then there won't be any int8_t. And of
course we can always throw in a static assertion if it is felt
necessary to protect against implementations that don't have
8-bit chars. (A static assertion also can verify that two's
complement is being used for signed char.)

Post by Keith Thompson
A drawback is that it could break existing (non-portable) code that
assumes plain char is signed.

Exactly! No reason to break the whole world when you can get
what you want just by using a compiler option.

DFS

2024-06-12 22:29:27 UTC

Post by David Brown

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all
values > 40 are flagged as outliers. Up to 40, no problem. Random
numbers dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers
(using the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data
point 41+ only when the dataset is consecutive numbers?
Also, why doesn't gcc just do you a solid and initialize to "" for you?

It is /really/ difficult to know exactly what your problem is without
seeing your C code! There may be other problems that you haven't seen yet.

The outlier section starts on line 169
=====================================================================================

//this code is hereby released to the public domain

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <time.h>

/*
this program computes the descriptive statistics of a randomly
generated set of N integers

1.0 release Dec 2020
2.0 release Jun 2024

used the population skewness and Kurtosis formulas from:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
also test the results of this code against that site

compile: gcc -Wall prog.c -o prog -lm
usage : ./prog N -option (where N is 2 or higher, and option is -r or
-c or -o)
-r generates N random numbers
-c generates consecutive numbers 1 to N
-o generates random numbers with outliers
*/

//random ints
int randNbr(int low, int high) {
return (low + rand() / (RAND_MAX / (high - low + 1) + 1));
}

//comparator function used with qsort
int compareint (const void * a, const void * b)
{
if (*(int*)a > *(int*)b) return 1;
else if (*(int*)a < *(int*)b) return -1;
else return 0;
}

int main(int argc, char *argv[])
{
if(argc < 3) {
printf("Missing argument:\n");
printf(" * enter a number greater than 2\n");
printf(" * enter an option -r -c or -o\n");
exit(0);
}

//vars
int i=0, lastmode=0;
int N = atoi(argv[1]);
int nums[N];

double sumN=0.0, median=0.0, Q1=0.0, Q2=0.0, Q3=0.0, IQR=0.0;
double stddev = 0.0, kurtosis = 0.0;
double sqrdiffmean = 0.0, cubediffmean = 0.0, quaddiffmean = 0.0;
double meanabsdev = 0.0, rootmeansqr = 0.0;
char mode[100], tmp[12];

//generate random dataset
if(strcmp(argv[2],"-r") == 0) {
srand(time(NULL));
for(i=0;i<N;i++) { nums[i] = randNbr(1,N*3); }

printf("%d Randoms:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
qsort(nums,N,sizeof(int),compareint);
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//generate random dataset with outliers
if(strcmp(argv[2],"-o") == 0) {
srand(time(NULL));
nums[0] = 1; nums[1] = 3;
for(i=2;i<N-2;i++) { nums[i] = randNbr(100,N*30); }
nums[N-2] = 1000; nums[N-1] = 2000;

printf("%d Randoms with outliers:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas: "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
qsort(nums,N,sizeof(int),compareint);
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nSorted : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//generate consecutive numbers 1 to N
if(strcmp(argv[2],"-c") == 0) {
for(i=0;i<N;i++) { nums[i] = i + 1; }

printf("%d Consecutive:\n", N);
printf("No commas : "); for(i=0;i<N;i++) { printf("%d ", nums[i]); }
printf("\nWith commas : "); for(i=0;i<N;i++) { printf("%d,", nums[i]); }
}

//various
for(i=0;i<N;i++) {sumN += nums[i];}
double min = nums[0], max = nums[N-1];

//calc descriptive stats
double mean = sumN / (double)N;
int ucnt = 1, umaxcnt=1;
for(i = 0; i < N; i++)
{
sqrdiffmean += pow(nums[i] - mean, 2); // for variance and sum squares
cubediffmean += pow(nums[i] - mean, 3); // for skewness
quaddiffmean += pow(nums[i] - mean, 4); // for Kurtosis
meanabsdev += fabs((nums[i] - mean)); // for mean absolute deviation
rootmeansqr += nums[i] * nums[i]; // for root mean square

//mode
if(ucnt == umaxcnt && lastmode != nums[i])
{
sprintf(tmp,"%d ",nums[i]);
strcat(mode,tmp);
}

if(nums[i]-nums[i+1]!=0) {ucnt=1;} else {ucnt++;}

if(ucnt>umaxcnt)
{
umaxcnt=ucnt;
memset(mode, '\0', sizeof(mode));
sprintf(tmp, "%d ", nums[i]);
strcat(mode, tmp);
lastmode = nums[i];
}
}

// median and quartiles
// quartiles divide sorted dataset into four sections
// Q1 = median of values less than Q2
// Q2 = median of the data set
// Q3 = median of values greater than Q2
if(N % 2 == 0) {
Q2 = median = (nums[(N/2)-1] + nums[N/2]) / 2.0;
i = N/2;
if(i % 2 == 0) {
Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
Q3 = (nums[i + ((i-1)/2)] + nums[i+(i/2)]) / 2.0;
}
if(i % 2 != 0) {
Q1 = nums[(i-1)/2];
Q3 = nums[i + ((i-1)/2)];
}
}

if(N % 2 != 0) {
Q2 = median = nums[(N-1)/2];
i = (N-1)/2;
if(i % 2 == 0) {
Q1 = (nums[(i/2)-1] + nums[i/2]) / 2.0;
Q3 = (nums[i + (i/2)] + nums[i + (i/2) + 1]) / 2.0;
}
if(i % 2 != 0) {
Q1 = nums[(i-1)/2];
Q3 = nums[i + ((i+1)/2)];
}
}

// outliers: below Q1−1.5xIQR, or above Q3+1.5xIQR
IQR = Q3 - Q1;
char outliers[200]="", temp[10]="";
if (N > 3) {

//range for outliers
double lo = Q1 - (1.5 * IQR);
double hi = Q3 + (1.5 * IQR);

//no outliers
if ( min > lo && max < hi) {
strcat(outliers,"none (using IQR * 1.5 rule)");
}

//at least one outlier
if ( min < lo || max > hi) {
for(i = 0; i < N; i++) {
double val = (double)nums[i];
if(val < lo || val > hi) {
sprintf(temp,"%.0f ",val);
temp[strlen(temp)] = '\0';
strcat(outliers,temp);
}
}
strcat(outliers," (using IQR * 1.5 rule)");
}
outliers[strlen(outliers)] = '\0';
}

stddev = sqrt(sqrdiffmean/N);
kurtosis = quaddiffmean / (N * pow(sqrt(sqrdiffmean/N),4));

//output
printf("\n--------------------------------------------------------------\n");
printf("Minimum = %.0f\n", min);
printf("Maximum = %.0f\n", max);
printf("Range = %.0f\n", max - min);
printf("Size N = %d\n" , N);
printf("Sum N = %.0f\n", sumN);
printf("Mean μ = %.2f\n", mean);
printf("Median = %.1f\n", median);
if(umaxcnt > 1) {
printf("Mode(s) = %s (%d occurrences ea)\n", mode,umaxcnt);}
if(umaxcnt < 2) {
printf("Mode(s) = na (no repeating values)\n");}
printf("Std Dev σ = %.4f\n", stddev);
printf("Variance σ^2 = %.4f\n", sqrdiffmean/N);
printf("Mid Range = %.1f\n", (max + min)/2);
printf("Quartiles");
if(N > 3) {printf(" Q1 = %.1f\n", Q1);}
if(N < 4) {printf(" Q1 = na\n");}
printf(" Q2 = %.1f (median)\n", Q2);
if(N > 3) {printf(" Q3 = %.1f\n", Q3);}
if(N < 4) {printf(" Q3 = na\n");}
printf("IQR = %.1f (interquartile range)\n", IQR);
if(N > 3) {printf("Outliers = %s\n", outliers);}
if(N < 4) {printf("Outliers = na\n");}
printf("Sum Squares SS = %.2f\n", sqrdiffmean);
printf("MAD = %.4f (mean absolute deviation)\n",
meanabsdev / N);
printf("Root Mean Sqr = %.4f\n", sqrt(rootmeansqr / N));
printf("Std Error Mean = %.4f\n", stddev / sqrt(N));
printf("Skewness γ1 = %.4f\n", cubediffmean / (N *
pow(sqrt(sqrdiffmean/N),3)));
printf("Kurtosis β2 = %.4f\n", kurtosis);
printf("Kurtosis Excess α4 = %.4f\n", kurtosis - 3);
printf("CV = %.6f (coefficient of variation\n",
sqrt(sqrdiffmean/N) / mean);
printf("RSD = %.4f%% (relative std deviation)\n", 100 *
(sqrt(sqrdiffmean/N) / mean));
printf("--------------------------------------------------------------\n");
printf("Check results against\n");
printf("https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php");
printf("\n\n");

return(0);
}

=====================================================================================

Post by David Brown
Non-static local variables without initialisers have "indeterminate"
value if there is no initialiser. Trying to use these "indeterminate"
values is undefined behaviour - you have absolutely no control over what
might happen. Any particular behaviour you see is done to luck from the
rest of the code and what happened to be in memory at the time.

In 2024 that's surprising. I can't be the only one to forget to
initialize a char[] variable.

Post by David Brown
There is no automatic initialisation of non-static local variables,
because that would often be inefficient.

It would've saved me half an hour of frustration.

Now I'm getting 'stack smashing detected' errors (after the program runs
correctly) when using datasets of consecutive numbers.

hmmmm 2 issues in a row using consecutives - that's a clue!

Post by David Brown
The best way to avoid errors
like yours, IMHO, is not to declare such variables until you have data
to put in them - thus you always have a sensible initialiser of real
data. Occasionally that is not practical, but it works in most cases.

Data is definitely going in them: either the value 'none' or a list of
the outliers and some text.

Post by David Brown
For a data array, zero initialisation is common. Typically you do this
int xs[100] = { 0 };
That puts the explicit 0 in the first element of xs, and then the rest
of the array is cleared with zeros.
I recommend never using "char" as a type unless you really mean a > character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to be
numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.

I did mean characters, limited to: 0-9a-zA-Z()

I think I'm using the char variable correctly.
sprintf(tempchar,"%d ",outlier);
strcat(char,tempchar);

Post by David Brown
A major lesson here is to learn how to use your tools. C is not a
forgiving language. Make use of all the help your tools can give you -
enable warnings here. "gcc -Wall" enables a range of common warnings
with few false positives in normal well-written code, including ones
that check for attempts to read uninitialised data.

I always use -Wall, and I was using it here.

"-Wextra" enables a

Post by David Brown
slew of extra warnings. Some of these will annoy people and trigger on
code they find reasonable, while most are good choices for a lot of code
- but personal preference varies significantly. And remember to enable
optimisation, since it makes the static checking more powerful.

Just did this:
gcc -Wall -Wextra -O3 mmv2.c -o mmv2 -lm

and no warnings or errors at all.

But: it now aborts near the front when using consecutive data points
(but not randoms).

*** buffer overflow detected ***: terminated
Aborted

I'm actually happy about that. I should be able to find and fix it.

Post by David Brown
If you /really/ want gcc to zero out such local data automatically, use
"-ftrivial-auto-var-init=zero". But it is much better to use warnings
and write correct code - options like that one are an addition to
well-checked code for paranoid software in security-critical contexts.

Great answer! I can always count on D Brown for excellent advice.
Thank you.

Ike Naar

2024-06-13 07:25:58 UTC

Post by DFS
//no outliers
if ( min > lo && max < hi) {

The condition for 'no outliers' is not the complement of
the condition for 'at least one outlier' below.

Post by DFS
strcat(outliers,"none (using IQR * 1.5 rule)");
}
//at least one outlier
if ( min < lo || max > hi) {
for(i = 0; i < N; i++) {
double val = (double)nums[i];
if(val < lo || val > hi) {
sprintf(temp,"%.0f ",val);
temp[strlen(temp)] = '\0';

This is unnecessary;
sprintf terminates the generated string with a null character.

Post by DFS
strcat(outliers,temp);
}
}
strcat(outliers," (using IQR * 1.5 rule)");
}

DFS

2024-06-13 15:13:04 UTC

Post by Ike Naar

Post by DFS
//no outliers
if ( min > lo && max < hi) {

The condition for 'no outliers' is not the complement of
the condition for 'at least one outlier' below.

You're saying some outliers will not be flagged?

Post by Ike Naar

Post by DFS
strcat(outliers,"none (using IQR * 1.5 rule)");
}
//at least one outlier
if ( min < lo || max > hi) {
for(i = 0; i < N; i++) {
double val = (double)nums[i];
if(val < lo || val > hi) {
sprintf(temp,"%.0f ",val);
temp[strlen(temp)] = '\0';

This is unnecessary;
sprintf terminates the generated string with a null character.

Thanks.

Post by Ike Naar

Post by DFS
strcat(outliers,temp);
}
}
strcat(outliers," (using IQR * 1.5 rule)");
}

Scott Lurndal

2024-06-13 15:40:29 UTC

Post by Ike Naar

Post by DFS
temp[strlen(temp)] = '\0';

This is unnecessary;
sprintf terminates the generated string with a null character.

Thanks.

Most programmers should consider sprintf to be deprecated and
should never used it. snprintf is safer and more capable.

Lew Pitcher

2024-06-13 15:49:46 UTC

Post by Ike Naar

Post by DFS
//no outliers
if ( min > lo && max < hi) {

The condition for 'no outliers' is not the complement of
the condition for 'at least one outlier' below.

You're saying some outliers will not be flagged?

[1] How does the above statement evaluate when (min == low) and (max == hi)?

Post by Ike Naar

Post by DFS
strcat(outliers,"none (using IQR * 1.5 rule)");
}
//at least one outlier
if ( min < lo || max > hi) {

[2] How does the above statement evaluate when (min == low) and (max == hi)?

[3] Given the answers to questions 1 and 2, are there any values that
satisfy /both/ the "no outliers" and "at least one outlier" conditions?
Are there any values that satisfy /neither/ conditions?

[snip]

HTH

--
Lew Pitcher
"In Skills We Trust"

DFS

2024-06-13 17:05:43 UTC

Post by Lew Pitcher

Post by Ike Naar

Post by DFS
//no outliers
if ( min > lo && max < hi) {

The condition for 'no outliers' is not the complement of
the condition for 'at least one outlier' below.

You're saying some outliers will not be flagged?

[1] How does the above statement evaluate when (min == low) and (max == hi)?

Post by Ike Naar

Post by DFS
//at least one outlier
if ( min < lo || max > hi) {

[2] How does the above statement evaluate when (min == low) and (max == hi)?
[3] Given the answers to questions 1 and 2, are there any values that
satisfy /both/ the "no outliers" and "at least one outlier" conditions?
Are there any values that satisfy /neither/ conditions?
[snip]
HTH

It does help. The original code won't miss any outliers, but it also
won't notify you there were none in the exceedingly rare case that the
bounds of the dataset exactly match the bounds of the outlier rule.

No outliers test:
Orig : if (min > lo && max < hi)
Fixed: if (min >= lo && max <= hi)

At least one outlier test:
Orig: if (min < lo || max > hi) {
No fix necessary

Thanks Lew.

David Brown

2024-06-13 13:15:55 UTC

Post by David Brown

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all
values > 40 are flagged as outliers. Up to 40, no problem. Random
numbers dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers
(using the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data
point 41+ only when the dataset is consecutive numbers?
Also, why doesn't gcc just do you a solid and initialize to "" for you?

It is /really/ difficult to know exactly what your problem is without
seeing your C code! There may be other problems that you haven't seen yet.

The outlier section starts on line 169
=====================================================================================

<snip>

Apart from the initialisation issue, I would suggest you re-consider the
way you add strings to the "outliers" buffer. If there are two many of
them, it will overflow - there's nothing to stop you putting more than
200 characters into it. I would recommend dropping the "temp" variable
and instead keep track of a pointer to the terminated null character of
your current "outliers" string. Use "snprintf" to "print" directly into
the string, rather than going via "temp", and use the return value of
the "snprintf" to update your end pointer. You will easily be able to
avoid the risk of overrun, while also being slightly more efficient too.

The line:

outliers[strlen(outliers)] = '\0';

is completely useless. "strlen" starts at the beginning of "outliers",
and counts along until it finds a null character - thus either
"outliers[strlen(outliers)]" is already equal to '\0', or your attempt
at calculating "strlen" with an overrun buffer will lead to more
undefined behaviour.

Post by David Brown
Non-static local variables without initialisers have "indeterminate"
value if there is no initialiser. Trying to use these "indeterminate"
values is undefined behaviour - you have absolutely no control over
what might happen. Any particular behaviour you see is done to luck
from the rest of the code and what happened to be in memory at the time.

In 2024 that's surprising. I can't be the only one to forget to
initialize a char[] variable.

You are not - attempting to use an uninitialised variable is a common
error. That is why C compilers provide warnings about this kind of
thing, along with run-time tools like the sanitizers Ben recommended, to
help find such mistakes. But compiler vendors can't force people to use
such tools and warning flags, nor can the tools find /all/ cases of
errors. At some point, programmers have to take responsibility for
knowing the language they are using, and writing their code correctly.
Good tools and good use of those tools is an aid to careful coding, not
an alternative to it.

Post by David Brown
There is no automatic initialisation of non-static local variables,
because that would often be inefficient.

It would've saved me half an hour of frustration.

And the things you have learned as a result - from your own debugging,
and the threads here - will save you many more hours of frustration in
the future.

There are languages that focus on ease of use and do all the management
of things like strings and buffers, and prevent users from mistakes like
this, at the cost of slower run-times. There are languages that do very
little automatically for the programmer and have absolutely minimal
overheads, for maximal efficiency. C is the later kind of language.

Remember, while you might see automatic initialisation of local
variables as a negligible overhead, other people might not - I've worked
on C code for microcontrollers where a wasted processor cycle or two is
too much. If your code does not care about such efficiencies, then you
have to question whether C is the right language in the first place. I
believe most modern code that is written in C would be better if it were
written in other higher level languages (precisely because a half hour
of /your/ time is usually more valuable than a few microseconds of your
computer's time).

On the subject of initialisation, I strongly suggest that you do /not/
get in the habit of always initialising your variables to 0 when you
define them. Do that only if 0 is the real, appropriate starting value.
Prefer to avoid declaring the variable at all until you need it, then
define it with its initial value (and consider making it "const" to
reduce the risk of other coding errors). If the structure of the code
requires you to define the variable before you have a value for it,
prefer to leave it without an initial value. Then compiler warnings
have a much better chance of spotting mistakes.

Post by DFS
Now I'm getting 'stack smashing detected' errors (after the program runs
correctly) when using datasets of consecutive numbers.

I think Ben found that buffer overrun for you, and showed you how to
find it yourself in the future.

Post by DFS
hmmmm 2 issues in a row using consecutives - that's a clue!

Post by David Brown
The best way to avoid errors like yours, IMHO, is not to declare such
variables until you have data to put in them - thus you always have a
sensible initialiser of real data. Occasionally that is not
practical, but it works in most cases.

Data is definitely going in them: either the value 'none' or a list of
the outliers and some text.

Now that I have your source code, I can see the error is the way you put
data in - strcat() reads the existing data, it does not just write data.

Post by David Brown
For a data array, zero initialisation is common. Typically you do
int xs[100] = { 0 };
That puts the explicit 0 in the first element of xs, and then the rest
of the array is cleared with zeros.
I recommend never using "char" as a type unless you really mean a >
character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to
be numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.

I did mean characters, limited to: 0-9a-zA-Z()

OK.

Post by DFS
I think I'm using the char variable correctly.
sprintf(tempchar,"%d ",outlier);
strcat(char,tempchar);

Yes. Without your source code, I could only guess.

But see earlier in this post for a suggestion to improve your use of the
variable.

Post by David Brown
A major lesson here is to learn how to use your tools. C is not a
forgiving language. Make use of all the help your tools can give you
- enable warnings here. "gcc -Wall" enables a range of common
warnings with few false positives in normal well-written code,
including ones that check for attempts to read uninitialised data.

I always use -Wall, and I was using it here.

Good. Unfortunately, good though gcc is, it is not perfect. Improving
warnings is a continuous endeavour for the gcc developers, but they
usually have to err on the side of avoiding false positives.

Post by DFS
"-Wextra" enables a

Post by David Brown
slew of extra warnings. Some of these will annoy people and trigger
on code they find reasonable, while most are good choices for a lot of
code - but personal preference varies significantly. And remember to
enable optimisation, since it makes the static checking more powerful.

gcc -Wall -Wextra -O3 mmv2.c -o mmv2 -lm

"-O3" is rarely much use - stick to "-O2" for normal use. The extra
optimisations enabled by "-O3" help in some code, but work worse on
other code due to the increased size, so they should be used with care.
Certainly "-O3" is rarely worth it unless you are also using a "-march="
flag (such as "-fmarch=native") to tune for a particular processor and
enable stuff like vectorisation. Getting the fastest code is more of an
art than a science!

Post by DFS
and no warnings or errors at all.
But: it now aborts near the front when using consecutive data points
(but not randoms).
*** buffer overflow detected ***: terminated
Aborted
I'm actually happy about that. I should be able to find and fix it.

Post by David Brown
If you /really/ want gcc to zero out such local data automatically,
use "-ftrivial-auto-var-init=zero". But it is much better to use
warnings and write correct code - options like that one are an
addition to well-checked code for paranoid software in
security-critical contexts.

Great answer! I can always count on D Brown for excellent advice.
Thank you.

I try :-)

You get the best results by combing the advice from a variety of people
here, along with your own experimentations.

Keith Thompson

2024-06-13 23:47:30 UTC

David Brown <***@hesbynett.no> writes:
[...]

Post by David Brown
Certainly "-O3" is rarely worth it unless you are also using a
"-march=" flag (such as "-fmarch=native") to tune for a particular
processor and enable stuff like vectorisation. Getting the fastest
code is more of an art than a science!

Typo: it's "-march=native".

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

David Brown

2024-06-14 17:13:42 UTC

Post by Keith Thompson
[...]

Post by David Brown
Certainly "-O3" is rarely worth it unless you are also using a
"-march=" flag (such as "-fmarch=native") to tune for a particular
processor and enable stuff like vectorisation. Getting the fastest
code is more of an art than a science!

Typo: it's "-march=native".

Thanks.

Janis Papanagnou

2024-06-12 21:38:55 UTC

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?
Also, why doesn't gcc just do you a solid and initialize to "" for you?

Yeah, I had a similar problem like you; I had a declaration

char answer[100];

and was surprised that it wasn't initialized with "42".

Seriously; why do you expect [in C] a declaration to initialize that
stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not
all 100 elements set to '\0'? (Someone else might want to access the
element 'answer[99]'.) And should we pay for initializing 1000000000
characters in case one declares an appropriate huge array?

Janis

Keith Thompson

2024-06-12 22:22:26 UTC

[...]

Post by Janis Papanagnou

Post by DFS
before: char outliers[100];
after : char outliers[100] = "";

[...]

Post by Janis Papanagnou
Seriously; why do you expect [in C] a declaration to initialize that
stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not
all 100 elements set to '\0'? (Someone else might want to access the
element 'answer[99]'.) And should we pay for initializing 1000000000
characters in case one declares an appropriate huge array?

This:
char outliers[100] = "";
initializes all 100 elements to zero. So does this:
char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

If you want to set an array's 0th element to 0 and not waste time
initializing the rest, you can assign it separately:
char outliers[100];
outliers[0] = '\0';
or
char outliers[100];
strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

DFS

2024-06-12 22:34:22 UTC

Post by DFS
[...]

Post by Janis Papanagnou

Post by DFS
before: char outliers[100];
after : char outliers[100] = "";

[...]

Post by Janis Papanagnou
Seriously; why do you expect [in C] a declaration to initialize that
stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not
all 100 elements set to '\0'? (Someone else might want to access the
element 'answer[99]'.) And should we pay for initializing 1000000000
characters in case one declares an appropriate huge array?

char outliers[100] = "";
char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.
If you want to set an array's 0th element to 0 and not waste time
char outliers[100];
outliers[0] = '\0';
or
char outliers[100];
strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

Thanks. I'll have to remember these things. I like to use char arrays.

The problem is I don't use C very often, so I don't develop muscle memory.

David Brown

2024-06-13 13:21:54 UTC

[...]

Post by Janis Papanagnou

Post by DFS
before: char outliers[100];
after : char outliers[100] = "";

[...]

Post by Janis Papanagnou
Seriously; why do you expect [in C] a declaration to initialize that
stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not
all 100 elements set to '\0'? (Someone else might want to access the
element 'answer[99]'.) And should we pay for initializing 1000000000
characters in case one declares an appropriate huge array?

char outliers[100] = "";
char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

Yes. It's good to point that out, since people might assume that using
a string literal here only initialises the bit covered by that string
literal.

(In C23 you can also write "char outliers[100] = {};" to get all zeros.)

If you want to set an array's 0th element to 0 and not waste time
     char outliers[100];
     outliers[0] = '\0';
or
     char outliers[100];
     strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

A good compiler will generate the same code for both cases - strcpy() is
often inlined for such uses.

Thanks. I'll have to remember these things. I like to use char arrays.
The problem is I don't use C very often, so I don't develop muscle memory.

What programming language do you usually use? And why are you writing
in C instead? (Or do you simply not do much programming?)

DFS

2024-06-13 14:38:12 UTC

[...]

Post by Janis Papanagnou

Post by DFS
before: char outliers[100];
after : char outliers[100] = "";

[...]

Post by Janis Papanagnou
Seriously; why do you expect [in C] a declaration to initialize that
stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not
all 100 elements set to '\0'? (Someone else might want to access the
element 'answer[99]'.) And should we pay for initializing 1000000000
characters in case one declares an appropriate huge array?

char outliers[100] = "";
char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

Yes. It's good to point that out, since people might assume that using
a string literal here only initialises the bit covered by that string
literal.
(In C23 you can also write "char outliers[100] = {};" to get all zeros.)

If you want to set an array's 0th element to 0 and not waste time
     char outliers[100];
     outliers[0] = '\0';
or
     char outliers[100];
     strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

A good compiler will generate the same code for both cases - strcpy() is
often inlined for such uses.

Thanks. I'll have to remember these things. I like to use char arrays.
The problem is I don't use C very often, so I don't develop muscle memory.

What programming language do you usually use? And why are you writing
in C instead? (Or do you simply not do much programming?)

I write a little code every few days. Mostly python.

I like C for it's blazing speed. Very addicting. And it's much more
challenging/frustrating than python.

I coded a subset (8 stat measures) of this C program 3.5 years ago, and
recently decided to finish duplicating all 23 stats shown at:

https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

Working on the outliers code, I decided to add an option to generate
data with consecutive numbers. That's when I ran $./dfs 50 -c and
noticed every value above 40 was considered an outlier. And this didn't
change over a bunch of code edits/file saves/compiles.

Understanding how an uninitialized variable caused that persistent issue
is beyond my pay grade.

That's when I whined to clc. Before I even posted, though, I spotted
the uninitialized var (outliers). Later I spotted another one (mode).

One led to 'undefined behavior', the other to 'stack smashing'. Both
only occurred when using consecutive numbers.

But with y'all's help I believe I found and fixed ALL issues. I can
dream anyway.

David Brown

2024-06-14 17:18:35 UTC

[...]

Post by Janis Papanagnou

Post by DFS
before: char outliers[100];
after : char outliers[100] = "";

[...]

Post by Janis Papanagnou
Seriously; why do you expect [in C] a declaration to initialize that
stack object? (There are other languages that do initializations as
the language defines it, but C doesn't; it may help to learn before
programming in any language?) And why do you think that "" would be
an appropriate initialization (i.e. a single '\0' character) and not
all 100 elements set to '\0'? (Someone else might want to access the
element 'answer[99]'.) And should we pay for initializing 1000000000
characters in case one declares an appropriate huge array?

char outliers[100] = "";
char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

Yes. It's good to point that out, since people might assume that
using a string literal here only initialises the bit covered by that
string literal.
(In C23 you can also write "char outliers[100] = {};" to get all zeros.)

If you want to set an array's 0th element to 0 and not waste time
     char outliers[100];
     outliers[0] = '\0';
or
     char outliers[100];
     strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

A good compiler will generate the same code for both cases - strcpy()
is often inlined for such uses.

Thanks. I'll have to remember these things. I like to use char arrays.
The problem is I don't use C very often, so I don't develop muscle memory.

What programming language do you usually use? And why are you writing
in C instead? (Or do you simply not do much programming?)

I write a little code every few days. Mostly python.

Certainly if I wanted to calculate some statistics from small data sets,
I'd go for Python - it would not consider C unless it was for an
embedded system.

I like C for it's blazing speed. Very addicting. And it's much more
challenging/frustrating than python.

With small data sets, Python has blazing speed - /every/ language has
blazing speed. And for large data sets, use numpy on Python and you
/still/ have blazing speeds - a lot faster than anything you would write
in C (because numpy's underlying code is written in C by people who are
much better at writing fast numeric code than you or I).

The only reason to use C for something like is is for the challenge and
fun, which is fair enough.

I coded a subset (8 stat measures) of this C program 3.5 years ago, and
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
Working on the outliers code, I decided to add an option to generate
data with consecutive numbers. That's when I ran $./dfs 50 -c and
noticed every value above 40 was considered an outlier. And this didn't
change over a bunch of code edits/file saves/compiles.
Understanding how an uninitialized variable caused that persistent issue
is beyond my pay grade.

Understanding that you should not read from a variable that has never
been given a value is well within the pay grade of every programmer.
And it's something that every C programmer should understand. (And now
you understand it too!)

That's when I whined to clc. Before I even posted, though, I spotted
the uninitialized var (outliers). Later I spotted another one (mode).
One led to 'undefined behavior', the other to 'stack smashing'. Both
only occurred when using consecutive numbers.
But with y'all's help I believe I found and fixed ALL issues. I can
dream anyway.

Scott Lurndal

2024-06-14 17:36:19 UTC

Post by David Brown

What programming language do you usually use?Â And why are you writing
in C instead?Â (Or do you simply not do much programming?)

I write a little code every few days.Â Mostly python.

Certainly if I wanted to calculate some statistics from small data sets,
I'd go for Python - it would not consider C unless it was for an
embedded system.

I'd likely turn to R instead of Python for that.

David Brown

2024-06-15 20:15:09 UTC

Post by Scott Lurndal

Post by David Brown

What programming language do you usually use? And why are you writing
in C instead? (Or do you simply not do much programming?)

I write a little code every few days. Mostly python.

Certainly if I wanted to calculate some statistics from small data sets,
I'd go for Python - it would not consider C unless it was for an
embedded system.

I'd likely turn to R instead of Python for that.

The only thing I know about R is that it would be a good choice for
statistics code if I knew R. Since I don't know anything more about R,
I'd go for Python :-)

DFS

2024-06-14 23:05:49 UTC

Post by David Brown

I write a little code every few days. Mostly python.

Certainly if I wanted to calculate some statistics from small data sets,
I'd go for Python - it would not consider C unless it was for an
embedded system.

I like C for it's blazing speed. Very addicting. And it's much more
challenging/frustrating than python.

With small data sets, Python has blazing speed - /every/ language has
blazing speed. And for large data sets, use numpy on Python and you
/still/ have blazing speeds - a lot faster than anything you would write
in C (because numpy's underlying code is written in C by people who are
much better at writing fast numeric code than you or I).
The only reason to use C for something like is is for the challenge and
fun, which is fair enough.

It was fun, especially when I got every stat to match the website exactly.

I just now ported that C stats program to python. The original C took
me ~2.5 days to write and test.

The port to python then took about 2 hours.

It mainly consisted of replacing printf with print, removing brackets
{}, changing vars max and min to dmax and dmin, dropping the \n from
printf's, replacing fabs() with abs(), etc.

Line count dropped about 20%.

During conversion, I got a Python error I don't remember seeing in the past:

"TypeError: list indices must be integers or slices, not float"

because division returns a float, and some of the array addressing was
like this: nums[i/2].

My initial fix was this clunk (convert to int()):

# median and quartiles
# quartiles divide sorted dataset into four sections
# Q1 = median of values less than Q2
# Q2 = median of the data set
# Q3 = median of values greater than Q2
if N % 2 == 0:
Q2 = median = (nums[int((N/2)-1)] + nums[int(N/2)]) / 2.0
i = int(N/2)
if i % 2 == 0:
Q1 = (nums[int((i/2)-1)] + nums[int(i/2)]) / 2.0
Q3 = (nums[int(i + ((i-1)/2))] + nums[int(i+(i/2))]) / 2.0
else:
Q1 = nums[int((i-1)/2)]
Q3 = nums[int(i + ((i-1)/2))]

if N % 2 != 0:
Q2 = median = nums[int((N-1)/2)]
i = int((N-1)/2)
if i % 2 == 0:
Q1 = (nums[int((i/2)-1)] + nums[int(i/2)]) / 2.0
Q3 = (nums[int(i + (i/2))] + nums[int(i + (i/2) + 1)]) / 2.0
else:
Q1 = nums[int((i-1)/2)]
Q3 = nums[int(i + ((i+1)/2))]

And then with some substitution:

if N % 2 == 0:
i = int(N/2)
Q2 = median = (nums[i - 1] + nums[i]) / 2.0
x = int(i/2)
y = int((i-1)/2)
if i % 2 == 0:
Q1 = (nums[x - 1] + nums[x]) / 2.0
Q3 = (nums[i + y] + nums[i + x]) / 2.0
else:
Q1 = nums[y]
Q3 = nums[i + y]

if N % 2 != 0:
i = int((N-1)/2)
Q2 = median = nums[i]
x = int(i/2)
y = int((i-1)/2)
z = int((i+1)/2)
if i % 2 == 0:
Q1 = (nums[x - 1] + nums[x]) / 2.0
Q3 = (nums[i + x] + nums[i + x + 1]) / 2.0
else:
Q1 = nums[y]
Q3 = nums[i + z]

How would you do it?

If you have an easy to apply formula for computing the quartiles, let's
hear it!

Keith Thompson

2024-06-15 01:39:50 UTC

DFS <***@dfs.com> writes:
[...]

Post by DFS
"TypeError: list indices must be integers or slices, not float"
because division returns a float, and some of the array addressing was
like this: nums[i/2].

[...]

C's "/" operator yields a result with the type of the operands (after
promotion to a common type).

Python's "/" operator yields a floating-point result. For C-style
integer division, Python uses "//". (Python 2 is more C-like.)

See https://docs.python.org/3/ or comp.lang.python, which is reasonably
active.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

DFS

2024-06-15 03:49:37 UTC

Post by Keith Thompson
[...]

Post by DFS
"TypeError: list indices must be integers or slices, not float"
because division returns a float, and some of the array addressing was
like this: nums[i/2].

[...]
C's "/" operator yields a result with the type of the operands (after
promotion to a common type).
Python's "/" operator yields a floating-point result. For C-style
integer division, Python uses "//". (Python 2 is more C-like.)

I was surprised python did that, since every division used in the array
addressing results in an integer.

After casting i to an int before any array addressing, // works.

Thanks

Keith Thompson

2024-06-15 03:56:41 UTC

Post by Keith Thompson
[...]

Post by DFS
"TypeError: list indices must be integers or slices, not float"
because division returns a float, and some of the array addressing was
like this: nums[i/2].

[...]
C's "/" operator yields a result with the type of the operands (after
promotion to a common type).
Python's "/" operator yields a floating-point result. For C-style
integer division, Python uses "//". (Python 2 is more C-like.)

I was surprised python did that, since every division used in the
array addressing results in an integer.
After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick, as long
as i always has an int value (note Python's dynamic typing). If i
is acquiring a float value, that's probably a bug, given the name.

But if you want help with your Python code, comp.lang.python is the
place to ask.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

DFS

2024-06-15 04:45:24 UTC

Post by Keith Thompson

Post by Keith Thompson
[...]

Post by DFS
"TypeError: list indices must be integers or slices, not float"
because division returns a float, and some of the array addressing was
like this: nums[i/2].

[...]
C's "/" operator yields a result with the type of the operands (after
promotion to a common type).
Python's "/" operator yields a floating-point result. For C-style
integer division, Python uses "//". (Python 2 is more C-like.)

I was surprised python did that, since every division used in the
array addressing results in an integer.
After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long> as i always has an int value (note Python's dynamic typing).

If i

Post by Keith Thompson
is acquiring a float value, that's probably a bug, given the name.

I spotted the issue. Just prior to using i for array addressing I said:
i = N/2.

The fix is set i = int(N/2)

Post by Keith Thompson
But if you want help with your Python code, comp.lang.python is the
place to ask.

Thanks for your help, but David Brown is a Python developer and I'll ask
him python questions here whenever I care to.

In the recent past you were involved in discussions on perl, Fortran and
awk, among other off-topics.

Rules for thee but not for me?

Janis Papanagnou

2024-06-15 05:03:16 UTC

Post by Keith Thompson

Post by DFS
After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long as i always has an int value (note Python's dynamic typing).
If i is acquiring a float value, that's probably a bug, given the name.

i = N/2.
The fix is set i = int(N/2)

Given what Keith suggested, and assuming N is an integer, wouldn't it
be more sensible to use the int division operator '//' and just write
i = N // 2 ? I mean, why do a float division on integer operands and
then again coerce the result to int again?

Janis

DFS

2024-06-15 11:39:40 UTC

Post by Janis Papanagnou

Post by Keith Thompson

Post by DFS
After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long as i always has an int value (note Python's dynamic typing).
If i is acquiring a float value, that's probably a bug, given the name.

i = N/2.
The fix is set i = int(N/2)

Given what Keith suggested, and assuming N is an integer, wouldn't it
be more sensible to use the int division operator '//' and just write
i = N // 2 ? I mean, why do a float division on integer operands and
then again coerce the result to int again?

Python bytecode
$ python3 -m dis file.py

i = N//2
1068 LOAD_NAME 12 (N)
1070 LOAD_CONST 10 (2)
1072 BINARY_FLOOR_DIVIDE
1074 STORE_NAME 10 (i)

i = int(N/2)
1068 LOAD_NAME 11 (int)
1070 LOAD_NAME 12 (N)
1072 LOAD_CONST 10 (2)
1074 BINARY_TRUE_DIVIDE
1076 CALL_FUNCTION 1
1078 STORE_NAME 10 (i)

Fewer ops is better, so I'll go with your suggestion. Good catch.

James Kuyper

2024-06-15 05:05:16 UTC

...

Post by Keith Thompson
I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long> as i always has an int value (note Python's dynamic typing).

If i

Post by Keith Thompson
is acquiring a float value, that's probably a bug, given the name.

i = N/2.
The fix is set i = int(N/2)

Alternatively, i = N//2

Post by Keith Thompson
But if you want help with your Python code, comp.lang.python is the
place to ask.

Thanks for your help, but David Brown is a Python developer and I'll ask
him python questions here whenever I care to.

Keep in mind that he's just one Python developer. With all due respect
to David, you're likely to get better answers to your Python questions
by going to a Python forum filled with Python developers.
It's not about "following the rules" - rules are meaningless when
enforcement is impossible, as it is in an unmoderated newsgroup like
this one. It's about getting the best possible answer to your questions.
If you prefer get lower quality answers to your Python questions,
continue asking them in forums where they are off-topic - but why would
you prefer that?

Keith Thompson

2024-06-15 05:20:23 UTC

[...]

Post by Keith Thompson
But if you want help with your Python code, comp.lang.python is the
place to ask.

Thanks for your help, but David Brown is a Python developer and I'll
ask him python questions here whenever I care to.
In the recent past you were involved in discussions on perl, Fortran
and awk, among other off-topics.
Rules for thee but not for me?

comp.lang.python is full of Python experts, and you'll get better help
there than here. That's why different newsgroups exist. And if you
start an *extended* discussion of Python here, people will be annoyed.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

bart

2024-06-15 08:37:06 UTC

Post by Keith Thompson
[...]

Post by DFS
"TypeError: list indices must be integers or slices, not float"
because division returns a float, and some of the array addressing was
like this: nums[i/2].

[...]
C's "/" operator yields a result with the type of the operands (after
promotion to a common type).
Python's "/" operator yields a floating-point result. For C-style
integer division, Python uses "//". (Python 2 is more C-like.)

I was surprised python did that, since every division used in the
array addressing results in an integer.
After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long> as i always has an int value (note Python's dynamic typing).

If i

is acquiring a float value, that's probably a bug, given the name.

i = N/2.
The fix is set i = int(N/2)

But if you want help with your Python code, comp.lang.python is the
place to ask.

Thanks for your help, but David Brown is a Python developer and I'll ask
him python questions here whenever I care to.

Yeah do that. Set up a private corner of comp.lang.c where David Brown
has a sideline answering questions about Python from only one poster.

Nobody else is allowed to answer.

Sounds ridiculous, yes?

Post by DFS
In the recent past you were involved in discussions on perl, Fortran and
awk, among other off-topics.
Rules for thee but not for me?

David Brown

2024-06-15 20:22:09 UTC

Post by Keith Thompson
[...]

Post by DFS
"TypeError: list indices must be integers or slices, not float"
because division returns a float, and some of the array addressing was
like this: nums[i/2].

[...]
C's "/" operator yields a result with the type of the operands (after
promotion to a common type).
Python's "/" operator yields a floating-point result. For C-style
integer division, Python uses "//". (Python 2 is more C-like.)

I was surprised python did that, since every division used in the
array addressing results in an integer.
After casting i to an int before any array addressing, // works.

I'm surprised you needed to convert i to an int. I would think that
just replacing nums[i/2] by nums[i//2] would do the trick,
as long> as i always has an int value (note Python's dynamic typing).

If i

is acquiring a float value, that's probably a bug, given the name.

i = N/2.
The fix is set i = int(N/2)

But if you want help with your Python code, comp.lang.python is the
place to ask.

Thanks for your help, but David Brown is a Python developer and I'll ask
him python questions here whenever I care to.

I consider myself more of a C developer than a Python developer, but I
use Python regularly. I would say that my knowledge of the C language
and standard, while not as deep as some others here, covers a far higher
proportion of the language than my knowledge of Python covers of Python.
But I think you can make good use of Python while knowing a smaller
fraction of the language and library than for C.

Post by DFS
In the recent past you were involved in discussions on perl, Fortran and
awk, among other off-topics.
Rules for thee but not for me?

If occasional questions or discussions about other languages pop up
here, people will often answer them. But for more in-depth discussions
or questions, this is not the newsgroup - comp.lang.python is the place
for Python questions. (You'll also probably get better answers there
than I can give.)

The rules are for everyone, but they are a bit fuzzy. (And different
posters have different levels of fuzziness.)

Janis Papanagnou

2024-06-13 00:19:59 UTC

Post by DFS
char outliers[100] = "";
char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

Oops! This surprised me. (But you are right.) The overhead isn't
[syntactically] obvious, but I'm anyway always setting a single
'\0' character if I want to store strings in a 'char[]' and have
it initialized to an empty string (like below).

Post by DFS
If you want to set an array's 0th element to 0 and not waste time
char outliers[100];
outliers[0] = '\0';
or
char outliers[100];
strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

It wouldn't occur to me to use the strcpy() function, but is the
function call really that expensive in C ?

Janis

David Brown

2024-06-13 13:28:20 UTC

Post by Janis Papanagnou

Post by DFS
char outliers[100] = "";
char outliers[100] = { '\0' };
Any elements or members not specified in an initializer are set to zero.

Oops! This surprised me. (But you are right.) The overhead isn't
[syntactically] obvious, but I'm anyway always setting a single
'\0' character if I want to store strings in a 'char[]' and have
it initialized to an empty string (like below).

Post by DFS
If you want to set an array's 0th element to 0 and not waste time
char outliers[100];
outliers[0] = '\0';
or
char outliers[100];
strcpy(outliers, "");
though the overhead of the function call is likely to outweigh the
cost of initializing the array.

It wouldn't occur to me to use the strcpy() function, but is the
function call really that expensive in C ?

That depends on your toolchain.

If you are using a Windows-based compiler with an external DLL for the C
library and the compiler doesn't handle the strcpy() directly, then it
can be quite a lot of overhead. You have the call to the DLL, which
involves a few steps of indirection. The library strcpy() may be
optimised for handling large strings, and may save and restore a lot of
registers (such as SIMD vector registers).

If you are using a compiler (whatever the platform) that optimises
"strcpy", it will generate identical code to "outliers[0] = '\0';".

Keith Thompson

2024-06-12 21:57:19 UTC

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php

And where is that C program? Do you expect us to help debug it without
seeing your code?

Post by DFS
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all
values > 40 are flagged as outliers. Up to 40, no problem. Random
numbers dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers
(using the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.

Great, you've found the problem and solved it.

See question 1.30 of the comp.lang.c FAQ, <https://www.c-faq.com/>.

Post by DFS
Makes no sense. What could cause the program to go FUBAR at data
point 41+ only when the dataset is consecutive numbers?

Without seeing your code, we can't really tell what's going on, but
assuming that your `outliers` array has automatic storage duration
(i.e., is defined inside a function definition without `static`), its
initial value is indeterminate. It's not random; it's garbage. It
might be consistent from one run of your program to the next, just
because it gets its initial value from whatever is in memory when the
object is allocated. It might even happen to be all zeros, though
that's apparently not what happened in your case.

Post by DFS
Also, why doesn't gcc just do you a solid and initialize to "" for you?

Because you didn't ask it to. The language says that static objects
(ones defined at file scope or with the `static` keyword) are
initialized to zero (that's a bit of an oversimplification; I'm skipping
over what "zero" means in this context). Automatic (local) objects
without initializers start with garbage values.

Initializing a character array with a string literal initializes the
entire array. Characters beyond the length of the string literal are
initialized to 0 ('\0').

Compilers can sometimes warn you when your code depends on indeterminate
values. I'd expect gcc to do so with the right options (try "-Wall").

Initializing automatic objects to all-bits-zero might be useful,
and I think some compilers might offer that as an option. But it
could substantially hurt performance. If you're careful enough to
write code that never depends on the values of uninitialized objects,
zero-initialization is a waste of time. Initializing static objects
to zero is cheap; it's done at program load time.

There are languages that make it difficult, or even impossible, to
read uninitialized objects. C is not such a language. It places a
greater burden on the programmer for the sake of runtime performance.
And yes, C's behavior here is a source of bugs that are often
difficult to diagnose, though compiler warnings can be very helpful
if you enable them.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

bart

2024-06-13 09:43:51 UTC

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?

I assume outliers is inside a function.

What are the 100 values of outliers if you don't initialise it? You can
try printing them out (as individual numbers not as a string) although
just doing that, and adding that extra code, may change the actual values.

However that doesn't matter if it still goes wrong; you may still get a
hint as to why it's behaving as it is.

Post by DFS
Also, why doesn't gcc just do you a solid and initialize to "" for you?

Initialising to "" will zero the entire array. You really want the
compiler to do that work, even when you're going to overwrite it anyway?

Bonita Montero

2024-06-13 09:45:08 UTC

Post by DFS
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
My code compiles and works fine - every stat matches - except for one
anomaly: when using a dataset of consecutive numbers 1 to N, all values

40 are flagged as outliers. Up to 40, no problem. Random numbers

dataset of any size: no problem.
And values 41+ definitely don't meet the conditions for outliers (using
the IQR * 1.5 rule).
Very strange.
before: char outliers[100];
after : char outliers[100] = "";
And the problem went away. Reset it to before and problem came back.
Makes no sense. What could cause the program to go FUBAR at data point
41+ only when the dataset is consecutive numbers?
Also, why doesn't gcc just do you a solid and initialize to "" for you?

Maybe it's because you've too many uninitialized data which doesn't
touch the stack's guard page and the initialized variables are beyond
that and not in the guard page in the uninitialized area of the stack.

80 Replies
7 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

DFS 2024-06-12 20:47:23 UTC

Barry Schwarz 2024-06-12 21:30:26 UTC

DFS 2024-06-12 21:53:35 UTC

Keith Thompson 2024-06-12 22:30:52 UTC

DFS 2024-06-12 23:07:29 UTC

Keith Thompson 2024-06-13 00:33:45 UTC

Malcolm McLean 2024-06-13 04:47:57 UTC

Scott Lurndal 2024-06-13 15:39:25 UTC

Ben Bacarisse 2024-06-13 17:08:03 UTC

bart 2024-06-13 18:01:23 UTC

Malcolm McLean 2024-06-13 18:54:31 UTC

Chris M. Thomasson 2024-06-13 19:34:45 UTC

Malcolm McLean 2024-06-13 23:32:55 UTC

Ben Bacarisse 2024-06-13 23:55:12 UTC

Malcolm McLean 2024-06-14 01:48:51 UTC

Ben Bacarisse 2024-06-14 11:44:13 UTC

Malcolm McLean 2024-06-14 14:30:23 UTC

Richard Harnden 2024-06-14 15:32:58 UTC

Malcolm McLean 2024-06-14 18:06:03 UTC

bart 2024-06-14 18:31:29 UTC

Malcolm McLean 2024-06-14 19:13:33 UTC

Ben Bacarisse 2024-06-14 21:29:00 UTC

Malcolm McLean 2024-06-14 22:35:20 UTC

Ben Bacarisse 2024-06-14 23:14:22 UTC

David Brown 2024-06-15 18:57:49 UTC

Richard Harnden 2024-06-15 19:27:25 UTC

Ben Bacarisse 2024-06-15 22:13:01 UTC

David Brown 2024-06-16 10:53:38 UTC

Malcolm McLean 2024-06-16 13:44:53 UTC

Chris M. Thomasson 2024-06-14 18:49:02 UTC

Ben Bacarisse 2024-06-14 21:32:12 UTC

Chris M. Thomasson 2024-06-15 07:56:51 UTC

Keith Thompson 2024-06-13 22:58:39 UTC

bart 2024-06-14 01:18:45 UTC

David Brown 2024-06-14 17:08:13 UTC

Keith Thompson 2024-06-14 19:34:51 UTC

David Brown 2024-06-15 20:13:24 UTC

Keith Thompson 2024-06-14 20:43:28 UTC

Keith Thompson 2024-06-13 21:47:44 UTC

Malcolm McLean 2024-06-13 23:41:08 UTC

Keith Thompson 2024-06-14 00:09:48 UTC

David Brown 2024-06-12 21:38:45 UTC

Keith Thompson 2024-06-12 22:18:34 UTC

David Brown 2024-06-13 12:42:22 UTC

Keith Thompson 2024-06-13 23:39:52 UTC

Tim Rentsch 2024-06-19 00:23:09 UTC

Keith Thompson 2024-06-19 00:42:48 UTC

Tim Rentsch 2024-06-22 16:28:14 UTC

DFS 2024-06-12 22:29:27 UTC

Ike Naar 2024-06-13 07:25:58 UTC

DFS 2024-06-13 15:13:04 UTC

Scott Lurndal 2024-06-13 15:40:29 UTC

Lew Pitcher 2024-06-13 15:49:46 UTC

DFS 2024-06-13 17:05:43 UTC

David Brown 2024-06-13 13:15:55 UTC

Keith Thompson 2024-06-13 23:47:30 UTC

David Brown 2024-06-14 17:13:42 UTC

Janis Papanagnou 2024-06-12 21:38:55 UTC

Keith Thompson 2024-06-12 22:22:26 UTC

DFS 2024-06-12 22:34:22 UTC

David Brown 2024-06-13 13:21:54 UTC

DFS 2024-06-13 14:38:12 UTC

David Brown 2024-06-14 17:18:35 UTC

Scott Lurndal 2024-06-14 17:36:19 UTC

David Brown 2024-06-15 20:15:09 UTC

DFS 2024-06-14 23:05:49 UTC

Keith Thompson 2024-06-15 01:39:50 UTC

DFS 2024-06-15 03:49:37 UTC

Keith Thompson 2024-06-15 03:56:41 UTC

DFS 2024-06-15 04:45:24 UTC

Janis Papanagnou 2024-06-15 05:03:16 UTC

DFS 2024-06-15 11:39:40 UTC

James Kuyper 2024-06-15 05:05:16 UTC

Keith Thompson 2024-06-15 05:20:23 UTC

bart 2024-06-15 08:37:06 UTC

David Brown 2024-06-15 20:22:09 UTC

Janis Papanagnou 2024-06-13 00:19:59 UTC

David Brown 2024-06-13 13:28:20 UTC

Keith Thompson 2024-06-12 21:57:19 UTC

bart 2024-06-13 09:43:51 UTC

Bonita Montero 2024-06-13 09:45:08 UTC

about - legalese

Loading...