Discussion:
Whaddaya think?
(too old to reply)
DFS
2024-06-15 19:36:22 UTC
Permalink
I want to read numbers in from a file, say:

47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118 245
294 188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144 245 178
108 152 197 125 185 63 272 239 60 242 56 4 235 244 144 69 195 32 4 54 79
193 282 173 267 8 40 241 152 285 119 259 136 15 83 21 78 55 259 137 297
15 141 232 259 285 300 153 16 4 207 95 197 188 267 164 195 7 104 47 291


This code:
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array


Any issues with this method?

Any 'better' way?

Thanks


----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {

int N=0, i=0, j=0;
int *nums;

FILE* datafile = fopen(argv[1], "r");
while(fscanf(datafile, "%d", &j) != EOF){
N++;
}

nums = calloc(N, sizeof(int));
rewind(datafile);
while(fscanf(datafile, "%d", &j) != EOF){
nums[i++] = j;
}
fclose (datafile);
printf("\n");

for(i=0;i<N;i++) {
printf("%d. %d\n", i+1, nums[i]);
}
printf("\n");
free(nums);
return(0);

}
----------------------------------------------------------
Malcolm McLean
2024-06-15 21:33:00 UTC
Permalink
Post by DFS
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118 245
294 188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144 245 178
108 152 197 125 185 63 272 239 60 242 56 4 235 244 144 69 195 32 4 54 79
193 282 173 267 8 40 241 152 285 119 259 136 15 83 21 78 55 259 137 297
15 141 232 259 285 300 153 16 4 207 95 197 188 267 164 195 7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
Any 'better' way?
Thanks
----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
    int N=0, i=0, j=0;
    int *nums;
    FILE* datafile = fopen(argv[1], "r");
    while(fscanf(datafile, "%d", &j) != EOF){
        N++;
    }
    nums = calloc(N, sizeof(int));
    rewind(datafile);
    while(fscanf(datafile, "%d", &j) != EOF){
        nums[i++] = j;
    }
    fclose (datafile);
    printf("\n");
    for(i=0;i<N;i++) {
        printf("%d. %d\n", i+1, nums[i]);
    }
    printf("\n");
    free(nums);
    return(0);
}
----------------------------------------------------------
Some files can't be rewound. Whilst C doesn't have dynamic arrays,
languages tha do suooort then usually build them on top of the C-linked
realloc() function. It's redious, but not hard to do.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
Ben Bacarisse
2024-06-15 22:03:10 UTC
Permalink
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118 245 294
188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144 245 178 108 152
197 125 185 63 272 239 60 242 56 4 235 244 144 69 195 32 4 54 79 193 282
173 267 8 40 241 152 285 119 259 136 15 83 21 78 55 259 137 297 15 141 232
259 285 300 153 16 4 207 95 197 188 267 164 195 7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
There are two issues: (1) you end up with a program that can't be
"piped" to (because the input can't be rewound), and (2) the file might
change between counting and reading. How much either matters will
depend on the context. I like piping to programs so (1) would bother
me.
Any 'better' way?
I'd allocate the array on the fly. It's one of those things that, once
you've done it, becomes a stock bit of coding. In fact, you can write a
simple dynamic array module, and use it again and again.
----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int N=0, i=0, j=0;
int *nums;
FILE* datafile = fopen(argv[1], "r");
while(fscanf(datafile, "%d", &j) != EOF){
It's always better to loop while fscanf succeeds rather than trying to
handle all the errors. You might not care about case where this loop
fails, but it's just better to get into the right habit:

while (fscanf(datafile, "%d", &j) == 1) ...
nums = calloc(N, sizeof(int));
The cost is low, but there's no need to use calloc here as you are going
to assign exactly N locations.
rewind(datafile);
while(fscanf(datafile, "%d", &j) != EOF){
nums[i++] = j;
}
As above, though I'd read into &nums[i] directly.
fclose (datafile);
printf("\n");
for(i=0;i<N;i++) {
printf("%d. %d\n", i+1, nums[i]);
}
printf("\n");
free(nums);
return(0);
Because I have acquired the habit, I'd also check for errors,
particularly on argc, fopen and malloc.
}
----------------------------------------------------------
--
Ben.
bart
2024-06-15 23:22:34 UTC
Permalink
Post by Ben Bacarisse
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118 245 294
188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144 245 178 108 152
197 125 185 63 272 239 60 242 56 4 235 244 144 69 195 32 4 54 79 193 282
173 267 8 40 241 152 285 119 259 136 15 83 21 78 55 259 137 297 15 141 232
259 285 300 153 16 4 207 95 197 188 267 164 195 7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
There are two issues: (1) you end up with a program that can't be
"piped" to (because the input can't be rewound), and (2) the file might
change between counting and reading.
It might change even while you're reading it once.
Ben Bacarisse
2024-06-16 09:30:43 UTC
Permalink
Post by bart
Post by Ben Bacarisse
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118 245 294
188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144 245 178 108 152
197 125 185 63 272 239 60 242 56 4 235 244 144 69 195 32 4 54 79 193 282
173 267 8 40 241 152 285 119 259 136 15 83 21 78 55 259 137 297 15 141 232
259 285 300 153 16 4 207 95 197 188 267 164 195 7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
There are two issues: (1) you end up with a program that can't be
"piped" to (because the input can't be rewound), and (2) the file might
change between counting and reading.
It might change even while you're reading it once.
Your program will see the data it sees -- in that sense the file does
not change. When there are two (or more) phases to the input, your
program has to handle some new error conditions that are logically
avoided by just reading what's available (even if, to some outside
observer, it's "changing").
--
Ben.
DFS
2024-06-16 15:52:30 UTC
Permalink
Post by Ben Bacarisse
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118 245 294
188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144 245 178 108 152
197 125 185 63 272 239 60 242 56 4 235 244 144 69 195 32 4 54 79 193 282
173 267 8 40 241 152 285 119 259 136 15 83 21 78 55 259 137 297 15 141 232
259 285 300 153 16 4 207 95 197 188 267 164 195 7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
There are two issues: (1) you end up with a program that can't be
"piped" to (because the input can't be rewound), and (2) the file might
change between counting and reading. How much either matters will
depend on the context. I like piping to programs so (1) would bother
me.
Any 'better' way?
I'd allocate the array on the fly. It's one of those things that, once
you've done it, becomes a stock bit of coding. In fact, you can write a
simple dynamic array module, and use it again and again.
----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int N=0, i=0, j=0;
int *nums;
FILE* datafile = fopen(argv[1], "r");
while(fscanf(datafile, "%d", &j) != EOF){
It's always better to loop while fscanf succeeds rather than trying to
handle all the errors. You might not care about case where this loop
while (fscanf(datafile, "%d", &j) == 1) ...
nums = calloc(N, sizeof(int));
The cost is low, but there's no need to use calloc here as you are going
to assign exactly N locations.
rewind(datafile);
while(fscanf(datafile, "%d", &j) != EOF){
nums[i++] = j;
}
As above, though I'd read into &nums[i] directly.
fclose (datafile);
printf("\n");
for(i=0;i<N;i++) {
printf("%d. %d\n", i+1, nums[i]);
}
printf("\n");
free(nums);
return(0);
Because I have acquired the habit, I'd also check for errors,
particularly on argc, fopen and malloc.
}
----------------------------------------------------------
Thanks for the tips.

I'm not into error checking on my personal code. But I am into brief
and efficient.

New effort
* dropped 2 variables
* allocate 'on the fly'
* one fscanf thru the file
* 4 less lines of code

----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {

int N=0;
int *nums = malloc(2 * sizeof(int));

FILE* datafile = fopen(argv[1], "r");
while(fscanf(datafile, "%d", &nums[N++]) == 1){
nums = realloc(nums, (N+1) * sizeof(int));
}
fclose (datafile);

N--;
for(int i=0;i<N;i++) {
printf("%d.%d ", i+1, nums[i]);
}
free(nums);

printf("\n");
return 0;

}
----------------------------------------------------------




original 19 lines not incl close brackets
----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {

    int N=0, i=0, j=0;
    int *nums;

    FILE* datafile = fopen(argv[1], "r");
    while(fscanf(datafile, "%d", &j) != EOF){
N++;
    }

    nums = calloc(N, sizeof(int));
    rewind(datafile);
    while(fscanf(datafile, "%d", &j) != EOF){
nums[i++] = j;
    }
    fclose (datafile);
    printf("\n");

    for(i=0;i<N;i++) {
printf("%d. %d\n", i+1, nums[i]);
    }
    printf("\n");
    free(nums);
    return(0);

}
----------------------------------------------------------
Ben Bacarisse
2024-06-16 23:17:18 UTC
Permalink
Post by DFS
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int N=0;
int *nums = malloc(2 * sizeof(int));
FILE* datafile = fopen(argv[1], "r");
while(fscanf(datafile, "%d", &nums[N++]) == 1){
nums = realloc(nums, (N+1) * sizeof(int));
}
fclose (datafile);
N--;
This N-- is a bit "tricksy". Better to increment in the realloc (or the
while body) so it only happens when an int has been read.
--
Ben.
DFS
2024-06-17 12:49:51 UTC
Permalink
Post by Ben Bacarisse
Post by DFS
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int N=0;
int *nums = malloc(2 * sizeof(int));
FILE* datafile = fopen(argv[1], "r");
while(fscanf(datafile, "%d", &nums[N++]) == 1){
nums = realloc(nums, (N+1) * sizeof(int));
}
fclose (datafile);
N--;
This N-- is a bit "tricksy". Better to increment in the realloc (or the
while body) so it only happens when an int has been read.
I don't like it either, but I've already spent about 3 full days on the
whole stats program, and life's too short. N has to be the number of
data points, because it's used throughout the rest of the program.
Keith Thompson
2024-06-15 22:22:06 UTC
Permalink
Post by DFS
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118
245 294 188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144
245 178 108 152 197 125 185 63 272 239 60 242 56 4 235 244 144 69 195
32 4 54 79 193 282 173 267 8 40 241 152 285 119 259 136 15 83 21 78 55
259 137 297 15 141 232 259 285 300 153 16 4 207 95 197 188 267 164 195
7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
Any 'better' way?
Thanks
In a quick test, your code compiles without errors and runs correctly
with your input. I do get a warning about argc being unused, which you
should address.
Post by DFS
----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int N=0, i=0, j=0;
The usual convention is to use all-caps for macro names. Calling your
variable N is not a real problem, but could be slightly confusing.

N is the number of integers in the input. i is an index. j is a value
read from the file. That's not at all clear from the names.

I suggest using longer and more descriptive names in lower case.
"N" could be "count". "i" is fine for an index, but "j" could be
"value".

Consider using size_t rather than int for the count and index. That's
mostly a style point; it's not going to make any practical difference
unless you have at least INT_MAX elements.
Post by DFS
int *nums;
FILE* datafile = fopen(argv[1], "r");
Undefined behavior if no argument was provided, i.e., argc < 1.
Post by DFS
while(fscanf(datafile, "%d", &j) != EOF){
Numeric input with the *scanf functions has undefined behavior if the
scanned value is outside the range of the target type. For example, if
the input contains "99999999999999999999999999999999999999999999999999",
arbitrary bad things could happen. (Most likely it will just store some
incorrect value in j, with no indication that there was an error.)

strtol is trickier to use, but you can detect errors.

fscanf returns EOF on reaching the end of the file or on a read error,
and that's the only condition you check. It returns the number of items
scanned. If the input doesn't contain a string that can be interpreted
as an integer, fscanf will return 0, and you'll be stuck in an infinite
loop. `while (fscanf(...) == 1)` is more robust, but it doesn't
distinguish between a read error and bad data. It's up to you how and
whether to distinguish among different kinds of errors.

Your sample input consists of decimal integers with no sign. Decide
whether you want to hande "-123" or "+123". (fscanf will do so; so will
strtol.)
Post by DFS
N++;
}
nums = calloc(N, sizeof(int));
Consider using `sizeof *nums` rather than `sizeof(int)`. That way you
don't have to change the type in two places if the element type changes.

You'll be updating all the elements of the nums array, so there's not
much point in zeroing it. If you use malloc:

nums = malloc(N * sizeof *nums);

Whether you use calloc() or malloc(), you should check the return
value. If it returns a null pointer, it means the allocation failed.
Aborting the program is probably a good way to handle it.

(There are complications on Linux-based systems which I won't get into
here. Google "OOM killer" and "overcommit" for details.)
Post by DFS
rewind(datafile);
This can fail if the input file is not seekable. For example, on a
Linux-based system you could do something like:
./your_program /dev/stdin < file
Perhaps that's an acceptable restriction, but be aware of it.
Post by DFS
while(fscanf(datafile, "%d", &j) != EOF){
Again, UB for out of range values.

It's not guaranteed that you'll get the same data the second time you
read the file; some other process could modify it. This might not be
worth worrying about.
Post by DFS
nums[i++] = j;
}
fclose (datafile);
printf("\n");
You haven't produced any output yet; why print a blank line? (Of course
you can if you want to.)
Post by DFS
for(i=0;i<N;i++) {
printf("%d. %d\n", i+1, nums[i]);
}
printf("\n");
free(nums);
return(0);
A minor style point: a return statement doesn't require parentheses.
IMHO using parentheses make it look too much like a function call. I'd
write `return 0;`, or more likely I'd just omit it, since falling off
the end of main does an implicit `return 0;` (starting in C99).
Post by DFS
}
A method that doesn't require rescanning the input file is to initially
allocate some reasonable amount of memory, then use realloc() to
expand the array as needed. Doubling the array size is probably
reasonable. It will consume more memory than a single allocation.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
DFS
2024-06-16 16:20:07 UTC
Permalink
Post by Keith Thompson
Post by DFS
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118
245 294 188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144
245 178 108 152 197 125 185 63 272 239 60 242 56 4 235 244 144 69 195
32 4 54 79 193 282 173 267 8 40 241 152 285 119 259 136 15 83 21 78 55
259 137 297 15 141 232 259 285 300 153 16 4 207 95 197 188 267 164 195
7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
Any 'better' way?
Thanks
In a quick test, your code compiles without errors and runs correctly
with your input. I do get a warning about argc being unused, which you
should address.
-Wall doesn't warn about that, but -Wall -Wextra does.

In the bigger program of which this is a part, argc IS used.
Post by Keith Thompson
Post by DFS
----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int N=0, i=0, j=0;
The usual convention is to use all-caps for macro names. Calling your
variable N is not a real problem, but could be slightly confusing.
N is the number of integers in the input. i is an index. j is a value
read from the file. That's not at all clear from the names.
I suggest using longer and more descriptive names in lower case.
"N" could be "count". "i" is fine for an index, but "j" could be
"value".
N is used in statistics, and this is a stats program.
Post by Keith Thompson
Consider using size_t rather than int for the count and index. That's
mostly a style point; it's not going to make any practical difference
unless you have at least INT_MAX elements.
Post by DFS
int *nums;
FILE* datafile = fopen(argv[1], "r");
Undefined behavior if no argument was provided, i.e., argc < 1.
Post by DFS
while(fscanf(datafile, "%d", &j) != EOF){
Numeric input with the *scanf functions has undefined behavior if the
scanned value is outside the range of the target type. For example, if
the input contains "99999999999999999999999999999999999999999999999999",
arbitrary bad things could happen. (Most likely it will just store some
incorrect value in j, with no indication that there was an error.)
strtol is trickier to use, but you can detect errors.
fscanf returns EOF on reaching the end of the file or on a read error,
and that's the only condition you check. It returns the number of items
scanned. If the input doesn't contain a string that can be interpreted
as an integer, fscanf will return 0, and you'll be stuck in an infinite
loop. `while (fscanf(...) == 1)` is more robust, but it doesn't
distinguish between a read error and bad data. It's up to you how and
whether to distinguish among different kinds of errors.
Your sample input consists of decimal integers with no sign. Decide
whether you want to hande "-123" or "+123". (fscanf will do so; so will
strtol.)
A change I might make down the road is to process positive floats. For
now it's just positive ints.
Post by Keith Thompson
Post by DFS
N++;
}
nums = calloc(N, sizeof(int));
Consider using `sizeof *nums` rather than `sizeof(int)`. That way you
don't have to change the type in two places if the element type changes.
You'll be updating all the elements of the nums array, so there's not
nums = malloc(N * sizeof *nums);
Whether you use calloc() or malloc(), you should check the return
value. If it returns a null pointer, it means the allocation failed.
Aborting the program is probably a good way to handle it.
I usually don't do error checking on my personal code.
Post by Keith Thompson
(There are complications on Linux-based systems which I won't get into
here. Google "OOM killer" and "overcommit" for details.)
Post by DFS
rewind(datafile);
This can fail if the input file is not seekable. For example, on a
./your_program /dev/stdin < file
Perhaps that's an acceptable restriction, but be aware of it.
Post by DFS
while(fscanf(datafile, "%d", &j) != EOF){
Again, UB for out of range values.
It's not guaranteed that you'll get the same data the second time you
read the file; some other process could modify it. This might not be
worth worrying about.
I updated the code to do one fscanf() thru the file.

I looked for an easy way to lock it while reading, but as I understand
flock() it only places an 'advisory lock' on the file, and other
processes are still free to modify it.
Post by Keith Thompson
Post by DFS
nums[i++] = j;
}
fclose (datafile);
printf("\n");
You haven't produced any output yet; why print a blank line? (Of course
you can if you want to.)
Post by DFS
for(i=0;i<N;i++) {
printf("%d. %d\n", i+1, nums[i]);
}
printf("\n");
free(nums);
return(0);
A minor style point: a return statement doesn't require parentheses.
IMHO using parentheses make it look too much like a function call. I'd
write `return 0;`, or more likely I'd just omit it, since falling off
the end of main does an implicit `return 0;` (starting in C99).
Can't omit it. It's required by my brain.
Post by Keith Thompson
Post by DFS
}
A method that doesn't require rescanning the input file is to initially
allocate some reasonable amount of memory, then use realloc() to
expand the array as needed. Doubling the array size is probably
reasonable. It will consume more memory than a single allocation.
Done in a way, as you'll see below.


Thanks for the thorough analysis and good tips.


Updated
* dropped 2 variable declarations
* allocate 'on the fly'
* one fscanf thru the file
* 4 less lines of code (not incl brackets)

----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {

    int N=0;
    int *nums = malloc(2 * sizeof(int));

    FILE* datafile = fopen(argv[1], "r");
    while(fscanf(datafile, "%d", &nums[N++]) == 1){
nums = realloc(nums, (N+1) * sizeof(int));
    }
    fclose (datafile);

    N--;
    for(int i=0;i<N;i++) {
printf("%d.%d ", i+1, nums[i]);
    }
    free(nums);

    printf("\n");
    return 0;

}
----------------------------------------------------------
Keith Thompson
2024-06-16 20:54:14 UTC
Permalink
Post by DFS
Post by Keith Thompson
Post by DFS
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118
245 294 188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144
245 178 108 152 197 125 185 63 272 239 60 242 56 4 235 244 144 69 195
32 4 54 79 193 282 173 267 8 40 241 152 285 119 259 136 15 83 21 78 55
259 137 297 15 141 232 259 285 300 153 16 4 207 95 197 188 267 164 195
7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
Any 'better' way?
Thanks
In a quick test, your code compiles without errors and runs
correctly
with your input. I do get a warning about argc being unused, which you
should address.
-Wall doesn't warn about that, but -Wall -Wextra does.
True (at least for gcc).
Post by DFS
In the bigger program of which this is a part, argc IS used.
I suggest that argc *should* have been used in this program.
Post by DFS
Post by Keith Thompson
Post by DFS
----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int N=0, i=0, j=0;
The usual convention is to use all-caps for macro names. Calling your
variable N is not a real problem, but could be slightly confusing.
N is the number of integers in the input. i is an index. j is a value
read from the file. That's not at all clear from the names.
I suggest using longer and more descriptive names in lower case.
"N" could be "count". "i" is fine for an index, but "j" could be
"value".
N is used in statistics, and this is a stats program.
OK, using the names i and j suggests they're used similarly when they're
not.

[...]
Post by DFS
I looked for an easy way to lock it while reading, but as I understand
flock() it only places an 'advisory lock' on the file, and other
processes are still free to modify it.
Locking the file probably isn't worth the effort. If I were doing this
kind of thing, I'd just keep in the back of my mind that if some other
process modifies the file while my program is running, bad things can
happen. For something that's going to be used in production, more care
is appropriate.

[...]
Post by DFS
Post by Keith Thompson
A minor style point: a return statement doesn't require parentheses.
IMHO using parentheses make it look too much like a function call. I'd
write `return 0;`, or more likely I'd just omit it, since falling off
the end of main does an implicit `return 0;` (starting in C99).
Can't omit it. It's required by my brain.
OK, that's harmless -- but be prepared to see the "return 0;" omitted in
code written by other people.
Post by DFS
Post by Keith Thompson
Post by DFS
}
A method that doesn't require rescanning the input file is to
initially
allocate some reasonable amount of memory, then use realloc() to
expand the array as needed. Doubling the array size is probably
reasonable. It will consume more memory than a single allocation.
Done in a way, as you'll see below.
Thanks for the thorough analysis and good tips.
Updated
* dropped 2 variable declarations
* allocate 'on the fly'
* one fscanf thru the file
* 4 less lines of code (not incl brackets)
----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
    int N=0;
    int *nums = malloc(2 * sizeof(int));
Again, I really recommend

int *nums = malloc(2 * sizeof *nums);
Post by DFS
    FILE* datafile = fopen(argv[1], "r");
Undefined behavior if the program is invoked with no arguments.
No check whether fopen succeeded.
Post by DFS
    while(fscanf(datafile, "%d", &nums[N++]) == 1){
I've already discussed the problems of using fscanf for numeric input.
Post by DFS
nums = realloc(nums, (N+1) * sizeof(int));
I'd think about moving the increment of N so that I don't have to use
"N+1" in realloc or add "N--;" at the end. I haven't taken the time to
work out the details, but I think this can be done more cleanly.

Calling realloc() every time could be wasteful. (On my system, realloc
reallocates at about 25 bytes and not again until about 128 kbytes, but
don't rely on that.) A common scheme is to call realloc() to double the
buffer size as needed. This requires keep track of the size of the
buffer and the number of elements used separately. But for a
quick-and-dirty demo, calling realloc() every time isn't too bad.
Post by DFS
    }
    fclose (datafile);
    N--;
    for(int i=0;i<N;i++) {
printf("%d.%d ", i+1, nums[i]);
    }
    free(nums);
    printf("\n");
    return 0;
}
----------------------------------------------------------
You changed the output format. Your original program printed each
number on line line (with extra blank lines at the top and bottom for
some reason); this program prints all the numbers on a single line.
Neither is right or wrong, but consistency is nice -- and very long
lines can cause problems in looking at the output.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
James Kuyper
2024-06-17 02:41:26 UTC
Permalink
...
Post by DFS
Post by Keith Thompson
Post by DFS
return(0);
A minor style point: a return statement doesn't require parentheses.
IMHO using parentheses make it look too much like a function call. I'd
write `return 0;`, or more likely I'd just omit it, since falling off
the end of main does an implicit `return 0;` (starting in C99).
Can't omit it. It's required by my brain.
The parentheses you're putting in are completely unrelated to the use of
parentheses in _Generic(), function calls, compound literals,
sizeof(type name), alignof(), _BitInt(), _Atomic(), typeof(),
typeof_unqual(), alignas(), function declarators, static_assert(), if(),
switch(for(), while(), do ... while(), function-like macro definitions
and invocations or cast expressions. In all of those cases, the
parentheses are part of the grammar.

The parentheses that you put in return(0) serve only for grouping
purpose. They are semantically equivalent to the parentheses in "i =
(0);"; they are just as legal, and just as pointless.

If your brain doesn't immediately understand why what I said above is
true, I recommend retraining it.
Tim Rentsch
2024-06-17 05:45:28 UTC
Permalink
Post by James Kuyper
...
Post by DFS
Post by DFS
return(0);
A minor style point: a return statement doesn't require parentheses.
IMHO using parentheses make it look too much like a function call. I'd
write `return 0;`, or more likely I'd just omit it, since falling off
the end of main does an implicit `return 0;` (starting in C99).
Can't omit it. It's required by my brain.
The parentheses you're putting in are completely unrelated to the use of
parentheses in _Generic(), function calls, compound literals,
sizeof(type name), alignof(), _BitInt(), _Atomic(), typeof(),
typeof_unqual(), alignas(), function declarators, static_assert(), if(),
switch(for(), while(), do ... while(), function-like macro definitions
and invocations or cast expressions. In all of those cases, the
parentheses are part of the grammar. [...]
I'm pretty sure the "it" in "Can't omit it" was meant to refer
to having the return statement, not to the parentheses.
Kaz Kylheku
2024-06-17 07:39:36 UTC
Permalink
Post by James Kuyper
...
Post by DFS
Post by Keith Thompson
Post by DFS
return(0);
A minor style point: a return statement doesn't require parentheses.
IMHO using parentheses make it look too much like a function call. I'd
write `return 0;`, or more likely I'd just omit it, since falling off
the end of main does an implicit `return 0;` (starting in C99).
Can't omit it. It's required by my brain.
I think DFS might mean that they find themselves unable to omit the
unnecessary return 0 statement entirely.

I also hate it; I feel that the implicit return 0 in main is a
misfeature that was added due to caving in to bad programmers.

Making int main(void) { } correct is like legalizing weed.
Potheads are still potheads. Since I'm not one, I write a
return statement in main.
Post by James Kuyper
The parentheses you're putting in are completely unrelated to the use of
parentheses in _Generic(), function calls, compound literals,
sizeof(type name), alignof(), _BitInt(), _Atomic(), typeof(),
typeof_unqual(), alignas(), function declarators, static_assert(), if(),
switch(for(), while(), do ... while(), function-like macro definitions
and invocations or cast expressions. In all of those cases, the
parentheses are part of the grammar.
Speaking of while, the do/while construct does not require parentheses
in order to disambiguate anything, since it has a mandatory semicolon.
Yet, it still has them.

There would be no issue with this grammar:

iteration_statement := 'do' statement 'while' expression ';'

the fragment "'while' expression ';'" is exactly like
"'return' expression ';'".

Obviously, the parentheses are there for consistency with the
top-testing while loop.

It seems that in some people's eyes, the same consistency should extend
to the return statement.

More widespread than that is a practice of always using parentheses
around the argument of sizeof, even if it's an expression and not
a type.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Chris M. Thomasson
2024-06-17 08:22:51 UTC
Permalink
Post by Kaz Kylheku
...
Post by DFS
Post by Keith Thompson
Post by DFS
return(0);
A minor style point: a return statement doesn't require parentheses.
IMHO using parentheses make it look too much like a function call. I'd
write `return 0;`, or more likely I'd just omit it, since falling off
the end of main does an implicit `return 0;` (starting in C99).
Can't omit it. It's required by my brain.
I think DFS might mean that they find themselves unable to omit the
unnecessary return 0 statement entirely.
I also hate it; I feel that the implicit return 0 in main is a
misfeature that was added due to caving in to bad programmers.
Making int main(void) { } correct is like legalizing weed.
Potheads are still potheads. Since I'm not one, I write a
return statement in main.
[...]

Indeed! Toke, Toke...

Peter Tosh - Legalize It (Audio)



lol. :^)
DFS
2024-06-17 13:50:14 UTC
Permalink
Post by Kaz Kylheku
I think DFS might mean that they find themselves
he finds himself
Post by Kaz Kylheku
unable to omit the unnecessary return 0 statement entirely.
yes
Richard Harnden
2024-06-17 15:23:54 UTC
Permalink
Post by DFS
Post by Kaz Kylheku
I think DFS might mean that they find themselves
he finds himself
Post by Kaz Kylheku
unable to omit the unnecessary return 0 statement entirely.
yes
If a function is defined to return an int, then you should return one.

Anything else is just lazy/sloppy. Just because main allows it as a
special case doesn't mean it's a good idea.

I mean: it's really not much extra to type.
David Brown
2024-06-17 16:46:59 UTC
Permalink
Post by Richard Harnden
Post by DFS
Post by Kaz Kylheku
I think DFS might mean that they find themselves
he finds himself
Post by Kaz Kylheku
unable to omit the unnecessary return 0 statement entirely.
yes
If a function is defined to return an int, then you should return one.
Anything else is just lazy/sloppy.  Just because main allows it as a
special case doesn't mean it's a good idea.
I mean: it's really not much extra to type.
There's nothing wrong with ending your "main" with "return 0;". What
Keith said was that it is unnecessary, that using parenthesis in the
form "return(0);" looks like like a function call and is considered poor
style by many people, and that it is useful to know that when "main"
exists without an explicit returned value, it does so as though it had
exited with "return 0;". (And in another branch, he said the return
type for "main" on hosted C systems should be "int".)

These are all true statements.

If you prefer to end "main" with "return 0;", that's absolutely fine -
but it is /not/ lazy or sloppy to omit it.
Kaz Kylheku
2024-06-22 22:14:04 UTC
Permalink
Post by Richard Harnden
Post by DFS
Post by Kaz Kylheku
I think DFS might mean that they find themselves
he finds himself
Post by Kaz Kylheku
unable to omit the unnecessary return 0 statement entirely.
yes
If a function is defined to return an int, then you should return one.
Anything else is just lazy/sloppy. Just because main allows it as a
special case doesn't mean it's a good idea.
I mean: it's really not much extra to type.
The misfeature of missing return being success was, I believe, not
intended to make programs shorter. It was intendeda to correct the
random termination statuses of countless numbers of programs in a single
stroke.

Deliberately relying on this in a new program is like relying ona a
diaper. If you're of intermediate age, you don't do this.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Keith Thompson
2024-06-23 00:10:15 UTC
Permalink
Kaz Kylheku <643-408-***@kylheku.com> writes:
[...]
Post by Kaz Kylheku
The misfeature of missing return being success was, I believe, not
intended to make programs shorter. It was intendeda to correct the
random termination statuses of countless numbers of programs in a single
stroke.
Agreed.
Post by Kaz Kylheku
Deliberately relying on this in a new program is like relying ona a
diaper. If you're of intermediate age, you don't do this.
No, deliberately relying on this in a new program is like relying on a
language feature that's been clearly specified for the past quarter
century.

If I relied only on language features that I *like*, I wouldn't be able
to write much code.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Phil Carmody
2024-06-23 08:20:16 UTC
Permalink
Post by Kaz Kylheku
Post by Richard Harnden
If a function is defined to return an int, then you should return one.
Anything else is just lazy/sloppy. Just because main allows it as a
special case doesn't mean it's a good idea.
I mean: it's really not much extra to type.
The misfeature of missing return being success was, I believe, not
intended to make programs shorter. It was intendeda to correct the
random termination statuses of countless numbers of programs in a single
stroke.
Deliberately relying on this in a new program is like relying ona a
diaper. If you're of intermediate age, you don't do this.
Astronauts do this quite frequently. Some pilots too. And divers. And
crane operators. It's a well-established solution to a known problem.

However, I'd still put the explicit return in for a reason of
literal portability: were I to want to lift that code out into
a separate function called by main(), I'd want it to behave the
same.

Phil
--
We are no longer hunters and nomads. No longer awed and frightened, as we have
gained some understanding of the world in which we live. As such, we can cast
aside childish remnants from the dawn of our civilization.
-- NotSanguine on SoylentNews, after Eugen Weber in /The Western Tradition/
Tim Rentsch
2024-06-18 07:19:28 UTC
Permalink
Post by Kaz Kylheku
Speaking of while, the do/while construct does not require parentheses
in order to disambiguate anything, since it has a mandatory semicolon.
Yet, it still has them.
It has them to allow an extension for a "loop-and-a-half" control
structure:

do statement while ( expression ) statement

and so for example

do c = getchar(); while( c != EOF ) n++;

to count characters on standard input.
Keith Thompson
2024-06-18 10:10:04 UTC
Permalink
Post by Tim Rentsch
Post by Kaz Kylheku
Speaking of while, the do/while construct does not require parentheses
in order to disambiguate anything, since it has a mandatory semicolon.
Yet, it still has them.
It has them to allow an extension for a "loop-and-a-half" control
do statement while ( expression ) statement
and so for example
do c = getchar(); while( c != EOF ) n++;
to count characters on standard input.
Oh? Do you have any evidence that that was the intent? Does any
compiler provide such an extension? (As you know it's a syntax error in
standard C.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Tim Rentsch
2024-06-19 00:24:44 UTC
Permalink
Post by Tim Rentsch
Post by Kaz Kylheku
Speaking of while, the do/while construct does not require parentheses
in order to disambiguate anything, since it has a mandatory semicolon.
Yet, it still has them.
It has them to allow an extension for a "loop-and-a-half" control
do statement while ( expression ) statement
and so for example
do c = getchar(); while( c != EOF ) n++;
to count characters on standard input.
Oh? Do you have any evidence that that was the intent? [...]
I think you're reading something into my remark that it
didn't say.
Keith Thompson
2024-06-19 00:55:36 UTC
Permalink
Post by Tim Rentsch
Post by Tim Rentsch
Post by Kaz Kylheku
Speaking of while, the do/while construct does not require parentheses
in order to disambiguate anything, since it has a mandatory semicolon.
Yet, it still has them.
It has them to allow an extension for a "loop-and-a-half" control
do statement while ( expression ) statement
and so for example
do c = getchar(); while( c != EOF ) n++;
to count characters on standard input.
Oh? Do you have any evidence that that was the intent? [...]
I think you're reading something into my remark that it
didn't say.
Or at least that you didn't mean.

What did you actually meant by "It has them to allow an extension
..."? It seemed very clear to me that you meant to imply an intent,
and I can't think of any other sensible interpretation of your words.

do-while *could* have been specified without required parentheses.
The only reason I can think of that it wasn't is consistency
with other constructs (if, for, while), and in my opinion that's
a perfectly valid reason. If you're seriously suggesting that
there's another reason, I'd be interested in learning about it.
If any existing compiler has the loop-and-a-half extension you
mentioned, or anyone even considered such an extension, I'd be
interested in learning about that as well. (If it was a joke,
just say so and we can drop this.)

Of course you could have explained what you meant in the first place.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Kaz Kylheku
2024-06-19 01:58:57 UTC
Permalink
Post by Keith Thompson
Post by Tim Rentsch
Post by Tim Rentsch
Post by Kaz Kylheku
Speaking of while, the do/while construct does not require parentheses
in order to disambiguate anything, since it has a mandatory semicolon.
Yet, it still has them.
It has them to allow an extension for a "loop-and-a-half" control
do statement while ( expression ) statement
and so for example
do c = getchar(); while( c != EOF ) n++;
to count characters on standard input.
Oh? Do you have any evidence that that was the intent? [...]
I think you're reading something into my remark that it
didn't say.
Or at least that you didn't mean.
FWIW, it would seem that the phrase pattern:

do statement while expression ;

may be compatible with the proposed extension in a way
manageable via LALR(1) parsing.

I don't see difficulties in recursive descent, either.

The near minimal Yacc grammar pasted below produces no conflicts,
and is only slightly contorted. We treat the ')' token as the
lowest prededence operator, and ';' as highest, which eliminates
conflicts in way that we want.

I can explain why; another way is to remove the %nonassoc declarations,
use "yacc -v", and study the confict details y.output file.

It's not clear whether the grammar can be nicely factored into the form
used in the standard, which makes no use of precedence or associativity.
(But would that be a requirement for leaving room for an extension.)

%{

%}

%nonassoc ')'
%token DO WHILE NUM
%left '+'
%nonassoc ';'

%%

while_statement : DO statement WHILE expr ';'
| DO statement WHILE '(' expr ')' statement

statement : ';'
| expr ';'
| '{' expr '}'
| '{' '}'
;

expr : '(' expr ')'
| expr '+' expr
| '+' expr
| NUM
;

%%
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
DFS
2024-06-17 12:50:06 UTC
Permalink
Post by James Kuyper
...
Post by DFS
Post by Keith Thompson
Post by DFS
return(0);
A minor style point: a return statement doesn't require parentheses.
IMHO using parentheses make it look too much like a function call. I'd
write `return 0;`, or more likely I'd just omit it, since falling off
the end of main does an implicit `return 0;` (starting in C99).
Can't omit it. It's required by my brain.
The parentheses you're putting in are completely unrelated to the use of
parentheses in _Generic(), function calls, compound literals,
sizeof(type name), alignof(), _BitInt(), _Atomic(), typeof(),
typeof_unqual(), alignas(), function declarators, static_assert(), if(),
switch(for(), while(), do ... while(), function-like macro definitions
and invocations or cast expressions. In all of those cases, the
parentheses are part of the grammar.
The parentheses that you put in return(0) serve only for grouping
purpose. They are semantically equivalent to the parentheses in "i =
(0);"; they are just as legal, and just as pointless.
If your brain doesn't immediately understand why what I said above is
true, I recommend retraining it.
I meant omit a return altogether.

But looking around, I rarely see return(0). Don't know why it became a
thing for me.

Moving forward, return 0 it is.
Ben Bacarisse
2024-06-17 14:41:17 UTC
Permalink
Post by DFS
Post by James Kuyper
...
Post by DFS
Post by Keith Thompson
Post by DFS
return(0);
A minor style point: a return statement doesn't require parentheses.
IMHO using parentheses make it look too much like a function call. I'd
write `return 0;`, or more likely I'd just omit it, since falling off
the end of main does an implicit `return 0;` (starting in C99).
Can't omit it. It's required by my brain.
The parentheses you're putting in are completely unrelated to the use of
parentheses in _Generic(), function calls, compound literals,
sizeof(type name), alignof(), _BitInt(), _Atomic(), typeof(),
typeof_unqual(), alignas(), function declarators, static_assert(), if(),
switch(for(), while(), do ... while(), function-like macro definitions
and invocations or cast expressions. In all of those cases, the
parentheses are part of the grammar.
The parentheses that you put in return(0) serve only for grouping
purpose. They are semantically equivalent to the parentheses in "i =
(0);"; they are just as legal, and just as pointless.
If your brain doesn't immediately understand why what I said above is
true, I recommend retraining it.
I meant omit a return altogether.
But looking around, I rarely see return(0). Don't know why it became a
thing for me.
Moving forward, return 0 it is.
By the way, you might have retained return (exp); from old C. C
originally required the parentheses, but they got dropped quite early
on. The syntax in K&R (1st edition) does not require them, but almost
all the code example in the book still have them!

I took a while to drop them as I came to C from B where they were always
required so I'd got the habit.
--
Ben.
Janis Papanagnou
2024-06-18 06:12:40 UTC
Permalink
Post by Ben Bacarisse
Post by DFS
Moving forward, return 0 it is.
By the way, you might have retained return (exp); from old C. C
originally required the parentheses, but they got dropped quite early
on. The syntax in K&R (1st edition) does not require them, but almost
all the code example in the book still have them!
This is an interesting observation! (That I can confirm.)

That's probably why originally I also used parenthesis.
I saw the examples but didn't inspect the syntax appendix.

But how did the early compiler behave?
Did they follow the code samples' syntax or the formal syntax?
Post by Ben Bacarisse
I took a while to drop them as I came to C from B where they were always
required so I'd got the habit.
I dropped them as soon as I noticed that it's possible.

Janis
Keith Thompson
2024-06-18 10:07:10 UTC
Permalink
Post by Janis Papanagnou
Post by Ben Bacarisse
Post by DFS
Moving forward, return 0 it is.
By the way, you might have retained return (exp); from old C. C
originally required the parentheses, but they got dropped quite early
on. The syntax in K&R (1st edition) does not require them, but almost
all the code example in the book still have them!
This is an interesting observation! (That I can confirm.)
That's probably why originally I also used parenthesis.
I saw the examples but didn't inspect the syntax appendix.
But how did the early compiler behave?
Did they follow the code samples' syntax or the formal syntax?
The syntax in the 1975 C reference manual required parentheses as part
of the syntax of a return statement:
return ;
return ( expression ) ;
By 1978, when K&R1 was published, the syntax had changed to:
return ;
return expression ;

If you write `return (42);`, even in modern C, it's still syntactically
valid. The parentheses are simply part of the expression, not part of
the syntax of the return statement.
Post by Janis Papanagnou
Post by Ben Bacarisse
I took a while to drop them as I came to C from B where they were always
required so I'd got the habit.
I dropped them as soon as I noticed that it's possible.
My personal preference (which I don't follow entirely consistently) is
to try to avoid making things that aren't function calls look too much
like function calls.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
James Kuyper
2024-06-17 04:30:01 UTC
Permalink
...
Post by DFS
Post by Keith Thompson
Post by DFS
return(0);
A minor style point: a return statement doesn't require parentheses.
IMHO using parentheses make it look too much like a function call. I'd
write `return 0;`, or more likely I'd just omit it, since falling off
the end of main does an implicit `return 0;` (starting in C99).
Can't omit it. It's required by my brain.
What behavior does your brain expect of the following code?:

return(a+b)*2;

If you have any trouble with interpreting that code, you need to retrain
your brain.
Michael S
2024-06-15 22:56:49 UTC
Permalink
On Sat, 15 Jun 2024 15:36:22 -0400
Post by DFS
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118
245 294 188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144
245 178 108 152 197 125 185 63 272 239 60 242 56 4 235 244 144 69 195
32 4 54 79 193 282 173 267 8 40 241 152 285 119 259 136 15 83 21 78
55 259 137 297 15 141 232 259 285 300 153 16 4 207 95 197 188 267 164
195 7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
Any 'better' way?
Thanks
----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int N=0, i=0, j=0;
int *nums;
FILE* datafile = fopen(argv[1], "r");
while(fscanf(datafile, "%d", &j) != EOF){
N++;
}
nums = calloc(N, sizeof(int));
rewind(datafile);
while(fscanf(datafile, "%d", &j) != EOF){
nums[i++] = j;
}
fclose (datafile);
printf("\n");
for(i=0;i<N;i++) {
printf("%d. %d\n", i+1, nums[i]);
}
printf("\n");
free(nums);
return(0);
}
----------------------------------------------------------
If you want to preserve you sanity, never use fscanf().
Lawrence D'Oliveiro
2024-06-16 03:26:30 UTC
Permalink
Post by Michael S
If you want to preserve you sanity, never use fscanf().
Quoth the man page <https://manpages.debian.org/3/scanf.3.en.html>:

It is very difficult to use these functions correctly, and it is
preferable to read entire lines with fgets(3) or getline(3) and
parse them later with sscanf(3) or more specialized functions such
as strtol(3).
Janis Papanagnou
2024-06-16 03:41:12 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Michael S
If you want to preserve you sanity, never use fscanf().
It is very difficult to use these functions correctly, and it is
preferable to read entire lines with fgets(3) or getline(3) and
parse them later with sscanf(3) or more specialized functions such
as strtol(3).
This would be also my first impulse, but you'd have to know
_in advance_ how long the data stream would be; the function
requires an existing buffer. So you'd anyway need a stepwise
input. On the plus side there's maybe a better performance
to read large buffer junks and compose them on demand? But
a problem is the potential cut of the string of a number; it
requires additional clumsy handling. So it might anyway be
better (i.e. much more convenient) to use fscanf() ?

Janis
Keith Thompson
2024-06-16 04:17:57 UTC
Permalink
Post by Janis Papanagnou
Post by Lawrence D'Oliveiro
Post by Michael S
If you want to preserve you sanity, never use fscanf().
It is very difficult to use these functions correctly, and it is
preferable to read entire lines with fgets(3) or getline(3) and
parse them later with sscanf(3) or more specialized functions such
as strtol(3).
This would be also my first impulse, but you'd have to know
_in advance_ how long the data stream would be; the function
requires an existing buffer. So you'd anyway need a stepwise
input. On the plus side there's maybe a better performance
to read large buffer junks and compose them on demand? But
a problem is the potential cut of the string of a number; it
requires additional clumsy handling. So it might anyway be
better (i.e. much more convenient) to use fscanf() ?
I advise never using any of the *scanf() functions for numeric input
unless you have control over what appears on the input stream.
They have undefined behavior if the input value is out of range.
(I consider this a bug in the language.) Typical behavior is to
store some arbitrary integer value with no error indication.

As for reading lines, getline() can allocate a buffer long enough
to hold the line, but it's defined by POSIX, not by ISO C.

For the original problem, where the input consists of digits and
whitespace, you could read a character at a time and accumulate the
value of each number. (You probably want to handle leading signs as
well, which isn't difficult.) That is admittedly reinventing the
wheel, but the existing wheels aren't entirely round. You still
have to dynamically allocate the array of ints, assuming you need
to store all of them rather than processing each value as it's read.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Lawrence D'Oliveiro
2024-06-16 04:41:34 UTC
Permalink
.. but it's defined by POSIX, not by ISO C.
Dang. There’s another reset of the days-since-last-mention-of-POSIX-on-
this-list counter.

Has it ever actually reached 1?
Janis Papanagnou
2024-06-16 04:44:27 UTC
Permalink
Post by Keith Thompson
For the original problem, where the input consists of digits and
whitespace, you could read a character at a time and accumulate the
value of each number. (You probably want to handle leading signs as
well, which isn't difficult.)
Yes. Been there, done that. Sometimes it's good enough to go back
to the roots if higher-level functions are imperfect or quirky.
Post by Keith Thompson
That is admittedly reinventing the
wheel, but the existing wheels aren't entirely round. You still
have to dynamically allocate the array of ints, assuming you need
to store all of them rather than processing each value as it's read.
A subclass of tasks can certainly process data on the fly but for
the general solution there should be a convenient way to handle it.

I still prefer higher-level languages that take the burden from me.

Janis
DFS
2024-06-16 15:09:01 UTC
Permalink
Post by Janis Papanagnou
Post by Keith Thompson
For the original problem, where the input consists of digits and
whitespace, you could read a character at a time and accumulate the
value of each number. (You probably want to handle leading signs as
well, which isn't difficult.)
Yes. Been there, done that. Sometimes it's good enough to go back
to the roots if higher-level functions are imperfect or quirky.
Post by Keith Thompson
That is admittedly reinventing the
wheel, but the existing wheels aren't entirely round. You still
have to dynamically allocate the array of ints, assuming you need
to store all of them rather than processing each value as it's read.
A subclass of tasks can certainly process data on the fly but for
the general solution there should be a convenient way to handle it.
I still prefer higher-level languages that take the burden from me.
nums = []
with open('data.txt','r') as f:
for nbr in f.read().split():
nums.append(int(nbr))
print(*sorted(nums))
David Brown
2024-06-16 15:56:01 UTC
Permalink
Post by DFS
Post by Janis Papanagnou
Post by Keith Thompson
For the original problem, where the input consists of digits and
whitespace, you could read a character at a time and accumulate the
value of each number.  (You probably want to handle leading signs as
well, which isn't difficult.)
Yes. Been there, done that. Sometimes it's good enough to go back
to the roots if higher-level functions are imperfect or quirky.
Post by Keith Thompson
That is admittedly reinventing the
wheel, but the existing wheels aren't entirely round.  You still
have to dynamically allocate the array of ints, assuming you need
to store all of them rather than processing each value as it's read.
A subclass of tasks can certainly process data on the fly but for
the general solution there should be a convenient way to handle it.
I still prefer higher-level languages that take the burden from me.
nums = []
        nums.append(int(nbr))
    print(*sorted(nums))
nums = sorted(map(int, open('data.txt', 'r').read().split()))

But you'll learn more doing it with C :-) And it's nice to see someone
starting on-topic threads here.
bart
2024-06-16 17:14:55 UTC
Permalink
Post by David Brown
Post by DFS
Post by Janis Papanagnou
Post by Keith Thompson
For the original problem, where the input consists of digits and
whitespace, you could read a character at a time and accumulate the
value of each number.  (You probably want to handle leading signs as
well, which isn't difficult.)
Yes. Been there, done that. Sometimes it's good enough to go back
to the roots if higher-level functions are imperfect or quirky.
Post by Keith Thompson
That is admittedly reinventing the
wheel, but the existing wheels aren't entirely round.  You still
have to dynamically allocate the array of ints, assuming you need
to store all of them rather than processing each value as it's read.
A subclass of tasks can certainly process data on the fly but for
the general solution there should be a convenient way to handle it.
I still prefer higher-level languages that take the burden from me.
nums = []
         nums.append(int(nbr))
     print(*sorted(nums))
nums = sorted(map(int, open('data.txt', 'r').read().split()))
OK, a bit of a challenge for my scripting language. I managed this first:

nums := sort(mapv(toval, splitstring(readstrfile("data.txt"))))

It needed a change to 'splitstring' to allow a default separator
consisting of white space of any length. And a one-line helper function
'toval' since the usual candidates, special built-ins, were not valid
for 'mapv'.

It also works like this:

nums := readstrfile("data.txt") -> splitstring -> mapv(toval) -> sort

But only by chance since the 'piped' argument is the last one of
multi-parameter functions, rather than the first.
David Brown
2024-06-17 12:44:47 UTC
Permalink
Post by David Brown
Post by DFS
Post by Janis Papanagnou
Post by Keith Thompson
For the original problem, where the input consists of digits and
whitespace, you could read a character at a time and accumulate the
value of each number.  (You probably want to handle leading signs as
well, which isn't difficult.)
Yes. Been there, done that. Sometimes it's good enough to go back
to the roots if higher-level functions are imperfect or quirky.
Post by Keith Thompson
That is admittedly reinventing the
wheel, but the existing wheels aren't entirely round.  You still
have to dynamically allocate the array of ints, assuming you need
to store all of them rather than processing each value as it's read.
A subclass of tasks can certainly process data on the fly but for
the general solution there should be a convenient way to handle it.
I still prefer higher-level languages that take the burden from me.
nums = []
         nums.append(int(nbr))
     print(*sorted(nums))
nums = sorted(map(int, open('data.txt', 'r').read().split()))
  nums := sort(mapv(toval, splitstring(readstrfile("data.txt"))))
It needed a change to 'splitstring' to allow a default separator
consisting of white space of any length. And a one-line helper function
'toval' since the usual candidates, special built-ins, were not valid
for 'mapv'.
That's nice, but irrelevant - the OP can use the Python version if he
decides writing the C version is not fun any more, but your language is
useless to everyone but you.
  nums := readstrfile("data.txt") -> splitstring -> mapv(toval) -> sort
But only by chance since the 'piped' argument is the last one of
multi-parameter functions, rather than the first.
A piping syntax is, IMHO, also a nice feature (though again the OP will
have no direct use of your language).

Some people might like to do this all with shell pipes:

cat data.txt | xargs -n 1 | sort -n | xargs

That kind of thing can quickly get more awkward as the details change,
such as if the data is separated by commas - by the time you have
figured out the "awk" or "sed" commands needed, you'd be much faster
with Python.
DFS
2024-06-17 13:52:06 UTC
Permalink
Post by David Brown
Post by DFS
nums = []
         nums.append(int(nbr))
     print(*sorted(nums))
nums = sorted(map(int, open('data.txt', 'r').read().split()))
showoff!
Janis Papanagnou
2024-06-16 04:51:33 UTC
Permalink
Post by Janis Papanagnou
Post by Lawrence D'Oliveiro
Post by Michael S
If you want to preserve you sanity, never use fscanf().
It is very difficult to use these functions correctly, and it is
preferable to read entire lines with fgets(3) or getline(3) and
parse them later with sscanf(3) or more specialized functions such
as strtol(3).
This would be also my first impulse, but you'd have to know
_in advance_ how long the data stream would be; the function
requires an existing buffer. So you'd anyway need a stepwise
input. [...]
Would it be sensible to have a malloc()'ed buffer used for the first
fgets() and then subsequent fgets() work on the realloc()'ed part? I
suppose the previously set data in the malloc area would be retained
so that there's no re-composition of cut numbers necessary?

Janis
Keith Thompson
2024-06-16 05:21:59 UTC
Permalink
Post by Janis Papanagnou
Post by Janis Papanagnou
Post by Lawrence D'Oliveiro
Post by Michael S
If you want to preserve you sanity, never use fscanf().
It is very difficult to use these functions correctly, and it is
preferable to read entire lines with fgets(3) or getline(3) and
parse them later with sscanf(3) or more specialized functions such
as strtol(3).
This would be also my first impulse, but you'd have to know
_in advance_ how long the data stream would be; the function
requires an existing buffer. So you'd anyway need a stepwise
input. [...]
Would it be sensible to have a malloc()'ed buffer used for the first
fgets() and then subsequent fgets() work on the realloc()'ed part? I
suppose the previously set data in the malloc area would be retained
so that there's no re-composition of cut numbers necessary?
Sure. "The contents of the new object shall be the same as that of the
old object prior to deallocation, up to the lesser of the new and old
sizes."

Keep in mind that you can't call realloc() on a non-null pointer that
wasn't allocated by an allocation function.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Janis Papanagnou
2024-06-16 05:41:38 UTC
Permalink
Post by Keith Thompson
Post by Janis Papanagnou
Post by Janis Papanagnou
Post by Lawrence D'Oliveiro
Post by Michael S
If you want to preserve you sanity, never use fscanf().
It is very difficult to use these functions correctly, and it is
preferable to read entire lines with fgets(3) or getline(3) and
parse them later with sscanf(3) or more specialized functions such
as strtol(3).
This would be also my first impulse, but you'd have to know
_in advance_ how long the data stream would be; the function
requires an existing buffer. So you'd anyway need a stepwise
input. [...]
Would it be sensible to have a malloc()'ed buffer used for the first
fgets() and then subsequent fgets() work on the realloc()'ed part? I
suppose the previously set data in the malloc area would be retained
so that there's no re-composition of cut numbers necessary?
Sure. "The contents of the new object shall be the same as that of the
old object prior to deallocation, up to the lesser of the new and old
sizes."
Keep in mind that you can't call realloc() on a non-null pointer that
wasn't allocated by an allocation function.
Thanks. - I've just tried it with this ad hoc test code

#include <stdlib.h>
#include <stdio.h>

void main (int argc, char * argv[])
{
int chunk = 10;
int bufsize = chunk+1;
char * buf = malloc(bufsize);
char * anchor = buf;
while (fgets(buf, chunk+1, stdin) != NULL)
if (realloc(anchor, bufsize += chunk) != NULL)
buf += chunk;
puts(anchor);
}

I wonder whether it can be simplified by making malloc() obsolete
and using realloc() in a redesigned loop.

Janis
Keith Thompson
2024-06-16 05:49:11 UTC
Permalink
Post by Janis Papanagnou
Post by Keith Thompson
Post by Janis Papanagnou
Post by Janis Papanagnou
Post by Lawrence D'Oliveiro
Post by Michael S
If you want to preserve you sanity, never use fscanf().
It is very difficult to use these functions correctly, and it is
preferable to read entire lines with fgets(3) or getline(3) and
parse them later with sscanf(3) or more specialized functions such
as strtol(3).
This would be also my first impulse, but you'd have to know
_in advance_ how long the data stream would be; the function
requires an existing buffer. So you'd anyway need a stepwise
input. [...]
Would it be sensible to have a malloc()'ed buffer used for the first
fgets() and then subsequent fgets() work on the realloc()'ed part? I
suppose the previously set data in the malloc area would be retained
so that there's no re-composition of cut numbers necessary?
Sure. "The contents of the new object shall be the same as that of the
old object prior to deallocation, up to the lesser of the new and old
sizes."
Keep in mind that you can't call realloc() on a non-null pointer that
wasn't allocated by an allocation function.
Thanks. - I've just tried it with this ad hoc test code
#include <stdlib.h>
#include <stdio.h>
void main (int argc, char * argv[])
*Ahem* -- int main.
Post by Janis Papanagnou
{
int chunk = 10;
int bufsize = chunk+1;
char * buf = malloc(bufsize);
char * anchor = buf;
while (fgets(buf, chunk+1, stdin) != NULL)
if (realloc(anchor, bufsize += chunk) != NULL)
buf += chunk;
puts(anchor);
}
I wonder whether it can be simplified by making malloc() obsolete
and using realloc() in a redesigned loop.
Yes.

"If ptr is a null pointer, the realloc function behaves like the malloc
function for the specified size."
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Janis Papanagnou
2024-06-16 09:29:05 UTC
Permalink
Post by Keith Thompson
Post by Janis Papanagnou
void main (int argc, char * argv[])
*Ahem* -- int main.
Never sure about whether it was/is correct to 'void'-declare
the return value and/or the [unused] main() arguments. (I'm
still from the early C time when types were even omitted as
function return specification (presuming an implicit int or
no return), as in the K&R book. During the past decades I
tended to declare my intention by writing f(void) instead
of f() and void f() where no results are delivered. K&R at
least seems to say that 'void' can only be declared for the
return type of functions that do not return anything.

As long as my C compiler doesn't mind 'int main (void)' or
'void main (int, char **)' I don't care much for test code.
I'm sure this stance of mine might be considered offensive
in a 'C' NG. - Apologies! :-)

Janis
Malcolm McLean
2024-06-16 15:04:07 UTC
Permalink
Post by Janis Papanagnou
Post by Keith Thompson
Post by Janis Papanagnou
void main (int argc, char * argv[])
*Ahem* -- int main.
Never sure about whether it was/is correct to 'void'-declare
the return value and/or the [unused] main() arguments. (I'm
still from the early C time when types were even omitted as
function return specification (presuming an implicit int or
no return), as in the K&R book. During the past decades I
tended to declare my intention by writing f(void) instead
of f() and void f() where no results are delivered. K&R at
least seems to say that 'void' can only be declared for the
return type of functions that do not return anything.
As long as my C compiler doesn't mind 'int main (void)' or
'void main (int, char **)' I don't care much for test code.
I'm sure this stance of mine might be considered offensive
in a 'C' NG. - Apologies! :-)
Janis
And is a mapping of every input to the empty set a "function" or not? I
think it is but mathematicians might weigh in on that.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
Janis Papanagnou
2024-06-16 15:13:29 UTC
Permalink
Post by Malcolm McLean
And is a mapping of every input to the empty set a "function" or not? I
think it is but mathematicians might weigh in on that.
I'm well aware that there's a semantical distinction of "procedures"
and "functions". And there's differences how languages consider these.
In "C" I've learned that basically "everything is a function", even
if they don't return anything, of if they do but the result ignored.
In other languages these two types are called 'procedure' and '<type>
procedure'. It might indeed be a source for religious disputes as so
many things in IT/CS and elsewhere. I think it's not worth the time.

Janis
Keith Thompson
2024-06-16 20:32:05 UTC
Permalink
Post by Janis Papanagnou
Post by Keith Thompson
Post by Janis Papanagnou
void main (int argc, char * argv[])
*Ahem* -- int main.
Never sure about whether it was/is correct to 'void'-declare
the return value and/or the [unused] main() arguments. (I'm
still from the early C time when types were even omitted as
function return specification (presuming an implicit int or
no return), as in the K&R book. During the past decades I
tended to declare my intention by writing f(void) instead
of f() and void f() where no results are delivered. K&R at
least seems to say that 'void' can only be declared for the
return type of functions that do not return anything.
As long as my C compiler doesn't mind 'int main (void)' or
'void main (int, char **)' I don't care much for test code.
I'm sure this stance of mine might be considered offensive
in a 'C' NG. - Apologies! :-)
No version of C has ever permitted "void main" except when an
implementation documents and permits it. The 1989 ANSI C standard
both introduced the "void" keyword and specified the two permissible
definitions of main, both of which return int. Prior to C99,
defining main without an explicit return type was equivalent to
"int main". Many compilers will permit "void main", and might not
warn about it by default, but it has undefined behavior unless the
implementation documents it as an option. The calling environment will
assume that main returns an int value.

This applies to hosted implementations; in a freestanding
implementation, the program entry point is defined by the implementation
and might not even be called "main".

There is no advantage in writing "void main" rather than "int main".
Since C99, falling off the end of main does an implicit "return 0;".
"int main" is guaranteed to work; "void main" is not.

On my system, if I define "void main" the exit status seen by the shell
is some arbitrary value (I got 41 just now with gcc, 48 with clang, 165
with tcc).

See also questions 11.12a and following in the comp.lang.c FAQ,
<https://www.c-faq.com/>.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Janis Papanagnou
2024-06-17 05:41:01 UTC
Permalink
Post by Keith Thompson
[...] K&R at
least seems to say that 'void' can only be declared for the
return type of functions that do not return anything.
[...]
No version of C has ever permitted "void main" except when an
implementation documents and permits it. [...]
I cannot comment on main() being handled differently than
other C functions. I was just quoting my old copy of K&R.

I don't understand what you mean with "no version of C has
ever permitted", given that my C compiler doesn't complain.

WRT return value to the environment I'd expect any random
or arbitrary value being returned in case that non had been
explicitly specified to be returned.

If I want a defined exit status (which is what I usually
want) I specify 'int main (...)' and provide an explicit
return statement (or exit() call).

Janis
Keith Thompson
2024-06-17 06:20:13 UTC
Permalink
Post by Janis Papanagnou
Post by Keith Thompson
[...] K&R at
least seems to say that 'void' can only be declared for the
return type of functions that do not return anything.
[...]
No version of C has ever permitted "void main" except when an
implementation documents and permits it. [...]
I cannot comment on main() being handled differently than
other C functions. I was just quoting my old copy of K&R.
First or second edition?

But main() *is* handled differently than other functions, and
that's important to understand. It's effectively called by the
environment, which means that your definition has to cooperate
with what the environment expects. What's slightly weird about
it is that it can be defined in (at least) two different ways,
with or without argc and argv.

Similarly, signal handlers and qsort comparison functions are code
that you write that's invoked by the environment or the runtime
library. You don't get to change what type they return and expect
your program to work correctly.

This is all in the "Program startup" subsection of section 5 of any
edition or draft of the C standard, and it hasn't changed much from
C89 (where it's in section 2) up to the latest post-C23 draft.
Post by Janis Papanagnou
I don't understand what you mean with "no version of C has
ever permitted", given that my C compiler doesn't complain.
I mean that no edition of the C standard has ever mentioned "void
main". The only explicitly permitted return type has been int since
the first standard in 1989. In K&R1, there was no void keyword.
(I think the "void" keyword was introduced before 1989, but the
ANSI C standard formalized it.)

"void main" does not require a diagnostic, so perhaps "permitted" was
not the best word. But a conforming compiler can reject a program
that uses "void main" (that's one of the allowed consequences of
undefined behavior).
Post by Janis Papanagnou
WRT return value to the environment I'd expect any random
or arbitrary value being returned in case that non had been
explicitly specified to be returned.
Feel free to expect that. The standard says that, unless the
implementation documents that "void main" is permitted, the behavior
is undefined. Not just the status, the behavior of the program.

It happens that, for most compilers, the actual behavior of "void
main" is relatively harmless -- but why take that risk?
Post by Janis Papanagnou
If I want a defined exit status (which is what I usually
want) I specify 'int main (...)' and provide an explicit
return statement (or exit() call).
Why would you ever not want a defined exit status, given that it's
easier to have one than not to have one? (Since C99 an explicit
return or exit() is optional.) I can't think of any reason *at all*
to use "void main" in C with a hosted implementation. Can you?
(If you don't care about the exit status, you can just write
"int main" and not bother with a return statement or exit() call.
The exit status will be 0, but that's not a problem if you don't
care about it.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Janis Papanagnou
2024-06-17 07:16:00 UTC
Permalink
Post by Keith Thompson
Post by Janis Papanagnou
Post by Keith Thompson
[...] K&R at
least seems to say that 'void' can only be declared for the
return type of functions that do not return anything.
[...]
No version of C has ever permitted "void main" except when an
implementation documents and permits it. [...]
I cannot comment on main() being handled differently than
other C functions. I was just quoting my old copy of K&R.
First or second edition?
It's a translation of a "(c) 1977 Prentice Hall" original, with
no further edition mentioned, so it's probably the 1st edition?
Post by Keith Thompson
But main() *is* handled differently than other functions,
Be assured, I don't object! It was just not mentioned that it's
a special case.
Post by Keith Thompson
and
that's important to understand. It's effectively called by the
environment, which means that your definition has to cooperate
with what the environment expects.
I'm not sure whether my K&R copy addresses that at all. A quick
view and I see only one instance where "main()" is mentioned at
the beginning: main() { printf("hello, world\n"); }
No types here, and no environment aspects mentioned.

Also mind that (other) languages don't need to interact with the
environment. Just recently I noticed that the Algol 68 Genie
does not pass the value of the outmost block to the environment.
Environment questions, interaction with the OS, may not be part
of the language. (I'm not saying anything about the C standard
[that I don't know]. Just a comment in principal.)
Post by Keith Thompson
What's slightly weird about
it is that it can be defined in (at least) two different ways,
with or without argc and argv.
Frankly, I have no idea about the details of evolution of the
C language. The old K&R source I have I had considered a pain;
it provoked more questions than giving answers. And since then
C changed a lot. That's why I stay mostly conservative with C
and if in doubt check things undogmatic just with my compiler.
Post by Keith Thompson
[...]
Post by Janis Papanagnou
If I want a defined exit status (which is what I usually
want) I specify 'int main (...)' and provide an explicit
return statement (or exit() call).
Why would you ever not want a defined exit status, given that it's
easier to have one than not to have one?
Aren't we agreeing here? (The only difference is that you are
formulating in a negated form where I positively said the same.)
Post by Keith Thompson
(Since C99 an explicit
return or exit() is optional.) I can't think of any reason *at all*
to use "void main" in C with a hosted implementation. Can you?
Well, to indicate that there's no status information or that
it's irrelevant. E.g. as was the case in the test fragment I
posted.

In programs I typically write there's a lot things that can
possibly go wrong - and that the program cannot fix -, mostly
(but not exclusively) externalities. So it's typical that I
interrogate return status of functions, map them to a defined
set of return codes, create own codes for data inconsistencies
etc. And this status is relevant and of course returned to the
environment (often accompanied by some textual information on
stderr).
Post by Keith Thompson
(If you don't care about the exit status, you can just write
"int main" and not bother with a return statement or exit() call.
The exit status will be 0, but that's not a problem if you don't
care about it.)
Whatever current C standards - and I'm not sure what ancient
'cc' is on my system and to what standard it complies - say,
if I specify an 'int' return type I also want a 'return' (or
exit()) - consider it as "code hygienics" - even if it's not
necessary according to more recent standards.

Janis
James Kuyper
2024-06-17 13:38:54 UTC
Permalink
...
Post by Janis Papanagnou
Post by Keith Thompson
and
that's important to understand. It's effectively called by the
environment, which means that your definition has to cooperate
with what the environment expects.
I'm not sure whether my K&R copy addresses that at all. A quick
view and I see only one instance where "main()" is mentioned at
the beginning: main() { printf("hello, world\n"); }
No types here, and no environment aspects mentioned.
K&R C did not have function prototypes. main() declared with no
arguments indicates that main takes an unknown number of arguments, of
unspecified type - as such it's compatible with taking either two
arguments, or none. For backwards compatibility, you're still allowed to
declare functions K&R style, but it's been more than 3 decades since it
was a good idea to do so.

...
Post by Janis Papanagnou
Post by Keith Thompson
Post by Janis Papanagnou
If I want a defined exit status (which is what I usually
want) I specify 'int main (...)' and provide an explicit
return statement (or exit() call).
Why would you ever not want a defined exit status, given that it's
easier to have one than not to have one?
Aren't we agreeing here? (The only difference is that you are
formulating in a negated form where I positively said the same.)
You implied, by saying "If I want a defined exit status", that there are
occasions where you don't want a defined exit status - and he's
questioning that. Things that are undefined are seldom useful. If the
exit status is undefined, it might be a failure status. In many
contexts, that would cause no problems, but there's also places where it
would.

...
Post by Janis Papanagnou
Well, to indicate that there's no status information or that
it's irrelevant. E.g. as was the case in the test fragment I
posted.
That's the problem - your "indication that there's no status
information" doesn't achieve the desired effect. Instead, it results in
an unspecified status being returned to the system. If might be a
successful status, or an unsuccessful status. On the systems I use,
scripts that execute programs will often abort if the program returns an
unsuccessful status code. If there's nothing that needs to be brought to
the system's attention, use "return 0;", not "void main()".
Keith Thompson
2024-06-17 23:17:15 UTC
Permalink
James Kuyper <***@alumni.caltech.edu> writes:
[...]
Post by James Kuyper
That's the problem - your "indication that there's no status
information" doesn't achieve the desired effect. Instead, it results in
an unspecified status being returned to the system.
That's the likely effect, but in fact "void main" causes the entire
program's behavior to be undefined (unless it's documented by the
current implementation).
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Janis Papanagnou
2024-06-18 05:09:34 UTC
Permalink
Post by James Kuyper
Post by Janis Papanagnou
Post by Keith Thompson
Post by Janis Papanagnou
If I want a defined exit status (which is what I usually
want) I specify 'int main (...)' and provide an explicit
return statement (or exit() call).
Why would you ever not want a defined exit status, given that it's
easier to have one than not to have one?
Aren't we agreeing here? (The only difference is that you are
formulating in a negated form where I positively said the same.)
You implied, by saying "If I want a defined exit status", that there are
occasions where you don't want a defined exit status
...where I don't _need_ one. Yes.
E.g. in code like main() { printf("hello, world\n"); }
Post by James Kuyper
- and he's
questioning that. Things that are undefined are seldom useful.
I disagree. If things are undefined it _may_ just not matter.
If things matter they should not (ideally never) be undefined.
Post by James Kuyper
If the
exit status is undefined, it might be a failure status. In many
contexts, that would cause no problems, but there's also places where it
would.
Exactly.
Post by James Kuyper
...
Post by Janis Papanagnou
Well, to indicate that there's no status information or that
it's irrelevant. E.g. as was the case in the test fragment I
posted.
That's the problem - your "indication that there's no status
information" doesn't achieve the desired effect. Instead, it results in
an unspecified status being returned to the system. If might be a
successful status, or an unsuccessful status. On the systems I use,
scripts that execute programs will often abort if the program returns an
unsuccessful status code. If there's nothing that needs to be brought to
the system's attention, use "return 0;", not "void main()".
In cases where the return status is a substantial part of the
external specification, yes. Return status is not self purpose!
YMMV.

Janis
James Kuyper
2024-06-18 07:25:59 UTC
Permalink
...
Post by Janis Papanagnou
Post by James Kuyper
You implied, by saying "If I want a defined exit status", that there are
occasions where you don't want a defined exit status
...where I don't _need_ one. Yes.
E.g. in code like main() { printf("hello, world\n"); }
Post by James Kuyper
- and he's
questioning that. Things that are undefined are seldom useful.
I disagree. If things are undefined it _may_ just not matter.
If things matter they should not (ideally never) be undefined.
Post by James Kuyper
If the
exit status is undefined, it might be a failure status. In many
contexts, that would cause no problems, but there's also places where it
would.
Exactly.
I was merely directly addressing your comment by pointing out that the
status returned was unspecified. While technically correct that's like
saying that a nuclear bomb could be used to light a match. It's
unspecified because, as Keith pointed out, the behavior of the entire
program is undefined.
Is there anything that a program could be written to do on your computer
that you would not like it to do? The C standard imposes no restrictions
on the behavior of a program created by translating code that has
undefined behavior, so a fully conforming implementation may legally
translate such code into a program with that behavior. If the
implementation you're using documents what it does with "void main()",
you can rely upon that documentation - but do you know for a fact that
it does document it?
When you deliberately write a program that you know to have undefined
behavior, what you are, in effect, telling the implementation to do is
create an executable that has any behavior it wants to give you. If
that's OK with you, there's no need to write a new program. Any existing
program already has behavior that your such code could legally be
translated to, so you might as well just execute any arbitrary program
that's already been translated, and save yourself some time.
Keith Thompson
2024-06-18 09:57:12 UTC
Permalink
Post by Janis Papanagnou
Post by James Kuyper
Post by Janis Papanagnou
Post by Keith Thompson
Post by Janis Papanagnou
If I want a defined exit status (which is what I usually
want) I specify 'int main (...)' and provide an explicit
return statement (or exit() call).
Why would you ever not want a defined exit status, given that it's
easier to have one than not to have one?
Aren't we agreeing here? (The only difference is that you are
formulating in a negated form where I positively said the same.)
You implied, by saying "If I want a defined exit status", that there are
occasions where you don't want a defined exit status
...where I don't _need_ one. Yes.
E.g. in code like main() { printf("hello, world\n"); }
That's been a syntax error since C99, which dropped the "implicit
int" rule. (Some compilers accept it by default to cater to old
code, if you don't ask them to enforce the current language rules.)
Post by Janis Papanagnou
Post by James Kuyper
- and he's
questioning that. Things that are undefined are seldom useful.
I disagree. If things are undefined it _may_ just not matter.
If things matter they should not (ideally never) be undefined.
But you have to go out of your way to make it undefined, by
violating the rules of the language. Why would you do that?
Sheer stubbornness?

In C90 and earlier, falling off the end of main caused an unspecified
exit value, which seems to be what you want for some reason.
In C99 and later, there is no valid way to do that.

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf>
is a draft of the 2011 ISO C standard. Section 5.1.2.2.1,
"Program startup", covers what we've been discussing about main.
One more thing to know is that if a requirement uses the word
"shall" outside a constraint (such as the "shall" in paragraph 1),
violating that requirement causes undefined behavior.

You've gotten the idea, based on a badly out of date book, that C
gives you a way to leave the exit status unspecified. It doesn't.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson
2024-06-17 23:15:50 UTC
Permalink
Post by Janis Papanagnou
Post by Keith Thompson
Post by Janis Papanagnou
Post by Keith Thompson
[...] K&R at
least seems to say that 'void' can only be declared for the
return type of functions that do not return anything.
[...]
No version of C has ever permitted "void main" except when an
implementation documents and permits it. [...]
I cannot comment on main() being handled differently than
other C functions. I was just quoting my old copy of K&R.
First or second edition?
It's a translation of a "(c) 1977 Prentice Hall" original, with
no further edition mentioned, so it's probably the 1st edition?
K&R1, the first edition, was published in 1978. K&R2, based on the
then-new ANSI C standard, was published in 1988. The 1977 date is
odd, but you clearly have a translation of the first edition.

That book is of great historical interest, but it's *not* a good
source of information of C as it is now. I don't even know where
to find a C compiler for modern systems that supports that version
of the language. Even the second edition is out of date, but it's
a good start for learning modern C.

[...]
Post by Janis Papanagnou
Post by Keith Thompson
What's slightly weird about
it is that it can be defined in (at least) two different ways,
with or without argc and argv.
Frankly, I have no idea about the details of evolution of the
C language. The old K&R source I have I had considered a pain;
it provoked more questions than giving answers. And since then
C changed a lot. That's why I stay mostly conservative with C
and if in doubt check things undogmatic just with my compiler.
You've mentioned several things you have no idea about. Are you
interested in learning?
Post by Janis Papanagnou
Post by Keith Thompson
[...]
Post by Janis Papanagnou
If I want a defined exit status (which is what I usually
want) I specify 'int main (...)' and provide an explicit
return statement (or exit() call).
Why would you ever not want a defined exit status, given that it's
easier to have one than not to have one?
Aren't we agreeing here? (The only difference is that you are
formulating in a negated form where I positively said the same.)
We're not agreeing unless you've changed your mind.

You implied that there are cases where you don't want a defined exit
status. For me, there are no such cases. I would have go out of my way
to avoid having a defined exit status, and if I used your method then
the behavior of my program, not just its exit status, would be
undefined.
Post by Janis Papanagnou
Post by Keith Thompson
(Since C99 an explicit
return or exit() is optional.) I can't think of any reason *at all*
to use "void main" in C with a hosted implementation. Can you?
Well, to indicate that there's no status information or that
it's irrelevant. E.g. as was the case in the test fragment I
posted.
Again, that's not what "void main" means. Unless the implementation
documents that it permits "void main" (Microsoft's C compiler is the
only one I know of that does so, and it's vague about the semantics),
"void main" makes exactly as much sense as "float main".

You can write "int main(void)" and omit the return statement.
You can write "int main(void)" and add "return 0;". You can write
"int main(void)" and add "return rand();" if you really want to.

[...]
Post by Janis Papanagnou
Whatever current C standards - and I'm not sure what ancient
'cc' is on my system and to what standard it complies - say,
Perhaps you should find out what your ancient "cc" does. What OS
are you on? Does "cc --version", "cc -V", or "man cc" give you
any meaningful information?
Post by Janis Papanagnou
if I specify an 'int' return type I also want a 'return' (or
exit()) - consider it as "code hygienics" - even if it's not
necessary according to more recent standards.
That's fine. A return statement or exit() call is unnecessary
in main() due to a special-case rule that was added in 1999 for
compatibility with C++. I don't particularly like that rule myself.
I choose to omit the return statement in small programs, but if
you want to add the "return 0;", I have absolutely no objection.
(I used to do that myself.) It even makes your code more portable
to old compilers that support C90. (tcc claims to support C99,
but it has a bug in this area.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Janis Papanagnou
2024-06-18 06:02:13 UTC
Permalink
Post by Keith Thompson
You've mentioned several things you have no idea about.
(I have mainly no idea about the detailed differences of
the "C" standard evolution.) I explicitly stated that to
make clear about that, since what's valid in "std X" (or
in legacy systems) may be void in "std Y".
Post by Keith Thompson
Are you interested in learning?
I think you are unnecessarily provocative, but I'm honestly
answering your question...

It depends. - Generally, I'm constantly learning of course
(there's no "off"-switch).

Concerning current "C"? - Clearly not in detailed changes.
(I'm aware that the density of folks here that has every C
standard version present and probably even considers the
latest version as a bible (sort of) is high. That's fine.
But I'm not a [religious (sort of)] follower.)

"C" was never (thus still isn't) my "language of choice".
I programmed already in a couple other (better) languages
when I stumbled across C (in the 1980's. It has never been
a paragon of a "good" programming language to me. (YMMV)

In practice I switched _very early_ (as soon as it was
available) to C++, mainly because of the OO concepts that
I already knew from and used with Simula. The unreliable
"C" base of that language was still a nuisance. (It's not
surprising that "C" has later evolved in important parts.
But that ship has sailed [for me; maybe also more widely].)

I'm still (academically) interested in several questions
concerning the C programming language. That's one reason
why I'm raising or discussing topics here. It's private
interest. (It should be a clear indication of learning.)
Post by Keith Thompson
[...]
(I skip the part of your post that I just answered in a
reply to James.)
Post by Keith Thompson
[...]
Post by Janis Papanagnou
Whatever current C standards - and I'm not sure what ancient
'cc' is on my system and to what standard it complies - say,
Perhaps you should find out what your ancient "cc" does. What OS
are you on? Does "cc --version", "cc -V", or "man cc" give you
any meaningful information?
I'm on an (old) Unix system. The version of my GNU 'cc' is
usually not important for the things I'm doing. I've just
once (in the past year) used a '-std=...' switch to have
some specific behavior guaranteed or a feature available.
(I cannot use features of newer standard, year 2000+, but
that's unimportant for the things I'm using that language.)

Professionally I used C only in the late 1980's for a short
period of time.

Frankly, professionally I do other things than programming
[in C or else] these days.

Given my age, and in the light of what I outlined above,
don't expect to convince me with "expert details" of C,
specifically newer C standards. (I'm sure they are very
important for younger folks that [want/have to] use C in
their professional context.)

Quite some folks here seem to be of similar age than me;
I'm astonished there's so much eagerness concerning the
"C" language. :-)
Post by Keith Thompson
[...]
Janis
Keith Thompson
2024-06-19 02:07:48 UTC
Permalink
Keith Thompson <Keith.S.Thompson+***@gmail.com> writes:
[...]
Post by Keith Thompson
That's fine. A return statement or exit() call is unnecessary
in main() due to a special-case rule that was added in 1999 for
compatibility with C++. I don't particularly like that rule myself.
I choose to omit the return statement in small programs, but if
you want to add the "return 0;", I have absolutely no objection.
(I used to do that myself.) It even makes your code more portable
to old compilers that support C90. (tcc claims to support C99,
but it has a bug in this area.)
A minor point: The latest unreleased version of tcc appears to fix this
bug. In tcc 0.9.27, falling off the end of main (defined as "int
main(void)") returns some random status. In the latest version, it
returns 0, based on a quick experiment and a cursory examination of the
generated object code. (tcc doesn't have an option to generate an
assembly listing; I used "tcc -c" followed by "objdump -d".)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
David Brown
2024-06-19 07:50:45 UTC
Permalink
Post by Keith Thompson
[...]
Post by Keith Thompson
That's fine. A return statement or exit() call is unnecessary
in main() due to a special-case rule that was added in 1999 for
compatibility with C++. I don't particularly like that rule myself.
I choose to omit the return statement in small programs, but if
you want to add the "return 0;", I have absolutely no objection.
(I used to do that myself.) It even makes your code more portable
to old compilers that support C90. (tcc claims to support C99,
but it has a bug in this area.)
A minor point: The latest unreleased version of tcc appears to fix this
bug. In tcc 0.9.27, falling off the end of main (defined as "int
main(void)") returns some random status. In the latest version, it
returns 0, based on a quick experiment and a cursory examination of the
generated object code. (tcc doesn't have an option to generate an
assembly listing; I used "tcc -c" followed by "objdump -d".)
Godbolt has support for tcc, which might be convenient if you want to
look at its output.

<https://godbolt.org/z/5hK7PbGbj>
Keith Thompson
2024-06-19 20:13:19 UTC
Permalink
Post by David Brown
Post by Keith Thompson
[...]
Post by Keith Thompson
That's fine. A return statement or exit() call is unnecessary
in main() due to a special-case rule that was added in 1999 for
compatibility with C++. I don't particularly like that rule myself.
I choose to omit the return statement in small programs, but if
you want to add the "return 0;", I have absolutely no objection.
(I used to do that myself.) It even makes your code more portable
to old compilers that support C90. (tcc claims to support C99,
but it has a bug in this area.)
A minor point: The latest unreleased version of tcc appears to fix
this bug. In tcc 0.9.27, falling off the end of main (defined as
"int main(void)") returns some random status. In the latest version,
it returns 0, based on a quick experiment and a cursory examination
of the generated object code. (tcc doesn't have an option to
generate an assembly listing; I used "tcc -c" followed by "objdump
-d".)
Godbolt has support for tcc, which might be convenient if you want to
look at its output.
<https://godbolt.org/z/5hK7PbGbj>
If anyone cares, I found the git commit where this was fixed:

commit 3b9c3fd1860ceaa5684d5837455084707a7848c9
Author: Michael Matz <***@suse.de>
Date: 2018-11-03 22:17:20 +0100

Fix noreturn in main()

ISO C requires 'main' falling through the end without explicit
returns to implicitely return 0 (if declared as returning int).

The most recent release of tcc is 0.9.27, released 2017-12-17.
But the git repo (<git://repo.or.cz/tinycc.git>, mirror at
<https://github.com/TinyCC/tinycc> has updates as recently as
2024-03-22. Post-0.9.27 updates are on the "mob" branch.

Godbolt's "TCC (trunk)" is built from the latest version in git.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
James Kuyper
2024-06-17 06:21:22 UTC
Permalink
Post by Janis Papanagnou
Post by Keith Thompson
[...] K&R at
least seems to say that 'void' can only be declared for the
return type of functions that do not return anything.
[...]
No version of C has ever permitted "void main" except when an
implementation documents and permits it. [...]
I cannot comment on main() being handled differently than
other C functions. I was just quoting my old copy of K&R.
It is handled differently. Your own functions can be declared in a wide
variety of ways, so long as the declaration that is relevant to function
designator in a function call is compatible with the definition of the
function that it designates.
C standard library functions can only be declared in ways compatible
with the specifications in the C standard.
main(), on the other hand, is unique, in that you have two incompatible
choices of how to define it, and an implementation can designate
additional choices. You can define main() in any way compatible with one
of the options supported by your implementation; but portable code
should define it only in one of the two ways specified by the C standard.
K&R is long obsolete; up-to-date drafts of the standard that are almost
identical to the latest version of the standard are free and easily
available.
Post by Janis Papanagnou
I don't understand what you mean with "no version of C has
ever permitted", given that my C compiler doesn't complain.
He wrote "No version of C has ever permitted "void main" except when an
implementation documents and permits it." Note that he is talking about
versions of the standard, not versions of any particular implementation
of C. If your C compiler "documents and permits" "void main", then it
certainly shouldn't complain about it. However, since the C standard
does not mandate support for void main, you've no guarantee of
portability of code that uses void main to other implementations of C.
Janis Papanagnou
2024-06-17 07:22:58 UTC
Permalink
Post by James Kuyper
Post by Janis Papanagnou
Post by Keith Thompson
[...] K&R at
least seems to say that 'void' can only be declared for the
return type of functions that do not return anything.
[...]
No version of C has ever permitted "void main" except when an
implementation documents and permits it. [...]
I cannot comment on main() being handled differently than
other C functions. I was just quoting my old copy of K&R.
It is handled differently. Your own functions can be declared in a wide
variety of ways, so long as the declaration that is relevant to function
designator in a function call is compatible with the definition of the
function that it designates.
C standard library functions can only be declared in ways compatible
with the specifications in the C standard.
main(), on the other hand, is unique, in that you have two incompatible
choices of how to define it, and an implementation can designate
additional choices. You can define main() in any way compatible with one
of the options supported by your implementation; but portable code
should define it only in one of the two ways specified by the C standard.
K&R is long obsolete; up-to-date drafts of the standard that are almost
identical to the latest version of the standard are free and easily
available.
Post by Janis Papanagnou
I don't understand what you mean with "no version of C has
ever permitted", given that my C compiler doesn't complain.
He wrote "No version of C has ever permitted "void main" except when an
implementation documents and permits it." Note that he is talking about
versions of the standard, not versions of any particular implementation
of C. If your C compiler "documents and permits" "void main", then it
certainly shouldn't complain about it. However, since the C standard
does not mandate support for void main, you've no guarantee of
portability of code that uses void main to other implementations of C.
Re portability: Of course there's other requirements for portable
and generally for professional code. I wrote a lot more professional
code in C++ than in C but the same requirements hold. Defining the
version of the standard, the supported platforms, activation of high
warning levels - we wanted our code free of warnings! -, and whatnot.

Janis
Michael S
2024-06-16 08:11:34 UTC
Permalink
On Sun, 16 Jun 2024 07:41:38 +0200
Post by Janis Papanagnou
Post by Keith Thompson
Post by Janis Papanagnou
Post by Janis Papanagnou
Post by Lawrence D'Oliveiro
Post by Michael S
If you want to preserve you sanity, never use fscanf().
Quoth the man page
It is very difficult to use these functions correctly, and
it is preferable to read entire lines with fgets(3) or
getline(3) and parse them later with sscanf(3) or more
specialized functions such as strtol(3).
This would be also my first impulse, but you'd have to know
_in advance_ how long the data stream would be; the function
requires an existing buffer. So you'd anyway need a stepwise
input. [...]
Would it be sensible to have a malloc()'ed buffer used for the
first fgets() and then subsequent fgets() work on the realloc()'ed
part? I suppose the previously set data in the malloc area would
be retained so that there's no re-composition of cut numbers
necessary?
Sure. "The contents of the new object shall be the same as that of
the old object prior to deallocation, up to the lesser of the new
and old sizes."
Keep in mind that you can't call realloc() on a non-null pointer
that wasn't allocated by an allocation function.
Thanks. - I've just tried it with this ad hoc test code
#include <stdlib.h>
#include <stdio.h>
void main (int argc, char * argv[])
{
int chunk = 10;
int bufsize = chunk+1;
char * buf = malloc(bufsize);
char * anchor = buf;
while (fgets(buf, chunk+1, stdin) != NULL)
if (realloc(anchor, bufsize += chunk) != NULL)
buf += chunk;
puts(anchor);
}
Not sure what this code is supposed to do.
However it looks unlikely that it does what you meant for it to do.
I recommend to read the [f*****g] manual.
https://cplusplus.com/reference/cstdio/fgets/
https://cplusplus.com/reference/cstdlib/realloc/
Post by Janis Papanagnou
I wonder whether it can be simplified by making malloc() obsolete
and using realloc() in a redesigned loop.
Janis
Janis Papanagnou
2024-06-16 09:07:22 UTC
Permalink
Post by Michael S
Not sure what this code is supposed to do.
Not sure what you're comment is supposed to tell me.
Post by Michael S
However it looks unlikely that it does what you meant for it to do.
I recommend to read the [f*****g] manual.
https://cplusplus.com/reference/cstdio/fgets/
https://cplusplus.com/reference/cstdlib/realloc/
I don't need the Web to access man pages. Thanks.

Janis
Michael S
2024-06-16 09:38:50 UTC
Permalink
On Sun, 16 Jun 2024 11:07:22 +0200
Post by Janis Papanagnou
Post by Michael S
Not sure what this code is supposed to do.
Not sure what you're comment is supposed to tell me.
I hoped that after you would read the manual you will know.
But it obviously didn't work out.
So, I'd tell a little more:
1. It does not read one line of arbitrary length
2. There is more than one mistake
3. All mistakes seems to be caused by deep misconceptions about fgets()
and realloc().
Post by Janis Papanagnou
Post by Michael S
However it looks unlikely that it does what you meant for it to do.
I recommend to read the [f*****g] manual.
https://cplusplus.com/reference/cstdio/fgets/
https://cplusplus.com/reference/cstdlib/realloc/
I don't need the Web to access man pages. Thanks.
Janis
May be not Web in general, but this particular site is more pleasant to
read than typical *nix man.
Besides, I don't know how it looks on your man page, but if it is
similar to one in link below then apart from relevant info it contains
few blah-blah paragraphs.
https://www.man7.org/linux/man-pages/man3/fgets.3p.html
Janis Papanagnou
2024-06-16 10:03:21 UTC
Permalink
Post by Michael S
On Sun, 16 Jun 2024 11:07:22 +0200
Post by Janis Papanagnou
Post by Michael S
Not sure what this code is supposed to do.
Not sure what you're comment is supposed to tell me.
I hoped that after you would read the manual you will know.
The point is that I inspected these manuals before I wrote the
test code, and after some initial mistakes it ran as expected.
Post by Michael S
But it obviously didn't work out.
1. It does not read one line of arbitrary length
No, it does not do that. I wrote the test program as one way
to circumvent the problem reading data of arbitrary length.
The relevant text of my previous post was:

| Would it be sensible to have a malloc()'ed buffer used for the first
| fgets() and then subsequent fgets() work on the realloc()'ed part? I
| suppose the previously set data in the malloc area would be retained
| so that there's no re-composition of cut numbers necessary?

It's intention was to read chunks of data to construct the
buffer subsequently, starting from a small buffer instance and
let it grow as more data (from the single external data line)
are read. The intention was to avoid having to wastefully
specify a too large buffer from the beginning, yet not being
sure it suffices.

(And just to make sure; "arbitrary" length is of course meant
to be in the range of available memory, neither exa-byte, nor
"unlimited" was meant.)
Post by Michael S
2. There is more than one mistake
I'm sure that's possible with an "ad hoc test code". Only your
comment is meaningless without pointing me to any issue you see.
Post by Michael S
3. All mistakes seems to be caused by deep misconceptions about fgets()
and realloc().
Again; which ones?

Janis
Post by Michael S
[...]
Michael S
2024-06-16 11:31:43 UTC
Permalink
On Sun, 16 Jun 2024 12:03:21 +0200
Post by Janis Papanagnou
Again; which ones?
Janis
[...]
The main misconceptions are about what is returned by fgets() and by
realloc().
fgets() returns its first argument in all cases except EOF or FS read
error. That includes the case when the buffer is too short to
accommodate full input line.

With realloc() there are two issues:
1. It can sometimes return its first argument, but does not have to. In
scenario like yours it will return a different pointer quite soon.
2. When realloc() returns NULL, it does not de-allocate its first
argument.

The second case, of course, is not important in practice, because in
practice you're very unlikely to see realloc() returning NULL, and if
nevertheless it did happen, you program is unlikely to survive and give
meaningful result anyway. Still, here on c.l.c we like to pretend that
we can meaningfully handle allocation failures.
Janis Papanagnou
2024-06-16 15:37:34 UTC
Permalink
Post by Michael S
On Sun, 16 Jun 2024 12:03:21 +0200
Post by Janis Papanagnou
Again; which ones?
Janis
[...]
The main misconceptions are about what is returned by fgets() and by
realloc().
fgets() returns its first argument in all cases except EOF or FS read
error. That includes the case when the buffer is too short to
accommodate full input line.
fgets() return s on success, and NULL on error or when end
of file occurs while no characters have been read.

I am interested in the success case, thus "fgets() != NULL".
The buffer size is controlled by the second function parameter.
Post by Michael S
1. It can sometimes return its first argument, but does not have to. In
scenario like yours it will return a different pointer quite soon.
2. When realloc() returns NULL, it does not de-allocate its first
argument.
The second case, of course, is not important in practice, because in
practice you're very unlikely to see realloc() returning NULL, and if
nevertheless it did happen, you program is unlikely to survive and give
meaningful result anyway. Still, here on c.l.c we like to pretend that
we can meaningfully handle allocation failures.
You may provide "correct" code (if you think mine is wrong). Or just
inspect how it behaves; here's the output after two printf's added:

$ printf "123 456 789 101112 77 88 99 101 999" | realloc
+++>123 456 78<+++
===>123 456 78<===
+++>9 101112 7<+++
===>123 456 789 101112 7<===
+++>7 88 99 10<+++
===>123 456 789 101112 77 88 99 10<===
+++>1 999<+++
===>123 456 789 101112 77 88 99 101 999<===
123 456 789 101112 77 88 99 101 999

The +++data+++ is the chunk read, and the ===data=== is the overall
buffer content, and the final line again the result (as before, with
the added newline as documented for puts()).

Even though my code is just an "ad hoc test code" to demonstrate the
procedure I outlined - and as such test code certainly lacking quite
some error handling and much more - it does exactly what I intended it
to do, and what I've implemented according to what I read in the man
pages. I cannot see any "misconception", it does what was _intended_.

There's indeed one point that I _deliberately_ ignored for the test
code; actually the point you mentioned as "not important in practice".

Again: You may provide "correct" code (if you think mine is "wrong"),
or "better" code, usable for production instead of a test code.

But your tone and statements were (as observed so often) inadequate;
I quote from your post that started the subthread:
"However it looks unlikely that it does what you meant for it to do."

It does exactly what I meant to do (as you can see in the logs above).

Janis
Michael S
2024-06-16 21:45:33 UTC
Permalink
On Sun, 16 Jun 2024 17:37:34 +0200
Post by Janis Papanagnou
Post by Michael S
On Sun, 16 Jun 2024 12:03:21 +0200
Post by Janis Papanagnou
Again; which ones?
Janis
[...]
The main misconceptions are about what is returned by fgets() and by
realloc().
fgets() returns its first argument in all cases except EOF or FS
read error. That includes the case when the buffer is too short to
accommodate full input line.
fgets() return s on success, and NULL on error or when end
of file occurs while no characters have been read.
I am interested in the success case, thus "fgets() != NULL".
The buffer size is controlled by the second function parameter.
Post by Michael S
1. It can sometimes return its first argument, but does not have
to. In scenario like yours it will return a different pointer quite
soon. 2. When realloc() returns NULL, it does not de-allocate its
first argument.
The second case, of course, is not important in practice, because in
practice you're very unlikely to see realloc() returning NULL, and
if nevertheless it did happen, you program is unlikely to survive
and give meaningful result anyway. Still, here on c.l.c we like to
pretend that we can meaningfully handle allocation failures.
You may provide "correct" code (if you think mine is wrong). Or just
$ printf "123 456 789 101112 77 88 99 101 999" | realloc
+++>123 456 78<+++
===>123 456 78<===
+++>9 101112 7<+++
===>123 456 789 101112 7<===
+++>7 88 99 10<+++
===>123 456 789 101112 77 88 99 10<===
+++>1 999<+++
===>123 456 789 101112 77 88 99 101 999<===
123 456 789 101112 77 88 99 101 999
The +++data+++ is the chunk read, and the ===data=== is the overall
buffer content, and the final line again the result (as before, with
the added newline as documented for puts()).
Even though my code is just an "ad hoc test code" to demonstrate the
procedure I outlined - and as such test code certainly lacking quite
some error handling and much more - it does exactly what I intended it
to do, and what I've implemented according to what I read in the man
pages. I cannot see any "misconception", it does what was _intended_.
There's indeed one point that I _deliberately_ ignored for the test
code; actually the point you mentioned as "not important in practice".
Again: You may provide "correct" code (if you think mine is "wrong"),
or "better" code, usable for production instead of a test code.
But your tone and statements were (as observed so often) inadequate;
"However it looks unlikely that it does what you meant for it to do."
Did you consider that, may be, your understanding of C library is
inadequate?
Post by Janis Papanagnou
It does exactly what I meant to do (as you can see in the logs above).
Janis
The only thing I see in your logs is that your testing skills are on par
with your coding skills.

I am not quite sure what your code is supposed to do.
However, my impression was that we wanted to read text file line by
line and to process lines separately. You code certainly does not do
it. As I said above it contains several mistakes, including one that is
particularly serious and hard to diagnose - a use of de-allocated
buffer.

Below is an example of how to do it correctly.
It's not the only possible method, but any correct method would have
similar complexity.


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
const int sz_incr = 10;
size_t bufsz = 32;
char* buffer = malloc(bufsz);
if (!buffer) {
perror("malloc()");
return 1;
}

size_t rdi = 0;
int err = 0;
for (size_t line_i = 1; ;) {
buffer[bufsz-1] = 1; // set guard
if (!fgets(&buffer[rdi], bufsz-rdi, stdin))
break; // eof or error

if (buffer[bufsz-1] != 0 || buffer[bufsz-2] == '\n') {
// Full line - we can process it here
// As an example, let's print our line as hex
printf("%10zu:", line_i);
size_t len = (char*)memchr(&buffer[rdi], '\n',
bufsz-rdi-1)+1-buffer; for (size_t i = 0; i < len; ++i)
printf(" %02x", (unsigned char)buffer[i]);
printf("\n");
rdi = 0;
++line_i;
} else {
// line is longer then bufsz-1
rdi = bufsz-1;
bufsz += sz_incr;
char* tmp = realloc(buffer, bufsz);
if (!tmp) {
perror("realloc()");
err = 1;
break;
}
buffer = tmp;
}
}

free(buffer);
if (ferror(stdin)) {
perror("fgets(stdin)");
err = 2;
}
return err;
}
Keith Thompson
2024-06-17 00:06:10 UTC
Permalink
Janis Papanagnou <janis_papanagnou+***@hotmail.com> writes:
[...]
Post by DFS
#include <stdlib.h>
#include <stdio.h>
void main (int argc, char * argv[])
{
int chunk = 10;
int bufsize = chunk+1;
char * buf = malloc(bufsize);
char * anchor = buf;
while (fgets(buf, chunk+1, stdin) != NULL)
if (realloc(anchor, bufsize += chunk) != NULL)
buf += chunk;
puts(anchor);
}
realloc() can return the pointer you pass to it if there's enough room
in the existing location. (Or it can relocate the buffer even if there
is enough room.)

But if realloc() moves the buffer (copying the existing data to it), it
returns a pointer to the new location and invalidates the old one. You
discard the new pointer, only comparing it to NULL.

Perhaps you assumed that realloc() always expands the buffer in place.
It doesn't.

If the above program worked for you, I suspect that either realloc()
never relocated the buffer, or you continued using the original buffer
(and beyond) after realloc() invalidated it.

The worst consequence of undefined behavior is having your code appear
to "work".
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Tim Rentsch
2024-06-17 05:40:48 UTC
Permalink
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
Janis Papanagnou
2024-06-17 05:52:25 UTC
Permalink
Post by Tim Rentsch
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
I think I wouldn't code a missile control system in "C". ;-)

Janis
DFS
2024-06-17 13:45:36 UTC
Permalink
Post by Janis Papanagnou
Post by Tim Rentsch
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
I think I wouldn't code a missile control system in "C". ;-)
Janis
Per "Google AI Overview": "In 1987, the Department of Defense mandated
that Ada be the standard programming language for Defense computer
resources used in military command and control systems."
Chris M. Thomasson
2024-06-17 20:16:27 UTC
Permalink
Post by DFS
Post by Janis Papanagnou
Post by Tim Rentsch
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
I think I wouldn't code a missile control system in "C". ;-)
Janis
Per "Google AI Overview": "In 1987, the Department of Defense mandated
that Ada be the standard programming language for Defense computer
resources used in military command and control systems."
Check this out:

JOINT STRIKE FIGHTER
AIR VEHICLE
C++ CODING STANDARDS

https://www.stroustrup.com/JSF-AV-rules.pdf

;^)
DFS
2024-06-17 21:07:41 UTC
Permalink
Post by Chris M. Thomasson
Post by DFS
Post by Janis Papanagnou
Post by Tim Rentsch
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
I think I wouldn't code a missile control system in "C". ;-)
Janis
Per "Google AI Overview": "In 1987, the Department of Defense mandated
that Ada be the standard programming language for Defense computer
resources used in military command and control systems."
JOINT STRIKE FIGHTER
AIR VEHICLE
C++ CODING STANDARDS
https://www.stroustrup.com/JSF-AV-rules.pdf
;^)
Scary.

I want to add a new AV Rule:

* The Joint Strike Fighter Air Vehicle C++ Coding Standards document
will not leave out Rules 161 and 172.
Keith Thompson
2024-06-17 22:48:00 UTC
Permalink
[...]
Post by DFS
Post by Chris M. Thomasson
JOINT STRIKE FIGHTER
AIR VEHICLE
C++ CODING STANDARDS
https://www.stroustrup.com/JSF-AV-rules.pdf
* The Joint Strike Fighter Air Vehicle C++ Coding Standards document
will not leave out Rules 161 and 172.
To save readers some time, the numbering in that documents skips
rules 161 and 172. The obvious explanation (which I haven't
confirmed) is that an earlier version had rules with those numbers
that were dropped, and renumbering the other rules was impractical
because there were existing references to them.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Scott Lurndal
2024-06-17 22:48:21 UTC
Permalink
Post by DFS
Post by Chris M. Thomasson
Post by DFS
Post by Janis Papanagnou
Post by Tim Rentsch
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
I think I wouldn't code a missile control system in "C". ;-)
Janis
Per "Google AI Overview": "In 1987, the Department of Defense mandated
that Ada be the standard programming language for Defense computer
resources used in military command and control systems."
JOINT STRIKE FIGHTER
AIR VEHICLE
C++ CODING STANDARDS
https://www.stroustrup.com/JSF-AV-rules.pdf
;^)
Scary.
It's useful to note that these rules were published two decades ago.
Keith Thompson
2024-06-17 22:44:13 UTC
Permalink
Post by DFS
Post by Janis Papanagnou
Post by Tim Rentsch
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
I think I wouldn't code a missile control system in "C". ;-)
Janis
Per "Google AI Overview": "In 1987, the Department of Defense mandated
that Ada be the standard programming language for Defense computer
resources used in military command and control systems."
Please don't post AI-based misinformation.

The DOD Ada mandate was introduced in 1991, and effectively dropped in 1997.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
David Brown
2024-06-18 13:00:16 UTC
Permalink
Post by Keith Thompson
Post by DFS
Post by Janis Papanagnou
Post by Tim Rentsch
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
I think I wouldn't code a missile control system in "C". ;-)
Janis
Per "Google AI Overview": "In 1987, the Department of Defense mandated
that Ada be the standard programming language for Defense computer
resources used in military command and control systems."
Please don't post AI-based misinformation.
The DOD Ada mandate was introduced in 1991, and effectively dropped in 1997.
And of course the USA DoD (I assume, when DFS failed to mention the
country, he meant the USA) is only for one country. There are a great
many other countries making missile control software around the world,
and I know without doubt that Ada is not mandated in all of them.

The open-source RTEMS "Real-Time Executive for Missile Systems" RTOS is
written in a mix of C and Ada, and supports at least C, Ada and C++ for
user code.
Janis Papanagnou
2024-06-18 04:57:06 UTC
Permalink
Post by DFS
Post by Janis Papanagnou
I think I wouldn't code a missile control system in "C". ;-)
Per "Google AI Overview": "In 1987, the Department of Defense mandated
that Ada be the standard programming language for Defense computer
resources used in military command and control systems."
This is actually what I'd have expected, that they might prefer Ada
(as in the aviation or space flight areas).

There's jokes existing[*] that illustrate the dilemma with safety in
engineering. Using specific (unsafe) languages as well as reducing
funds for QA measures or externalizing processes and components (or
many other possible sources for increased unreliability); there's
quite some tragic examples, sadly.

Janis

[*] e.g.
https://www.reddit.com/r/Jokes/comments/6a1czd/a_group_of_engineering_professors_were_invited_to/
Tim Rentsch
2024-06-18 07:25:19 UTC
Permalink
Post by Janis Papanagnou
Post by Tim Rentsch
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
I think I wouldn't code a missile control system in "C". ;-)
It's extremely unlikely that I will ever be working on a missile
control system, either in C or in any other language. But that
doesn't change either the truth or the point of my comment.
James Kuyper
2024-06-17 06:38:16 UTC
Permalink
Post by Tim Rentsch
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
The standard doesn't say anything to prohibit such a consequence, but in
real life such an outcome is possible only if your program is executing
in an environment that allows it to send out launch messages to real
missiles. In such a context, a program that was intended to launch a
missile strike, and seemed to do so, but actually failed to do so, would
arguably be worse. If the enemy knows that you are running such
defective software, that enemy might not be deterred from attacking.
Tim Rentsch
2024-06-19 00:01:07 UTC
Permalink
Post by James Kuyper
Post by Tim Rentsch
Post by Keith Thompson
The worst consequence of undefined behavior is having your code
appear to "work".
Personally I think causing a missle launch that started a
worldwide thermonuclear war would be a worse consequence.
YMMV.
The standard doesn't say anything to prohibit such a consequence, but in
real life such an outcome is possible only if your program is executing
in an environment that allows it to send out launch messages to real
missiles. In such a context, a program that was intended to launch a
missile strike, and seemed to do so, but actually failed to do so, would
arguably be worse. If the enemy knows that you are running such
defective software, that enemy might not be deterred from attacking.
None of that has any relevance to my comment.
Janis Papanagnou
2024-06-17 05:48:39 UTC
Permalink
Post by Keith Thompson
[...]
Post by DFS
#include <stdlib.h>
#include <stdio.h>
void main (int argc, char * argv[])
{
int chunk = 10;
int bufsize = chunk+1;
char * buf = malloc(bufsize);
char * anchor = buf;
while (fgets(buf, chunk+1, stdin) != NULL)
if (realloc(anchor, bufsize += chunk) != NULL)
buf += chunk;
puts(anchor);
}
realloc() can return the pointer you pass to it if there's enough room
in the existing location. (Or it can relocate the buffer even if there
is enough room.)
But if realloc() moves the buffer (copying the existing data to it), it
returns a pointer to the new location and invalidates the old one. You
discard the new pointer, only comparing it to NULL.
Perhaps you assumed that realloc() always expands the buffer in place.
It doesn't.
No, I didn't assume that. I just missed that 'anchor' will get lost.
Thanks!
Post by Keith Thompson
If the above program worked for you, I suspect that either realloc()
never relocated the buffer, or you continued using the original buffer
(and beyond) after realloc() invalidated it. [...]
Yes, that was certainly the case. (I did no thorough test with large
data sets, just a simple ad hoc test.)


Elsethread I suggested to merge the malloc() with the realloc() call.
The resulting code would be simpler (and might address that problem).

int chunk = 10;
int bufsize = 1;
char * anchor = NULL;
while ((anchor = realloc (anchor, bufsize += chunk)) != NULL &&
fgets (anchor+bufsize-chunk-1, chunk+1, stdin) != NULL)
;
puts (anchor);


Do you see the exposed problem (or any other issues) here, too?

Janis
Keith Thompson
2024-06-17 06:29:29 UTC
Permalink
Post by Janis Papanagnou
Post by Keith Thompson
[...]
Post by DFS
#include <stdlib.h>
#include <stdio.h>
void main (int argc, char * argv[])
{
int chunk = 10;
int bufsize = chunk+1;
char * buf = malloc(bufsize);
char * anchor = buf;
while (fgets(buf, chunk+1, stdin) != NULL)
if (realloc(anchor, bufsize += chunk) != NULL)
buf += chunk;
puts(anchor);
}
realloc() can return the pointer you pass to it if there's enough room
in the existing location. (Or it can relocate the buffer even if there
is enough room.)
But if realloc() moves the buffer (copying the existing data to it), it
returns a pointer to the new location and invalidates the old one. You
discard the new pointer, only comparing it to NULL.
Perhaps you assumed that realloc() always expands the buffer in place.
It doesn't.
No, I didn't assume that. I just missed that 'anchor' will get lost.
Thanks!
Post by Keith Thompson
If the above program worked for you, I suspect that either realloc()
never relocated the buffer, or you continued using the original buffer
(and beyond) after realloc() invalidated it. [...]
Yes, that was certainly the case. (I did no thorough test with large
data sets, just a simple ad hoc test.)
Elsethread I suggested to merge the malloc() with the realloc() call.
The resulting code would be simpler (and might address that problem).
int chunk = 10;
int bufsize = 1;
char * anchor = NULL;
while ((anchor = realloc (anchor, bufsize += chunk)) != NULL &&
fgets (anchor+bufsize-chunk-1, chunk+1, stdin) != NULL)
;
puts (anchor);
Do you see the exposed problem (or any other issues) here, too?
If stdin is empty, you never store anything in the buffer and
puts(anchor) has undefined behavior because there might be a terminating
'\0'. If the first realloc() fails, anchor is a null pointer and again
puts(anchor) has undefined behavior.

If nothing goes wrong, puts() adds an extra newline to the output.

That's all that jumped out at me looking at the code, but did you test
it with multi-line input? When I tried it it printed only the first
line of input (followed by that extra newline).

I'm still not entirely sure what the code is supposed to do.

```
$ ( echo one ; echo two ) | ./janis
one

$
```
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Janis Papanagnou
2024-06-17 07:35:07 UTC
Permalink
Post by Keith Thompson
Post by Janis Papanagnou
[...]
Elsethread I suggested to merge the malloc() with the realloc() call.
The resulting code would be simpler (and might address that problem).
int chunk = 10;
int bufsize = 1;
char * anchor = NULL;
while ((anchor = realloc (anchor, bufsize += chunk)) != NULL &&
fgets (anchor+bufsize-chunk-1, chunk+1, stdin) != NULL)
;
puts (anchor);
Do you see the exposed problem (or any other issues) here, too?
If stdin is empty, you never store anything in the buffer and
puts(anchor) has undefined behavior because there might be a terminating
'\0'. If the first realloc() fails, anchor is a null pointer and again
puts(anchor) has undefined behavior.
If nothing goes wrong, puts() adds an extra newline to the output.
Yes, the purpose of puts() was to show whether the function that I
wanted to check works properly on a long line of data.
Post by Keith Thompson
That's all that jumped out at me looking at the code, but did you test
it with multi-line input? When I tried it it printed only the first
line of input (followed by that extra newline).
I'm still not entirely sure what the code is supposed to do.
I just wanted the realloc-append logic codified and verified. (As
a possible building block to create a function for the purpose of
the original task outlined by the OP.)

Janis
Malcolm McLean
2024-06-16 07:19:28 UTC
Permalink
Post by Janis Papanagnou
Post by Lawrence D'Oliveiro
Post by Michael S
If you want to preserve you sanity, never use fscanf().
It is very difficult to use these functions correctly, and it is
preferable to read entire lines with fgets(3) or getline(3) and
parse them later with sscanf(3) or more specialized functions such
as strtol(3).
This would be also my first impulse, but you'd have to know
_in advance_ how long the data stream would be; the function
requires an existing buffer. So you'd anyway need a stepwise
input. On the plus side there's maybe a better performance
to read large buffer junks and compose them on demand? But
a problem is the potential cut of the string of a number; it
requires additional clumsy handling. So it might anyway be
better (i.e. much more convenient) to use fscanf() ?
Janis
Try this psuedo - code)

tempfp = tmpfile();

while ((ch = fgetc(fp) != EOF)
{
if (isidigit(ch))
{
ungetc(ch, fp);
x = parsenumber(fp);
fwrite(&x, sizeof(int), 1, tenpfp);
N++;
}
}

answer = malloc(N * sizeof(int))
fessek(tempfp, 0, SEEK_SET);
fread(answer, N, sizeof(int), tempfp);
reverse(answer, N);
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
Michael S
2024-06-16 07:44:25 UTC
Permalink
On Sun, 16 Jun 2024 05:41:12 +0200
Post by Janis Papanagnou
Post by Lawrence D'Oliveiro
Post by Michael S
If you want to preserve you sanity, never use fscanf().
It is very difficult to use these functions correctly, and it is
preferable to read entire lines with fgets(3) or getline(3) and
parse them later with sscanf(3) or more specialized functions
such as strtol(3).
This would be also my first impulse, but you'd have to know
_in advance_ how long the data stream would be; the function
requires an existing buffer.
Define formats with sensible maximal line length (512 sounds about
right) and refuse any input that has longer lines.
Post by Janis Papanagnou
So you'd anyway need a stepwise
input. On the plus side there's maybe a better performance
to read large buffer junks and compose them on demand? But
a problem is the potential cut of the string of a number; it
requires additional clumsy handling. So it might anyway be
better (i.e. much more convenient) to use fscanf() ?
No, the behaviour of fsacnf() is too non-intuitive.
Post by Janis Papanagnou
Janis
Janis Papanagnou
2024-06-16 09:13:35 UTC
Permalink
Post by Michael S
On Sun, 16 Jun 2024 05:41:12 +0200
Post by Janis Papanagnou
Post by Lawrence D'Oliveiro
Post by Michael S
If you want to preserve you sanity, never use fscanf().
It is very difficult to use these functions correctly, and it is
preferable to read entire lines with fgets(3) or getline(3) and
parse them later with sscanf(3) or more specialized functions
such as strtol(3).
This would be also my first impulse, but you'd have to know
_in advance_ how long the data stream would be; the function
requires an existing buffer.
Define formats with sensible maximal line length (512 sounds about
right) and refuse any input that has longer lines.
You're not serious, are you? - Or wasn't it clear that it was
about reading lines (of arbitrary lengths) in one go?
Post by Michael S
Post by Janis Papanagnou
So you'd anyway need a stepwise
input. On the plus side there's maybe a better performance
to read large buffer junks and compose them on demand? But
a problem is the potential cut of the string of a number; it
requires additional clumsy handling. So it might anyway be
better (i.e. much more convenient) to use fscanf() ?
No, the behaviour of fsacnf() is too non-intuitive.
Maybe fsacnf() is non-intuitive.

Myself I've never problems with fscanf(), though. - To each
his own. :-)

Janis
DFS
2024-06-16 15:03:15 UTC
Permalink
Post by Michael S
On Sat, 15 Jun 2024 15:36:22 -0400
If you want to preserve you sanity, never use fscanf().
ha!

You misspelled C.
Lew Pitcher
2024-06-16 15:52:16 UTC
Permalink
Post by DFS
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118 245
294 188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144 245 178
108 152 197 125 185 63 272 239 60 242 56 4 235 244 144 69 195 32 4 54 79
193 282 173 267 8 40 241 152 285 119 259 136 15 83 21 78 55 259 137 297
15 141 232 259 285 300 153 16 4 207 95 197 188 267 164 195 7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
Others have (and will continue to) address this question.
Post by DFS
Any 'better' way?
Not so much "better", as "other".

ISTM that you waste an opportunity (and expose a common 'blind-spot')
in your first loop:

Here,
Post by DFS
while(fscanf(datafile, "%d", &j) != EOF){
N++;
}
you discard a lot of work (done for you by fscanf() to determine the
value of each input number) just to be able to count the number of
numbers in your input. What if there were a way to put this (to you)
byproduct of fscanf() to use, and avoid using fscanf() entirely in
the second pass?

You /could/ create a temporary, binary, file, and write the fscanf()'ed
values to it as part of the first loop. Once the first loop completes,
you rewind this temporary file, and load your integer array by reading
the (now converted to native integer format) values from that file.

Still two passes, but using fscanf() in only one of those passes.

(BTW, the 'blind-spot' I mentioned is that we often forget that
we /can/ use temporary files to store intermediary results. Sometimes
we can manipulate a temporary file easier than we can manipulate
malloc()ed (or other) storage. )
--
Lew Pitcher
"In Skills We Trust"
Malcolm McLean
2024-06-16 16:17:52 UTC
Permalink
Post by Lew Pitcher
Post by DFS
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118 245
294 188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144 245 178
108 152 197 125 185 63 272 239 60 242 56 4 235 244 144 69 195 32 4 54 79
193 282 173 267 8 40 241 152 285 119 259 136 15 83 21 78 55 259 137 297
15 141 232 259 285 300 153 16 4 207 95 197 188 267 164 195 7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
Any issues with this method?
Others have (and will continue to) address this question.
Post by DFS
Any 'better' way?
Not so much "better", as "other".
ISTM that you waste an opportunity (and expose a common 'blind-spot')
Here,
Post by DFS
while(fscanf(datafile, "%d", &j) != EOF){
N++;
}
you discard a lot of work (done for you by fscanf() to determine the
value of each input number) just to be able to count the number of
numbers in your input. What if there were a way to put this (to you)
byproduct of fscanf() to use, and avoid using fscanf() entirely in
the second pass?
You /could/ create a temporary, binary, file, and write the fscanf()'ed
values to it as part of the first loop. Once the first loop completes,
you rewind this temporary file, and load your integer array by reading
the (now converted to native integer format) values from that file.
Still two passes, but using fscanf() in only one of those passes.
(BTW, the 'blind-spot' I mentioned is that we often forget that
we /can/ use temporary files to store intermediary results. Sometimes
we can manipulate a temporary file easier than we can manipulate
malloc()ed (or other) storage. )
Exactly. People complain that it is a hassle to realloc() a buffer.

So just use a temporary file.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
Lawrence D'Oliveiro
2024-06-18 08:06:40 UTC
Permalink
(BTW, the 'blind-spot' I mentioned is that we often forget that we /can/
use temporary files to store intermediary results. Sometimes we can
manipulate a temporary file easier than we can manipulate malloc()ed (or
other) storage. )
Or we could use an “I/O stream”, basically a temporary file in RAM.
Lew Pitcher
2024-06-25 17:37:24 UTC
Permalink
Post by Lew Pitcher
Post by DFS
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118 245
294 188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144 245 178
108 152 197 125 185 63 272 239 60 242 56 4 235 244 144 69 195 32 4 54 79
193 282 173 267 8 40 241 152 285 119 259 136 15 83 21 78 55 259 137 297
15 141 232 259 285 300 153 16 4 207 95 197 188 267 164 195 7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
[snip]
Post by Lew Pitcher
You /could/ create a temporary, binary, file, and write the fscanf()'ed
values to it as part of the first loop. Once the first loop completes,
you rewind this temporary file, and load your integer array by reading
the (now converted to native integer format) values from that file.
Still two passes, but using fscanf() in only one of those passes.
[snip]

For what it's worth, here's an example of what I suggest:

/*
The following code provides two examples of the approach I suggested.

Example 1: while counting input numbers, write temp file with int values
malloc() a buffer big enough for that count of int values
fread() the temp file into the malloc()'ed buffer
Note: conformant to ISO Standard C.

Example 2: while counting input numbers, write temp file with int values
mmap() the temp file, starting at the beginning, and sized to
include all the int values in the file.
Note: conformant to POSIX C extensions to ISO Standard C.

Note: compile with -DUSE_MMAP to obtain mmap() variant, otherwise
this will compile the malloc()/fread() variant
*/

#include <stdio.h>
#include <stdlib.h>

#ifdef USE_MMAP
#include <sys/mman.h>
#define BANNER "Example of array loading using mmap()"
#define FREEALLOC(x)
#else
#define BANNER "Example of array loading using malloc() and fread()"
#define FREEALLOC(x) free((x))
#endif

static int *LoadIntArray(FILE *fp, size_t *Count);

int main(void)
{
int status = EXIT_FAILURE, *array;
size_t count;

puts(BANNER);

if ((array = LoadIntArray(stdin,&count)))
{
printf("%zu elements loaded\n",count);
for (size_t index = 0; index < count; ++index)
printf("array[%3zu] == %d\n",index,array[index]);

FREEALLOC(array); /* if necessary, free() the malloc()'ed array */
status = EXIT_SUCCESS;
}
return status;
}

static int *LoadIntArray(FILE *fp,size_t *Count)
{
FILE *tmp;
int *array = NULL;
size_t count = 0;

if ((tmp = tmpfile()))
{
int buffer;

for (count = 0; fscanf(fp,"%d",&buffer) == 1; ++count)
fwrite(&buffer,sizeof buffer, 1,tmp);

if (count)
{
#ifdef USE_MMAP
/*
** USE mmap() to map temp_file data into process memory
*/
array = mmap(NULL,
count * sizeof *array,
PROT_READ,MAP_PRIVATE,
fileno(tmp),
0);
if (array == MAP_FAILED)
{
array = NULL;
fprintf(stderr,"FAIL: Cannot mmap %zu element array\n",count);
}
#else
/*
** USE malloc() to reserve a big enough heap-space buffer,
** then fread() the temp_file data into that buffer
*/
if ((array = malloc(count * sizeof *array)))
{
rewind(tmp);
if (fread(array,sizeof *array,count,tmp) != count)
{
free(array);
array = NULL;
fprintf(stderr,"FAIL: Cannot load %zu element array\n",count);
}
}
else fprintf(stderr,"FAIL: Cant malloc() %zu element array\n",count);
#endif
}
fclose(tmp);

}
else fprintf(stderr,"FAIL: Cannot allocate temporary work file\n");

*Count = count; /* byproduct value that caller might find useful */
return array; /* either NULL (on failure) or pointer to array */
}
--
Lew Pitcher
"In Skills We Trust"
DFS
2024-06-25 18:09:23 UTC
Permalink
Post by Lew Pitcher
Post by Lew Pitcher
Post by DFS
47 185 99 74 202 118 78 203 264 207 19 17 34 167 148 54 297 271 118 245
294 188 140 134 251 188 236 160 48 189 228 94 74 27 168 275 144 245 178
108 152 197 125 185 63 272 239 60 242 56 4 235 244 144 69 195 32 4 54 79
193 282 173 267 8 40 241 152 285 119 259 136 15 83 21 78 55 259 137 297
15 141 232 259 285 300 153 16 4 207 95 197 188 267 164 195 7 104 47 291
1 opens the file
2 fscanf thru the file to count the number of data points
3 allocate memory
4 rewind and fscanf again to add the data to the int array
[snip]
Post by Lew Pitcher
You /could/ create a temporary, binary, file, and write the fscanf()'ed
values to it as part of the first loop. Once the first loop completes,
you rewind this temporary file, and load your integer array by reading
the (now converted to native integer format) values from that file.
Still two passes, but using fscanf() in only one of those passes.
[snip]
/*
The following code provides two examples of the approach I suggested.
Example 1: while counting input numbers, write temp file with int values
malloc() a buffer big enough for that count of int values
fread() the temp file into the malloc()'ed buffer
Note: conformant to ISO Standard C.
Example 2: while counting input numbers, write temp file with int values
mmap() the temp file, starting at the beginning, and sized to
include all the int values in the file.
Note: conformant to POSIX C extensions to ISO Standard C.
Note: compile with -DUSE_MMAP to obtain mmap() variant, otherwise
this will compile the malloc()/fread() variant
*/
#include <stdio.h>
#include <stdlib.h>
#ifdef USE_MMAP
#include <sys/mman.h>
#define BANNER "Example of array loading using mmap()"
#define FREEALLOC(x)
#else
#define BANNER "Example of array loading using malloc() and fread()"
#define FREEALLOC(x) free((x))
#endif
static int *LoadIntArray(FILE *fp, size_t *Count);
int main(void)
{
int status = EXIT_FAILURE, *array;
size_t count;
puts(BANNER);
if ((array = LoadIntArray(stdin,&count)))
{
printf("%zu elements loaded\n",count);
for (size_t index = 0; index < count; ++index)
printf("array[%3zu] == %d\n",index,array[index]);
FREEALLOC(array); /* if necessary, free() the malloc()'ed array */
status = EXIT_SUCCESS;
}
return status;
}
static int *LoadIntArray(FILE *fp,size_t *Count)
{
FILE *tmp;
int *array = NULL;
size_t count = 0;
if ((tmp = tmpfile()))
{
int buffer;
for (count = 0; fscanf(fp,"%d",&buffer) == 1; ++count)
fwrite(&buffer,sizeof buffer, 1,tmp);
if (count)
{
#ifdef USE_MMAP
/*
** USE mmap() to map temp_file data into process memory
*/
array = mmap(NULL,
count * sizeof *array,
PROT_READ,MAP_PRIVATE,
fileno(tmp),
0);
if (array == MAP_FAILED)
{
array = NULL;
fprintf(stderr,"FAIL: Cannot mmap %zu element array\n",count);
}
#else
/*
** USE malloc() to reserve a big enough heap-space buffer,
** then fread() the temp_file data into that buffer
*/
if ((array = malloc(count * sizeof *array)))
{
rewind(tmp);
if (fread(array,sizeof *array,count,tmp) != count)
{
free(array);
array = NULL;
fprintf(stderr,"FAIL: Cannot load %zu element array\n",count);
}
}
else fprintf(stderr,"FAIL: Cant malloc() %zu element array\n",count);
#endif
}
fclose(tmp);
}
else fprintf(stderr,"FAIL: Cannot allocate temporary work file\n");
*Count = count; /* byproduct value that caller might find useful */
return array; /* either NULL (on failure) or pointer to array */
}
$ gcc -Wall LewPitcher_readnums.c -o lprn
$ (no compile errors)
$ ./lprn nums.txt
Example of array loading using malloc() and fread()

(then it just hung)


$ gcc -Wall LewPitcher_readnums.c -o lprn -DUSE_MMAP
$ (no compile errors)
$ ./lprn nums.txt
Example of array loading using mmap()

(then it just hung)


Am I supposed to hardcode the filename in there somewhere?
Lew Pitcher
2024-06-25 18:11:10 UTC
Permalink
[snip]
Post by DFS
$ gcc -Wall LewPitcher_readnums.c -o lprn
$ (no compile errors)
$ ./lprn nums.txt
Example of array loading using malloc() and fread()
The program (both versions) take input from stdin.
Try
./lprn <nums.txt
--
Lew Pitcher
"In Skills We Trust"
Loading...