Post by Michael SFor starter, it looks like designers of fgets() did not believe in
their own motto about files being just streams of bytes.
They obviously did, which is exactly why they painstakingly preserved
the annoying line terminators in the returned data.
Post by Michael SI don't know the history, so, may be, the function was defined this way
for portability with systems where text files have special record-based
structure?
You are sliding into muddled thinking here.
Post by Michael SThen, everything about it feels inelegant.
A return value carries just 1 bit of information, success or failure.
Why would you assert a claim for which the standard library alone
is replete with counterexamples: getchar, malloc, getenv, pow, sin.
Did you mean /the/ return value (of fgets)?
Post by Michael SSo why did they encode this information in baroque way instead of
something obvious, 0 and 1?
Because you can express this concept:
char work_area[SIZE];
char *line;
while ((line = fgets(work_area, sizeof work_area, stream)))
{
/* process line */
}
The work_area just provides storage for the operation: line is the
returned line.
The loop would work even if fgets sometimes returned pointers that
are not the to first byte of work_area. It just so happens that
they always are.
It is meaningful to capture the returned value and work with
it as if it were distinct from the buffer.
Post by Michael SAppending zero at the end also feels like a hack, but it is necessary
because of the main problem.
Appending zero is necessary so that the result meets the definition
of a C character string, without which it cannot be passed into
string-manipulating functions like strlen.
Home-grown functions that resemble fgets, but forget to add a null
byte sometimes, are the subjects of security CVEs.
Post by Michael SAnd the main problem is: how the user is
supposed to figure out how many bytes were read?
Yes, how are they, if you take away the null byte?
Post by Michael SIn well-designed API this question should be answered in O(1) time.
In the context of C strings, that buys you almost nothing.
Even if you know the length, it's going to get measured numerous
more times.
It would be good if fgets nuked the terminating newline.
Many uses of fgets, after every operation, look for the newline
and nuke it, before doing anything else.
There is a nice idiom for that, by the way, which avoids an
temporary variable and if test:
line[strcspn(line, "\n")] = 0;
strcspn(line, "\n") calculates the length of the prefix of line
which consists of non-newlines. That value is precisely the
array index of the first newline, if there is one, or else
of the terminating null, if there isn't a newline. Either
way, you can clobber that with a newline.
Once you see the above, you will never do this again:
newline = strchr(line, '\n');
if (newline)
*newline = 0;
Post by Michael SWith fgets(), it can be answered in O(N) time when input is trusted to
contain no zeros.
We have decided in the C world that text does not contain zeros.
This has become so pervasive that the remaining naysayers can safely
regarded as part of a lunatic fringe.
Software that tries to support the presence of raw nulls in text is
actively harmful for security.
For instance, a piece of text with embedded nulls might have valid
overall syntax which makes it immune to an injection attack.
But when it is sent to another piece of software which interprets
the null as a terminator, the syntax is chopped in half, allowing
it to be completed by a malicious actor.
Post by Michael SWhen input is arbitrary, finding out the answer is
even harder and requires quirks.
When input is arbitrary, don't use fgets? It's for text.
Post by Michael SThe function foo() is more generic than fgets(). For use instead of
fgets() it should be accompanied by standard constant EOL_CHAR.
I am not completely satisfied with proposed solution. The API is
still less obvious than it could be. But it is much better than fgets().
If last_c is '\n', you're still writing the pesky newline that
the caller will often want to remove.
Adding a terminating null and returning a pointer to that null
would be better.
You could then call the operation again with the returned dst
pointer, and it would continue extending the string,
without obliterating the last character.
I'm sure I've seen a foo-like function in software before:
reading delimited by an arbitrary byte, with length signaling.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca