Discussion:
Baby X is bor nagain
(too old to reply)
Malcolm McLean
2024-06-11 09:13:13 UTC
Permalink
I've finally got Baby X (not the resource compiler, the Windows toolkit)
to link X11 on my Mac. And I can start work on it again. But it was far
from easyt to get it to link.

Can friendly people plesse dowload it and see if it compiles on other
platforms?
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
bart
2024-06-11 12:59:23 UTC
Permalink
Post by Malcolm McLean
I've finally got Baby X (not the resource compiler, the Windows toolkit)
to link X11 on my Mac. And I can start work on it again. But it was far
from easyt to get it to link.
Can friendly people plesse dowload it and see if it compiles on other
platforms?
In src\windows, I tried compiling both main.c and testbed.c (not using
any makefiles, just directly applying a compiler).

They both call a startbabyx() function which is incompatible with the
one defined in BabyX.h.

It seems they pass an extra Hinstance first parameter. If I switch from
WinMain() to main(), then testbed.c compiles - by itself. (Note that
Windows GUI apps don't need WinMain; they work just as well with main.)

I tried another module BBX_Canvas.h. gcc complained about a type
mismatch (but it is 14.1 which is stricter).

tcc said it couldn't find windowsx.h.

My mcc had problems with UINT32 which doesn't seem to be defined
anywhere that I could see.
Malcolm McLean
2024-06-11 13:35:10 UTC
Permalink
Post by bart
Post by Malcolm McLean
I've finally got Baby X (not the resource compiler, the Windows
toolkit) to link X11 on my Mac. And I can start work on it again. But
it was far from easyt to get it to link.
Can friendly people plesse dowload it and see if it compiles on other
platforms?
In src\windows, I tried compiling both main.c and testbed.c (not using
any makefiles, just directly applying a compiler).
They both call a startbabyx() function which is incompatible with the
one defined in BabyX.h.
It seems they pass an extra Hinstance first parameter. If I switch from
WinMain() to main(), then testbed.c compiles - by itself. (Note that
Windows GUI apps don't need WinMain; they work just as well with main.)
I tried another module BBX_Canvas.h. gcc complained about a type
mismatch (but it is 14.1 which is stricter).
tcc said it couldn't find windowsx.h.
My mcc had problems with UINT32 which doesn't seem to be defined
anywhere that I could see.
It's these little things which make all the difference. I doubt it needs
much to fix. But I'll need to be on Windows to do it. There were a
couple of tiny glitches getting it to compile on Mac, but the
mainproblemw as linking X11. I got very frustrated with it, and I'm
supposedly a professional programmer. I'm afraid the average Mac user
wouldn't have a clue.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
Kenny McCormack
2024-06-12 10:04:10 UTC
Permalink
In article <v49jqf$12ol8$***@dont-email.me>,
Malcolm McLean <***@gmail.com> wrote:
...
I'm afraid the average Mac user wouldn't have a clue.
That is true by definition. It is, after all, the computer for the rest of us.
--
"He is exactly as they taught in KGB school: an egoist, a liar, but talented - he
knows the mind of the wrestling-loving, under-educated, authoritarian-admiring
white male populous."
- Malcolm Nance, p59. -
Ben Bacarisse
2024-06-11 14:16:00 UTC
Permalink
I've finally got Baby X (not the resource compiler, the Windows toolkit) to
link X11 on my Mac. And I can start work on it again. But it was far from
easyt to get it to link.
Can friendly people plesse dowload it and see if it compiles on other
platforms?
Compiles and the test programs run on Ubuntu 24.04 (LTS).
--
Ben.
Malcolm McLean
2024-06-11 14:35:32 UTC
Permalink
Post by Ben Bacarisse
I've finally got Baby X (not the resource compiler, the Windows toolkit) to
link X11 on my Mac. And I can start work on it again. But it was far from
easyt to get it to link.
Can friendly people plesse dowload it and see if it compiles on other
platforms?
Compiles and the test programs run on Ubuntu 24.04 (LTS).
Oh brilliant. And, as the name implies, Linux is where the best users
will be. It was designed as wrapper over X11. But then I decided to port
to Windows, and that meant I bit off more than I could chew, because
it's just too hard to work with two machines at the same time. So Baby
X was abandoned for a long time.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
Ben Bacarisse
2024-06-11 23:34:43 UTC
Permalink
Post by Malcolm McLean
Post by Ben Bacarisse
I've finally got Baby X (not the resource compiler, the Windows toolkit) to
link X11 on my Mac. And I can start work on it again. But it was far from
easyt to get it to link.
Can friendly people plesse dowload it and see if it compiles on other
platforms?
Compiles and the test programs run on Ubuntu 24.04 (LTS).
Oh brilliant.
My installation may not be typical in that I have a lot of -dev packages
installed, but it will be similar to those of others who develop software
--
Ben.
Malcolm McLean
2024-06-11 23:50:53 UTC
Permalink
Post by Ben Bacarisse
Post by Malcolm McLean
Post by Ben Bacarisse
I've finally got Baby X (not the resource compiler, the Windows toolkit) to
link X11 on my Mac. And I can start work on it again. But it was far from
easyt to get it to link.
Can friendly people plesse dowload it and see if it compiles on other
platforms?
Compiles and the test programs run on Ubuntu 24.04 (LTS).
Oh brilliant.
My installation may not be typical in that I have a lot of -dev packages
installed, but it will be similar to those of others who develop software
I know.

Because I couldn't compile it for the Mac, and whilst I was working,
somehow I just never had the intiative to do it. But now I don't have
those calls on my time. And it was a complete nightmare trying to link a
program with X11 on the Mac, and you need to download an app from a
people caled XQuarz. And even then, I couldn't work out which path I
needed to pass to the linker to link, and it only works because I use
CMake's "find-package" function. And Baby X, as the name implies, is
meant to keep everything simple, and I'm just not achieving that.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
Bonita Montero
2024-06-11 16:02:26 UTC
Permalink
Post by Malcolm McLean
I've finally got Baby X (not the resource compiler, the Windows toolkit)
to link X11 on my Mac. And I can start work on it again. But it was far
from easyt to get it to link.
Can friendly people plesse dowload it and see if it compiles on other
platforms?
For large files it would be more convenient to have an .obj-output in
the proper format for Windows or Linux. I implemented a binary file to
char-array compiler myself and for lage files the compilation time was
totally intolerable and all the compilers I tested (g++, clang++, MSVC)
ran into out of memory conditiond sooner or later, depending on the
size of the char array.
I don't really see a necessity to have conversion for image formats
with such a tool as the conversion isn't needed very frequently and
you could save the image in the proper format yourself.
Malcolm McLean
2024-06-11 16:15:11 UTC
Permalink
Post by Bonita Montero
Post by Malcolm McLean
I've finally got Baby X (not the resource compiler, the Windows
toolkit) to link X11 on my Mac. And I can start work on it again. But
it was far from easyt to get it to link.
Can friendly people plesse dowload it and see if it compiles on other
platforms?
For large files it would be more convenient to have an .obj-output in
the proper format for Windows or Linux. I implemented a binary file to
char-array compiler myself and for lage files the compilation time was
totally intolerable and all the compilers I tested (g++, clang++, MSVC)
ran into out of memory conditiond sooner or later, depending on the
size of the char array.
I don't really see a necessity to have conversion for image formats
with such a tool as the conversion isn't needed very frequently and
you could save the image in the proper format yourself.
These are Baby programs. But they use a cut down GUI. So they need to
get fonts and images into the program somehow. And so Baby X does that
by converting to 32 bit C arrays which can be compiled and linked as
normal. And for that, you need a tool. Writing a tiff file decoder is
not a trivial exercise.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
Bonita Montero
2024-06-12 05:40:04 UTC
Permalink
Post by Malcolm McLean
These are Baby programs. But they use a cut down GUI. So they need to
get fonts and images into the program somehow. And so Baby X does that
by converting to 32 bit C arrays which can be compiled and linked as
normal. And for that, you need a tool. Writing a tiff file decoder is
not a trivial exercise.
I converted my code into sth. that produces a C-string as an output.
Printing that is still very fast, i.e. the files produced are written
with about 2.6GiB/s. But the problem is still that all compilers don't
parse large files but quit with an out of memory error. So having a
.obj output along with a small header file would be the best.

That's my program called cdump:

#include <iostream>
#include <fstream>
#include <charconv>
#include <span>
#include <vector>

using namespace std;

int main( int argc, char **argv )
{
using u = unsigned char;
if( argc < 4 )
{
cout << "usage: " << argv[0] << " infile symbol outfile" << endl;
return EXIT_FAILURE;
}
char const
*inFile = argv[1],
*symbol = argv[2],
*outFile = argv[3];
ifstream ifs;
ifs.exceptions( ifstream::failbit | ifstream::badbit );
ifs.open( inFile, ifstream::binary | ifstream::ate );
streampos spSize( ifs.tellg() );
if( spSize > (size_t)-1 )
{
cout << "file too large" << endl;
return EXIT_FAILURE;
}
size_t size = (size_t)spSize;
ifs.seekg( ifstream::beg );
union ndi { u c; ndi() {} };
vector<ndi> rawBytes( size );
span<u> bytes( &rawBytes.data()->c, rawBytes.size() );
ifs.read( (char *)bytes.data(), bytes.size() );
ofstream ofs;
ofs.exceptions( ofstream::failbit | ofstream::badbit );
ofs.open( outFile, ofstream::binary | ofstream::trunc );
vector<ndi> rawBuf( 1ull << 20 );
span<u> buf( &rawBuf.begin()->c, rawBuf.size() );
ofs << "const unsigned char " << symbol << "[" << size + 1 << "] = \n";
auto rd = bytes.begin();
auto wrt = buf.begin();
auto flush = [&]
{
ofs.write( (char *)buf.data(), wrt - buf.begin() );
wrt = buf.begin();
};
#if defined(_WIN32)
constexpr bool CRLF = true;
#else
constexpr bool CRLF = false;
#endif
while( rd != bytes.end() ) [[likely]]
{
size_t
remaining = bytes.end() - rd,
n = remaining > 12 ? 12 : remaining;
auto rowEnd = rd + n;
*wrt++ = '\t';
*wrt++ = '"';
do
{
*wrt++ = '\\';
*wrt++ = 'x';
auto toHex = []( u c ) -> u { return c + (c < 10 ? '0' : -10 + 'A'); };
*wrt++ = toHex( *rd >> 4 );
*wrt++ = toHex( *rd & 0xF );
} while( ++rd != bytes.end() && rd != rowEnd );
*wrt++ = '"';
if( rd == bytes.end() )
*wrt++ = ';';
if constexpr( CRLF )
*wrt++ = '\r';
*wrt++ = '\n';
if( buf.end() - wrt < 128 ) [[likely]]
flush();
}
flush();
}
David Brown
2024-06-12 07:01:58 UTC
Permalink
Post by Bonita Montero
Post by Malcolm McLean
These are Baby programs. But they use a cut down GUI. So they need to
get fonts and images into the program somehow. And so Baby X does that
by converting to 32 bit C arrays which can be compiled and linked as
normal. And for that, you need a tool. Writing a tiff file decoder is
not a trivial exercise.
I converted my code into sth. that produces a C-string as an output.
Printing that is still very fast, i.e. the files produced are written
with about 2.6GiB/s. But the problem is still that all compilers don't
parse large files but quit with an out of memory error. So having a
.obj output along with a small header file would be the best.
How big files are you talking about? In an earlier thread (which I
thought had beaten this topic to death), "xxd -i" include files were
fine to at least a few tens of megabytes with gcc. And it would be,
IMHO, absurd to have much bigger files than that embedded with your
executable in this manner. I can understand wanting some icons and a
few resource files in a PC executable, but if you have a lot of files or
big files then a single massive executable often does not make much
sense as the binary file.

If you /do/ want such a file, it is typically for making a portable
package that can be run directly without installing. But then you don't
mess around with inventing your own little pretend file systems, or
embedding the files manually, or using absurd ideas like XML text
strings. You use standard, well-established solutions and tools such
as AppImage on Linux or self-extracting zip files on Windows.
Malcolm McLean
2024-06-12 09:51:12 UTC
Permalink
Post by Bonita Montero
Post by Malcolm McLean
These are Baby programs. But they use a cut down GUI. So they need to
get fonts and images into the program somehow. And so Baby X does
that by converting to 32 bit C arrays which can be compiled and
linked as normal. And for that, you need a tool. Writing a tiff file
decoder is not a trivial exercise.
I converted my code into sth. that produces a C-string as an output.
Printing that is still very fast, i.e. the files produced are written
with about 2.6GiB/s. But the problem is still that all compilers don't
parse large files but quit with an out of memory error. So having a
.obj output along with a small header file would be the best.
How big files are you talking about?  In an earlier thread (which I
thought had beaten this topic to death), "xxd -i" include files were
fine to at least a few tens of megabytes with gcc.  And it would be,
IMHO, absurd to have much bigger files than that embedded with your
executable in this manner.  I can understand wanting some icons and a
few resource files in a PC executable, but if you have a lot of files or
big files then a single massive executable often does not make much
sense as the binary file.
If you /do/ want such a file, it is typically for making a portable
package that can be run directly without installing.  But then you don't
mess around with inventing your own little pretend file systems, or
embedding the files manually, or using absurd ideas like XML text
strings.   You use standard, well-established solutions and tools such
as AppImage on Linux or self-extracting zip files on Windows.
You don't get what Baby X is all about.

These solutions will not work for the audience I am trying to target.
Baby X is clean, portable, and simple. As much as I can make it. And
it's meant to be easy for people who are just beginning programmers to use.

But the main focus now is help and documentation. The improved ls
command is now in the shell, and the next task is to improve the "help"
command, at the same time as writing more docs. The two tasks naturally
go together, and the website is beginning to gel.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
David Brown
2024-06-12 13:20:59 UTC
Permalink
Post by Malcolm McLean
Post by Bonita Montero
Post by Malcolm McLean
These are Baby programs. But they use a cut down GUI. So they need
to get fonts and images into the program somehow. And so Baby X does
that by converting to 32 bit C arrays which can be compiled and
linked as normal. And for that, you need a tool. Writing a tiff file
decoder is not a trivial exercise.
I converted my code into sth. that produces a C-string as an output.
Printing that is still very fast, i.e. the files produced are written
with about 2.6GiB/s. But the problem is still that all compilers don't
parse large files but quit with an out of memory error. So having a
.obj output along with a small header file would be the best.
How big files are you talking about?  In an earlier thread (which I
thought had beaten this topic to death), "xxd -i" include files were
fine to at least a few tens of megabytes with gcc.  And it would be,
IMHO, absurd to have much bigger files than that embedded with your
executable in this manner.  I can understand wanting some icons and a
few resource files in a PC executable, but if you have a lot of files
or big files then a single massive executable often does not make much
sense as the binary file.
If you /do/ want such a file, it is typically for making a portable
package that can be run directly without installing.  But then you
don't mess around with inventing your own little pretend file systems,
or embedding the files manually, or using absurd ideas like XML text
strings.   You use standard, well-established solutions and tools such
as AppImage on Linux or self-extracting zip files on Windows.
You don't get what Baby X is all about.
True.
Post by Malcolm McLean
These solutions will not work for the audience I am trying to target.
Baby X is clean, portable, and simple. As much as I can make it. And
it's meant to be easy for people who are just beginning programmers to use.
Who are these people? And why would they care if it is "clean",
whatever you mean by that? Why would they care about portability? The
great majority of developers spend most of their time targeting a single
platform. Beginners are unlikely to have more than one OS to work with.
I think cross-platform portability between Linux and Windows is
usually a good thing, but not a big issue for beginners. Macs and other
systems are irrelevant in practice. And no one - apart from you and a
guy called Paul - have the slightest interest in making gui toolkits in
string C89/C90. To the nearest percent, no one, beginner or expert,
writes gui programs in C.

"Simple" is good. Big toolkits like GTK, wxWidgets or Qt can easily be
overwhelming for beginners. But too simple and limited is not helpful
either - once the beginner has made their "Hello, world" gui with an OK
button, they need more. Why should anyone pick Baby X rather than, say,
FLTK?
Post by Malcolm McLean
But the main focus now is help and documentation. The improved ls
command is now in the shell, and the next task is to improve the "help"
command, at the same time as writing more docs. The two tasks naturally
go together, and the website is beginning to gel.
To be clear here, I do not want to discourage you from your project in
any way. I am trying to ask questions to make you think, and to focus
appropriately. It seems to me that Baby X is your real passion here and
the project that you think will be useful to others (regardless of what
I may think of it). I believe your "Filesystem XML" and even more so,
your shell and utilities like "ls", are a distraction and a rabbit hole.
It does not make sense to spend months developing that to save the
user a couple of seconds packing or unpacking the XML file to normal files.

Help, documentation and examples are going to be much more valuable to
users.
Bonita Montero
2024-06-12 11:07:41 UTC
Permalink
How big files are you talking about?  In an earlier thread (which I
thought had beaten this topic to death), "xxd -i" include files were
fine to at least a few tens of megabytes with gcc.  And it would be,
IMHO, absurd to have much bigger files than that embedded with your
executable ...
I've also no necissity for such large files but I still can imagine
that others have it. A tool which outputs an .obj-file as well as a
small .h-file with an exteral declaration would be nost appropriate.
bart
2024-06-12 11:27:13 UTC
Permalink
Post by Bonita Montero
I converted my code into sth. that produces a C-string as an output.
Printing that is still very fast, i.e. the files produced are written
with about 2.6GiB/s. But the problem is still that all compilers don't
parse large files but quit with an out of memory error. So having a
.obj output along with a small header file would be the best.
How big files are you talking about?  In an earlier thread (which I
thought had beaten this topic to death), "xxd -i" include files were
fine to at least a few tens of megabytes with gcc.
What was never discussed is why xxd (and the faster alternates that
some posted to do that task more quickly), produces lists of numbers anyway.

Why not strings containing the embedded binary data?
  And it would be,
IMHO, absurd to have much bigger files than that embedded with your
executable in this manner.
BM complained that some files expressed as xxd-like output were causing
problems with compilers.

I suggested using a string representation. While the generated text file
is not much smaller, it is seen by the compiler as one string
expression, instead of millions of small expressions. Or at least,
1/20th the number if you split the strings across lines.

It's a no-brainer. Why spend 10 times as long on processing such data?
Bonita Montero
2024-06-12 12:35:31 UTC
Permalink
Post by bart
I suggested using a string representation. While the generated text
file is not much smaller, it is seen by the compiler as one string
expression, instead of millions of small expressions. Or at least,
1/20th the number if you split the strings across lines.
I implemented that with my second code but the compilers are still
limited with that.
bart
2024-06-12 13:13:24 UTC
Permalink
Post by Bonita Montero
Post by bart
I suggested using a string representation. While the generated text
file  is not much smaller, it is seen by the compiler as one string
expression, instead of millions of small expressions. Or at least,
1/20th the number if you split the strings across lines.
I implemented that with my second code but the compilers are still
limited with that.
What size of file are we talking about, and how much memory in your machine?

I need another test with a 55MB test file. These are the results:

{65,66,67,... "\x41\x42\x43...

g++ 284 seconds 13 seconds
tcc 20 seconds 2.5 seconds

A 20x slowdown suggests problems exceeding memory.

Do you have a test file that does work in either format? If so how much
difference was there between them? If very little, then you're doing
something wrong.

(I did one more test with my language which directly imported the 55MB
binary without either of those intermediate textual formats. It took
under 0.7 seconds.

That needs built-in language support, but using a more apt textual
format is a sensible first step.)
Bonita Montero
2024-06-12 13:43:02 UTC
Permalink
          {65,66,67,...      "\x41\x42\x43...
g++       284 seconds        13   seconds
tcc        20 seconds         2.5 seconds
I just tested this with my personal backup.
bart
2024-06-12 13:52:33 UTC
Permalink
Post by Bonita Montero
           {65,66,67,...      "\x41\x42\x43...
g++       284 seconds        13   seconds
tcc        20 seconds         2.5 seconds
I just tested this with my personal backup.
I meant to say 'I did' rather than 'I need'.

I keep forgetting that this forum doesn't allow editing.
David Brown
2024-06-12 13:46:44 UTC
Permalink
Post by Bonita Montero
I converted my code into sth. that produces a C-string as an output.
Printing that is still very fast, i.e. the files produced are written
with about 2.6GiB/s. But the problem is still that all compilers don't
parse large files but quit with an out of memory error. So having a
.obj output along with a small header file would be the best.
How big files are you talking about?  In an earlier thread (which I
thought had beaten this topic to death), "xxd -i" include files were
fine to at least a few tens of megabytes with gcc.
What was never discussed is why xxd  (and the faster alternates that
some posted to do that task more quickly), produces lists of numbers anyway.
Why not strings containing the embedded binary data?
There are some cases where lists of numbers would be useable while
strings would not be. But I suppose the opposite will apply too.

While string literals can contain embedded null characters (a string
literal in C does not have to be a string), I don't feel as comfortable
using a messy string literal full of escape codes for binary data. A
list of hex numbers, with appropriate line lengths, is also vastly
neater if you need to look at the data (or accidentally open it in a
text editor).

I also don't imagine that string literals would be much faster for
compilation, at least for file sizes that I think make sense. And I
have heard (it could be wrong) that MSVC has severe limits on the size
of string literals, though it is not a compiler I ever use myself.

But of course, if you prefer string literals, use them. I don't think
xxd can generate them, but it should not be hard to write a program that
does.
  And it would be, IMHO, absurd to have much bigger files than that
embedded with your executable in this manner.
BM complained that some files expressed as xxd-like output were causing
problems with compilers.
I suggested using a string representation. While the generated text file
is not much smaller, it is seen by the compiler as one string
expression, instead of millions of small expressions. Or at least,
1/20th the number if you split the strings across lines.
It's a no-brainer. Why spend 10 times as long on processing such data?
10 times negligible is still negligible.

But to be clear, I'd still rate a string literal like this as vastly
nicer than some XML monstrosity!
Michael S
2024-06-12 21:29:33 UTC
Permalink
On Wed, 12 Jun 2024 15:46:44 +0200
Post by David Brown
I also don't imagine that string literals would be much faster for
compilation, at least for file sizes that I think make sense.
Just shows how little do you know about internals of typical compiler.
Which, by itself, is o.k. What is not o.k. is that with your level of
knowledge you have a nerve to argue vs bart that obviously knows a lot
more.
Post by David Brown
And I
have heard (it could be wrong) that MSVC has severe limits on the
size of string literals, though it is not a compiler I ever use
myself.
Citation, please..
Malcolm McLean
2024-06-12 22:22:42 UTC
Permalink
Post by Michael S
On Wed, 12 Jun 2024 15:46:44 +0200
Post by David Brown
I also don't imagine that string literals would be much faster for
compilation, at least for file sizes that I think make sense.
Just shows how little do you know about internals of typical compiler.
Which, by itself, is o.k. What is not o.k. is that with your level of
knowledge you have a nerve to argue vs bart that obviously knows a lot
more.
Post by David Brown
And I
have heard (it could be wrong) that MSVC has severe limits on the
size of string literals, though it is not a compiler I ever use
myself.
Citation, please..
When I was writing Crossword Designer I had a massive array containing a
lot of rare words which Visual Studio choked on, but it was possible to
set a flag. However the strings themselves wereall veru short./
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
David Brown
2024-06-13 11:53:54 UTC
Permalink
Post by Michael S
On Wed, 12 Jun 2024 15:46:44 +0200
Post by David Brown
I also don't imagine that string literals would be much faster for
compilation, at least for file sizes that I think make sense.
Just shows how little do you know about internals of typical compiler.
Which, by itself, is o.k. What is not o.k. is that with your level of
knowledge you have a nerve to argue vs bart that obviously knows a lot
more.
I know more than most C programmers about how certain C compilers work,
and what works well with them, and what is relevant for them - though I
certainly don't claim to know everything. Obviously Bart knows vastly
more about how /his/ compiler works. He also tends to do testing with
several small and odd C compilers, which can give interesting results
even though they are of little practical relevance for real-world C
development work.

Testing a 1 MB file of random data, gcc -O2 took less than a second to
compile it. One megabyte is about the biggest size I would think makes
sense to embed directly in C code unless you are doing something very
niche - usually if you need that much data, you'd be better off with
separate files and standardised packaging systems like zip files,
installer setup.exe builds, or that kind of thing.

Using string literals, the compile time was shorter, but when you are
already below a second, it's all just irrelevant noise.

For much bigger files, string literals are likely to be faster for
compilation for gcc because the compiler does not track as much
information (for use in diagnostic messages). But it makes no
difference to real world development.
Post by Michael S
Post by David Brown
And I
have heard (it could be wrong) that MSVC has severe limits on the
size of string literals, though it is not a compiler I ever use
myself.
Citation, please..
<https://letmegooglethat.com/?q=msvc+string+literal+length+limit>

Actually, I think it was from Bart that I first heard that MSVC has
limitations on its string literal lengths, but I could well be
misremembering that. I am confident, however, that it was here in
c.l.c., as MSVC is not a tool I have used myself.

<https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp>

It seems that version 17.0 has removed the arbitrary limits, while
before that it was limited to 65K in their C++ compiler.

For the MSVC C compiler, I see this:

<https://learn.microsoft.com/en-us/cpp/c-language/maximum-string-length>

Each individual string is up to 2048 bytes, which can be concatenated to
a maximum of 65K in total.

I see other links giving different values, but I expect the MS ones to
be authoritative. It is possible that newer versions of their C
compiler have removed the limit, just as for their C++ compiler, but it
was missing from that webpage.

(And I noticed also someone saying that MSVC is 70x faster at using
string literals compared to lists of integers for array initialisation.)
bart
2024-06-13 13:46:32 UTC
Permalink
Post by David Brown
Post by Michael S
On Wed, 12 Jun 2024 15:46:44 +0200
Post by David Brown
I also don't imagine that string literals would be much faster for
compilation, at least for file sizes that I think make sense.
Just shows how little do you know about internals of typical compiler.
Which, by itself, is o.k. What is not o.k. is that with your level of
knowledge you have a nerve to argue vs bart that obviously knows a lot
more.
I know more than most C programmers about how certain C compilers work,
and what works well with them, and what is relevant for them - though I
certainly don't claim to know everything.  Obviously Bart knows vastly
more about how /his/ compiler works.  He also tends to do testing with
several small and odd C compilers, which can give interesting results
even though they are of little practical relevance for real-world C
development work.
Testing a 1 MB file of random data, gcc -O2 took less than a second to
compile it.  One megabyte is about the biggest size I would think makes
sense to embed directly in C code unless you are doing something very
niche - usually if you need that much data, you'd be better off with
separate files and standardised packaging systems like zip files,
installer setup.exe builds, or that kind of thing.
Here are some tests embedding a 1.1 MB binary on my machine:

Numbers One string

gcc 14.1 -O0 3.2 (0.2) 0.4 (0.2) Seconds

tcc 0.4 (0.03) 0.07 (0.03)

Using 'One string' makes gcc as fast as Tiny C working with 'Numbers'!

The figures in brackets are the build times for hello.c, to better
appreciate the differences.

Including those overheads, 'One string' makes gcc 8 times as fast as
with 'Numbers'. Excluded those overheads, and it is 15 times as fast
(3.0 vs 0.2).

For comparions, here is the timing for my non-C compiler using direct
embedding:

mm 0.05 (0.03)

The extra time compared with 'hello' is 20ms; tcc was 370/40ms, and gcc
was 3000/200ms.
Post by David Brown
Using string literals, the compile time was shorter, but when you are
already below a second, it's all just irrelevant noise.
My machine is slower than yours. It's not anyway just about one machine
and one program. You're choosing to spend 10 times as long to do a task,
using resources that could be used for other processes, and using extra
power.

But if you are creating a tool for N other people to use who may be
running it M times a day on data of size X, you can't just dismiss these
considerations. You don't know how far people will push the operating
limits of your tool.
Post by David Brown
Each individual string is up to 2048 bytes, which can be concatenated to
a maximum of 65K in total.
I see other links giving different values, but I expect the MS ones to
be authoritative.  It is possible that newer versions of their C
compiler have removed the limit, just as for their C++ compiler, but it
was missing from that webpage.
(And I noticed also someone saying that MSVC is 70x faster at using
string literals compared to lists of integers for array initialisation.)
That doesn't sound unreasonable.

Note that it is not necessary to use one giant string; you can chop it
up into smaller strings, say with one line's worth of values per string,
and still get most of the benefits. It's just a tiny bit more fiddly to
generate the strings.

Within my compiler, each single number takes a 64-byte record to
represent. So 1MB of data takes 64MB, while a 1MB string takes one
64-byte record plus the 1MB of the string data.

Then there are the various type analysis and other passes that have to
be done a million times rather then once. I'd imagine that compilers
like gcc do a lot more.
tTh
2024-06-13 15:11:57 UTC
Permalink
Post by bart
Note that it is not necessary to use one giant string; you can chop it
up into smaller strings, say with one line's worth of values per string,
and still get most of the benefits. It's just a tiny bit more fiddly to
generate the strings.
And what about the ending '\0' of all those small strings ?
--
+---------------------------------------------------------------------+
| https://tube.interhacker.space/a/tth/video-channels |
+---------------------------------------------------------------------+
bart
2024-06-13 15:32:01 UTC
Permalink
Post by bart
Note that it is not necessary to use one giant string; you can chop it
up into smaller strings, say with one line's worth of values per
string, and still get most of the benefits. It's just a tiny bit more
fiddly to generate the strings.
   And what about the ending '\0' of all those small strings ?
What about it? If you specify the bounds of the char array, then a
terminator won't be added.

But I've now realised lots of shorter strings will make it awkward to
define the data (now a table of chars, with the last row being partially
full, making an exact size tricky).

It can only really work if the separate strings are concatenated into
one big string. This may still run into string length limitations, but
it depends on whether the limitation applies to individual strings
(which is OK), or to the sum of all the strings (which isn't).
Bonita Montero
2024-06-14 06:47:30 UTC
Permalink
Post by bart
What about it? If you specify the bounds of the char array, then a
terminator won't be added.
This works in C but not in C++. With g++ you've to specify -fpermissive
to make that work also in C++.
Michael S
2024-06-13 16:13:51 UTC
Permalink
On Thu, 13 Jun 2024 14:46:32 +0100
Post by bart
Within my compiler, each single number takes a 64-byte record to
represent. So 1MB of data takes 64MB, while a 1MB string takes one
64-byte record plus the 1MB of the string data.
Then there are the various type analysis and other passes that have
to be done a million times rather then once. I'd imagine that
compilers like gcc do a lot more.
For gcc up to certain limit I measured ~160 bytes per number.
After that certain very big limit (probably 64M numbers) gcc appears
to switch into more economical mode - ~112 bytes per number. At ~300M
numbers it appears to become yet more economical, but still above 100
bytes per number. 400M number - 100 bytes per number. Going further
became quite time consuming so I gave up.
Michael S
2024-06-13 14:43:54 UTC
Permalink
On Thu, 13 Jun 2024 13:53:54 +0200
Post by David Brown
Post by Michael S
On Wed, 12 Jun 2024 15:46:44 +0200
Post by David Brown
I also don't imagine that string literals would be much faster for
compilation, at least for file sizes that I think make sense.
Just shows how little do you know about internals of typical
compiler. Which, by itself, is o.k. What is not o.k. is that with
your level of knowledge you have a nerve to argue vs bart that
obviously knows a lot more.
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
Post by David Brown
Testing a 1 MB file of random data, gcc -O2 took less than a second
to compile it.
Somewhat more than a second on less modern hardware. Enough for me to
feel that compilation is not instant.
But 1 MB is just an arbitrary number. For 20 MB everybody would feel
the difference. And for 50 MB few people would not want it to be much
faster.
Post by David Brown
One megabyte is about the biggest size I would think
makes sense to embed directly in C code unless you are doing
something very niche - usually if you need that much data, you'd be
better off with separate files and standardised packaging systems
like zip files, installer setup.exe builds, or that kind of thing.
Using string literals, the compile time was shorter, but when you are
already below a second, it's all just irrelevant noise.
For much bigger files, string literals are likely to be faster for
compilation for gcc because the compiler does not track as much
information
And that is sort of the thing that bart knows immediately. Unlike you
and me.
Post by David Brown
(for use in diagnostic messages).
But it makes no
difference to real world development.
Post by Michael S
Post by David Brown
And I
have heard (it could be wrong) that MSVC has severe limits on the
size of string literals, though it is not a compiler I ever use
myself.
Citation, please..
<https://letmegooglethat.com/?q=msvc+string+literal+length+limit>
Actually, I think it was from Bart that I first heard that MSVC has
limitations on its string literal lengths, but I could well be
misremembering that. I am confident, however, that it was here in
c.l.c., as MSVC is not a tool I have used myself.
<https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp>
It seems that version 17.0 has removed the arbitrary limits, while
before that it was limited to 65K in their C++ compiler.
<https://learn.microsoft.com/en-us/cpp/c-language/maximum-string-length>
Each individual string is up to 2048 bytes, which can be concatenated
to a maximum of 65K in total.
I see other links giving different values, but I expect the MS ones
to be authoritative. It is possible that newer versions of their C
compiler have removed the limit, just as for their C++ compiler, but
it was missing from that webpage.
(And I noticed also someone saying that MSVC is 70x faster at using
string literals compared to lists of integers for array
initialisation.)
I didn't know it, thanks.
It means that string method can't be used universally.

Still, for C (as opposed to C++), limitation of compiler can be tricked
around by declaring container as a struct. E.g. for array of length
1234567

struct {
char bulk[123][10000];
char tail[4567];
} bar = {
{
"init0-to-99999" ,
"init10000-to-199999" ,
....
},
"init123400-to1234566"
};

For that I'd expecte compilation speed almost as fast as of one string.
David Brown
2024-06-14 16:43:59 UTC
Permalink
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
Post by David Brown
Post by Michael S
On Wed, 12 Jun 2024 15:46:44 +0200
Post by David Brown
I also don't imagine that string literals would be much faster for
compilation, at least for file sizes that I think make sense.
Just shows how little do you know about internals of typical
compiler. Which, by itself, is o.k. What is not o.k. is that with
your level of knowledge you have a nerve to argue vs bart that
obviously knows a lot more.
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
I know enough about compiler design and implementation to have a pretty
good idea about many parts of it, though certainly not all of it. To be
clear - theoretical knowledge is not the same as practical experience.
I realise that array initialisation by a sequence of numbers has
overhead (this was discussed at length in a previous thread), and that a
string literal will likely have less. But the difference is not
particularly significant for realistic file sizes.
Post by Michael S
Post by David Brown
Testing a 1 MB file of random data, gcc -O2 took less than a second
to compile it.
Somewhat more than a second on less modern hardware. Enough for me to
feel that compilation is not instant.
But 1 MB is just an arbitrary number. For 20 MB everybody would feel
the difference. And for 50 MB few people would not want it to be much
faster.
But what would be the point of trying to embed such files in the first
place? There are much better ways of packing large files. You can
always increase sizes for things until you get problems or annoying
slowdowns, but that does not mean that will happen in practical situations.

And even if you /did/ want to embed a 20 MB file, and even if that took
20 seconds, so what? Unless you have a masochistic build setup, such as
refusing to use "make" or insisting that everything goes in one C file
that is re-compiled all the time, that 20 second compile is a one-off
time cost on the rare occasion when you change the big binary file.

Now, I am quite happy to agree that faster is better, all other things
being equal. And convenience and simplicity is better. Once the
compilers I use support #embed, if I need to embed a file and I don't
need anything more than an array initialisation, I'll use #embed. Until
then, 5 seconds writing an "xxd -i" line in a makefile and a 20 second
compile (if it took that long) beats 5 minutes writing a Python script
to generate string literals even if the compile is now 2 seconds.
Post by Michael S
Post by David Brown
One megabyte is about the biggest size I would think
makes sense to embed directly in C code unless you are doing
something very niche - usually if you need that much data, you'd be
better off with separate files and standardised packaging systems
like zip files, installer setup.exe builds, or that kind of thing.
Using string literals, the compile time was shorter, but when you are
already below a second, it's all just irrelevant noise.
For much bigger files, string literals are likely to be faster for
compilation for gcc because the compiler does not track as much
information
And that is sort of the thing that bart knows immediately. Unlike you
and me.
I can't answer for /you/, but /I/ knew this - look through the
discussions on #embed if you like. To be fair, I did not consider
string literals very much there - they are a pretty pointless
alternative when #embed is coming and integer sequences are fast enough
for any realistic use.
Post by Michael S
Post by David Brown
Post by Michael S
Post by David Brown
And I
have heard (it could be wrong) that MSVC has severe limits on the
size of string literals, though it is not a compiler I ever use
myself.
<https://learn.microsoft.com/en-us/cpp/c-language/maximum-string-length>
Each individual string is up to 2048 bytes, which can be concatenated
to a maximum of 65K in total.
I see other links giving different values, but I expect the MS ones
to be authoritative. It is possible that newer versions of their C
compiler have removed the limit, just as for their C++ compiler, but
it was missing from that webpage.
(And I noticed also someone saying that MSVC is 70x faster at using
string literals compared to lists of integers for array
initialisation.)
I didn't know it, thanks.
I didn't know the details either, until you challenged me and I looked
them up!
Post by Michael S
It means that string method can't be used universally.
That depends on the state of the current MSVC compiler - and perhaps
other compilers. The C standards only require support for 4095
characters in a string literal. (They also only require support for
objects up to 32767 bytes in length - and for that size, any method
should be fast.)
Post by Michael S
Still, for C (as opposed to C++), limitation of compiler can be tricked
around by declaring container as a struct. E.g. for array of length
1234567
struct {
char bulk[123][10000];
char tail[4567];
} bar = {
{
"init0-to-99999" ,
"init10000-to-199999" ,
....
},
"init123400-to1234566"
};
For that I'd expecte compilation speed almost as fast as of one string.
I suppose so, but it is not pretty!
bart
2024-06-14 18:24:04 UTC
Permalink
Post by David Brown
Post by Michael S
Somewhat more than a second on less modern hardware. Enough for me to
feel that compilation is not instant.
But 1 MB is just an arbitrary number. For 20 MB everybody would feel
the difference. And for 50 MB few people would not want it to be much
faster.
But what would be the point of trying to embed such files in the first
place?  There are much better ways of packing large files.
I remember complaining that some tool installations were bloated at
100MB, 500MB, 1000MB or beyond, and your attitude was So what, since
there is now almost unlimited storage.

But now of course, it's Why would someone ever want to do X with such a
large file! Suddenly large files are undesirable when it suits you.
Post by David Brown
  You can
always increase sizes for things until you get problems or annoying
slowdowns, but that does not mean that will happen in practical situations.
And even if you /did/ want to embed a 20 MB file, and even if that took
20 seconds, so what?  Unless you have a masochistic build setup, such as
refusing to use "make" or insisting that everything goes in one C file
that is re-compiled all the time, that 20 second compile is a one-off
time cost on the rare occasion when you change the big binary file.
Now, I am quite happy to agree that faster is better, all other things
being equal.  And convenience and simplicity is better.  Once the
compilers I use support #embed, if I need to embed a file and I don't
need anything more than an array initialisation, I'll use #embed.  Until
then, 5 seconds writing an "xxd -i" line in a makefile and a 20 second
compile (if it took that long) beats 5 minutes writing a Python script
to generate string literals even if the compile is now 2 seconds.
That's a really bad attitude. It partly explains why such things as
#embed take so long to get added.

I've heard lots of horror stories elsewhere about projects taking
minutes, tens of minutes or even hours to build.

How much of that is due to attitudes like yours? You've managed to find
ways of working around speed problems, by throwing hardware resources at
it (fast processors, loads of memory, multiple cores, SSD, RAM-disk), or
using ingenuity in *avoiding* having to compile stuff as much as
possible. Or maybe the programs you build aren't that big.

But that is not how you fix such problems. Potential bottlenecks should
be identified and investigated.

/Could/ it be faster? /Could/ it use less memory? /Could/ a simple
language extension help out?

I can understand you having little interest in it because you just use
the tools that available and can't do much about it, but it should be
somebody's job to keep on top of this stuff.
Post by David Brown
Until
then, 5 seconds writing an "xxd -i" line in a makefile and a 20 second
compile (if it took that long) beats 5 minutes writing a Python script
to generate string literals even if the compile is now 2 seconds.
So now you need 'xxd'. And 'Python'. And 'make'. When it could all be
done effortlessly, more easily and 100 times faster within the language
without all that mucking about.
Post by David Brown
Unless you have a masochistic build setup, such as
refusing to use "make" or insisting that everything goes in one C file
that is re-compiled all the time,
When you write such tools, you don't know what people are going to do
with them, how much they will push their limits. And you can't really
dictate how they develop or build their software.
David Brown
2024-06-15 10:35:37 UTC
Permalink
Post by bart
Post by David Brown
Post by Michael S
Somewhat more than a second on less modern hardware. Enough for me to
feel that compilation is not instant.
But 1 MB is just an arbitrary number. For 20 MB everybody would feel
the difference. And for 50 MB few people would not want it to be much
faster.
But what would be the point of trying to embed such files in the first
place?  There are much better ways of packing large files.
I remember complaining that some tool installations were bloated at
100MB, 500MB, 1000MB or beyond, and your attitude was So what, since
there is now almost unlimited storage.
We all remember that :-)
Post by bart
But now of course, it's Why would someone ever want to do X with such a
large file! Suddenly large files are undesirable when it suits you.
It's a /completely/ different situation. Anyone doing development work
is going to have a machine with lots of space - 1 GB is peanuts for
space on a disk. But that does not mean it makes sense to have a 1 GB
initialised array in an executable!

Consider /why/ you might want to include a binary blob inside an
executable. I can think of a number of scenarios :

1. You want a "setup.exe" installation file. Then you use appropriate
tools for the job, you don't use inclusion in a C file.

2. You want a "portable" version of a big program - portable apps on
Windows, AppImage on Linux, or something like that. Then you use
appropriate tools for the job so that the application can access the
enclosed files as /normal/ files (not some weird "XML Filesystem" nonsense).

3. You are targeting a platform where there is no big OS and no
filesystem, and everything is within a single statically-linked binary.
Then embedded files in C arrays are a good solution, but your files are
always small because your system is small.

4. You want to include a few "resources" like icons or images in your
executable, because you don't need much and it makes the results neater.
Then you use some kind of "resource compiler", such as has been used
on Windows for decades.

I'm sure there are a few other niche cases where the convenience of a
single executable file is more important than the inconvenience of not
being able to access the files with normal file operations. Even then,
it's unlikely that they will be big files.



To give an analogy, consider books. In a home, it's no problem having a
set of bookshelves with hundreds of books on them - that's your disk
storage. It is also sometimes convenient to have books packed together
in single units, boxes, even though you need to unpack them to get to
the books - that's your setup.exe or AppImage files. And sometimes it
is nice to have a few /small/ books inside one binding, such as a a
trilogy in one binding - that's your embedded files. But no one wants
the complete Encyclopedia Britannica in one binding.
Post by bart
Post by David Brown
  You can always increase sizes for things until you get problems or
annoying slowdowns, but that does not mean that will happen in
practical situations.
And even if you /did/ want to embed a 20 MB file, and even if that
took 20 seconds, so what?  Unless you have a masochistic build setup,
such as refusing to use "make" or insisting that everything goes in
one C file that is re-compiled all the time, that 20 second compile is
a one-off time cost on the rare occasion when you change the big
binary file.
Now, I am quite happy to agree that faster is better, all other things
being equal.  And convenience and simplicity is better.  Once the
compilers I use support #embed, if I need to embed a file and I don't
need anything more than an array initialisation, I'll use #embed.
Until then, 5 seconds writing an "xxd -i" line in a makefile and a 20
second compile (if it took that long) beats 5 minutes writing a Python
script to generate string literals even if the compile is now 2 seconds.
That's a really bad attitude. It partly explains why such things as
#embed take so long to get added.
Using the best tool available for the job, and using a better tool if
one becomes available, is a "bad attitude" ?

Or did you mean it is a "bad attitude" to concentrate on things that are
important and make a real difference, instead of improving on something
that was never really a big issue in the first place?
Post by bart
I've heard lots of horror stories elsewhere about projects taking
minutes, tens of minutes or even hours to build.
I agree - some kinds of builds take a /long/ time. Embedding binary
blobs has absolutely nothing to do with it. Indeed, long build times
are often the result of trying to put too much in one build rather than
splitting things up in separate files and libraries. (Sometimes such
big builds are justified, such as for large programs with very large
user bases.)
Post by bart
How much of that is due to attitudes like yours? You've managed to find
ways of working around speed problems, by throwing hardware resources at
it (fast processors, loads of memory, multiple cores, SSD, RAM-disk), or
using ingenuity in *avoiding* having to compile stuff as much as
possible. Or maybe the programs you build aren't that big.
You are joking, right? Or trolling?

(I'm snipping the rest, because if it is not trolling, it would take far
too long to explain to you how the software development world works for
everyone else.)
James Kuyper
2024-06-17 06:22:33 UTC
Permalink
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers. In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Kaz Kylheku
2024-06-17 07:30:44 UTC
Permalink
Post by James Kuyper
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers. In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Most programmers use Javascript and Python, which follow Bart's
priorities. Fast, invisible compilation to some kind of byte code (plus
possibly later JIT), slow execution time.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Michael S
2024-06-17 09:11:05 UTC
Permalink
On Mon, 17 Jun 2024 07:30:44 -0000 (UTC)
Post by Kaz Kylheku
Post by James Kuyper
The problem is that Bart's compiler is VERY unusual. It's
customized for his use, and he has lots of quirks in the way he
thinks compilers should work, which are very different from those
of most other programmers. In particular, compilation speed is very
important to him, while execution speed is almost completely
unimportant, which is pretty much the opposite of the way most
programmers prioritize those things.
Most programmers use Javascript and Python, which follow Bart's
priorities. Fast, invisible compilation to some kind of byte code
(plus possibly later JIT), slow execution time.
O.T.
My understanding is that Javascript and "default" Python are not quite
in the same boat w.r.t. speed. If we measure speed of execution of
program that does not spend 99% of the time in library functions
written in lower-level languages then Javascript would be between C
and Python, probably closer to former on logarithmic scale. At least,
for long-running computations.
That's my impression, I never did measurements.
Michael S
2024-06-17 09:25:27 UTC
Permalink
On Mon, 17 Jun 2024 07:30:44 -0000 (UTC)
Post by Kaz Kylheku
Post by James Kuyper
The problem is that Bart's compiler is VERY unusual. It's
customized for his use, and he has lots of quirks in the way he
thinks compilers should work, which are very different from those
of most other programmers. In particular, compilation speed is very
important to him, while execution speed is almost completely
unimportant, which is pretty much the opposite of the way most
programmers prioritize those things.
Most programmers use Javascript and Python, which follow Bart's
priorities. Fast, invisible compilation to some kind of byte code
(plus possibly later JIT), slow execution time.
I'd dare to say that most programmers care about speed of compilation
more than they care about speed of execution even (or especially) when
they use "visible" compilation processes. Except when compilation is
already very fast.
BTW, my impression was that Bart's 'C' compiler uses 'visible'
compilation.

Then again, neither speed of compilation nor speed of execution are top
priorities for most pros. My guess is that #1 priority is conformance
with co-workers/employer, #2 is convenient IDE, preferably integrated
with debugger, #3 is support, but there is big distance between #2, and
#3. #4 are religious issues of various forms. Speed of compilation is at
best #5.
Tim Rentsch
2024-06-18 17:54:09 UTC
Permalink
Post by Michael S
On Mon, 17 Jun 2024 07:30:44 -0000 (UTC)
Post by Kaz Kylheku
The problem is that Bart's compiler is VERY unusual. It's
customized for his use, and he has lots of quirks in the way he
thinks compilers should work, which are very different from those
of most other programmers. In particular, compilation speed is very
important to him, while execution speed is almost completely
unimportant, which is pretty much the opposite of the way most
programmers prioritize those things.
Most programmers use Javascript and Python, which follow Bart's
priorities. Fast, invisible compilation to some kind of byte code
(plus possibly later JIT), slow execution time.
I'd dare to say that most programmers care about speed of compilation
more than they care about speed of execution even (or especially) when
they use "visible" compilation processes. Except when compilation is
already very fast.
BTW, my impression was that Bart's 'C' compiler uses 'visible'
compilation.
Then again, neither speed of compilation nor speed of execution are top
priorities for most pros. My guess is that #1 priority is conformance
with co-workers/employer, #2 is convenient IDE, preferably integrated
with debugger, #3 is support, but there is big distance between #2, and
#3. #4 are religious issues of various forms. Speed of compilation is at
best #5.
I agree that speed of compilation is nowhere near the top of my
list, and probably that is true for many or most other developers
as well. I suspect that what the start of the list looks like,
both in terms of what items appear and in what order, varies a
fair amount between different developers and different groups.
David Brown
2024-06-17 13:23:55 UTC
Permalink
Post by Kaz Kylheku
Post by James Kuyper
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers. In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Most programmers use Javascript and Python, which follow Bart's
priorities. Fast, invisible compilation to some kind of byte code (plus
possibly later JIT), slow execution time.
That is not at all why people use Javascript and/or Python.

They want fast /development/ time - compilation speed, to the extent
that such languages have a compilation speed - is a minor issue. I know
that it would not bother me in the slightest if my Python code took some
small but non-zero compilation time for "real" programs (as distinct
from small scripts). I use Python rather than C because for PC code,
that can often involve files, text manipulation, networking, and various
data structures, the Python code is at least an order of magnitude
shorter and faster to write. When I see the amount of faffing around in
order to read and parse a file consisting of a list of integers, I find
it amazing that anyone would actively choose C for the task (unless it
is for the fun of it).

And people who use these languages - indeed any languages - want their
code to be /fast enough/. Faster than that should not be a priority.

Bart's priorities in his C compiler do not match those of Python or
Javascript programmers. (His scripting language might be closer.)
Development time with his C compiler will be significantly worse than
normal C with a quality C compiler - you might save a second or two on
compilation time, but the lack of features, compatibility, modern
standards, and static checking could cost you hours, days, or months.
And the fact that it does not produce as efficient results as tools like
gcc and clang make it less useful - one of the prime motivations for
using C is to get high speed code.

His C compiler might have use as a companion tool to his other language
tools that generate C, and it could also be seen as a testbed for
playing with new potential features in C as it is easier to modify than,
say, gcc or clang. But it is not a tool that matches the priorities of
languages such as Javascript or Python.
Michael S
2024-06-18 08:56:50 UTC
Permalink
On Mon, 17 Jun 2024 15:23:55 +0200
Post by David Brown
I use Python rather than C because for
PC code, that can often involve files, text manipulation, networking,
and various data structures, the Python code is at least an order of
magnitude shorter and faster to write. When I see the amount of
faffing around in order to read and parse a file consisting of a list
of integers, I find it amazing that anyone would actively choose C
for the task (unless it is for the fun of it).
The faffing (what does it mean, BTW ?) is caused by unrealistic
requirements. More specifically, by requirements of (A) to support
arbitrary line length (B) to process file line by line. Drop just one
of those requirements and everything become quite simple.
[O.T.]
That despite the fact that fgets() API is designed rather badly -
return value is much less useful that it can easily be. It would be
interesting to find out who was responsible.
[/O.T.]

For task like that Python could indeed be several times shorter, but
only if you wrote your python script exclusively for yourself, cutting
all corners, like not providing short help for user, not testing that
input format matches expectations and most importantly not reporting
input format problems in potentially useful manner.
OTOH, if we write our utility in more "anal" manner, as we should if
we expect it to be used by other people or by ourselves long time after
it was written (in my age, couple of months is long enough and I am not
that much older than you) then code size difference between python and
C variants will be much smaller, probably factor of 2 or so.

W.r.t. faster to code, it very strongly depends on familiarity.
You didn't do that sort of tasks in 'C' since your school days, right?
Or ever? And you are doing them in Python quite regularly? Then that is
much bigger reason for the difference than the language itself.
Now, for more complicated tasks Python, as the language, and even more
importantly, Python as a massive set of useful libraries could have
very big productivity advantage over 'C'. But it does not apply to very
simple thing like reading numbers from text file.

In the real world, I wrote utility akin to that less than two years ago.
It converted big matrices from space delimited text to Matlab v4 .mat
format. Why did I do it? Because while both Matlab and Gnu Octave are
capable of reading text files like those, but they are quite slow doing
so. With huge files that I was using at the moment, it became
uncomfortable.
I wrote it in 'C' (or was it C-style C++ ? I don't remember) mostly
because I knew how to produce v4 .mat files in C. If I were doing it in
Python, I'd have to learn how to do it in Python and at the end it
would have taken me more time rather than less. I didn't even came to
the point of evaluating whether speed of python's functions for parsing
text was sufficient for my needs.
David Brown
2024-06-18 12:36:40 UTC
Permalink
Post by Michael S
On Mon, 17 Jun 2024 15:23:55 +0200
Post by David Brown
I use Python rather than C because for
PC code, that can often involve files, text manipulation, networking,
and various data structures, the Python code is at least an order of
magnitude shorter and faster to write. When I see the amount of
faffing around in order to read and parse a file consisting of a list
of integers, I find it amazing that anyone would actively choose C
for the task (unless it is for the fun of it).
The faffing (what does it mean, BTW ?) is caused by unrealistic
requirements. More specifically, by requirements of (A) to support
arbitrary line length (B) to process file line by line. Drop just one
of those requirements and everything become quite simple.
"Faffing around" or "faffing about" means messing around doing
unimportant or unnecessary things instead of useful things. In this
case, it means writing lots of code for handling memory management to
read a file instead of using a higher-level language and just reading
the file.

Yes, dropping requirements might make the task easier in C. But you
still don't get close to being as easy as it is in a higher level
language. (That does not have to be Python - I simply use that as an
example that I am familiar with, and many others here will also have at
least some experience of it.)
Post by Michael S
For task like that Python could indeed be several times shorter, but
only if you wrote your python script exclusively for yourself, cutting
all corners, like not providing short help for user, not testing that
input format matches expectations and most importantly not reporting
input format problems in potentially useful manner.
No, even if that were part of the specifications, it would still be far
easier in Python. The brief Python samples I have posted don't cover
such user help, options, error checking, etc., but that's because they
are brief samples.
Post by Michael S
OTOH, if we write our utility in more "anal" manner, as we should if
we expect it to be used by other people or by ourselves long time after
it was written (in my age, couple of months is long enough and I am not
that much older than you) then code size difference between python and
C variants will be much smaller, probably factor of 2 or so.
Unless half the code is a text string for a help page, I'd expect a
bigger factor. And I'd expect the development time difference to be an
even bigger factor - with Python you avoid a number of issues that are
easy to get wrong in C (such as memory management). Of course that
would require a reasonable familiarity of both languages for a fair
comparison.

C and Python are both great languages, with their pros and cons and
different areas where they shine. There can be good reasons for writing
a program like this in C rather than Python, but C is often used without
good technical reasons. To me, it is important to know a number of
tools and pick the best one for any given job.
Post by Michael S
W.r.t. faster to code, it very strongly depends on familiarity.
You didn't do that sort of tasks in 'C' since your school days, right?
Or ever? And you are doing them in Python quite regularly? Then that is
much bigger reason for the difference than the language itself.
Sure - familiarity with a particular tool is a big reason for choosing it.
Post by Michael S
Now, for more complicated tasks Python, as the language, and even more
importantly, Python as a massive set of useful libraries could have
very big productivity advantage over 'C'. But it does not apply to very
simple thing like reading numbers from text file.
IMHO, it does. I have slightly lost track of which programs were being
discussed in which thread, but the Python code for the task is a small
fraction of the size of the C code. I agree that if you want to add
help messages and nicer error messages, the difference will go down.

Here is a simple task - take a file name as an command-line argument,
then read all white-space (space, tab, newlines, mixtures) separated
integers. Add them up and print the count, sum, and average (as an
integer). Give a brief usage message if the file name is missing, and a
brief error if there is something that is not an integer. This should
be a task that you see as very simple in C.


#!/usr/bin/python3
import sys

if len(sys.argv) < 2 :
print("Usage: sums.py <input-file>")
sys.exit(1)

data = list(map(int, open(sys.argv[1], "r").read().split()))
n = len(data)
s = sum(data)
print("Count: %i, sum %i, average %i" % (n, s, s // n))
Post by Michael S
In the real world, I wrote utility akin to that less than two years ago.
It converted big matrices from space delimited text to Matlab v4 .mat
format. Why did I do it? Because while both Matlab and Gnu Octave are
capable of reading text files like those, but they are quite slow doing
so. With huge files that I was using at the moment, it became
uncomfortable.
I wrote it in 'C' (or was it C-style C++ ? I don't remember) mostly
because I knew how to produce v4 .mat files in C. If I were doing it in
Python, I'd have to learn how to do it in Python and at the end it
would have taken me more time rather than less. I didn't even came to
the point of evaluating whether speed of python's functions for parsing
text was sufficient for my needs.
Of course if you don't know Python, it will be slower to write it in Python!

And there are times when Python /could/ be used, but C would be better -
C has faster run-time for most purposes. In many situations you can get
Python to run fast, by being careful of the code structures you use, or
using JIT tools, or using toolkits like numpy. And of course these
require additional development effort and learning to use.
bart
2024-06-18 13:48:15 UTC
Permalink
Post by David Brown
Post by Michael S
On Mon, 17 Jun 2024 15:23:55 +0200
Post by David Brown
I use Python rather than C because for
PC code, that can often involve files, text manipulation, networking,
and various data structures, the Python code is at least an order of
magnitude shorter and faster to write.  When I see the amount of
faffing around in order to read and parse a file consisting of a list
of integers, I find it amazing that anyone would actively choose C
for the task (unless it is for the fun of it).
The faffing (what does it mean, BTW ?) is caused by unrealistic
requirements. More specifically, by requirements of (A) to support
arbitrary line length (B) to process file line by line. Drop just one
of those requirements and everything become quite simple.
"Faffing around" or "faffing about" means messing around doing
unimportant or unnecessary things instead of useful things.  In this
case, it means writing lots of code for handling memory management to
read a file instead of using a higher-level language and just reading
the file.
Yes, dropping requirements might make the task easier in C.  But you
still don't get close to being as easy as it is in a higher level
language.  (That does not have to be Python - I simply use that as an
example that I am familiar with, and many others here will also have at
least some experience of it.)
Post by Michael S
For task like that Python could indeed be several times shorter, but
only if you wrote your python script exclusively for yourself, cutting
all corners, like not providing short help for user, not testing that
input format matches expectations and most importantly not reporting
input format problems in potentially useful manner.
No, even if that were part of the specifications, it would still be far
easier in Python.  The brief Python samples I have posted don't cover
such user help, options, error checking, etc., but that's because they
are brief samples.
Post by Michael S
OTOH, if we write our utility in more "anal" manner, as we should if
we expect it to be used by other people or by ourselves long time after
it was written (in my age, couple of months is long enough and I am not
that much older than you) then code size difference between python and
C variants will be much smaller, probably factor of 2 or so.
Unless half the code is a text string for a help page, I'd expect a
bigger factor.  And I'd expect the development time difference to be an
even bigger factor - with Python you avoid a number of issues that are
easy to get wrong in C (such as memory management).  Of course that
would require a reasonable familiarity of both languages for a fair
comparison.
C and Python are both great languages, with their pros and cons and
different areas where they shine.  There can be good reasons for writing
a program like this in C rather than Python, but C is often used without
good technical reasons.  To me, it is important to know a number of
tools and pick the best one for any given job.
Post by Michael S
W.r.t. faster to code, it very strongly depends on familiarity.
You didn't do that sort of tasks in 'C' since your school days, right?
Or ever? And you are doing them in Python quite regularly? Then that is
much bigger reason for the difference than the language itself.
Sure - familiarity with a particular tool is a big reason for choosing it.
Post by Michael S
Now, for more complicated tasks Python, as the language, and even more
importantly, Python as a massive set of useful libraries could have
very big productivity advantage over 'C'. But it does not apply to very
simple thing like reading numbers from text file.
IMHO, it does.  I have slightly lost track of which programs were being
discussed in which thread, but the Python code for the task is a small
fraction of the size of the C code.  I agree that if you want to add
help messages and nicer error messages, the difference will go down.
Here is a simple task - take a file name as an command-line argument,
then read all white-space (space, tab, newlines, mixtures) separated
integers.  Add them up and print the count, sum, and average (as an
integer).  Give a brief usage message if the file name is missing, and a
brief error if there is something that is not an integer.  This should
be a task that you see as very simple in C.
#!/usr/bin/python3
import sys
    print("Usage: sums.py <input-file>")
    sys.exit(1)
data = list(map(int, open(sys.argv[1], "r").read().split()))
n = len(data)
s = sum(data)
print("Count: %i, sum %i, average %i" % (n, s, s // n))
A rather artificial task that you have to chosen so that it can be done
as a Python one-liner, for the main body.

Some characteristics of how it is done are that the whole file is read
into memory as effectively a single string, and all the numbers are
collated into an in-memory array before it is processed.

Numbers are also conveniently separated by white-space (no commas!), so
that .split can be used.

You are using features from Python that allow arbitrary large integers
that also avoid any overflow on that sum.

A C version wouldn't have all those built-ins to draw on (presumably you
expect the starting point to be 'int main(int n ,char** args){}'; using
existing libraries is not allowed).

Some would write it so that the file is processed serially and doesn't
have to occupy memory, or needed to deal with files that might fill up
memory.

They might also try and avoid building a large data[] array that may
need to grow in size unless the bounds are determined in addvance.

The C version would be doing it in a different mannner, and likely to be
more efficient.

I haven't tried it directly in C (I don't have a C 'readfile's to hand);
I tried it in my language on a 100MB test input of 15M random numbers
ranging up to one million.

It took just under 0.5 seconds. When I optimised it via C and gcc-O3, it
took just over 0.3 seconds (so the C was 50% faster).

In CPython, your version took 6 seconds, and PyPy was 4.8 seconds.

With a more arbitrary input format, this would be the kind of job that a
compiler's lexer does. But nobody seriously writes lexers in Python.


(This is the main program from my attempt; not C, but equally low level:

-------------------
proc main=
int n:=0, x, length:=0, sum:=0

sptr:=readfile("data.txt")
if sptr=nil then stop fi
eof:=0

while x:=nextnumber(); not eof do
++length
sum+:=x
od

println "Length =", length
println "Sum =", sum
println "Average =", sum/length
end
-------------------

Not shown is the fiddly 'nextnumber' routine. It uses 64-bit signed
values, and handles negative numbers.

This is it in action, run directly from source code (tcc can do this too!):

C:\mapps>mm -run test
Length = 15494902
Sum = 7745911799036
Average = 499900
David Brown
2024-06-18 16:11:07 UTC
Permalink
Post by bart
Post by David Brown
Post by Michael S
On Mon, 17 Jun 2024 15:23:55 +0200
Post by David Brown
I use Python rather than C because for
PC code, that can often involve files, text manipulation, networking,
and various data structures, the Python code is at least an order of
magnitude shorter and faster to write.  When I see the amount of
faffing around in order to read and parse a file consisting of a list
of integers, I find it amazing that anyone would actively choose C
for the task (unless it is for the fun of it).
The faffing (what does it mean, BTW ?) is caused by unrealistic
requirements. More specifically, by requirements of (A) to support
arbitrary line length (B) to process file line by line. Drop just one
of those requirements and everything become quite simple.
"Faffing around" or "faffing about" means messing around doing
unimportant or unnecessary things instead of useful things.  In this
case, it means writing lots of code for handling memory management to
read a file instead of using a higher-level language and just reading
the file.
Yes, dropping requirements might make the task easier in C.  But you
still don't get close to being as easy as it is in a higher level
language.  (That does not have to be Python - I simply use that as an
example that I am familiar with, and many others here will also have
at least some experience of it.)
Post by Michael S
For task like that Python could indeed be several times shorter, but
only if you wrote your python script exclusively for yourself, cutting
all corners, like not providing short help for user, not testing that
input format matches expectations and most importantly not reporting
input format problems in potentially useful manner.
No, even if that were part of the specifications, it would still be
far easier in Python.  The brief Python samples I have posted don't
cover such user help, options, error checking, etc., but that's
because they are brief samples.
Post by Michael S
OTOH, if we write our utility in more "anal" manner, as we should if
we expect it to be used by other people or by ourselves long time after
it was written (in my age, couple of months is long enough and I am not
that much older than you) then code size difference between python and
C variants will be much smaller, probably factor of 2 or so.
Unless half the code is a text string for a help page, I'd expect a
bigger factor.  And I'd expect the development time difference to be
an even bigger factor - with Python you avoid a number of issues that
are easy to get wrong in C (such as memory management).  Of course
that would require a reasonable familiarity of both languages for a
fair comparison.
C and Python are both great languages, with their pros and cons and
different areas where they shine.  There can be good reasons for
writing a program like this in C rather than Python, but C is often
used without good technical reasons.  To me, it is important to know a
number of tools and pick the best one for any given job.
Post by Michael S
W.r.t. faster to code, it very strongly depends on familiarity.
You didn't do that sort of tasks in 'C' since your school days, right?
Or ever? And you are doing them in Python quite regularly? Then that is
much bigger reason for the difference than the language itself.
Sure - familiarity with a particular tool is a big reason for choosing it.
Post by Michael S
Now, for more complicated tasks Python, as the language, and even more
importantly, Python as a massive set of useful libraries could have
very big productivity advantage over 'C'. But it does not apply to very
simple thing like reading numbers from text file.
IMHO, it does.  I have slightly lost track of which programs were
being discussed in which thread, but the Python code for the task is a
small fraction of the size of the C code.  I agree that if you want to
add help messages and nicer error messages, the difference will go down.
Here is a simple task - take a file name as an command-line argument,
then read all white-space (space, tab, newlines, mixtures) separated
integers.  Add them up and print the count, sum, and average (as an
integer).  Give a brief usage message if the file name is missing, and
a brief error if there is something that is not an integer.  This
should be a task that you see as very simple in C.
#!/usr/bin/python3
import sys
     print("Usage: sums.py <input-file>")
     sys.exit(1)
data = list(map(int, open(sys.argv[1], "r").read().split()))
n = len(data)
s = sum(data)
print("Count: %i, sum %i, average %i" % (n, s, s // n))
A rather artificial task that you have to chosen so that it can be done
as a Python one-liner, for the main body.
It is an artificial task that matches Michael's description of a "very
simple thing like reading numbers from text file". Perhaps I should
have asked for the median and mode as well as the mean. In Python, that
would mean adding these lines :


from collections import Counter

print("Mode: %i" % Counter(data).most_common(1)[0][0])

if n % 2 == 1 :
median = sorted(data)[n // 2]
else :
median = sum(sorted(data)[(n // 2 - 1) : (n // 2 + 1)]) / 2
print("Median: %s" % median)


Or there is statistics.mode() and statistics.mean(), but I expect you'd
call that cheating. And I know that sorting the data is inefficient
compared to using heaps to calculate the medium, but this is targeting
low developer time, not low run time.

How much more would that be in C?
Post by bart
Some characteristics of how it is done are that the whole file is read
into memory as effectively a single string, and all the numbers are
collated into an in-memory array before it is processed.
Yes. And that's fine.
Post by bart
Numbers are also conveniently separated by white-space (no commas!), so
that .split can be used.
Yes, that was the specification. But if you want it to support spaces,
newlines, tabs and commas, you can write the split() as

.split([" ", "\n", "\t", ","])

I'd probably arrange the code with a couple of extra lines in that case,
as it's not nice to put too much functionality in one line.
Post by bart
You are using features from Python that allow arbitrary large integers
that also avoid any overflow on that sum.
I'm using features from Python in my Python code when showing that
Python has features making it more convenient than C for this kind of
task! What a horror! That's downright /evil/ of me!
Post by bart
A C version wouldn't have all those built-ins to draw on (presumably you
expect the starting point to be 'int main(int n ,char** args){}'; using
existing libraries is not allowed).
Some would write it so that the file is processed serially and doesn't
have to occupy memory, or needed to deal with files that might fill up
memory.
/Exactly/.
Post by bart
They might also try and avoid building a large data[] array that may
need to grow in size unless the bounds are determined in addvance.
The C version would be doing it in a different mannner, and likely to be
more efficient.
Run-time speed was not at issue. We all know that it is possible to
write C code for a task like this which will run a great deal faster
than the Python code, especially if you can give extra restrictions to
the incoming data.
Post by bart
I haven't tried it directly in C (I don't have a C 'readfile's to hand);
I tried it in my language on a 100MB test input of 15M random numbers
ranging up to one million.
No one is interested in that - that was not part of the task.
Post by bart
With a more arbitrary input format, this would be the kind of job that a
compiler's lexer does. But nobody seriously writes lexers in Python.
Yes, people do. (Look up the PLY project, for example.) Nobody
seriously writes lexers in C these days. They use Python or another
high level language during development, prototyping and experimentation,
and if the language takes off as a realistic general-purpose language,
they either write the lexer and the rest of the tools in the new
language itself, or they use C++.
bart
2024-06-18 16:38:58 UTC
Permalink
Post by David Brown
Post by bart
I haven't tried it directly in C (I don't have a C 'readfile's to
hand); I tried it in my language on a 100MB test input of 15M random
numbers ranging up to one million.
No one is interested in that - that was not part of the task.
You're arguing in favour of a high level scripting language for task
rather than a lower level one which you claim involves a lot of
'faffing' around.

I tried it using mine (C would have taken 10 minutes longer), and found
it wasn't actually that hard, especially if you have some ready-made
routines lying around.

Plus it was a magnitude faster. Plus I showed how it could be run
without a discrete build step just like Python (Tcc has that feature for C).

So for this example, there wasn't a lot in it, while the low-level could
was shown to be faster without trying too hard.

For a throwaway program that is only run once you probably would use the
nearest scripting language; its extra runtime (5 seconds for my example)
is shorter than the next extra coding time.

But that's not why you might use C.
Post by David Brown
Post by bart
With a more arbitrary input format, this would be the kind of job that
a compiler's lexer does. But nobody seriously writes lexers in Python.
Yes, people do.  (Look up the PLY project, for example.)  Nobody
seriously writes lexers in C these days.  They use Python or another
high level language during development, prototyping and experimentation,
and if the language takes off as a realistic general-purpose language,
they either write the lexer and the rest of the tools in the new
language itself, or they use C++.
I'm not talking about experimentation.

Actually I'm starting to wonder whether you use C much at all, and why.

You come across as a Python and C++ 'fan-boy'.
David Brown
2024-06-18 16:54:36 UTC
Permalink
Post by bart
Post by David Brown
Post by bart
I haven't tried it directly in C (I don't have a C 'readfile's to
hand); I tried it in my language on a 100MB test input of 15M random
numbers ranging up to one million.
No one is interested in that - that was not part of the task.
You're arguing in favour of a high level scripting language for task
rather than a lower level one which you claim involves a lot of
'faffing' around.
Yes - for cases like the ones we've been looking at recently where
Python code is vastly simpler and faster to write, and easier to get
correct, and where the speed is fine for realistic use-cases.

No one has suggested that Python is faster to /run/ than reasonable C
code - the point is that where Python runs more than as fast as you
need, going faster is of no benefit. But being faster to develop /is/ a
benefit.
Post by bart
I tried it using mine (C would have taken 10 minutes longer), and found
it wasn't actually that hard, especially if you have some ready-made
routines lying around.
You are already far beyond the time it takes to write such code in
Python. And anything in your language is irrelevant to everyone except you.
Post by bart
Plus it was a magnitude faster. Plus I showed how it could be run
without a discrete build step just like Python (Tcc has that feature for C).
So for this example, there wasn't a lot in it, while the low-level could
was shown to be faster without trying too hard.
For a throwaway program that is only run once you probably would use the
nearest scripting language; its extra runtime (5 seconds for my example)
is shorter than the next extra coding time.
But that's not why you might use C.
I agree. That's the point - use C when C is the best choice, use a
higher level language when /that/ is the best choice.
Post by bart
Post by David Brown
Post by bart
With a more arbitrary input format, this would be the kind of job
that a compiler's lexer does. But nobody seriously writes lexers in
Python.
Yes, people do.  (Look up the PLY project, for example.)  Nobody
seriously writes lexers in C these days.  They use Python or another
high level language during development, prototyping and
experimentation, and if the language takes off as a realistic
general-purpose language, they either write the lexer and the rest of
the tools in the new language itself, or they use C++.
I'm not talking about experimentation.
Actually I'm starting to wonder whether you use C much at all, and why.
I use C for embedded systems where C is the right (or only!) choice. I
also use C++ on such systems when that is the right choice.
Post by bart
You come across as a Python and C++ 'fan-boy'.
I'm a fan of picking the best available tool for the job. For something
involving manipulating text, strings, and file data on a PC, that is
rarely C.

Unless, of course, you are doing stuff in C for the fun of it, in which
case C is clearly the right choice!
DFS
2024-06-18 18:14:27 UTC
Permalink
        median = sorted(data)[n // 2]
        median = sum(sorted(data)[(n // 2 - 1) : (n // 2 + 1)]) / 2
I think your else formula (n % 2 == 0) is incorrect:

n = 4
data = [1,2,3,4]
median = 2.5

Yours appears to sum (1,2,3) = 6 / 2 = 3.

Am I reading it correctly?
Mark Bourne
2024-06-18 19:52:58 UTC
Permalink
         median = sorted(data)[n // 2]
         median = sum(sorted(data)[(n // 2 - 1) : (n // 2 + 1)]) / 2
n      = 4
data   = [1,2,3,4]
median = 2.5
Yours appears to sum (1,2,3) = 6 / 2 = 3.
Am I reading it correctly?
Python ranges include the start index but exclude the end index. So
data[1:3] gives the items at data[1] and data[2], but not data[3].
Indexes are zero-based, so data[1:3] == [2, 3], sum([2, 3]) == 5, and 5
/ 2 == 2.5.
--
Mark.
DFS
2024-06-18 20:07:05 UTC
Permalink
         median = sorted(data)[n // 2]
         median = sum(sorted(data)[(n // 2 - 1) : (n // 2 + 1)]) / 2
n      = 4
data   = [1,2,3,4]
median = 2.5
Yours appears to sum (1,2,3) = 6 / 2 = 3.
Am I reading it correctly?
Python ranges include the start index but exclude the end index.  So
data[1:3] gives the items at data[1] and data[2], but not data[3].
Indexes are zero-based, so data[1:3] == [2, 3], sum([2, 3]) == 5, and 5
/ 2 == 2.5.
I knew python is index-0 based and I've done a lot of string slicing,
but I didn't know you could sum a slice.

Thanks.
Malcolm McLean
2024-06-18 14:30:14 UTC
Permalink
Post by David Brown
Post by Michael S
On Mon, 17 Jun 2024 15:23:55 +0200
Post by David Brown
I use Python rather than C because for
PC code, that can often involve files, text manipulation, networking,
and various data structures, the Python code is at least an order of
magnitude shorter and faster to write.  When I see the amount of
faffing around in order to read and parse a file consisting of a list
of integers, I find it amazing that anyone would actively choose C
for the task (unless it is for the fun of it).
The faffing (what does it mean, BTW ?) is caused by unrealistic
requirements. More specifically, by requirements of (A) to support
arbitrary line length (B) to process file line by line. Drop just one
of those requirements and everything become quite simple.
"Faffing around" or "faffing about" means messing around doing
unimportant or unnecessary things instead of useful things.  In this
case, it means writing lots of code for handling memory management to
read a file instead of using a higher-level language and just reading
the file.
Yes, dropping requirements might make the task easier in C.  But you
still don't get close to being as easy as it is in a higher level
language.  (That does not have to be Python - I simply use that as an
example that I am familiar with, and many others here will also have at
least some experience of it.)
Post by Michael S
For task like that Python could indeed be several times shorter, but
only if you wrote your python script exclusively for yourself, cutting
all corners, like not providing short help for user, not testing that
input format matches expectations and most importantly not reporting
input format problems in potentially useful manner.
No, even if that were part of the specifications, it would still be far
easier in Python.  The brief Python samples I have posted don't cover
such user help, options, error checking, etc., but that's because they
are brief samples.
Post by Michael S
OTOH, if we write our utility in more "anal" manner, as we should if
we expect it to be used by other people or by ourselves long time after
it was written (in my age, couple of months is long enough and I am not
that much older than you) then code size difference between python and
C variants will be much smaller, probably factor of 2 or so.
Unless half the code is a text string for a help page, I'd expect a
bigger factor.  And I'd expect the development time difference to be an
even bigger factor - with Python you avoid a number of issues that are
easy to get wrong in C (such as memory management).  Of course that
would require a reasonable familiarity of both languages for a fair
comparison.
C and Python are both great languages, with their pros and cons and
different areas where they shine.  There can be good reasons for writing
a program like this in C rather than Python, but C is often used without
good technical reasons.  To me, it is important to know a number of
tools and pick the best one for any given job.
Post by Michael S
W.r.t. faster to code, it very strongly depends on familiarity.
You didn't do that sort of tasks in 'C' since your school days, right?
Or ever? And you are doing them in Python quite regularly? Then that is
much bigger reason for the difference than the language itself.
Sure - familiarity with a particular tool is a big reason for choosing it.
Post by Michael S
Now, for more complicated tasks Python, as the language, and even more
importantly, Python as a massive set of useful libraries could have
very big productivity advantage over 'C'. But it does not apply to very
simple thing like reading numbers from text file.
IMHO, it does.  I have slightly lost track of which programs were being
discussed in which thread, but the Python code for the task is a small
fraction of the size of the C code.  I agree that if you want to add
help messages and nicer error messages, the difference will go down.
Here is a simple task - take a file name as an command-line argument,
then read all white-space (space, tab, newlines, mixtures) separated
integers.  Add them up and print the count, sum, and average (as an
integer).  Give a brief usage message if the file name is missing, and a
brief error if there is something that is not an integer.  This should
be a task that you see as very simple in C.
#!/usr/bin/python3
import sys
    print("Usage: sums.py <input-file>")
    sys.exit(1)
data = list(map(int, open(sys.argv[1], "r").read().split()))
n = len(data)
s = sum(data)
print("Count: %i, sum %i, average %i" % (n, s, s // n))
And here's a simple task for you. Our filesystem uses a new technology
and is a bit dicey. Occasionally you will get a read error. Can you
modify the python to print out that a read error has occurred?
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
David Brown
2024-06-18 16:15:13 UTC
Permalink
Post by Malcolm McLean
And here's a simple task for you. Our filesystem uses a new technology
and is a bit dicey. Occasionally you will get a read error. Can you
modify the python to print out that a read error has occurred?
The error is in the specification of the task.

If you are trying to suggest that sometimes Python is not a suitable
language and C might be better for some tasks, then I already know that.

(Mind you, it's quite possible that Python and fuse might be a suitable
combination for prototyping a filesystem.)
Mark Bourne
2024-06-18 20:09:25 UTC
Permalink
Post by Malcolm McLean
Post by David Brown
#!/usr/bin/python3
import sys
     print("Usage: sums.py <input-file>")
     sys.exit(1)
data = list(map(int, open(sys.argv[1], "r").read().split()))
n = len(data)
s = sum(data)
print("Count: %i, sum %i, average %i" % (n, s, s // n))
And here's a simple task for you. Our filesystem uses a new technology
and is a bit dicey. Occasionally you will get a read error. Can you
modify the python to print out that a read error has occurred?
No modification required ;) A read error will raise an exception;
unhandled exceptions are printed to stderr along with a stack trace, and
the program terminates with a failure status.

If you don't want the stack trace, you can handle the exception to just
print a message and exit, but this is comp.lang.c, not comp.lang.python.
--
Mark.
Michael S
2024-06-18 15:40:26 UTC
Permalink
On Tue, 18 Jun 2024 14:36:40 +0200
Post by David Brown
Of course if you don't know Python, it will be slower to write it in Python!
I don't know Python well, but it does not meant that I don't know it at
all.
Few minutes ago I took a look into docs and it seems that situation with
writing binary data files with predefined layout is better than what I
was suspecting. They have something called "Buffer Protocol". It allows
to specify layout in declarative manner, similarly to C struct or may
be even to Ada's records with representation clause.
However attempt to read the doc page further down proved that my
suspicion about steepness of the learning curve was not wrong :(
Malcolm McLean
2024-06-18 16:39:01 UTC
Permalink
Post by Michael S
On Tue, 18 Jun 2024 14:36:40 +0200
Post by David Brown
Of course if you don't know Python, it will be slower to write it in Python!
I don't know Python well, but it does not meant that I don't know it at
all.
Few minutes ago I took a look into docs and it seems that situation with
writing binary data files with predefined layout is better than what I
was suspecting. They have something called "Buffer Protocol". It allows
to specify layout in declarative manner, similarly to C struct or may
be even to Ada's records with representation clause.
However attempt to read the doc page further down proved that my
suspicion about steepness of the learning curve was not wrong :(
My main experience of Python was that we had some resource files which
were icons, in matching light and dark themes. The light theme had
suffix _L followed by extension, and the dark themes had _D. And they
needed to be sorted alphabetically, except that _L should be placed
before _D.
And it didn't take long to get Python to sort the list alphabetically,
but there seemed no way in to the sort comparision function itself. And
I had to give up.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
Keith Thompson
2024-06-18 22:49:43 UTC
Permalink
Malcolm McLean <***@gmail.com> writes:
[...]
Post by Malcolm McLean
And it didn't take long to get Python to sort the list alphabetically,
but there seemed no way in to the sort comparision function
itself. And I had to give up.
<OT>
https://docs.python.org/3/library/functions.html#sorted
https://docs.python.org/3/library/stdtypes.html#list.sort
</OT>
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Malcolm McLean
2024-06-19 09:25:08 UTC
Permalink
Post by Keith Thompson
[...]
Post by Malcolm McLean
And it didn't take long to get Python to sort the list alphabetically,
but there seemed no way in to the sort comparision function
itself. And I had to give up.
<OT>
https://docs.python.org/3/library/functions.html#sorted
https://docs.python.org/3/library/stdtypes.html#list.sort
</OT>
key specifies a function of one argument that is used to extract a
comparison key from each element in iterable (for example,
key=str.lower). The default value is None (compare the elements directly).

You see the problem. I can sort on any field. I can sort alphaetically
upwards and downwards. But I don't want to do that. I want to use a
non-alphabetical comaprison function on two fields, and I need to
specify that myself, because it's impossible that it is available
anywhere. And that is to sort alphalbetically, except where the strings
match except for an emedded "_L_" or "_D_" where the string wth the
embedded "L" shoud be treated as closer to A than the string with the
emebdded "_D_".

And I'm sure there is some way to achiev e this. But in C, it s achieved
simply by declaring that qsort takes a function pointer to user-supplied
code.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
David Brown
2024-06-19 10:42:31 UTC
Permalink
Post by Malcolm McLean
Post by Keith Thompson
[...]
Post by Malcolm McLean
And it didn't take long to get Python to sort the list alphabetically,
but there seemed no way in to the sort comparision function
itself. And I had to give up.
<OT>
https://docs.python.org/3/library/functions.html#sorted
https://docs.python.org/3/library/stdtypes.html#list.sort
</OT>
key specifies a function of one argument that is used to extract a
comparison key from each element in iterable (for example,
key=str.lower). The default value is None (compare the elements directly).
You see the problem. I can sort on any field. I can sort alphaetically
upwards and downwards. But I don't want to do that. I want to use a
non-alphabetical comaprison function on two fields, and I need to
specify that myself, because it's impossible that it is available
anywhere. And that is to sort alphalbetically, except where the strings
match except for an emedded "_L_" or "_D_" where the string wth the
embedded "L" shoud be treated as closer to A than the string with the
emebdded "_D_".
def LD_key(n) :
if "_L" in n : return (0, n)
if "_D_" in n : return (1, n)
return (2, n)

Now you have a key function that will put all names containing "_L_"
first, then all names containing "_D_", then everything else, with
alphabetic sorting within those groups.

There is no problem here - you just have to think about things in a
different way.

(I don't know why Python 3 dropped the comparison function support from
sort()/sorted(). It might be that a key function is more efficient,
since you call it once for each item rather than once for each comparison.)
Post by Malcolm McLean
And I'm sure there is some way to achiev e this. But in C, it s achieved
simply by declaring that qsort takes a function pointer to user-supplied
code.
Yes, there is some way to achieve this all in Python. And like pretty
much every other question that is commonly asked, google will tell you
the answer. Sometimes things seem hard - then you do a little research,
learn a bit, and then its easy.
Malcolm McLean
2024-06-19 16:52:11 UTC
Permalink
Post by Malcolm McLean
Post by Keith Thompson
[...]
Post by Malcolm McLean
And it didn't take long to get Python to sort the list alphabetically,
but there seemed no way in to the sort comparision function
itself. And I had to give up.
<OT>
https://docs.python.org/3/library/functions.html#sorted
https://docs.python.org/3/library/stdtypes.html#list.sort
</OT>
key specifies a function of one argument that is used to extract a
comparison key from each element in iterable (for example,
key=str.lower). The default value is None (compare the elements directly).
You see the problem. I can sort on any field. I can sort alphaetically
upwards and downwards. But I don't want to do that. I want to use a
non-alphabetical comaprison function on two fields, and I need to
specify that myself, because it's impossible that it is available
anywhere. And that is to sort alphalbetically, except where the
strings match except for an emedded "_L_" or "_D_" where the string
wth the embedded "L" shoud be treated as closer to A than the string
with the emebdded "_D_".
    if "_L" in n : return (0, n)
    if "_D_" in n : return (1, n)
    return (2, n)
Now you have a key function that will put all names containing "_L_"
first, then all names containing "_D_", then everything else, with
alphabetic sorting within those groups.
Yes, but that's not quite what we want. A typical input would go.

quill_icon_D_.png
***@2x.png
quill_icon_L_.png
***@2x.png
aardvark.png
zebra.png

and the ouput we want is

aardvark.png
quill_icon_L_.png
quill_icon_D_.png
***@2x.png
***@2x.png
zebra.png
There is no problem here - you just have to think about things in a
different way.
(I don't know why Python 3 dropped the comparison function support from
sort()/sorted().  It might be that a key function is more efficient,
since you call it once for each item rather than once for each comparison.)
Post by Malcolm McLean
And I'm sure there is some way to achiev e this. But in C, it s
achieved simply by declaring that qsort takes a function pointer to
user-supplied code.
Yes, there is some way to achieve this all in Python.  And like pretty
much every other question that is commonly asked, google will tell you
the answer.  Sometimes things seem hard - then you do a little research,
learn a bit, and then its easy.
I struugled with it for a while and, as you say, had recourse to the
web. It was the very first Python program I tried to write, and it
semmed unacceptably difficult to carry out a simple customised sort. I
got alphabetical sorting going quite easily, but you don't need
scripting for that.

It wasn't what I was ment to be spending my time on anyway. So I had to
drop the idea of automating the collation of resources with a Python
script.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
David Brown
2024-06-19 17:49:50 UTC
Permalink
Post by Malcolm McLean
Yes, but that's not quite what we want. A typical input would go.
It's extremely hard to guess what you want (not "we", but "you" - no one
else wants this kind of thing) when you have bizarre requirements and
only give bits of them. So modifying the Python code is left as an
exercise if you are interested, especially as it is off-topic.

I appreciate that Python programming will be more difficult than C
programming if you are familiar with C and have never written Python.
That's not the point. The point is that for someone reasonably familiar
with both languages, some types of coding - such as the ones discussed
here - are faster and easier to develop in Python.
David Brown
2024-06-19 05:44:44 UTC
Permalink
Post by Malcolm McLean
Post by Michael S
On Tue, 18 Jun 2024 14:36:40 +0200
Post by David Brown
Of course if you don't know Python, it will be slower to write it in Python!
I don't know Python well, but it does not meant that I don't know it at
all.
Few minutes ago I took a look into docs and it seems that situation with
writing binary data files with predefined layout is better than what I
was suspecting. They have something called "Buffer Protocol". It allows
to specify layout in declarative manner, similarly to C struct or may
be even to Ada's records with representation clause.
However attempt to read the doc page further down proved that my
suspicion about steepness of the learning curve was not wrong :(
My main experience of Python was that we had some resource files which
were icons, in matching light and dark themes. The light theme had
suffix _L followed by extension, and the dark themes had _D. And they
needed to be sorted alphabetically, except that _L should be placed
before _D.
And it didn't take long to get Python to sort the list alphabetically,
but there seemed no way in to the sort comparision function itself. And
I had to give up.
Python "sort" is a bit like C "qsort" (desperately trying to relate this
to the group topicality) in that you can define your own comparison
function, and use that for "sort". For simple comparison functions,
people often use lambdas, for more complicated ones it's clearer to
define a function with a name.
Keith Thompson
2024-06-19 09:27:04 UTC
Permalink
David Brown <***@hesbynett.no> writes:
[...]
Post by David Brown
Python "sort" is a bit like C "qsort" (desperately trying to relate
this to the group topicality) in that you can define your own
comparison function, and use that for "sort". For simple comparison
functions, people often use lambdas, for more complicated ones it's
clearer to define a function with a name.
Not exactly (see the Python documentation), but this isn't the place to
go into the details.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Michael S
2024-06-19 10:17:58 UTC
Permalink
On Wed, 19 Jun 2024 07:44:44 +0200
Post by David Brown
Post by Malcolm McLean
My main experience of Python was that we had some resource files
which were icons, in matching light and dark themes. The light
theme had suffix _L followed by extension, and the dark themes had
_D. And they needed to be sorted alphabetically, except that _L
should be placed before _D.
And it didn't take long to get Python to sort the list
alphabetically, but there seemed no way in to the sort comparision
function itself. And I had to give up.
Python "sort" is a bit like C "qsort" (desperately trying to relate
this to the group topicality) in that you can define your own
comparison function, and use that for "sort". For simple comparison
functions, people often use lambdas, for more complicated ones it's
clearer to define a function with a name.
Off topic:
Indeed, Python sort has option for specifying comparison function, but
I would not call it very similar to comparison function of C qsort
since in Python comparison is not applied directly to the record.
Instead, it applied to the keys that are derived from record.
Besides, my impression is that in Python sorting by user-supplied
comparison function is less idiomatic than doing all heavy lifting in
user-supplied key function.
For the case, presented by Malcolm, I'd certainly do it all in key(),
without custom cmp_to_key(). May be, it's a little less efficient, but
significantly easier to comprehend.

Back to topic:
C qsort() sucks. They forgot to provide an option for 3rd parameter
(context) in comparison callback.
Back to O.T.:
Python's sort bypasses the problem by allowing lambda as a key()
function, so it could have visibility of variables of the caller. IMHO,
it still sucks.
C++ way, where comparison can be functor, sucks far less. I find it
less than obvious, but at least all functionality one can ever want is
available.

Back to topic: why C standard committee still didn't add something like
gnu qsort_r() to the standard?
David Brown
2024-06-19 10:53:21 UTC
Permalink
Post by Michael S
On Wed, 19 Jun 2024 07:44:44 +0200
Post by David Brown
Post by Malcolm McLean
My main experience of Python was that we had some resource files
which were icons, in matching light and dark themes. The light
theme had suffix _L followed by extension, and the dark themes had
_D. And they needed to be sorted alphabetically, except that _L
should be placed before _D.
And it didn't take long to get Python to sort the list
alphabetically, but there seemed no way in to the sort comparision
function itself. And I had to give up.
Python "sort" is a bit like C "qsort" (desperately trying to relate
this to the group topicality) in that you can define your own
comparison function, and use that for "sort". For simple comparison
functions, people often use lambdas, for more complicated ones it's
clearer to define a function with a name.
Indeed, Python sort has option for specifying comparison function, but
I would not call it very similar to comparison function of C qsort
since in Python comparison is not applied directly to the record.
Yes, it seems that the comparison function support in sort() was in
Python 2 but was dropped for Python 3.
Post by Michael S
Instead, it applied to the keys that are derived from record.
Besides, my impression is that in Python sorting by user-supplied
comparison function is less idiomatic than doing all heavy lifting in
user-supplied key function.
A key function can be applied once per item, while a comparison function
is called once per comparison - thus key functions will be (or at least
/can/ be) more efficient.
Post by Michael S
For the case, presented by Malcolm, I'd certainly do it all in key(),
without custom cmp_to_key(). May be, it's a little less efficient, but
significantly easier to comprehend.
C qsort() sucks. They forgot to provide an option for 3rd parameter
(context) in comparison callback.
It is also not guaranteed to be stable (which is important in some
contexts), and it is a misnomer - it is rarely a quicksort.
Post by Michael S
Python's sort bypasses the problem by allowing lambda as a key()
function, so it could have visibility of variables of the caller. IMHO,
it still sucks.
C++ way, where comparison can be functor, sucks far less. I find it
less than obvious, but at least all functionality one can ever want is
available.
The C++ way is also massively more efficient than C in cases where the
comparison function is simple. And it is typesafe.
Post by Michael S
Back to topic: why C standard committee still didn't add something like
gnu qsort_r() to the standard?
That is a little more flexible, but it's still ugly!
Malcolm McLean
2024-06-19 17:42:42 UTC
Permalink
Post by David Brown
Post by Michael S
On Wed, 19 Jun 2024 07:44:44 +0200
Post by David Brown
Post by Malcolm McLean
My main experience of Python was that we had some resource files
which were icons, in matching light and dark themes. The light
theme had suffix _L followed by extension, and the dark themes had
_D. And they needed to be sorted alphabetically, except that _L
should be placed before _D.
And it didn't take long to get Python to sort the list
alphabetically, but there seemed no way in to the sort comparision
function itself. And I had to give up.
Python "sort" is a bit like C "qsort" (desperately trying to relate
this to the group topicality) in that you can define your own
comparison function, and use that for "sort".  For simple comparison
functions, people often use lambdas, for more complicated ones it's
clearer to define a function with a name.
Indeed, Python sort has option for specifying comparison function, but
I would not call it very similar to comparison function of C qsort
since in Python comparison is not applied directly to the record.
Yes, it seems that the comparison function support in sort() was in
Python 2 but was dropped for Python 3.
This is exactly the sort of carry on which causes problems. You have a
requirement for a custom sort. And, whilst I'm sure that the sort I
ak=sked for is possible to achieve, its not exacty obvious how to
achieve it from someone brought up on sort. So you trying looking it up
on the web, and every answer is bases on Pyhtn 2 sort, which has been
taken away.
Post by David Brown
Post by Michael S
Instead, it applied to the keys that are derived from record.
Besides, my impression is that in Python sorting by user-supplied
comparison function is less idiomatic than doing all heavy lifting in
user-supplied key function.
A key function can be applied once per item, while a comparison function
is called once per comparison - thus key functions will be (or at least
/can/ be) more efficient.
Post by Michael S
For the case, presented by Malcolm, I'd certainly do it all in key(),
without custom cmp_to_key(). May be, it's a little less efficient, but
significantly easier to comprehend.
C qsort() sucks. They forgot to provide an option for 3rd parameter
(context) in comparison callback.
Back to topic: why C standard committee still didn't add something like
gnu qsort_r() to the standard?
That is a little more flexible, but it's still ugly!
You seldom need a context pointer for sort functions. They must be pure
functions, or at least they must return the same result for the same two
copmarisions. And it's rare that it makes sense to give them extra
paeameers.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
David Brown
2024-06-19 05:39:26 UTC
Permalink
Post by Michael S
On Tue, 18 Jun 2024 14:36:40 +0200
Post by David Brown
Of course if you don't know Python, it will be slower to write it in Python!
I don't know Python well, but it does not meant that I don't know it at
all.
Few minutes ago I took a look into docs and it seems that situation with
writing binary data files with predefined layout is better than what I
was suspecting. They have something called "Buffer Protocol". It allows
to specify layout in declarative manner, similarly to C struct or may
be even to Ada's records with representation clause.
However attempt to read the doc page further down proved that my
suspicion about steepness of the learning curve was not wrong :(
"Buffer protocol" is for passing data between Python and C extensions,
which is certainly a complicated business.

For dealing with binary data in specific formats in Python, the "struct"
module is your friend. It lets you pack and unpack data with specific
sizes and endianness using a compact format string notation. I've used
it for dealing with binary file formats and especially for network
packets. There's also the ctypes module which is aimed at duplicating
C-style types and structures, primarily for interfacing with DLL's and
dynamic so libraries.
James Kuyper
2024-06-18 07:26:45 UTC
Permalink
Post by Kaz Kylheku
Post by James Kuyper
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers. In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Most programmers use Javascript and Python, which follow Bart's
priorities. Fast, invisible compilation to some kind of byte code (plus
possibly later JIT), slow execution time.
Perhaps I should have said "most C programmers"; C tends to attract
those who have a need for fast execution time.
Most of my own programming experience has been with programs that worked
on data coming down to Earth from NASA satellites. My programs read one
or more input files, process them, and write one or more output files,
with no human interaction of any kind. Those programs each ran in batch
processing mode thousand of times a day, and the load they placed on the
processors was a significant cost factor - the slower they operated, the
more processors we had to maintain in order to get the output data
coming out as fast as the input data was coming in. Even though they
performed complex scientific calculations on the data, they were
primarily I/O bound, so our top priority was to design them to minimize
the amount of I/O that needed to be done.
I fully understand that this experience gives me a biased view of
programming - but so does everyone else's experience. I am in no danger
of believing that all programs are batch processing, and you should not
imagine that all programs are interactive. Some of the biggest, most
power computers in the world process weather forecasting data 24/7, and
many of those programs operate in a batch mode keeping pace with
real-time data, similar to the way mine operated.
bart
2024-06-17 10:30:19 UTC
Permalink
Post by James Kuyper
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers.
In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Compilation speed is important to everyone. That's why so many tricks
are used to get around the lack of speed in a big compiler, or so many
extra resources are thrown at the problem.

Runtime performance is important too, but at this level of language, the
difference between optimised and unoptimised code is narrow. Unoptimised
may be between 1x and 2x slower, typically.

Perhaps slower on benchmarks, or code written in C++ style that
generates lots of redundances that relies on optimisation to make it fast.

But, during developement, you probably wouldn't use optimisation anyway.

In that case, you're still suffering slow build times with a big
compiler, but you don't get any faster code at the end of it.

I sometimes suggest to people to use Tiny C most of the time, and run
gcc from time to time for extra analysis and extra checks, and use
gcc-O3 for production builds.

(I have also suggested that gcc should incorporate a -O-1 option that
runs a secretly bundled of Tiny C.)
David Brown
2024-06-17 13:43:31 UTC
Permalink
Post by bart
Post by James Kuyper
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers.
In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Compilation speed is important to everyone. That's why so many tricks
are used to get around the lack of speed in a big compiler, or so many
extra resources are thrown at the problem.
What "tricks" ?
Post by bart
Runtime performance is important too, but at this level of language, the
difference between optimised and unoptimised code is narrow. Unoptimised
may be between 1x and 2x slower, typically.
That depends on the language, type of code, and target platform.
Typical C code on an x86_64 platform might be two or three times slower
when using a poorly optimising compiler. After all, the designers of
x86 cpus put a great deal of effort into making shitty code run fast.
For high-performance code written with care and requiring fast results,
the performance difference will be bigger. For C++, especially code
that makes good use of abstractions, the difference can be very much
bigger. For C code on an embedded ARM device or other microcontroller,
it's not unusual to see a 5x speed improvement on optimised code.

Speed is not the only good reason for picking C as the language for a
task, but it is often a relevant factor. And if it is a factor, then
you will usually prefer faster speeds.
Post by bart
Perhaps slower on benchmarks, or code written in C++ style that
generates lots of redundances that relies on optimisation to make it fast.
But, during developement, you probably wouldn't use optimisation anyway.
I virtually always have optimisation enabled during development. I
might, when trying to chase down a specific bug, reduce some specific
optimisations, but I have never seen the point of crippling a
development tool when doing development work - it makes no sense at all.
Post by bart
In that case, you're still suffering slow build times with a big
compiler, but you don't get any faster code at the end of it.
I sometimes suggest to people to use Tiny C most of the time, and run
gcc from time to time for extra analysis and extra checks, and use
gcc-O3 for production builds.
I cannot imagine any situation where I would think that might be a good
idea.

But then, I see development tools as tools to help my work as a
developer, while you seem to consider tools (other than your own) as
objects of hatred to be avoided whenever possible or dismissed as
"tricks". I don't expect we will ever agree there.
Post by bart
(I have also suggested that gcc should incorporate a -O-1 option that
runs a secretly bundled of Tiny C.)
Malcolm McLean
2024-06-17 15:48:36 UTC
Permalink
Post by David Brown
Post by bart
Post by James Kuyper
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers.
In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Compilation speed is important to everyone. That's why so many tricks
are used to get around the lack of speed in a big compiler, or so many
extra resources are thrown at the problem.
What "tricks" ?
Post by bart
Runtime performance is important too, but at this level of language,
the difference between optimised and unoptimised code is narrow.
Unoptimised may be between 1x and 2x slower, typically.
That depends on the language, type of code, and target platform. Typical
C code on an x86_64 platform might be two or three times slower when
using a poorly optimising compiler.  After all, the designers of x86
cpus put a great deal of effort into making shitty code run fast. For
high-performance code written with care and requiring fast results, the
performance difference will be bigger.  For C++, especially code that
makes good use of abstractions, the difference can be very much bigger.
For C code on an embedded ARM device or other microcontroller, it's not
unusual to see a 5x speed improvement on optimised code.
Speed is not the only good reason for picking C as the language for a
task, but it is often a relevant factor.  And if it is a factor, then
you will usually prefer faster speeds.
Post by bart
Perhaps slower on benchmarks, or code written in C++ style that
generates lots of redundances that relies on optimisation to make it fast.
But, during developement, you probably wouldn't use optimisation anyway.
I virtually always have optimisation enabled during development.  I
might, when trying to chase down a specific bug, reduce some specific
optimisations, but I have never seen the point of crippling a
development tool when doing development work - it makes no sense at all.
I never do.
Until I had to give up work, I was making real time tools for artists.
And if it didn't work in under just noticeable time on the debug build,
it wouldn't be working in under just noticeable time on the release
build, you could be pretty sure. So I never turned the release build on,
but of course the downstream deployment team built it as release for
delivery to customers. And that might mean that they could do 2000 paths
instead of 1000 before the tool slowed to the point that it became
unusable. So not a game changer. But not something to deprive a customer
of either.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
David Brown
2024-06-17 16:21:26 UTC
Permalink
Post by Malcolm McLean
Post by David Brown
Post by bart
Post by James Kuyper
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers.
In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Compilation speed is important to everyone. That's why so many tricks
are used to get around the lack of speed in a big compiler, or so
many extra resources are thrown at the problem.
What "tricks" ?
Post by bart
Runtime performance is important too, but at this level of language,
the difference between optimised and unoptimised code is narrow.
Unoptimised may be between 1x and 2x slower, typically.
That depends on the language, type of code, and target platform.
Typical C code on an x86_64 platform might be two or three times
slower when using a poorly optimising compiler.  After all, the
designers of x86 cpus put a great deal of effort into making shitty
code run fast. For high-performance code written with care and
requiring fast results, the performance difference will be bigger.
For C++, especially code that makes good use of abstractions, the
difference can be very much bigger. For C code on an embedded ARM
device or other microcontroller, it's not unusual to see a 5x speed
improvement on optimised code.
Speed is not the only good reason for picking C as the language for a
task, but it is often a relevant factor.  And if it is a factor, then
you will usually prefer faster speeds.
Post by bart
Perhaps slower on benchmarks, or code written in C++ style that
generates lots of redundances that relies on optimisation to make it fast.
But, during developement, you probably wouldn't use optimisation anyway.
I virtually always have optimisation enabled during development.  I
might, when trying to chase down a specific bug, reduce some specific
optimisations, but I have never seen the point of crippling a
development tool when doing development work - it makes no sense at all.
I never do.
Until I had to give up work, I was making real time tools for artists.
And if it didn't work in under just noticeable time on the debug build,
it wouldn't be working in under just noticeable time on the release
build, you could be pretty sure. So I never turned the release build on,
but of course the downstream deployment team built it as release for
delivery to customers. And that might mean that they could do 2000 paths
instead of 1000 before the tool slowed to the point that it became
unusable. So not a game changer. But not something to deprive a customer
of either.
Having a distinction in optimisation between "debug" and "release"
builds is simply /wrong/. Release what you have debugged, debug what
you intend to release.

Sometimes it is helpful to fiddle with optimisation settings for
specific debugging tasks - though usually it is better to do this for
specific files or functions. And of course the more heavyweight
debugging tools, such as sanitizers, are used during development and not
in releases.

Optimisation is important. Right out the gate, it means you can let
your tools do a much better job of static analysis (though it is
possible to treat static analysis as a separate tool from compilation),
and you never want to leave bugs to testing if your tools can find them
at static analysis stage. The more problems you find out early, the better.

The other major point to optimisation is it means you can use better
abstractions. You don't need to use outdated and unsafe function-like
macros - you can use proper functions. You can split code up into
smaller parts, make new variables as and when they are convenient, and
in general write clearer and more maintainable code because you are
leaving the donkey work of optimisation up to the compiler.

Basically, if you are not using a good optimising compiler, and have
optimisation enabled, then the chances are high that C is the wrong
choice of language for the task.

Using C without optimisation is like driving a car but refusing to go
out of first gear. You would probably have been better off with a
bicycle or driving a tank, according to the task at hand.
Malcolm McLean
2024-06-17 19:17:25 UTC
Permalink
Post by David Brown
Using C without optimisation is like driving a car but refusing to go
out of first gear.  You would probably have been better off with a
bicycle or driving a tank, according to the task at hand.
I drive C in first gear when I'm developing, which means that the car is
given instructions to go to the right place and obey all, the rules of
the road. But it never gets out of frst gear when I'm driving it.
However because of the nature of what we do, which is interactivce
programming mostly, usually "just noticeable" time is sufficient. It's a
bit like driving in London - a top of the range sports car is no better
than a beat up old mini, they travel at the same speed because of all
the interactions.

They I had it over to the deployment team, and they take the restraints
off, and allow it to go up to top gear, and it is compiled with full
optimisation. And I don't actually have a computer with one of the most
important hardware targets, but it's all written in C++, a bit in C, and
none in assembler. So I can't profile it, and I have to rely on insight
into where the inner loop will be, and how to avoid expensive operations
in the inner loop.

And hopefully those subroutines will be called for many years to come,
or hardware as yet un-designed.

With Baby X, I did have severe problems with the rendering speed on an
old Windows machine. But I haven;t noticed them now its runnng on the
Apple Mac. However as the name suggests, Baby X was first designed for X
lib. I only added Windows support later, and all the rgba buffers were
in the wrong format. But faster processors cover a multitude of sins, if
you keep things lean.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
David Brown
2024-06-17 19:29:57 UTC
Permalink
Post by Malcolm McLean
Post by David Brown
Using C without optimisation is like driving a car but refusing to go
out of first gear.  You would probably have been better off with a
bicycle or driving a tank, according to the task at hand.
I drive C in first gear when I'm developing, which means that the car is
given instructions to go to the right place and obey all, the rules of
the road.
I do my C development with optimisations enabled, which means that the C
compiler will obey all the rules and requirements of C. Optimisations
don't change the meaning of correct code - they only have an effect on
the results of your code if you have written incorrect code. I don't
know about you, but my aim in development is to write /correct/ code.
If disabling optimisations helped in some way, it would be due to bugs
and luck.
Post by Malcolm McLean
But it never gets out of frst gear when I'm driving it.
However because of the nature of what we do, which is interactivce
programming mostly, usually "just noticeable" time is sufficient. It's a
bit like driving in London - a top of the range sports car is no better
than a beat up old mini, they travel at the same speed because of all
the interactions.
If I am writing PC code where the timing is determined by user
interaction, I would not be writing in C - it is almost certainly a poor
choice of language for the task.
Post by Malcolm McLean
They I had it over to the deployment team, and they take the restraints
off, and allow it to go up to top gear, and it is compiled with full
optimisation.
That is insane development practice, if I understand you correctly. For
some kinds of development work, it can make sense to have one person (or
team) make prototypes or proofs-of-concept, and then have another person
(or team) use that as a guide, specification and test comparison when
writing a fast implementation for the real product. But the prototype
should be in a high-level language, written in the clearest and simplest
manner - not crappy code in a low-level language that works by luck when
it is not optimised!
Post by Malcolm McLean
And I don't actually have a computer with one of the most
important hardware targets, but it's all written in C++, a bit in C, and
none in assembler. So I can't profile it, and I have to rely on insight
into where the inner loop will be, and how to avoid expensive operations
in the inner loop.
If you are writing C++ and are not happy about using optimisation, you
are in the wrong job.
Post by Malcolm McLean
And hopefully those subroutines will be called for many years to come,
or hardware as yet un-designed.
With Baby X, I did have severe problems with the rendering speed on an
old Windows machine. But I haven;t noticed them now its runnng on the
Apple Mac. However as the name suggests, Baby X was first designed for X
lib. I only added Windows support later, and all the rgba buffers were
in the wrong format. But faster processors cover a multitude of sins, if
you keep things lean.
Malcolm McLean
2024-06-17 21:06:35 UTC
Permalink
Post by David Brown
Post by Malcolm McLean
Post by David Brown
Using C without optimisation is like driving a car but refusing to go
out of first gear.  You would probably have been better off with a
bicycle or driving a tank, according to the task at hand.
I drive C in first gear when I'm developing, which means that the car
is given instructions to go to the right place and obey all, the rules
of the road.
I do my C development with optimisations enabled, which means that the C
compiler will obey all the rules and requirements of C.  Optimisations
don't change the meaning of correct code - they only have an effect on
the results of your code if you have written incorrect code.  I don't
know about you, but my aim in development is to write /correct/ code. If
disabling optimisations helped in some way, it would be due to bugs and
luck.
Post by Malcolm McLean
But it never gets out of frst gear when I'm driving it. However
because of the nature of what we do, which is interactivce programming
mostly, usually "just noticeable" time is sufficient. It's a bit like
driving in London - a top of the range sports car is no better than a
beat up old mini, they travel at the same speed because of all the
interactions.
If I am writing PC code where the timing is determined by user
interaction, I would not be writing in C - it is almost certainly a poor
choice of language for the task.
Post by Malcolm McLean
They I had it over to the deployment team, and they take the
restraints off, and allow it to go up to top gear, and it is compiled
with full optimisation.
That is insane development practice, if I understand you correctly.  For
some kinds of development work, it can make sense to have one person (or
team) make prototypes or proofs-of-concept, and then have another person
(or team) use that as a guide, specification and test comparison when
writing a fast implementation for the real product.  But the prototype
should be in a high-level language, written in the clearest and simplest
manner - not crappy code in a low-level language that works by luck when
it is not optimised!
Post by Malcolm McLean
And I don't actually have a computer with one of the most important
hardware targets, but it's all written in C++, a bit in C, and none in
assembler. So I can't profile it, and I have to rely on insight into
where the inner loop will be, and how to avoid expensive operations in
the inner loop.
If you are writing C++ and are not happy about using optimisation, you
are in the wrong job.
You know what hardware your code will run on. I don't.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
David Brown
2024-06-18 06:44:32 UTC
Permalink
Post by Malcolm McLean
Post by David Brown
Post by Malcolm McLean
Post by David Brown
Using C without optimisation is like driving a car but refusing to
go out of first gear.  You would probably have been better off with
a bicycle or driving a tank, according to the task at hand.
I drive C in first gear when I'm developing, which means that the car
is given instructions to go to the right place and obey all, the
rules of the road.
I do my C development with optimisations enabled, which means that the
C compiler will obey all the rules and requirements of C.
Optimisations don't change the meaning of correct code - they only
have an effect on the results of your code if you have written
incorrect code.  I don't know about you, but my aim in development is
to write /correct/ code. If disabling optimisations helped in some
way, it would be due to bugs and luck.
Post by Malcolm McLean
But it never gets out of frst gear when I'm driving it. However
because of the nature of what we do, which is interactivce
programming mostly, usually "just noticeable" time is sufficient.
It's a bit like driving in London - a top of the range sports car is
no better than a beat up old mini, they travel at the same speed
because of all the interactions.
If I am writing PC code where the timing is determined by user
interaction, I would not be writing in C - it is almost certainly a
poor choice of language for the task.
Post by Malcolm McLean
They I had it over to the deployment team, and they take the
restraints off, and allow it to go up to top gear, and it is compiled
with full optimisation.
That is insane development practice, if I understand you correctly.
For some kinds of development work, it can make sense to have one
person (or team) make prototypes or proofs-of-concept, and then have
another person (or team) use that as a guide, specification and test
comparison when writing a fast implementation for the real product.
But the prototype should be in a high-level language, written in the
clearest and simplest manner - not crappy code in a low-level language
that works by luck when it is not optimised!
Post by Malcolm McLean
And I don't actually have a computer with one of the most important
hardware targets, but it's all written in C++, a bit in C, and none
in assembler. So I can't profile it, and I have to rely on insight
into where the inner loop will be, and how to avoid expensive
operations in the inner loop.
If you are writing C++ and are not happy about using optimisation, you
are in the wrong job.
You know what hardware your code will run on. I don't.
That is absolutely true, and it gives me certain advantages. It is also
the case that high-quality optimisation is vital to my work.

But it is also absolutely irrelevant to the point I was making.
bart
2024-06-17 21:01:05 UTC
Permalink
Post by David Brown
Using C without optimisation is like driving a car but refusing to go
out of first gear.  You would probably have been better off with a
bicycle or driving a tank, according to the task at hand.
Which bit is the car: the compiler, or the program that it produces?

When I am developing, it is the compiler that is used more often. Or, if
I spend a lot of time on a particular build of an application, the speed
at which it runs is rarely critical, since during most testing, the
scale of the tasks is small.

So if the compiler is the car, then one like tcc goes at 60mph while gcc
goes at walking pace.
David Brown
2024-06-18 07:01:48 UTC
Permalink
Post by bart
Post by David Brown
Using C without optimisation is like driving a car but refusing to go
out of first gear.  You would probably have been better off with a
bicycle or driving a tank, according to the task at hand.
Which bit is the car: the compiler, or the program that it produces?
The compiler. It is the compiler you are pointlessly and
counter-productively limiting.

Clearly (at least to people not intentionally misinterpreting this) the
speed of the car is not analogous to the /speed/ of the compiler, but
its functionality.
Post by bart
When I am developing, it is the compiler that is used more often. Or, if
I spend a lot of time on a particular build of an application, the speed
at which it runs is rarely critical, since during most testing, the
scale of the tasks is small.
So if the compiler is the car, then one like tcc goes at 60mph while gcc
goes at walking pace.
If you spend most of your development time compiling, you are an
/extremely/ unusual developer.

I would expect that for most developers, the great majority of their
time is spend reading - reading their own code, reading other people's
code, reading documentation, API details, manuals, specifications,
notes, and everything else. The tool they spend most time with is their
IDE, along with whatever tools they use in testing and debugging and
whatever tools they use for documentation, and whatever collaboration
tools they use with colleagues (zoom, whiteboards, coffee machines,
etc.). Proportions will of course vary wildly.

I haven't measured the times for my own work, but at a vague guess I'd
suppose I have perhaps 5 to 30 seconds of build time per hour on average
during most development. Occasionally I'll have peaks where I am doing
small changes, rebuilds and testing in quick succession, but even there
the build times are very rarely a major time factor compared to testing
time. (And that's with perhaps 500 files of C, C++ and headers.)
James Kuyper
2024-06-17 23:52:16 UTC
Permalink
On 6/17/24 12:21, David Brown wrote:
...
Post by David Brown
Having a distinction in optimisation between "debug" and "release"
builds is simply /wrong/. Release what you have debugged, debug what
you intend to release.
I fully agree that you should debug what you intend to release; but I
can't agree that it always makes sense to release what you've debugged.
There are ways to debug code that make it horribly inefficient - they
are also good ways to uncover certain kinds of bugs. There should be a
debug mode where you enable that inefficient code, and track down and
remove any bugs that you find. Then you go to release mode, and test it
as thoroughly as possible with the code as it is intended to be
released, which is never as much can be possible in debug mode. Do not
release until the final version of the code has passed both sets of
tests. If release testing uncovers a bug that requires a code change,
that means that debug testing also needs to be redone.
David Brown
2024-06-18 09:26:32 UTC
Permalink
Post by James Kuyper
...
Post by David Brown
Having a distinction in optimisation between "debug" and "release"
builds is simply /wrong/. Release what you have debugged, debug what
you intend to release.
I fully agree that you should debug what you intend to release; but I
can't agree that it always makes sense to release what you've debugged.
There are ways to debug code that make it horribly inefficient - they
are also good ways to uncover certain kinds of bugs.
Yes, as I said I sometimes change things while chasing particular bugs -
with sanitizers mentioned as an example. And I might compile a
particular file with low optimisation, or disable particular
optimisations to make debugging easier.

Most often, I do this with additions to the source code in question -
marking some functions as "noinline" (a gcc attribute) to make
breakpoints easier, or marking some variables as "volatile" to make it
easier to see them with a debugger, or simply adding some extra
printf's. You do what you need to do in order to find the bugs, and how
you do that depends on the bugs and the type of tools you use and the
kind of program you have. But I do not change the rest of the build.


Of course it is correct in a sense that you don't release what you
debug, because you debug code that is not working, and you don't release
it until it is fixed!

But you should not (IMHO) be using special debug modes for the main part
of your debugging and testing, you should be aiming to have as realistic
a scenario as you can for the code. While optimisation does not change
the effect of correct code (other than perhaps different choices for
unspecified behaviour), few programmers are perfect. Sometimes code
errors will, by luck, give the desired behaviour with no optimisation
but erroneous behaviour with high optimisation. I have no interest at
all in having my code pass its tests with -O0 and fail with -O2 - I want
to go straight to seeing the problem.
Post by James Kuyper
There should be a
debug mode where you enable that inefficient code, and track down and
remove any bugs that you find. Then you go to release mode, and test it
as thoroughly as possible with the code as it is intended to be
released, which is never as much can be possible in debug mode. Do not
release until the final version of the code has passed both sets of
tests. If release testing uncovers a bug that requires a code change,
that means that debug testing also needs to be redone.
There are different development strategies appropriate for different
types of program, and different types of development teams. Splits
between development, debugging and testing can vary.


Perhaps my attitude here is unusual, and due to the type of work I do.
For me, a "project" consists of all my own source code, all the
libraries, headers, microcontroller SDK's, all the third-party source
code, the build process (normally a makefile and often a few scripts,
including all compiler flags), the toolchain, and the library.

The binary that goes into the product is thus entirely reproducible. I
don't deliver a collection of C files to the customer, I provide the
whole build system - and the code is debugged, tested and guaranteed
only with that toolchain and build settings. It is important that if
the customer finds a problem years later, I am working with /exactly/
the same binary as was delivered.

Of course I try to make the code as independent of the details of the
toolchain, libraries and SDK's as reasonably possible, and I will move
over to new versions if appropriate - but that means full re-testing,
re-qualification, and so on.
bart
2024-06-17 19:24:07 UTC
Permalink
Post by David Brown
Post by bart
Post by James Kuyper
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers.
In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Compilation speed is important to everyone. That's why so many tricks
are used to get around the lack of speed in a big compiler, or so many
extra resources are thrown at the problem.
What "tricks" ?
Post by bart
Runtime performance is important too, but at this level of language,
the difference between optimised and unoptimised code is narrow.
Unoptimised may be between 1x and 2x slower, typically.
That depends on the language, type of code, and target platform. Typical
C code on an x86_64 platform might be two or three times slower when
using a poorly optimising compiler.  After all, the designers of x86
cpus put a great deal of effort into making shitty code run fast. For
high-performance code written with care and requiring fast results, the
performance difference will be bigger.  For C++, especially code that
makes good use of abstractions, the difference can be very much bigger.
For C code on an embedded ARM device or other microcontroller, it's not
unusual to see a 5x speed improvement on optimised code.
Speed is not the only good reason for picking C as the language for a
task, but it is often a relevant factor.  And if it is a factor, then
you will usually prefer faster speeds.
Post by bart
Perhaps slower on benchmarks, or code written in C++ style that
generates lots of redundances that relies on optimisation to make it fast.
But, during developement, you probably wouldn't use optimisation anyway.
I virtually always have optimisation enabled during development.  I
might, when trying to chase down a specific bug, reduce some specific
optimisations, but I have never seen the point of crippling a
development tool when doing development work - it makes no sense at all.
Post by bart
In that case, you're still suffering slow build times with a big
compiler, but you don't get any faster code at the end of it.
I sometimes suggest to people to use Tiny C most of the time, and run
gcc from time to time for extra analysis and extra checks, and use
gcc-O3 for production builds.
I cannot imagine any situation where I would think that might be a good
idea.
But then, I see development tools as tools to help my work as a
developer, while you seem to consider tools (other than your own) as
objects of hatred to be avoided whenever possible or dismissed as
"tricks".  I don't expect we will ever agree there.
Here's one use-case of a C compiler, to process the output of my
whole-program non-C compiler. The 'mc' transpiler first converts to C
then invokes a C compiler according to options:

C:\qx52>tm mc -mcc qc
W:Invoking C compiler: mcc -out:qc.exe qc.c
Compiling qc.c to qc.exe
TM: 0.31

C:\qx52>tm mc -tcc qc
W:Invoking C compiler: tcc -oqc.exe qc.c
c:\windows\system32\user32.dll -luser32 c:\windows\system32\kernel32.dll
-fdollars-in-identifiers
TM: 0.27

C:\qx52>tm mc -gcc qc
W:Invoking C compiler: gcc -m64 -oqc.exe qc.c -s
TM: 2.44

C:\qx52>tm mc -gcc -opt qc
W:Invoking C compiler: gcc -m64 -O3 -oqc.exe qc.c -s
TM: 14.47

The actual translation to C take 0.1 seconds, so tcc is 13 times faster
at producing optimised code than gcc-O0, and about 80 times faster than
gcc-O3 (67 times faster than -O2).

If you don't need optimised code right now, why would you invoke gcc
rather than tcc? It's a no-brainer.

My C compiler is in there as well, and it's also quite fast, but if I
can run it, it means I'm running on Windows and can also directly use my
main compiler, which completes the job in 0.1 seconds:

C:\qx52>tm mm qc
TM: 0.09

But if running on Linux for example, I need to use intermediate C, and
tcc makes more sense as a default than gcc.

With tcc, I also only need two files totalling 230KB (I don't need std
headers); I can bundle it with the compiler.
David Brown
2024-06-18 12:40:27 UTC
Permalink
Post by bart
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
If you don't need optimised code right now, why would you invoke gcc
rather than tcc? It's a no-brainer.
You might use tcc if you have no brain. People who do C development
seriously don't use the compiler just to generate an exe file. gcc is a
development tool, not just a compiler. (As is clang, and MSVC.) If you
think compilation speed of a subset of C is all that matters, you are
not doing C development.
bart
2024-06-18 13:55:57 UTC
Permalink
Post by bart
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
If you don't need optimised code right now, why would you invoke gcc
rather than tcc? It's a no-brainer.
You might use tcc if you have no brain.  People who do C development
seriously don't use the compiler just to generate an exe file.  gcc is a
development tool, not just a compiler.  (As is clang, and MSVC.)  If you
think compilation speed of a subset of C is all that matters, you are
not doing C development.
It's all that mattered in the context that you snipped. Which actually
/isn't/ C development; the C has been generated by a program. This
hardly an uncommon use of a C compiler; there, all you want of it is (1)
to generate executable code (2) perhaps make it generate fast code.

There are a number of use-cases where the extra capabilities of C aren't
relevant.

If I do this:

gcc prog.c
del a.exe

where 'del a.exe' is done by mistake (or perhaps I do 'gcc prog2.c'
which wipes out a.exe; I like super-smart compilers!), then I have to do
'gcc prog.c' again.

But it has already been analysed, and nothing has changed; I just want a
translation from .c file to .exe file.
James Kuyper
2024-06-18 13:39:10 UTC
Permalink
On 17/06/2024 21:24, bart wrote:
...
Post by bart
If you don't need optimised code right now, why would you invoke gcc
rather than tcc? It's a no-brainer.
On virtually every occasion when I've heard someone claim that a given
decision is a no-brainer, I would generally make a different decision if
I actually applied my brain to the issue. This is no exception.
bart
2024-06-18 13:58:30 UTC
Permalink
Post by James Kuyper
...
Post by bart
If you don't need optimised code right now, why would you invoke gcc
rather than tcc? It's a no-brainer.
On virtually every occasion when I've heard someone claim that a given
decision is a no-brainer, I would generally make a different decision if
I actually applied my brain to the issue. This is no exception.
So your brain would tell you to choose a tool which takes at least 10
times as long to do the same task?

OK.
Scott Lurndal
2024-06-18 14:33:34 UTC
Permalink
Post by bart
Post by James Kuyper
...
Post by bart
If you don't need optimised code right now, why would you invoke gcc
rather than tcc? It's a no-brainer.
On virtually every occasion when I've heard someone claim that a given
decision is a no-brainer, I would generally make a different decision if
I actually applied my brain to the issue. This is no exception.
So your brain would tell you to choose a tool which takes at least 10
times as long to do the same task?
That's a ridiculous characterization. Why on earth would I compile
with tcc when it generates completely different (and very poorly
performing code) than the production compiler that I would
use for the version shipped to customers?

The difference in compile time for the vast majority of source
files being compiled with those two compilers is in the noise.
James Kuyper
2024-06-18 17:02:50 UTC
Permalink
Post by bart
Post by James Kuyper
...
Post by bart
If you don't need optimised code right now, why would you invoke gcc
rather than tcc? It's a no-brainer.
On virtually every occasion when I've heard someone claim that a given
decision is a no-brainer, I would generally make a different decision if
I actually applied my brain to the issue. This is no exception.
So your brain would tell you to choose a tool which takes at least 10
times as long to do the same task?
No, "the task" isn't "compile a program", it's "develop a program",
which includes only a quite negligible amount of time spent compiling it.
What I know about TCC is relatively limited, but the Wikipedia article
is consistent with what I though I knew. It says that tcc supports all
of the features of C90, most of C99, and some gnu extensions. That is
not the dialect of C I want to write in. I want full conformance with
the latest official version of C, with any unintentional use of gnu
extensions flagged with a diagnostic.
Having to write my code in a crippled version of C would be a waste of
my time, and having to fix it to take advantage of the features of a
more modern version of C when I'm ready to optimize it would be a
further waste of time. I'd save far more development time by writing in
the same dialect of C from the very beginning, then I could ever
possibly save by dividing entirely negligible compile times by a factor
of 10.
bart
2024-06-18 18:06:48 UTC
Permalink
Post by James Kuyper
Post by bart
Post by James Kuyper
...
Post by bart
If you don't need optimised code right now, why would you invoke gcc
rather than tcc? It's a no-brainer.
On virtually every occasion when I've heard someone claim that a given
decision is a no-brainer, I would generally make a different decision if
I actually applied my brain to the issue. This is no exception.
So your brain would tell you to choose a tool which takes at least 10
times as long to do the same task?
No, "the task" isn't "compile a program", it's "develop a program",
which includes only a quite negligible amount of time spent compiling it.
What I know about TCC is relatively limited, but the Wikipedia article
is consistent with what I though I knew. It says that tcc supports all
of the features of C90, most of C99, and some gnu extensions. That is
not the dialect of C I want to write in. I want full conformance with
the latest official version of C, with any unintentional use of gnu
extensions flagged with a diagnostic.
Having to write my code in a crippled version of C would be a waste of
my time, and having to fix it to take advantage of the features of a
more modern version of C when I'm ready to optimize it would be a
further waste of time. I'd save far more development time by writing in
the same dialect of C from the very beginning, then I could ever
possibly save by dividing entirely negligible compile times by a factor
of 10.
No, the task in my examples was to turn the validated C generated by a
program into runnable binary.

The C can be generated very quickly; then invoking gcc, even using -O0,
would be like hitting a brick wall. With Tiny C, the whole process is
more fluent.

My generated C tends to be very conservative. The most controversial
feature it has is the use of "$" in identifiers, which Tcc for some
reason doesn't support by default unless you enable it with a
long-winded option (see the examples I showed).

This use of C is fairly common among programming languages, where you
don't need a lot of fancy analysis, since that has already been done by
the front end compiler. And it doesn't really need extra features
either; that's also taken care of.
Tim Rentsch
2024-06-18 18:07:28 UTC
Permalink
Post by bart
Post by James Kuyper
...
Post by bart
If you don't need optimised code right now, why would you invoke gcc
rather than tcc? It's a no-brainer.
On virtually every occasion when I've heard someone claim that a given
decision is a no-brainer, I would generally make a different decision if
I actually applied my brain to the issue. This is no exception.
So your brain would tell you to choose a tool which takes at least 10
times as long to do the same task?
"When two people do the same thing, it's not exactly the same."

- the ancient playwright Terence
bart
2024-06-18 18:22:46 UTC
Permalink
Post by David Brown
Post by bart
Post by James Kuyper
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers.
In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Compilation speed is important to everyone. That's why so many tricks
are used to get around the lack of speed in a big compiler, or so many
extra resources are thrown at the problem.
What "tricks" ?
Going to considerable lengths to avoid actually doing any compilation,
or to somehow cache previous results (I mean things like .pch files
rather than .o files).

Have a look at any makefile.

If compilation was instant, half the reasons for a makefile and its
dependency graphs would disappear.

For the scale of programs I write, with the tools I use, compilation
*is* more or less instant.

(Roughly 0.1 seconds; faster than it takes to press and release the
Enter key, for my main compiler. My C compiler takes a bit longer, as it
has been accelerated, but it tends to be used for smaller projects if it
is something I've written.)
Post by David Brown
That depends on the language, type of code, and target platform. Typical
C code on an x86_64 platform might be two or three times slower when
using a poorly optimising compiler.  After all, the designers of x86
cpus put a great deal of effort into making shitty code run fast.
Yes, that's one reason why you can get away without an optimiser, for
sensibly written source code. But it also makes reasoning about optimal
code much harder: removing superfluous instructions often makes code slower!
Tim Rentsch
2024-06-18 23:34:59 UTC
Permalink
Post by bart
If compilation was instant, half the reasons for a makefile and its
dependency graphs would disappear.
Even if that conclusion were right, it's irrelevant, because the
premise is false. Furthermore as compilers get faster there is
an irrestible force to add diagnostic tools to simplify and
improve the reliability of program development, and that will
easily soak up as many cycles as are available.
David Brown
2024-06-19 08:05:53 UTC
Permalink
Post by bart
Post by David Brown
Post by bart
Post by James Kuyper
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers.
In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Compilation speed is important to everyone. That's why so many tricks
are used to get around the lack of speed in a big compiler, or so
many extra resources are thrown at the problem.
What "tricks" ?
Going to considerable lengths to avoid actually doing any compilation,
or to somehow cache previous results (I mean things like .pch files
rather than .o files).
Have a look at any makefile.
As I suspected, your idea of "tricks" is mostly what other people call
useful or essential tools.

I would use makefiles even if compilation was instant. I /do/ use
makefiles even when compilation is near instant. I use them even if
every run requires a full rebuild of everything. I use them for all
kinds of tasks other than compiling C - I first started using them for
cross-assembly builds on DOS.

The point of a makefile (or other build system) is twofold:

1. Get consistent results, with minimal risk of sometimes getting the
build process wrong.

2. Save time and effort for the developer.

It takes a special kind of dedication and stubborn, wilful ignorance to
fail to see the benefit of build tools. (make is not the only option
available.)
Post by bart
If compilation was instant, half the reasons for a makefile and its
dependency graphs would disappear.
My makefiles would be simpler if compilation were instant, but they
would be equally essential to my work.
Post by bart
For the scale of programs I write, with the tools I use, compilation
*is* more or less instant.
Some of us write serious code, use serious tools, and use them in
serious ways.
Post by bart
(Roughly 0.1 seconds; faster than it takes to press and release the
Enter key, for my main compiler. My C compiler takes a bit longer, as it
has been accelerated, but it tends to be used for smaller projects if it
is something I've written.)
Post by David Brown
That depends on the language, type of code, and target platform.
Typical C code on an x86_64 platform might be two or three times
slower when using a poorly optimising compiler.  After all, the
designers of x86 cpus put a great deal of effort into making shitty
code run fast.
Yes, that's one reason why you can get away without an optimiser, for
sensibly written source code. But it also makes reasoning about optimal
code much harder: removing superfluous instructions often makes code slower!
No, removing superfluous instructions very rarely makes code slower.
But I agree that it is hard to figure out optimal code sequences on
modern cpus, especially x86_64 devices, and even more so when you want
fast results on a range of such processors. Writing a good optimiser is
a very difficult task. But when compilers with good optimisers exist,
/using/ them is not at all hard for getting reasonable results.
(Squeezing out the last few percent /is/ hard.)
bart
2024-06-19 10:52:04 UTC
Permalink
Post by David Brown
Post by bart
Post by David Brown
Post by bart
Post by James Kuyper
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's
customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers.
In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Compilation speed is important to everyone. That's why so many
tricks are used to get around the lack of speed in a big compiler,
or so many extra resources are thrown at the problem.
What "tricks" ?
Going to considerable lengths to avoid actually doing any compilation,
or to somehow cache previous results (I mean things like .pch files
rather than .o files).
Have a look at any makefile.
As I suspected, your idea of "tricks" is mostly what other people call
useful or essential tools.
I would use makefiles even if compilation was instant.  I /do/ use
makefiles even when compilation is near instant.  I use them even if
every run requires a full rebuild of everything.  I use them for all
kinds of tasks other than compiling C - I first started using them for
cross-assembly builds on DOS.
1. Get consistent results, with minimal risk of sometimes getting the
build process wrong.
2. Save time and effort for the developer.
It takes a special kind of dedication and stubborn, wilful ignorance to
fail to see the benefit of build tools.  (make is not the only option
available.)
Post by bart
If compilation was instant, half the reasons for a makefile and its
dependency graphs would disappear.
My makefiles would be simpler if compilation were instant, but they
would be equally essential to my work.
Post by bart
For the scale of programs I write, with the tools I use, compilation
*is* more or less instant.
Some of us write serious code, use serious tools, and use them in
serious ways.
I understand. You can't take a product seriously unless it's big, and
it's slow, and it's got lots of shiny buttons!

My company had the some problem once: I had a product that fitted onto
one floppy disk, which seemed insubstantial. So it was supplied on a CD
instead to make it seem bigger than it was (a CD had a capacity 500
times greater than a floppy).

However, I can paste here the result of running a C program. Could you
tell whether it was built with a 0.2MB compiler or a 0.2GB one? Could
you tell whether it was built in 0.1 seconds or if it took a minute?

Could you tell whether execution took 5 seconds or 10 seconds?
David Brown
2024-06-19 17:53:55 UTC
Permalink
Post by bart
Post by David Brown
Some of us write serious code, use serious tools, and use them in
serious ways.
I understand.
No. You don't understand. No doubt you never will, because you have
spend such a lot of time and effort to be sure that you will never
understand.

I'm glad that you are happy with the tools you use - and let's leave it
there.

Malcolm McLean
2024-06-17 13:09:30 UTC
Permalink
Post by James Kuyper
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers. In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Yes, but that's probably what you want. As a one man band, bart can't
bear Aple and Microsoft in priiducing a compiler which creates highly
optimised code that executes quickly. And that's what the vast majority
of customers want.
But say that 0.1% of customers are more interested in compilation speed.
Now, Apple and Microsoft might not even bother catering to, what is to
them, just a tiny market and a disraction for the development team. So
bart can plausibly produce a compiler which does compile code correctly,
and much faster than the big boys. And there are about 28 million pepole
in the world who derive thetr living as computer programmers. 0.1% of
that is 28,000, Charge 10 dollars each, and that's a nice little
business for one person.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
David Brown
2024-06-17 16:38:40 UTC
Permalink
Post by Malcolm McLean
Post by James Kuyper
Post by Michael S
On Thu, 13 Jun 2024 13:53:54 +0200
...
Post by Michael S
Post by David Brown
I know more than most C programmers about how certain C compilers
work, and what works well with them, and what is relevant for them -
though I certainly don't claim to know everything. Obviously Bart
knows vastly more about how /his/ compiler works. He also tends to
do testing with several small and odd C compilers, which can give
interesting results even though they are of little practical
relevance for real-world C development work.
Since he do compilers himself, he has much better feeling [that you
or me] of what is hard and what is easy, what is small and what is big,
what is fast and what is slow. That applies to all compilers except
those that are very unusual. "Major" compiler are not unusual at all.
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers. In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Yes, but that's probably what you want.
Who is "you" here? Possibly "you" is Bart, but it is certainly not /me/.
Post by Malcolm McLean
As a one man band, bart can't
bear Aple and Microsoft in priiducing a compiler which creates highly
optimised code that executes quickly. And that's what the vast majority
of customers want.
I believe I can figure out the words you used, despite the spelling
mistakes, but I can't figure out what you are trying to say. One man
band developers generally want the best tools they can get hold of,
within the limits of their budgets - home-made tools can be part of
that, but not for something like a C compiler.
Post by Malcolm McLean
But say that 0.1% of customers are more interested in compilation speed.
Now, Apple and Microsoft might not even bother catering to, what is to
them, just a tiny market and a disraction for the development team. So
bart can plausibly produce a compiler which does compile code correctly,
and much faster than the big boys. And there are about 28 million pepole
in the world who derive thetr living as computer programmers. 0.1% of
that is 28,000, Charge 10 dollars each, and that's a nice little
business for one person.
Your connection with reality is tenuous at best.

People /do/ like faster compilation speed, though it is rarely a problem
in practice for C. But no one who has used a real C compiler would want
to step down to Bart's tool just to shave a second off their build times.

It is quite believable that some people will find big tools intimidating
and want something that they view as smaller and simpler, but not for
compiler speed. And that market is already saturated by things like
lcc-win and tcc. (These are, unlike Bart's tool, compilers that make a
significant effort to be correct for standard C. Bart's compiler is
made for his own use only, and is only likely to be correct for the
subset of C that he wants to use. There's absolutely nothing wrong with
that, but his tool is far from being ready to sell to others as a C
compiler.)
James Kuyper
2024-06-18 07:28:17 UTC
Permalink
...
Post by Malcolm McLean
Post by James Kuyper
The problem is that Bart's compiler is VERY unusual. It's customized for
his use, and he has lots of quirks in the way he thinks compilers should
work, which are very different from those of most other programmers. In
particular, compilation speed is very important to him, while execution
speed is almost completely unimportant, which is pretty much the
opposite of the way most programmers prioritize those things.
Yes, but that's probably what you want.
Who is "you" here? Possibly "you" is Bart, but it is certainly not /me/.
In a response to a message by me, his "you" is most plausibly me - but
that certain isn't true of me, either.
Kenny McCormack
2024-06-12 09:40:02 UTC
Permalink
In article <v4bcbj$1gqlo$***@raubtier-asyl.eternal-september.org>,
Bonita Montero <***@gmail.com> wrote:
...
Post by Bonita Montero
I converted my code into sth. that produces a C-string as an output.
Printing that is still very fast, i.e. the files produced are written
with about 2.6GiB/s. But the problem is still that all compilers don't
parse large files but quit with an out of memory error. So having a
.obj output along with a small header file would be the best.
#include <iostream>
#include <fstream>
#include <charconv>
#include <span>
#include <vector>
Do you know what newsgroup this is?
--
Faced with the choice between changing one's mind and proving that there is
no need to do so, almost everyone gets busy on the proof.

- John Kenneth Galbraith -
Bonita Montero
2024-06-12 10:59:05 UTC
Permalink
Post by Kenny McCormack
...
Post by Bonita Montero
I converted my code into sth. that produces a C-string as an output.
Printing that is still very fast, i.e. the files produced are written
with about 2.6GiB/s. But the problem is still that all compilers don't
parse large files but quit with an out of memory error. So having a
.obj output along with a small header file would be the best.
#include <iostream>
#include <fstream>
#include <charconv>
#include <span>
#include <vector>
Do you know what newsgroup this is?
The output of this tool is a C-file.
bart
2024-06-11 17:09:41 UTC
Permalink
Post by Bonita Montero
Post by Malcolm McLean
I've finally got Baby X (not the resource compiler, the Windows
toolkit) to link X11 on my Mac. And I can start work on it again. But
it was far from easyt to get it to link.
Can friendly people plesse dowload it and see if it compiles on other
platforms?
For large files it would be more convenient to have an .obj-output in
the proper format for Windows or Linux. I implemented a binary file to
char-array compiler myself and for lage files the compilation time was
totally intolerable and all the compilers I tested (g++, clang++, MSVC)
ran into out of memory conditiond sooner or later, depending on the
size of the char array.
A char array initialised a byte at a time?

That is going to be inefficient.

Instead of a large array like {65, 65, 66 ...} try instead generating a
single string like:

"\101\102\103..."
"\x41\x42\x43..."

Or maybe numbers of shorter strings with a few dozen values per line.

Each string should be represented internally by the compiler as a single
object occupying one byte per element, instead of dozens or even
hundreds of bytes per element.
Bonita Montero
2024-06-11 18:22:34 UTC
Permalink
Post by bart
A char array initialised a byte at a time?
That is going to be inefficient.
xxd does is that way.
bart
2024-06-11 19:31:56 UTC
Permalink
Post by Bonita Montero
Post by bart
A char array initialised a byte at a time?
That is going to be inefficient.
xxd does is that way.
Which option is that?

All recent discussions of xxd have used the '-i' option which writes out
the data as individual hex bytes such as '0x41,'.

The most compact option appears to be '-ps', but that is not a C data
format.

Or do you mean that xxd does a byte at a time, and so your version did
the same?

In that case don't be afraid to do your own thing if it is better.

I've just done a test. First writing a 5MB binary as 5 million
individual bytes, one per line. Compiling that with gcc took 15 seconds.

Then I wrote it as a single string full of hex codes as I suggested.

Now compilation took 1.3 seconds.

Using Tiny C, compile time reduced from 1.75 seconds to under 0.3
seconds. 12 times or 6 times faster compile-time; string data is always
faster.
Kalevi Kolttonen
2024-06-11 18:21:03 UTC
Permalink
Post by Malcolm McLean
Can friendly people plesse dowload it and see if it compiles on other
platforms?
Fully up-to-date Fedora 40 with the following GCC:

***@lappari ~$ rpm -qi gcc|head -4
Name : gcc
Version : 14.1.1
Release : 4.fc40
Architecture: x86_64

The build fails:

[ 85%] Building C object CMakeFiles/babyxfs_shell.dir/babyxfs_src/shell/bbx_fs_shell.c.o
/home/kalevi/tmp/babyx/babyxrc/babyxfs_src/shell/bbx_fs_shell.c: In function ‘cp’:
/home/kalevi/tmp/babyx/babyxrc/babyxfs_src/shell/bbx_fs_shell.c:503:9: error: ‘return’ with no value, in function returning non-void [-Wreturn-mismatch]
503 | return;
| ^~~~~~
/home/kalevi/tmp/babyx/babyxrc/babyxfs_src/shell/bbx_fs_shell.c:481:12: note: declared here
481 | static int cp(BBX_FS_SHELL *shell, int argc, char **argv)
| ^~
/home/kalevi/tmp/babyx/babyxrc/babyxfs_src/shell/bbx_fs_shell.c: In function ‘bbx_fs_system’:
/home/kalevi/tmp/babyx/babyxrc/babyxfs_src/shell/bbx_fs_shell.c:656:9: warning: ‘strncat’ specified bound 1024 equals destination size [-Wstringop-overflow=]
656 | strncat(line, " ", 1024);
| ^~~~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/babyxfs_shell.dir/build.make:160: CMakeFiles/babyxfs_shell.dir/babyxfs_src/shell/bbx_fs_shell.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:341: CMakeFiles/babyxfs_shell.dir/all] Error 2
make: *** [Makefile:91: all] Error 2


After fixing the line 503 to be "return 0;", the build completed
and produced the executables.

But you should also address the -Wstringop-overflow warning.

I also got a warning about tmpnam() being dangerous and
a suggestion to use mkstemp() instead.

br,
KK
Loading...