Discussion:
Command line globber/tokenizer library for C?
Add Reply
Ted Nolan <tednolan>
2024-09-10 19:01:37 UTC
Reply
Permalink
I have the case where my C program is handed a string which is basically
a command line.

Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.

Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say

hello -world "This is foo.*" foo.*

becomes something like

my_argv[0] "hello"
my_argv[1] "-world"
my_argv[2] "This is foo.*"
my_argv[3] foo.h
my_argv[4] foo.c
my_argv[5] foo.txt

my_argc = 6

I could live without the globbing if that's a bridge too far.
--
columbiaclosings.com
What's not in Columbia anymore..
Lawrence D'Oliveiro
2024-09-10 20:58:36 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
If that’s what your OS is giving you, your OS is doing it wrong.
Keith Thompson
2024-09-10 21:12:50 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
If that’s what your OS is giving you, your OS is doing it wrong.
He didn't say the string is coming from the OS.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Janis Papanagnou
2024-09-10 21:05:32 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
IIUC you don't want the shell to do the expansion but, sort of,
re-invent the wheel in your application (a'la DOS). - Okay.
Post by Ted Nolan <tednolan>
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
I also suppose that by "tokenizing" you don't mean something like
strtok (3) - extract tokens from strings
but a field separation as the Unix shell does using 'IFS'.

I don't know of a C library but if I'd want to implement a function
that all POSIX shells do then I'd look into the shell packages...

For Kornshell (e.g. version 93u+m) I see these files in the package
src/lib/libast/include/glob.h
src/lib/libast/misc/glob.c
that obviously care about the globbing function. (I suspect you'll
need some more supporting files from the ksh package.)

HTH

Janis
Post by Ted Nolan <tednolan>
Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say
hello -world "This is foo.*" foo.*
becomes something like
my_argv[0] "hello"
my_argv[1] "-world"
my_argv[2] "This is foo.*"
my_argv[3] foo.h
my_argv[4] foo.c
my_argv[5] foo.txt
my_argc = 6
I could live without the globbing if that's a bridge too far.
Ted Nolan <tednolan>
2024-09-10 22:11:29 UTC
Reply
Permalink
Post by Janis Papanagnou
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
IIUC you don't want the shell to do the expansion but, sort of,
re-invent the wheel in your application (a'la DOS). - Okay.
Post by Ted Nolan <tednolan>
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
I also suppose that by "tokenizing" you don't mean something like
strtok (3) - extract tokens from strings
but a field separation as the Unix shell does using 'IFS'.
More or less, and homething that understands double and single quoting
so that a token can have white space inside. Backslash handling
would be nice too so

'Who\'s a good boy?'

would work as one token.
Post by Janis Papanagnou
I don't know of a C library but if I'd want to implement a function
that all POSIX shells do then I'd look into the shell packages...
For Kornshell (e.g. version 93u+m) I see these files in the package
src/lib/libast/include/glob.h
src/lib/libast/misc/glob.c
that obviously care about the globbing function. (I suspect you'll
need some more supporting files from the ksh package.)
HTH
Janis
Thanks, fixing up something out of shell components is probably more
than I want to take on here though.

Ted
--
columbiaclosings.com
What's not in Columbia anymore..
Keith Thompson
2024-09-10 21:37:15 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say
hello -world "This is foo.*" foo.*
becomes something like
my_argv[0] "hello"
my_argv[1] "-world"
my_argv[2] "This is foo.*"
my_argv[3] foo.h
my_argv[4] foo.c
my_argv[5] foo.txt
my_argc = 6
I could live without the globbing if that's a bridge too far.
What environment(s) does this need to run in?

I don't know of a standard(ish) function that does this. POSIX defines
the glob() function, but it only does globbing, not word-splitting.

If you're trying to emulate the way the shell (which one?) parses
command lines, and *if* you're on a system that has a shell, you can
invoke a shell to do the work for you. Here's a quick and dirty
example:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(void) {
const char *line = "hello -world \"This is foo.*\" foo.*";
char *cmd = malloc(50 + strlen(line));
sprintf(cmd, "printf '%%s\n' %s", line);
system(cmd);
}

This prints the arguments to stdout, one per line (and doesn't handle
arguments with embedded newlines very well). You could modify the
command to write the output to a temporary file and then read that file,
or you could use popen() if it's available.

Of course this is portable only to systems that have a Unix-style shell,
and it can even behave differently depending on how the default shell
behaves. And invoking a new process is going to make this relatively
slow, which may or may not matter depending on how many times you need
to do it.

There is no completely portable solution, since you need to be able to
get directory listings to handle wildcards.

A quick Google search points to this question:

https://stackoverflow.com/q/21335041/827263
"How to split a string using shell-like rules in C++?"

An answer refers to Boost.Program_options, which is specific to C++.
Apparently boost::program_options::split_unix() does what you're looking
for.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Ted Nolan <tednolan>
2024-09-10 22:13:06 UTC
Reply
Permalink
Post by Keith Thompson
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say
hello -world "This is foo.*" foo.*
becomes something like
my_argv[0] "hello"
my_argv[1] "-world"
my_argv[2] "This is foo.*"
my_argv[3] foo.h
my_argv[4] foo.c
my_argv[5] foo.txt
my_argc = 6
I could live without the globbing if that's a bridge too far.
What environment(s) does this need to run in?
I don't know of a standard(ish) function that does this. POSIX defines
the glob() function, but it only does globbing, not word-splitting.
If you're trying to emulate the way the shell (which one?) parses
command lines, and *if* you're on a system that has a shell, you can
invoke a shell to do the work for you. Here's a quick and dirty
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(void) {
const char *line = "hello -world \"This is foo.*\" foo.*";
char *cmd = malloc(50 + strlen(line));
sprintf(cmd, "printf '%%s\n' %s", line);
system(cmd);
}
This prints the arguments to stdout, one per line (and doesn't handle
arguments with embedded newlines very well). You could modify the
command to write the output to a temporary file and then read that file,
or you could use popen() if it's available.
Of course this is portable only to systems that have a Unix-style shell,
and it can even behave differently depending on how the default shell
behaves. And invoking a new process is going to make this relatively
slow, which may or may not matter depending on how many times you need
to do it.
There is no completely portable solution, since you need to be able to
get directory listings to handle wildcards.
Yeah, that's the kind of thing I was hoping to avoid, and probably more
than I want to get into, but thanks!
Post by Keith Thompson
https://stackoverflow.com/q/21335041/827263
"How to split a string using shell-like rules in C++?"
An answer refers to Boost.Program_options, which is specific to C++.
Apparently boost::program_options::split_unix() does what you're looking
for.
--
columbiaclosings.com
What's not in Columbia anymore..
Kenny McCormack
2024-09-11 01:56:27 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say
Have a look at wordexp(3).
--
Trump has normalized hate.

The media has normalized Trump.
Ted Nolan <tednolan>
2024-09-11 02:54:32 UTC
Reply
Permalink
Post by Kenny McCormack
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say
Have a look at wordexp(3).
Very interesting, thanks!

Something added since lasttime I paged through section 3...
--
columbiaclosings.com
What's not in Columbia anymore..
Bonita Montero
2024-09-11 12:17:33 UTC
Reply
Permalink
Do you think it would make sense to switch the language ?

#include <Windows.h>
#include <iostream>
#include <string_view>

using namespace std;

template<typename CharType, typename Consumer>
requires requires( Consumer consumer, basic_string_view<CharType> sv )
{ { consumer( sv ) }; }
void Tokenize( basic_string_view<CharType> sv, Consumer consumer )
{
using sv_t = basic_string_view<CharType>;
auto it = sv.begin();
for( ; it != sv.end(); )
{
CharType end;
typename sv_t::iterator tkBegin;
if( *it == '\"' )
{
end = '\"';
tkBegin = ++it;
}
else
{
end = ' ';
tkBegin = it++;
}
for( ; it != sv.end() && *it != end; ++it );
consumer( sv_t( tkBegin, it ) );
if( it != sv.end() ) [[unlikely]]
{
while( ++it != sv.end() && *it == ' ' );
continue;
}
}
}

int main()
{
LPWSTR pCmdLine = GetCommandLineW();
size_t i = 1;
Tokenize( wstring_view( pCmdLine ), [&]( wstring_view sv )
{
wcout << i++ << L": \"" << sv << L"\"" << endl;
} );
}
Ted Nolan <tednolan>
2024-09-11 12:22:16 UTC
Reply
Permalink
Post by Bonita Montero
Do you think it would make sense to switch the language ?
No, not an option, thanks.
--
columbiaclosings.com
What's not in Columbia anymore..
Bonita Montero
2024-09-11 12:28:33 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
Post by Bonita Montero
Do you think it would make sense to switch the language ?
No, not an option, thanks.
I could write a C-bridge for you.
Ted Nolan <tednolan>
2024-09-11 12:44:19 UTC
Reply
Permalink
Post by Bonita Montero
Post by Ted Nolan <tednolan>
Post by Bonita Montero
Do you think it would make sense to switch the language ?
No, not an option, thanks.
I could write a C-bridge for you.
No, thank you.
--
columbiaclosings.com
What's not in Columbia anymore..
Bart
2024-09-11 12:42:00 UTC
Reply
Permalink
Post by Bonita Montero
#include <Windows.h>
#include <iostream>
#include <string_view>
using namespace std;
template<typename CharType, typename Consumer>
    requires requires( Consumer consumer, basic_string_view<CharType>
sv ) { { consumer( sv ) }; }
void Tokenize( basic_string_view<CharType> sv, Consumer consumer )
{
    using sv_t = basic_string_view<CharType>;
    auto it = sv.begin();
    for( ; it != sv.end(); )
    {
        CharType end;
        typename sv_t::iterator tkBegin;
        if( *it == '\"' )
        {
            end = '\"';
            tkBegin = ++it;
        }
        else
        {
            end = ' ';
            tkBegin = it++;
        }
        for( ; it != sv.end() && *it != end; ++it );
        consumer( sv_t( tkBegin, it ) );
        if( it != sv.end() ) [[unlikely]]
        {
            while( ++it != sv.end() && *it == ' ' );
            continue;
        }
    }
}
int main()
{
    LPWSTR pCmdLine = GetCommandLineW();
    size_t i = 1;
    Tokenize( wstring_view( pCmdLine ), [&]( wstring_view sv )
        {
            wcout << i++ << L": \"" << sv << L"\"" << endl;
        } );
}
This doesn't do globbing (expanding non-quoted wildcard filenames into
lists of individual filenames).

Neither is it clear if the OP is on Windows. (Otherwise I can supply
something in C for the globbing part. Chopping up into line into
separate items is fairly trivial.)
Kenny McCormack
2024-09-11 14:59:48 UTC
Reply
Permalink
Post by Bonita Montero
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/IceCream
Bonita Montero
2024-09-11 18:14:11 UTC
Reply
Permalink
Post by Kenny McCormack
Post by Bonita Montero
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Kenny McCormack
2024-09-11 18:17:04 UTC
Reply
Permalink
Post by Bonita Montero
Post by Kenny McCormack
Post by Bonita Montero
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
You know the rules around here, just as well as I do.
--
The coronavirus is the first thing, in his 74 pathetic years of existence,
that the orange menace has come into contact with, that he couldn't browbeat,
bully, bullshit, bribe, sue, legally harrass, get Daddy to fix, get his
siblings to bail him out of, or, if all else fails, simply wish it away.
Ted Nolan <tednolan>
2024-09-11 18:49:19 UTC
Reply
Permalink
Post by Bonita Montero
Post by Kenny McCormack
Post by Bonita Montero
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
--
columbiaclosings.com
What's not in Columbia anymore..
Keith Thompson
2024-09-11 21:43:35 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
Post by Bonita Montero
Post by Kenny McCormack
Post by Bonita Montero
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
We could help you more effectively if we understood your requirements.

Why exactly does it have to be C?

What system or systems do you need to support? (I asked this before and
you didn't answer.)

If you only care about Windows, for example, that's going to affect what
solutions we can offer; likewise if you only care about POSIX-based
systems, or only about Linux-based systems.

It might also be useful to know more about the context. If this is for
some specific application, what is that application intended to do, and
why does it need to do tokenization and globbing?
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Ted Nolan <tednolan>
2024-09-12 03:06:15 UTC
Reply
Permalink
Post by Keith Thompson
Post by Ted Nolan <tednolan>
Post by Bonita Montero
Post by Kenny McCormack
Post by Bonita Montero
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
We could help you more effectively if we understood your requirements.
Why exactly does it have to be C?
What system or systems do you need to support? (I asked this before and
you didn't answer.)
If you only care about Windows, for example, that's going to affect what
solutions we can offer; likewise if you only care about POSIX-based
systems, or only about Linux-based systems.
It might also be useful to know more about the context. If this is for
some specific application, what is that application intended to do, and
why does it need to do tokenization and globbing?
This would be for work, so I am limited in what I can say about it, but
it has to be in C because it is would be a C callout from a GT.M mumps
process. GT.M stores the command line tail (everything it doesn't need
to get a program running) in the special variable $ZCMDLINE which can
be passed to a callout. I would like to parse that string as the
shell does a command line. Basically, if it isn't a C library that
is commonly available through Linux package managers I probably can't
use it. In the end this is a "nice to have" and I have a q&d approach
that I will probably use.
--
columbiaclosings.com
What's not in Columbia anymore..
Keith Thompson
2024-09-12 03:37:06 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
Post by Keith Thompson
Post by Ted Nolan <tednolan>
Post by Bonita Montero
Post by Kenny McCormack
Post by Bonita Montero
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
We could help you more effectively if we understood your requirements.
Why exactly does it have to be C?
What system or systems do you need to support? (I asked this before and
you didn't answer.)
If you only care about Windows, for example, that's going to affect what
solutions we can offer; likewise if you only care about POSIX-based
systems, or only about Linux-based systems.
It might also be useful to know more about the context. If this is for
some specific application, what is that application intended to do, and
why does it need to do tokenization and globbing?
This would be for work, so I am limited in what I can say about it, but
it has to be in C because it is would be a C callout from a GT.M mumps
process. GT.M stores the command line tail (everything it doesn't need
to get a program running) in the special variable $ZCMDLINE which can
be passed to a callout. I would like to parse that string as the
shell does a command line. Basically, if it isn't a C library that
is commonly available through Linux package managers I probably can't
use it. In the end this is a "nice to have" and I have a q&d approach
that I will probably use.
Since you mentioned Linux package managers, I presume this only needs to
work on Linux-based systems, which means you can use POSIX-specific
functions. That could have been useful to know earlier.

And you might consider posting to comp.unix.programmer for more
system-specific solutions.

Earlier I suggested using system() to pass the string to the shell.
That wouldn't work on Windows, but it should be ok for your
requirements. There are good reasons not to want to do that, but "there
might not be a POSIX shell available" apparently isn't one of them.

I'd also suggest nailing down your exact requirements; "as the
shell does" is inexact, since different shells behave differently.

Suggested reading:
https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Ted Nolan <tednolan>
2024-09-12 03:56:09 UTC
Reply
Permalink
Post by Keith Thompson
Post by Ted Nolan <tednolan>
Post by Keith Thompson
Post by Ted Nolan <tednolan>
Post by Bonita Montero
Post by Kenny McCormack
Post by Bonita Montero
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
We could help you more effectively if we understood your requirements.
Why exactly does it have to be C?
What system or systems do you need to support? (I asked this before and
you didn't answer.)
If you only care about Windows, for example, that's going to affect what
solutions we can offer; likewise if you only care about POSIX-based
systems, or only about Linux-based systems.
It might also be useful to know more about the context. If this is for
some specific application, what is that application intended to do, and
why does it need to do tokenization and globbing?
This would be for work, so I am limited in what I can say about it, but
it has to be in C because it is would be a C callout from a GT.M mumps
process. GT.M stores the command line tail (everything it doesn't need
to get a program running) in the special variable $ZCMDLINE which can
be passed to a callout. I would like to parse that string as the
shell does a command line. Basically, if it isn't a C library that
is commonly available through Linux package managers I probably can't
use it. In the end this is a "nice to have" and I have a q&d approach
that I will probably use.
Since you mentioned Linux package managers, I presume this only needs to
work on Linux-based systems, which means you can use POSIX-specific
functions. That could have been useful to know earlier.
And you might consider posting to comp.unix.programmer for more
system-specific solutions.
Earlier I suggested using system() to pass the string to the shell.
That wouldn't work on Windows, but it should be ok for your
requirements. There are good reasons not to want to do that, but "there
might not be a POSIX shell available" apparently isn't one of them.
I'd also suggest nailing down your exact requirements; "as the
shell does" is inexact, since different shells behave differently.
https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html
--
void Void(void) { Void(); } /* The recursive call of the void */
Thank you. system() would not work as I don't want to execute
anything, just parse into an argv-like array.

I appreciate the responses, but it looks like I will be staying with
my q&d approach for now.
--
columbiaclosings.com
What's not in Columbia anymore..
Kenny McCormack
2024-09-12 13:22:48 UTC
Reply
Permalink
In article <***@mid.individual.net>,
Ted Nolan <tednolan> <tednolan> wrote:
...
Post by Ted Nolan <tednolan>
Thank you. system() would not work as I don't want to execute
anything, just parse into an argv-like array.
I appreciate the responses, but it looks like I will be staying with
my q&d approach for now.
This is a "solved problem". Or, to put it another way, if wordexp(3) is
not the solution, then there is no general solution (and that means, yes,
you'll have to "roll your own", as many here have suggested you do).
Post by Ted Nolan <tednolan>
columbiaclosings.com
What's not in Columbia anymore..
Which Columbia are we talking about here? And why?
--
Mike Huckabee has yet to consciously uncouple from Josh Duggar.
Lawrence D'Oliveiro
2024-09-12 22:07:47 UTC
Reply
Permalink
I would guess that majority of non-US readers don't know about existence
of both of these places.
I personally had two options in mind: a big country in South America and
a big university in NYC.
Maybe US readers don’t realize that one of those is not “Columbia”.
Ted Nolan <tednolan>
2024-09-12 13:50:38 UTC
Reply
Permalink
Post by Kenny McCormack
...
Post by Ted Nolan <tednolan>
Thank you. system() would not work as I don't want to execute
anything, just parse into an argv-like array.
I appreciate the responses, but it looks like I will be staying with
my q&d approach for now.
This is a "solved problem". Or, to put it another way, if wordexp(3) is
not the solution, then there is no general solution (and that means, yes,
you'll have to "roll your own", as many here have suggested you do).
Post by Ted Nolan <tednolan>
columbiaclosings.com
What's not in Columbia anymore..
Which Columbia are we talking about here? And why?
SC. It keeps me busy.
--
columbiaclosings.com
What's not in Columbia anymore..
David Brown
2024-09-12 14:43:01 UTC
Reply
Permalink
On Thu, 12 Sep 2024 14:06:07 -0000 (UTC)
...
Post by Ted Nolan <tednolan>
Post by Kenny McCormack
Post by Ted Nolan <tednolan>
columbiaclosings.com
What's not in Columbia anymore..
Which Columbia are we talking about here? And why?
SC. It keeps me busy.
OK. For some reason, I was thinking Maryland.
I would guess that majority of non-US readers don't know about
existence of both of these places.
The trouble is the minority of US posters who don't know about the
existence of non-US places.
I personally had two options in mind: a big country in South America and
a big university in NYC. Both are sorta closing, if not literally.
The country is spelt "Colombia" - but it's still the first thing I
thought of.
Michael S
2024-09-12 14:20:52 UTC
Reply
Permalink
On Thu, 12 Sep 2024 14:06:07 -0000 (UTC)
...
Post by Ted Nolan <tednolan>
Post by Kenny McCormack
Post by Ted Nolan <tednolan>
columbiaclosings.com
What's not in Columbia anymore..
Which Columbia are we talking about here? And why?
SC. It keeps me busy.
OK. For some reason, I was thinking Maryland.
I would guess that majority of non-US readers don't know about
existence of both of these places.
I personally had two options in mind: a big country in South America and
a big university in NYC. Both are sorta closing, if not literally.
Kenny McCormack
2024-09-12 14:06:07 UTC
Reply
Permalink
In article <***@mid.individual.net>,
Ted Nolan <tednolan> <tednolan> wrote:
...
Post by Ted Nolan <tednolan>
Post by Kenny McCormack
Post by Ted Nolan <tednolan>
columbiaclosings.com
What's not in Columbia anymore..
Which Columbia are we talking about here? And why?
SC. It keeps me busy.
OK. For some reason, I was thinking Maryland.
--
I'm building a wall.
Lawrence D'Oliveiro
2024-09-12 04:14:05 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
GT.M stores the command line tail (everything it doesn't need
to get a program running) in the special variable $ZCMDLINE which can be
passed to a callout.
What, all the arguments smooshed together into a single string?

That’s a dumb way to do it.
Ben Bacarisse
2024-09-12 09:43:52 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
Post by Keith Thompson
Post by Ted Nolan <tednolan>
Post by Bonita Montero
Post by Kenny McCormack
Post by Bonita Montero
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
We could help you more effectively if we understood your requirements.
Why exactly does it have to be C?
What system or systems do you need to support? (I asked this before and
you didn't answer.)
If you only care about Windows, for example, that's going to affect what
solutions we can offer; likewise if you only care about POSIX-based
systems, or only about Linux-based systems.
It might also be useful to know more about the context. If this is for
some specific application, what is that application intended to do, and
why does it need to do tokenization and globbing?
This would be for work, so I am limited in what I can say about it, but
it has to be in C because it is would be a C callout from a GT.M mumps
process. GT.M stores the command line tail (everything it doesn't need
to get a program running) in the special variable $ZCMDLINE which can
be passed to a callout. I would like to parse that string as the
shell does a command line. Basically, if it isn't a C library that
is commonly available through Linux package managers I probably can't
use it. In the end this is a "nice to have" and I have a q&d approach
that I will probably use.
If it were down to me I'd do the word splitting "by hand" and use POSIX
glob(3) to do the file expansion.

For the word splitting, the key would be to know where these strings
come from and what is really needed. That would enable you to pick a
syntax that makes sense for your particular use-case. For example, if
the string are typed by people, I wouldn't use the typical shell
quoting. I would not want anyone (other than technical Unix users) to
have to type

'He said "you can'"'""t"

You might get away with a very simple word splitting algorithm.
--
Ben.
Bart
2024-09-11 20:19:58 UTC
Reply
Permalink
Post by Bonita Montero
Post by Kenny McCormack
Post by Bonita Montero
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
C++ is a simpler language? You're having a laugh!

I made a version of your code that was about 50 lines, so a higher line
count, but was some 10% smaller in character count.

It doesn't need 'templates', or 'basic-string-view', or 'Consumer',
whatever that is, or iterators. This is a trivial exercise as I said.

However, if working on Windows, there may be no need: there is already a
CommandLineToArgvW function.
Bonita Montero
2024-09-12 02:22:08 UTC
Reply
Permalink
Post by Bart
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
Bart
2024-09-12 11:29:26 UTC
Reply
Permalink
Post by Bonita Montero
Post by Bart
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
count was half.

But your solutions are always incomprehensible because they strive for
the most advanced features possible.
Kenny McCormack
2024-09-12 12:13:55 UTC
Reply
Permalink
Post by Bart
Post by Bonita Montero
Post by Bart
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
count was half.
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
And, of course, totally off-topic.

Maybe I should start posting Fortran "solutions".

Or maybe Haskell?

Or Intercal?
--
Mike Huckabee has yet to consciously uncouple from Josh Duggar.
Janis Papanagnou
2024-09-12 12:24:37 UTC
Reply
Permalink
Post by Kenny McCormack
Maybe I should start posting Fortran "solutions".
Or maybe Haskell?
Or Intercal?
The latter might certainly be enlightening. I had always problems
to write such code. And seeing functional code would help. - But
it's off-topic as you say. Less off-topic are (IMO) C++ solutions
in contrast to C; C++ has a C base and C appears to me to advance
"with an eye on" C++.

Janis
Kenny McCormack
2024-09-12 13:20:14 UTC
Reply
Permalink
Post by Janis Papanagnou
Post by Kenny McCormack
Maybe I should start posting Fortran "solutions".
Or maybe Haskell?
Or Intercal?
The latter might certainly be enlightening. I had always problems
to write such code. And seeing functional code would help. - But
it's off-topic as you say. Less off-topic are (IMO) C++ solutions
in contrast to C; C++ has a C base and C appears to me to advance
"with an eye on" C++.
It's not me saying this. I am just repeating the CLC party line.

Ask Leader Keith. He'll tell you.

It has always been CLC policy that C++ is just as off-topic as Fortran or
C# or any other language (other than C, of course). And, of course, that
being "off topic" is the highest and most unforgivable sin.

Just ask Leader Keith. He'll tell you.

Leader Keith will tell you that we are not here to solve problems or to
discuss programming techniques. We are here to debate minutiae of the
various standards documents.
--
Elect a clown, expect a circus.
Bonita Montero
2024-09-12 15:48:25 UTC
Reply
Permalink
Programming C++ with only a "C" mindset I'd not consider advisable.
That's what I've generally observed; with sole knowledge of X there
seems to be an impetus and preference to infer those techniques to
programming in Y. A lot of early C++ programs I've seen were just,
umm, "enhanced" "C" programs.
I'm using most new language facilities, but the mindset is still the same.
Janis Papanagnou
2024-09-12 15:59:01 UTC
Reply
Permalink
Post by Bonita Montero
Programming C++ with only a "C" mindset I'd not consider advisable.
That's what I've generally observed; with sole knowledge of X there
seems to be an impetus and preference to infer those techniques to
programming in Y. A lot of early C++ programs I've seen were just,
umm, "enhanced" "C" programs.
I'm using most new language facilities, but the mindset is still the same.
You already said that in your previous posting. See my reply in my
response to that post.

Janis
Lawrence D'Oliveiro
2024-09-12 22:32:10 UTC
Reply
Permalink
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour, I’d
say that C++ is in fact designed to be used that way.
James Kuyper
2024-09-12 22:50:11 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour, I’d
say that C++ is in fact designed to be used that way.
Like many other aspects of C++, that was dictated by a necessity of
remaining a certain minimum level of backwards compatibility with
existing C code. You shouldn't draw any larger conclusions from that choice.
Lawrence D'Oliveiro
2024-09-13 01:37:08 UTC
Reply
Permalink
Post by James Kuyper
Post by Lawrence D'Oliveiro
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour,
I’d say that C++ is in fact designed to be used that way.
Like many other aspects of C++, that was dictated by a necessity of
remaining a certain minimum level of backwards compatibility with
existing C code.
No it wasn’t. OO was an entirely new feature, with no counterpart in C, so
there was nothing to maintain “backwards compatibility” with.
Michael S
2024-09-13 08:30:29 UTC
Reply
Permalink
On Fri, 13 Sep 2024 01:37:08 -0000 (UTC)
Post by Lawrence D'Oliveiro
Post by James Kuyper
Post by Lawrence D'Oliveiro
A lot of early C++ programs I've seen were just, umm, "enhanced"
"C" programs.
Given that C++ makes “virtual” optional instead of standard
behaviour, I’d say that C++ is in fact designed to be used that
way.
Like many other aspects of C++, that was dictated by a necessity of
remaining a certain minimum level of backwards compatibility with
existing C code.
No it wasn’t. OO was an entirely new feature, with no counterpart in
C, so there was nothing to maintain “backwards compatibility” with.
Agreed.
Method syntax was entirely new with no backward compatibility
restrictions.
BTW, in these sort of discussion I'd rather avoid in-concrete words,
like "OO".
Lawrence D'Oliveiro
2024-09-13 02:58:07 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour,
I’d say that C++ is in fact designed to be used that way.
There's different semantics with and without a 'virtual' specification.
Precisely. And consider what the meaning of a non-virtual destructor is:
it is essentially always the wrong thing to do.
Janis Papanagnou
2024-09-13 12:31:06 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour,
I’d say that C++ is in fact designed to be used that way.
There's different semantics with and without a 'virtual' specification.
it is essentially always the wrong thing to do.
I've used both design patterns depending on what I intended,
so I cannot say that one would be "wrong" in any way.

(Upthread I seem to have rightly sensed that this might lead
to a "right/wrong" ("real" OO) sort of discussion. I abstain.)

Janis
Kaz Kylheku
2024-09-13 03:38:59 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour, I’d
say that C++ is in fact designed to be used that way.
That is half right, goofy. C++ is certainly designed to be used without
virtual functions. But it's also designed to be with virtual functions,
too. Both ways are by design!

Moreover, a VIRTUAL keyword was already present in Simula-67, which
inspired C++. Virtual functions were not added as an afterthought into a
language that was originally designed otherwise. But even if they were,
such an addition is a design change. If you design a thing to be used
one way, without envisioning another way, and then some time later hit
upon the idea for that other way and add it to the design, then both
ways are now designed in, and intended to be used.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Kaz Kylheku
2024-09-13 04:18:36 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour, I’d
say that C++ is in fact designed to be used that way.
There's different semantics with and without a 'virtual' specification.
Even if you want polymorphism (and have to use 'virtual') there's no
need to define it as _default_ (and "disable" it where unnecessary).
Thd development of C++ follows, or at least used to, a "don't pay for
what you don't use" principle: programs not using certain mechanisms
that have an implementation cost should not ideally bear the cost of
implementing them.

When a class doesn't use virtual functions, that class doesn't need the
"vtable" implementation mechanism. And even if functions are called
through pointers or references to the object, the dispatch is static.
That would not be the case with virtuals, because a class must be
suspected of being a base class.

This "don't pay for what you don't use" principle doesn't mean that
the language is designed to be used with a preference toward
the cheap choices. It's just pragmatics. Programs do not all use all
available features, so why turn them on? A program which needs no
special features that have an implementation cost would then have to
verbiage to opt-out of all of them. Moreover, old programs would have to
be maintained to add more verbiage to opt out of newer hidden expenses
that have been made default.

We can imagine the "ls" program was designed such that a large number
of options were enabled at the same time, such that users
have to turn off what they don't need. It does not follow that just
because things are not that way (options sanely optional)
that doesn't imply that ls is designed to be used with no options.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Bonita Montero
2024-09-12 14:14:41 UTC
Reply
Permalink
Post by Kenny McCormack
It has always been CLC policy that C++ is just as off-topic as Fortran or
C# or any other language (other than C, of course). And, of course, that
being "off topic" is the highest and most unforgivable sin.
A switch to C++ is much more likely than to Fortran.
Doesn't matter. I'm talking policy, not personal feelings.
C and C++ are programmed with the same mindset.
Janis Papanagnou
2024-09-12 15:40:17 UTC
Reply
Permalink
Post by Bonita Montero
C and C++ are programmed with the same mindset.
Careful! This is depending on the background and experiences of the
respective programmer(s). If you come, say, from Simula you'd most
likely have another (OOP) perspective than if you'd come from "C".

Programming C++ with only a "C" mindset I'd not consider advisable.
That's what I've generally observed; with sole knowledge of X there
seems to be an impetus and preference to infer those techniques to
programming in Y. A lot of early C++ programs I've seen were just,
umm, "enhanced" "C" programs.

Janis
Bonita Montero
2024-09-12 14:01:17 UTC
Reply
Permalink
Post by Kenny McCormack
It has always been CLC policy that C++ is just as off-topic as Fortran or
C# or any other language (other than C, of course). And, of course, that
being "off topic" is the highest and most unforgivable sin.
A switch to C++ is much more likely than to Fortran.
Kenny McCormack
2024-09-12 14:07:46 UTC
Reply
Permalink
Post by Kenny McCormack
It has always been CLC policy that C++ is just as off-topic as Fortran or
C# or any other language (other than C, of course). And, of course, that
being "off topic" is the highest and most unforgivable sin.
A switch to C++ is much more likely than to Fortran.
Doesn't matter. I'm talking policy, not personal feelings.
--
"Only a genius could lose a billion dollars running a casino."
"You know what they say: the house always loses."
"When life gives you lemons, don't pay taxes."
"Grab 'em by the p***y!"
Bonita Montero
2024-09-12 14:00:07 UTC
Reply
Permalink
Post by Kenny McCormack
Post by Bart
Post by Bonita Montero
Post by Bart
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
count was half.
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
And, of course, totally off-topic.
Maybe I should start posting Fortran "solutions".
Or maybe Haskell?
Or Intercal?
Maybe Rust, that would fit like C++ because it's also a systems
programming language with some capabilities like C or C++. Haskell
would share much less properties.
Janis Papanagnou
2024-09-12 12:20:01 UTC
Reply
Permalink
Post by Bart
Post by Bonita Montero
Post by Bart
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
count was half.
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
I don't know of the other poster's solutions. But a quick browse seems
to show nothing incomprehensible or anything that should be difficult
to understand. (YMMV; especially if you're not familiar with C++ then
I'm sure the code may look like noise to you.)

In the given context of C and C++ I've always perceived the features
of C++ to add to comprehensibility of source code where the respective
C code required writing clumsy code and needed (unnecessary) syntactic
ballast to implement similar functions and program constructs.

Your undifferentiated complaint sounds more like someone not willing
to understand the other concepts or have a reluctance or laziness to
make yourself familiar with them.

Janis
Bonita Montero
2024-09-12 15:47:23 UTC
Reply
Permalink
Not only "roughly imagine"; I think the imperative languages have
so many common basic concepts that you can have a quite good idea,
especially if you know more than just two or three such languages.
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1
like they fit into memory (platform-dependent).
Yes, C++ can be written with a "C" mindset. But this is nothing
I'd suggest. Better make yourself familiar with the new concepts
(OO, genericity, or even simple things like references). - IMO.
I'm using mostly all new features as you can see from my code.
But the mindset is still the same.
Bonita Montero
2024-09-12 16:09:43 UTC
Reply
Permalink
Post by Bonita Montero
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1
like they fit into memory (platform-dependent).
Don't know what you're trying to say here or what it is you aim
at. If you think it's worth discussing please elaborate.
for 95% of C's lanuage facilties it's easy to imagine which computation
steps on an ISA-level are processed. For C that's also easy two answer.
In both langugages it's easy to layout data structures as they could
be hex-dumped with a debugger. The combination of both features may
not hold true for a lot of other languages.
Post by Bonita Montero
I'm using mostly all new features as you can see from my code.
But the mindset is still the same.
I don't know you or your background or much of your programming.
So please understand that I'm not inclined to make any comments
about you or your code; this would be all speculative and not
contribute anything to the discussion. If you had the impression
that what I said was referring to you personally you were wrong.
I just wanted to say that the kind of thinking is the same in C
and C++.
Janis Papanagnou
2024-09-12 15:56:58 UTC
Reply
Permalink
Post by Bonita Montero
Not only "roughly imagine"; I think the imperative languages have
so many common basic concepts that you can have a quite good idea,
especially if you know more than just two or three such languages.
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1
like they fit into memory (platform-dependent).
Don't know what you're trying to say here or what it is you aim
at. If you think it's worth discussing please elaborate.
Post by Bonita Montero
Yes, C++ can be written with a "C" mindset. But this is nothing
I'd suggest. Better make yourself familiar with the new concepts
(OO, genericity, or even simple things like references). - IMO.
I'm using mostly all new features as you can see from my code.
But the mindset is still the same.
I don't know you or your background or much of your programming.
So please understand that I'm not inclined to make any comments
about you or your code; this would be all speculative and not
contribute anything to the discussion. If you had the impression
that what I said was referring to you personally you were wrong.

Janis
Lawrence D'Oliveiro
2024-09-13 02:43:56 UTC
Reply
Permalink
Post by Bonita Montero
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1 like they
fit into memory (platform-dependent).
Python.
Bonita Montero
2024-09-13 05:27:47 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1 like they
fit into memory (platform-dependent).
Python.
lol
Lawrence D'Oliveiro
2024-09-13 06:49:20 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1 like
they fit into memory (platform-dependent).
Python.
Have a look at
<https://gitlab.com/ldo/inotipy_examples/-/blob/master/fanotify_7_example?ref_type=heads>,
and compare the C original from
<https://manpages.debian.org/7/fanotify.7.en.html>. The Python code is
half the size and can use high-level async calls.
Michael S
2024-09-13 08:49:35 UTC
Reply
Permalink
On Fri, 13 Sep 2024 06:49:20 -0000 (UTC)
Post by Kenny McCormack
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1 like
they fit into memory (platform-dependent).
Python.
Have a look at
<https://gitlab.com/ldo/inotipy_examples/-/blob/master/fanotify_7_example?ref_type=heads>,
and compare the C original from
<https://manpages.debian.org/7/fanotify.7.en.html>. The Python code is
half the size and can use high-level async calls.
What exactly your response has to do with producing data structures
with predefined layout?
Lawrence D'Oliveiro
2024-09-13 22:04:43 UTC
Reply
Permalink
What exactly your response has to do with producing data structures with
predefined layout?
Look at those structures: they have a specific predefined layout.
Bart
2024-09-13 22:48:38 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
What exactly your response has to do with producing data structures with
predefined layout?
Look at those structures: they have a specific predefined layout.
Look at them where? One link is a man-page with several C structs
defined (triple-spaced for some reason).

But I can't see anything in the Python link that looks like it might be
defining a struct layout.

So I would also question what it has to do with it.
Lawrence D'Oliveiro
2024-09-14 01:41:02 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
Post by Michael S
What exactly your response has to do with producing data structures
with predefined layout?
Look at those structures: they have a specific predefined layout.
One link is a man-page with several C structs defined ...
Correct. Structures that the Python wrapper is able to map exactly.

And the choice between which particular structure variants to use is
dynamic, dependent on the event type. So the Python wrapper is able to
dynamically generate a suitable type-safe wrapper -- something that a
statically-typed language cannot do.
Bart
2024-09-14 09:58:34 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Michael S
What exactly your response has to do with producing data structures
with predefined layout?
Look at those structures: they have a specific predefined layout.
One link is a man-page with several C structs defined ...
Correct. Structures that the Python wrapper is able to map exactly.
And the choice between which particular structure variants to use is
dynamic, dependent on the event type. So the Python wrapper is able to
dynamically generate a suitable type-safe wrapper -- something that a
statically-typed language cannot do.
So, where IS the struct defined in that Python code? Which line number?

If it is the defined is elsewhere, then that Python proves nothing.

For example, where is this struct:

struct fanotify_event_info_header {
__u8 info_type;
__u8 pad;
__u16 len;
};

defined in that Python? I have an idea how this might be done using the
ctypes module for example, but it's not pretty. However I'm not even
seeing that.
Lawrence D'Oliveiro
2024-09-14 22:37:49 UTC
Reply
Permalink
Post by Bart
So, where IS the struct defined in that Python code?
In the API wrapper module, of course.

<https://gitlab.com/ldo/inotipy>
Janis Papanagnou
2024-09-12 15:46:44 UTC
Reply
Permalink
Spaces Hard tabs
C++ 829 682 characters
You are counting spaces, tabs and characters to characterize programs'
quality or legibility or what? - Abandon all hope ye who enter here.

Janis
Bart
2024-09-12 16:15:35 UTC
Reply
Permalink
Post by Janis Papanagnou
Spaces Hard tabs
C++ 829 682 characters
You are counting spaces, tabs and characters to characterize programs'
quality or legibility or what? - Abandon all hope ye who enter here.
I'm counting the number of characters needed to express the function.
Since one of BM's claims is that the C++ example was smaller than C.

The difference between the two columns is whether indentation uses hard
tabs or spaces. The C version is more deeply indentated so that makes a
difference. (Also the width of the tabs, but everything was measured
with tabs set to 4 characters.)
Bonita Montero
2024-09-12 16:26:48 UTC
Reply
Permalink
Post by Bart
I'm counting the number of characters needed to express the function.
Since one of BM's claims is that the C++ example was smaller than C.
That was a general statement about C++ and not on my code. I'm using
abstractions like iterators for the benefit of bounds-checking while
debugging but the code is similar.
But usually you write in C++ a fifth or less code. Look what s simple
vector<T>::emplace_back() saves work over manually relocating a complex
vector-like data structure in C. Or consider how convenient a map or
unordered_map is over f.e. sth. like with libavl.
This stupid character-counting from a simple example shows that Bart
has no professional C++ experience.
Post by Bart
The difference between the two columns is whether indentation uses hard
tabs or spaces. The C version is more deeply indentated so that makes a
difference. (Also the width of the tabs, but everything was measured
with tabs set to 4 characters.)
That's as ridiculous as Barts's discussion.
Bart
2024-09-12 16:28:27 UTC
Reply
Permalink
On Thu, 12 Sep 2024 14:44:03 +0100
Apart from unnecessary ilen limit, of unnecessary goto into block (I
have nothing against forward gotos out of blocks, but gotos into blocks
make me nervous) and of variable 'length' that serves no purpose, your
code simply does not fulfill requirements of OP.
I can immediately see two gotchas: no handling of escaped double
quotation marks \" and no handling of single quotation marks. Quite
possibly there are additional omissions.
BM's C++ version doesn't handle embedded quotes or single quotes either.
Neither expand wildcards into sequences of filename arguments.

But you're right about 'length' which in the end was not used. It makes
the C version even smaller without it.

I wasn't trying to match the OP's requirements, as I don't know what
they are.

If this has to exactly match how the OS parses the command line into
separate parameters, then that's likely to be a significantly more
complex program, especially if it is to run on Linux.

There's probably no point in trying to create such program; you'd need
to find a way of utilising the OS to do the work.

Note that I wasn't posting to solve the OP's problem, but as a
counter-example to that C++ code which literally hurt my eyes to look at.
Bonita Montero
2024-09-12 17:02:35 UTC
Reply
Permalink
Post by Bart
BM's C++ version doesn't handle embedded quotes or single quotes either.
Neither expand wildcards into sequences of filename arguments.
Ok, that must be impossible with C++.
I just wanted to show how to do it basically and what are the
advantages: no intermediate data structure through functional
progtamming and debug iterators.
Michael S
2024-09-12 19:38:28 UTC
Reply
Permalink
On Thu, 12 Sep 2024 19:02:35 +0200
Post by Bonita Montero
Post by Bart
BM's C++ version doesn't handle embedded quotes or single quotes either.
Neither expand wildcards into sequences of filename arguments.
Ok, that must be impossible with C++.
I just wanted to show how to do it basically and what are the
advantages: no intermediate data structure through functional
progtamming and debug iterators.
Callback is as easy in C as in C++.
Debug iterators not needed in such simple program. At least, I don't
need them.
Here is an equivalent of your parser written in C. It does not look 5
times longer.

Attention! That is an equivalent of Bonita's code, no more and
hopefully no less. The routine does not fulfill requirements of OP!

#include <stddef.h>

void parse(const char* src,
void (*OnToken)(const char* beg, size_t len, void* context),
void* context) {
char c0 = ' ', c1 = '\t';
const char* beg = 0;
for (;;src++) {
char c = *src;
if (c == c0 || c == c1 || c == 0) {
if (beg) {
OnToken(beg, src-beg, context);
c0 = ' ', c1 = '\t';
beg = 0;
}
if (c == 0)
break;
} else if (!beg) {
beg = src;
if (c == '"') {
c0 = c1 = c;
++beg;
}
}
}
}
Bonita Montero
2024-09-13 05:28:34 UTC
Reply
Permalink
Post by Michael S
Callback is as easy in C as in C++.
Absolutely not because callbacks can't have state in C.
Michael S
2024-09-13 08:38:15 UTC
Reply
Permalink
On Fri, 13 Sep 2024 07:28:34 +0200
Post by Bonita Montero
Post by Michael S
Callback is as easy in C as in C++.
Absolutely not because callbacks can't have state in C.
So what is 'context' parameter in my code?
Bonita Montero
2024-09-13 12:12:32 UTC
Reply
Permalink
Post by Michael S
On Fri, 13 Sep 2024 07:28:34 +0200
Post by Bonita Montero
Post by Michael S
Callback is as easy in C as in C++.
Absolutely not because callbacks can't have state in C.
So what is 'context' parameter in my code?
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters. Just a [&] and the lambda refers to the
whole outer context.
Michael S
2024-09-13 12:25:00 UTC
Reply
Permalink
On Fri, 13 Sep 2024 14:12:32 +0200
Post by Bonita Montero
Post by Michael S
On Fri, 13 Sep 2024 07:28:34 +0200
Post by Bonita Montero
Post by Michael S
Callback is as easy in C as in C++.
Absolutely not because callbacks can't have state in C.
So what is 'context' parameter in my code?
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters.
So, do you admit that callback in C can have state?
Post by Bonita Montero
Just a [&] and the lambda refers to the whole outer context.
Bad software engineering practice that easily leads to incomprehensible
code.
When in C++ and not in mood for C-style, I very much prefer functors.
Ideologically they are the same as C-style context, but a little
sugarized syntactically.
Bonita Montero
2024-09-13 13:20:25 UTC
Reply
Permalink
Post by Michael S
On Fri, 13 Sep 2024 14:12:32 +0200
Post by Bonita Montero
Post by Michael S
On Fri, 13 Sep 2024 07:28:34 +0200
Post by Bonita Montero
Post by Michael S
Callback is as easy in C as in C++.
Absolutely not because callbacks can't have state in C.
So what is 'context' parameter in my code?
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters.
So, do you admit that callback in C can have state?
No, because this is a parameter and a lambda is a glue of an
object and a calling operator. The object is the state and a
C function-pointer misses that.
Post by Michael S
Bad software engineering practice that easily leads to incomprehensible
code.
I'm using this convention with nearly any lambda and if I can't
remember later which outer variables are used I remove the & and
the temporary non-referencing locals are underlined red and when
I've notice which locals were used I press ^Z.
Post by Michael S
When in C++ and not in mood for C-style, I very much prefer functors.
Ideologically they are the same as C-style context, but a little
sugarized syntactically.
No, A C++ functor may be an object with a calling operator. In C you
don't have the implicit object; that's magnitudes less convenient. C
is always multiple times more work.
Lawrence D'Oliveiro
2024-09-13 22:24:00 UTC
Reply
Permalink
Post by Bonita Montero
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters.
But you need a calling convention that passes “this” explicitly.
Bonita Montero
2024-09-13 23:42:05 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters.
But you need a calling convention that passes “this” explicitly.
That's not part of the C++-language.
Lawrence D'Oliveiro
2024-09-14 01:41:32 UTC
Reply
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters.
But you need a calling convention that passes “this” explicitly.
That's not part of the C++-language.
If the implementation doesn’t do it, it doesn’t work.
Tim Rentsch
2024-09-13 16:05:04 UTC
Reply
Permalink
Michael S <***@yahoo.com> writes:

[..iterate over words in a string..]
Post by Michael S
#include <stddef.h>
void parse(const char* src,
void (*OnToken)(const char* beg, size_t len, void* context),
void* context) {
char c0 = ' ', c1 = '\t';
const char* beg = 0;
for (;;src++) {
char c = *src;
if (c == c0 || c == c1 || c == 0) {
if (beg) {
OnToken(beg, src-beg, context);
c0 = ' ', c1 = '\t';
beg = 0;
}
if (c == 0)
break;
} else if (!beg) {
beg = src;
if (c == '"') {
c0 = c1 = c;
++beg;
}
}
}
}
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.


typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * ); };

static _Bool collect_word( const char *, const char *, _Bool, Gopher );
static _Bool is_space( char );


_Bool
words_do( const char *s, Gopher go ){
char c = *s;

return
is_space(c) ? words_do( s+1, go ) :
c ? collect_word( s, s, 1, go ) :
/***************/ 1;
}

_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;

return
c == 0 ? go->f( go, r, s ), w :
is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
/***************/ collect_word( s+1, r, w ^ c == '"', go );
}

_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
Michael S
2024-09-15 09:22:11 UTC
Reply
Permalink
On Fri, 13 Sep 2024 09:05:04 -0700
Post by Tim Rentsch
[..iterate over words in a string..]
Post by Michael S
#include <stddef.h>
void parse(const char* src,
void (*OnToken)(const char* beg, size_t len, void* context),
void* context) {
char c0 = ' ', c1 = '\t';
const char* beg = 0;
for (;;src++) {
char c = *src;
if (c == c0 || c == c1 || c == 0) {
if (beg) {
OnToken(beg, src-beg, context);
c0 = ' ', c1 = '\t';
beg = 0;
}
if (c == 0)
break;
} else if (!beg) {
beg = src;
if (c == '"') {
c0 = c1 = c;
++beg;
}
}
}
}
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * ); };
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
Can you give an example implementation of go->f() ?
It seems to me that it would have to use CONTAINING_RECORD or
container_of or analogous non-standard macro.

Also, while formally the program is written in C, by spirit it's
something else. May be, Lisp.
Lisp compilers are known to be very good at tail call elimination.
C compilers also can do it, but not reliably. In this particular case I
am afraid that common C compilers will implement it as written, i.e.
without turning recursion into iteration.

Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
Latest MSVC implements it as written, 100% recursion.
Tim Rentsch
2024-09-16 07:52:26 UTC
Reply
Permalink
Michael S <***@yahoo.com> writes:

[comments reordered]
Post by Michael S
Also, while formally the program is written in C, by spirit it's
something else. May be, Lisp.
I would call it a functional style, but still C. Not a C style
as most people are used to seeing it, I grant you that. I still
think of it as C though.
Post by Michael S
Lisp compilers are known to be very good at tail call elimination.
C compilers also can do it, but not reliably. In this particular
case I am afraid that common C compilers will implement it as
written, i.e. without turning recursion into iteration.
I routinely use gcc and clang, and both are good at turning
this kind of mutual recursion into iteration (-Os or higher,
although clang was able to eliminate all the recursion at -O1).
I agree the recursion elimination is not as reliable as one
would like; in practice though I find it quite usable.
Post by Michael S
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Both as expected.
Post by Michael S
Latest icc still does not turn it into iteration at least along one
code paths.
That's disappointing, but good to know.
Post by Michael S
Latest MSVC implements it as written, 100% recursion.
I'm not surprised at all. In my admittedly very limited experience,
MSVC is garbage.
Post by Michael S
Can you give an example implementation of go->f() ?
It seems to me that it would have to use CONTAINING_RECORD or
container_of or analogous non-standard macro.
You say that like you think such macros don't have well-defined
behavior. If I needed such a macro probably I would just
define it myself (and would be confident that it would
work correctly).

In this case I don't need a macro because I would put the gopher
struct at the beginning of the containing struct. For example:

#include <stdio.h>

typedef struct {
struct gopher_s go;
unsigned words;
} WordCounter;


static void
print_word( Gopher go, const char *s, const char *t ){
WordCounter *context = (void*) go;
int n = t-s;

printf( " word: %.*s\n", n, s );
context->words ++;
}


int
main(){
WordCounter wc = { { print_word }, 0 };
char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";

words_do( words, &wc.go );
printf( "\n" );
printf( " There were %u words found\n", wc.words );
return 0;
}
Michael S
2024-09-16 09:23:38 UTC
Reply
Permalink
On Mon, 16 Sep 2024 00:52:26 -0700
Post by Tim Rentsch
[comments reordered]
Post by Michael S
Also, while formally the program is written in C, by spirit it's
something else. May be, Lisp.
I would call it a functional style, but still C. Not a C style
as most people are used to seeing it, I grant you that. I still
think of it as C though.
Post by Michael S
Lisp compilers are known to be very good at tail call elimination.
C compilers also can do it, but not reliably. In this particular
case I am afraid that common C compilers will implement it as
written, i.e. without turning recursion into iteration.
I routinely use gcc and clang, and both are good at turning
this kind of mutual recursion into iteration (-Os or higher,
although clang was able to eliminate all the recursion at -O1).
I agree the recursion elimination is not as reliable as one
would like; in practice though I find it quite usable.
Post by Michael S
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Both as expected.
So, only 15 years for gcc and only 7 years for clang.
Post by Tim Rentsch
Post by Michael S
Latest icc still does not turn it into iteration at least along one
code paths.
That's disappointing, but good to know.
Post by Michael S
Latest MSVC implements it as written, 100% recursion.
I'm not surprised at all. In my admittedly very limited experience,
MSVC is garbage.
For sort of code that is important to me, gcc, clang and MSVC tend to
generate code of similar quality. clang is most suspect of the three to
sometimes unexpectedly produce utter crap. On the other hand, it is
sometimes most brilliant.
In case of gcc, I hate that recently they put tree-slp-vectorize under
-O2 umbrella.
Post by Tim Rentsch
Post by Michael S
Can you give an example implementation of go->f() ?
It seems to me that it would have to use CONTAINING_RECORD or
container_of or analogous non-standard macro.
You say that like you think such macros don't have well-defined
behavior. If I needed such a macro probably I would just
define it myself (and would be confident that it would
work correctly).
In this case I don't need a macro because I would put the gopher
#include <stdio.h>
typedef struct {
struct gopher_s go;
unsigned words;
} WordCounter;
static void
print_word( Gopher go, const char *s, const char *t ){
WordCounter *context = (void*) go;
That's what I was missing. Simple and adequate.
Post by Tim Rentsch
int n = t-s;
printf( " word: %.*s\n", n, s );
context->words ++;
}
int
main(){
WordCounter wc = { { print_word }, 0 };
char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";
words_do( words, &wc.go );
printf( "\n" );
printf( " There were %u words found\n", wc.words );
return 0;
}
There are couple of differences between your and my parsing.
1. "42""43"
You parse it as a single word, I split. It seems, your behavior is
closer to that of both bash and cmd.exe
2. I strip " characters from "-delimited words. You seem to leave them.
In this case what I do is more similar to both bash and cmd.exe

Not that it matters.
Tim Rentsch
2024-09-17 10:12:04 UTC
Reply
Permalink
Post by Michael S
On Mon, 16 Sep 2024 00:52:26 -0700
Post by Tim Rentsch
[comments reordered]
Post by Michael S
Also, while formally the program is written in C, by spirit it's
something else. May be, Lisp.
I would call it a functional style, but still C. Not a C style
as most people are used to seeing it, I grant you that. I still
think of it as C though.
Post by Michael S
Lisp compilers are known to be very good at tail call elimination.
C compilers also can do it, but not reliably. In this particular
case I am afraid that common C compilers will implement it as
written, i.e. without turning recursion into iteration.
I routinely use gcc and clang, and both are good at turning
this kind of mutual recursion into iteration (-Os or higher,
although clang was able to eliminate all the recursion at -O1).
I agree the recursion elimination is not as reliable as one
would like; in practice though I find it quite usable.
Post by Michael S
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
That's disappointing, but good to know.
Post by Michael S
Latest MSVC implements it as written, 100% recursion.
I'm not surprised at all. In my admittedly very limited experience,
MSVC is garbage.
For sort of code that is important to me, gcc, clang and MSVC tend to
generate code of similar quality.
To clarify, my earlier comment about MSVC is about what it thinks
the language is, not anything about quality of generated code. But
the lack of tail call elimination fits in with what else I have
seen.
Post by Michael S
clang is most suspect of the three to sometimes unexpectedly
produce utter crap. On the other hand, it is sometimes most
brilliant.
That's interesting. Recently I encountered a problem where clang
did just fine but gcc generated bad code under -O3.
Post by Michael S
In case of gcc, I hate that recently they put tree-slp-vectorize
under -O2 umbrella.
Yes, gcc is like a box of chocolates - you never know what you're
going to get.
Post by Michael S
Post by Tim Rentsch
Post by Michael S
Can you give an example implementation of go->f() ?
It seems to me that it would have to use CONTAINING_RECORD or
container_of or analogous non-standard macro.
You say that like you think such macros don't have well-defined
behavior. If I needed such a macro probably I would just
define it myself (and would be confident that it would
work correctly).
In this case I don't need a macro because I would put the gopher
#include <stdio.h>
typedef struct {
struct gopher_s go;
unsigned words;
} WordCounter;
static void
print_word( Gopher go, const char *s, const char *t ){
WordCounter *context = (void*) go;
That's what I was missing. Simple and adequate.
I now prefer this technique for callbacks. Cuts down on the
number of parameters, safer than a (void*) parameter, and it puts
the function pointer near the context state so it's easier to
connect the two (and less worry about them getting out of sync).
Post by Michael S
Post by Tim Rentsch
int n = t-s;
printf( " word: %.*s\n", n, s );
context->words ++;
}
int
main(){
WordCounter wc = { { print_word }, 0 };
char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";
words_do( words, &wc.go );
printf( "\n" );
printf( " There were %u words found\n", wc.words );
return 0;
}
There are couple of differences between your and my parsing.
1. "42""43"
You parse it as a single word, I split. It seems, your behavior is
closer to that of both bash and cmd.exe
Yes. I chose that deliberately because I often use patterns like
foo."$suffix" and it made sense to allow quoted subparts for that
reason.
Post by Michael S
2. I strip " characters from "-delimited words. You seem to leave them.
In this case what I do is more similar to both bash and cmd.exe
I do, both because it's easier, and in case the caller wants to
know where the quotes are. If it's important to strip them out
it's up to the caller to do that.
Post by Michael S
Not that it matters.
Yeah. These choices are only minor details; the general
approach taken is the main thing.
a***@fricas.org
2024-09-17 22:34:33 UTC
Reply
Permalink
Post by Michael S
On Fri, 13 Sep 2024 09:05:04 -0700
Post by Tim Rentsch
[..iterate over words in a string..]
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * ); };
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
<snip>
Post by Michael S
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
Latest MSVC implements it as written, 100% recursion.
I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
are not tail calls and gcc 12 compiles them as normal call.
The other calls are compiled to jumps. But call to 'collect_word'
in 'words_do' is not "sibicall" and dependig in calling convention
compiler may treat it narmal call. Two other calls, that is
call to 'words_do' in 'words_do' and call to 'collect_word' in
'collect_word' are clearly tail self recursion and compiler
should always optimize them to a jump.
--
Waldek Hebisch
Tim Rentsch
2024-09-17 23:33:16 UTC
Reply
Permalink
Post by a***@fricas.org
Post by Michael S
On Fri, 13 Sep 2024 09:05:04 -0700
Post by Tim Rentsch
[..iterate over words in a string..]
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * ); };
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
<snip>
Post by Michael S
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
Latest MSVC implements it as written, 100% recursion.
I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
are not tail calls and gcc 12 compiles them as normal call.
Right, they are not tail calls, simply ordinary calls (indirect
calls, but still ordinary calls).
Post by a***@fricas.org
The other calls are compiled to jumps. But call to 'collect_word'
in 'words_do' is not "sibicall" and dependig in calling convention
compiler may treat it narmal call. Two other calls, that is
call to 'words_do' in 'words_do' and call to 'collect_word' in
'collect_word' are clearly tail self recursion and compiler
should always optimize them to a jump.
Yes, a different set of calling conventions could result in the
call to collect_word from words_do being a normal call. It
should be possible to correct that by adding two dummy parameters
to words_do(), and wrapping the result in one outer function so
that there is at most one extra call besides the call from outide.
Michael S
2024-09-17 23:46:11 UTC
Reply
Permalink
On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
Post by a***@fricas.org
Post by Michael S
On Fri, 13 Sep 2024 09:05:04 -0700
Post by Tim Rentsch
[..iterate over words in a string..]
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * ); };
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
<snip>
Post by Michael S
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
Latest MSVC implements it as written, 100% recursion.
I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
are not tail calls and gcc 12 compiles them as normal call.
Naturally.
Post by a***@fricas.org
The other calls are compiled to jumps. But call to 'collect_word'
in 'words_do' is not "sibicall" and dependig in calling convention
compiler may treat it narmal call. Two other calls, that is
call to 'words_do' in 'words_do' and call to 'collect_word' in
'collect_word' are clearly tail self recursion and compiler
should always optimize them to a jump.
"Should" or not, MSVC does not eliminate them.

The funny thing is that it does eliminate all four calls after I rewrote
the code in more boring style.

_Bool
words_do( const char *s, Gopher go ){
char c = *s;
#if 1
if (is_space(c))
return words_do( s+1, go );
if (c)
return collect_word( s, s, 1, go );
return 1;
#else
return
is_space(c) ? words_do( s+1, go ) :
c ? collect_word( s, s, 1, go ):
/***************/ 1;
#endif
}

static
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
#if 1
if (c == 0) {
go->f( go, r, s );
return w;
}
if (is_space(c) && w) {
go->f( go, r, s );
return words_do( s, go );
}
return collect_word( s+1, r, w ^ c == '"', go );
#else
return
c == 0 ? go->f( go, r, s ), w :
is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
/***************/ collect_word( s+1, r, w ^ c == '"', go );
#endif
}

Kenny McCormack
2024-09-12 17:39:46 UTC
Reply
Permalink
Post by Bonita Montero
Post by Bart
BM's C++ version doesn't handle embedded quotes or single quotes either.
Neither expand wildcards into sequences of filename arguments.
Ok, that must be impossible with C++.
I just wanted to show how to do it basically and what are the
advantages: no intermediate data structure through functional
progtamming and debug iterators.
All of which would have been fine - and they'd probably all be raving about
what a clever boy you are - if you'd only posted it to an appropriate
newsgroup.
--
Many (most?) Trump voters voted for him because they thought if they
supported Trump enough, they'd get to *be* Trump.

Similarly, Trump believes that if *he* praises Putin enough, he'll get to *be* Putin.
Bart
2024-09-12 13:44:03 UTC
Reply
Permalink
Post by Janis Papanagnou
Post by Bart
Post by Bonita Montero
Post by Bart
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
count was half.
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
I don't know of the other poster's solutions. But a quick browse seems
to show nothing incomprehensible or anything that should be difficult
to understand. (YMMV; especially if you're not familiar with C++ then
I'm sure the code may look like noise to you.)
In the given context of C and C++ I've always perceived the features
of C++ to add to comprehensibility of source code where the respective
C code required writing clumsy code and needed (unnecessary) syntactic
ballast to implement similar functions and program constructs.
Your undifferentiated complaint sounds more like someone not willing
to understand the other concepts or have a reluctance or laziness to
make yourself familiar with them.
I'm saying it's not necessary to use such advanced features to do some
trivial parsing.

I've given a C solution below. (To test outside of Windows, remove
windows.h and set cmdline to any string containing a test input or use a
local function to get the program's command line as one string.)

It uses no special features. Anybody can understand such code. Anybody
can port it to another language far more easily than the C++. (Actually
I wrote it first in my language then ported it to C. I only needed to do
1- to 0-based conversion.)

There are two things missing compared with the C++ (other than it uses
UTF8 strings):

* Individual parameters are capped in length (to 1023 chars here). This
can be solved by determining only the span of the item then working from
that.

* Handling an unknown number of parameters is not automatic:

For the latter, the example uses a fixed array size. For a dynamic array
size, call 'strtoargs' with a count of 0 to first determine the number
of args, then allocate an array and call again to populate it.


-------------------------------------------
#include <windows.h>
#include <stdio.h>
#include <string.h>

int strtoargs(char* cmd, char** dest, int count) {
enum {ilen=1024};
char item[ilen];
int n=0, length, c;
char *p=cmd, *q, *end=&item[ilen-1];

while (c=*p++) {
if (c==' ' || c=='\t')
continue;
else if (c=='"') {
length=0;
q=item;

while (c=*p++, c!='"') {
if (c==0) {
--p;
break;
} else {
if (q<end) *q++ = c;
}
}
goto store;
} else {
length=0;
q=item;
--p;

while (c=*p++, c!=' ' && c!='\t') {
if (c==0) {
--p;
break;
} else {
if (q<end) *q++ = c;
}
}

store: *q=0;
++n;
if (n<=count) dest[n-1]=strdup(item);
}
}
return n;
}

int main(void) {
char* cmdline;
enum {cap=30};
char* args[cap];
int n;

cmdline = GetCommandLineA();

n=strtoargs(cmdline, args, cap);

for (int i=0; i<n; ++i) {
if (i<cap)
printf("%d %s\n", i, args[i]);
else
printf("%d <overflow>\n", i);
}
}
-------------------------------------------
Bart
2024-09-12 14:16:02 UTC
Reply
Permalink
Post by Bart
Post by Janis Papanagnou
Post by Bart
Post by Bonita Montero
Post by Bart
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
count was half.
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
I don't know of the other poster's solutions. But a quick browse seems
to show nothing incomprehensible or anything that should be difficult
to understand. (YMMV; especially if you're not familiar with C++ then
I'm sure the code may look like noise to you.)
In the given context of C and C++ I've always perceived the features
of C++ to add to comprehensibility of source code where the respective
C code required writing clumsy code and needed (unnecessary) syntactic
ballast to implement similar functions and program constructs.
Your undifferentiated complaint sounds more like someone not willing
to understand the other concepts or have a reluctance or laziness to
make yourself familiar with them.
I'm saying it's not necessary to use such advanced features to do some
trivial parsing.
I've given a C solution below.
BTW here are the sources sizes for the tokeniser function. (For C++ I've
included the 'using' statement.)

Spaces Hard tabs

C++ 829 682 characters
C 959 634
M 785 548 (My original of the C version)

So my C version is actually smaller than the C++ when using hard tabs.

In any case, the C++ is not significantly smaller than the C, and
certainly not a fifth the size.

For proper higher level solutions in different languages, below is one
of mine. That function is 107 bytes with hard tabs.

(It's not possible to just split the string on white space because of
quoted items with embedded spaces.)

-------------------------------
func strtoargs(cmdline)=
args::=()
sreadln(cmdline)

while k:=sread("n") do
args &:= k
od
args
end

println strtoargs(getcommandlinea())
Bonita Montero
2024-09-12 14:23:58 UTC
Reply
Permalink
Post by Bart
So my C version is actually smaller than the C++ when using hard tabs.
Did you really do your own parsing ? And your own filename-expansion ?
Michael S
2024-09-12 15:16:25 UTC
Reply
Permalink
On Thu, 12 Sep 2024 14:44:03 +0100
Post by Bart
Post by Janis Papanagnou
Post by Bart
Post by Bonita Montero
Post by Bart
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
count was half.
But your solutions are always incomprehensible because they strive
for the most advanced features possible.
I don't know of the other poster's solutions. But a quick browse
seems to show nothing incomprehensible or anything that should be
difficult to understand. (YMMV; especially if you're not familiar
with C++ then I'm sure the code may look like noise to you.)
In the given context of C and C++ I've always perceived the features
of C++ to add to comprehensibility of source code where the
respective C code required writing clumsy code and needed
(unnecessary) syntactic ballast to implement similar functions and
program constructs.
Your undifferentiated complaint sounds more like someone not willing
to understand the other concepts or have a reluctance or laziness to
make yourself familiar with them.
I'm saying it's not necessary to use such advanced features to do
some trivial parsing.
I've given a C solution below. (To test outside of Windows, remove
windows.h and set cmdline to any string containing a test input or
use a local function to get the program's command line as one string.)
It uses no special features. Anybody can understand such code.
Anybody can port it to another language far more easily than the C++.
(Actually I wrote it first in my language then ported it to C. I only
needed to do 1- to 0-based conversion.)
There are two things missing compared with the C++ (other than it
* Individual parameters are capped in length (to 1023 chars here).
This can be solved by determining only the span of the item then
working from that.
For the latter, the example uses a fixed array size. For a dynamic
array size, call 'strtoargs' with a count of 0 to first determine the
number of args, then allocate an array and call again to populate it.
-------------------------------------------
#include <windows.h>
#include <stdio.h>
#include <string.h>
int strtoargs(char* cmd, char** dest, int count) {
enum {ilen=1024};
char item[ilen];
int n=0, length, c;
char *p=cmd, *q, *end=&item[ilen-1];
while (c=*p++) {
if (c==' ' || c=='\t')
continue;
else if (c=='"') {
length=0;
q=item;
while (c=*p++, c!='"') {
if (c==0) {
--p;
break;
} else {
if (q<end) *q++ = c;
}
}
goto store;
} else {
length=0;
q=item;
--p;
while (c=*p++, c!=' ' && c!='\t') {
if (c==0) {
--p;
break;
} else {
if (q<end) *q++ = c;
}
}
store: *q=0;
++n;
if (n<=count) dest[n-1]=strdup(item);
}
}
return n;
}
int main(void) {
char* cmdline;
enum {cap=30};
char* args[cap];
int n;
cmdline = GetCommandLineA();
n=strtoargs(cmdline, args, cap);
for (int i=0; i<n; ++i) {
if (i<cap)
printf("%d %s\n", i, args[i]);
else
printf("%d <overflow>\n", i);
}
}
-------------------------------------------
Apart from unnecessary ilen limit, of unnecessary goto into block (I
have nothing against forward gotos out of blocks, but gotos into blocks
make me nervous) and of variable 'length' that serves no purpose, your
code simply does not fulfill requirements of OP.
I can immediately see two gotchas: no handling of escaped double
quotation marks \" and no handling of single quotation marks. Quite
possibly there are additional omissions.
Scott Lurndal
2024-09-12 15:37:33 UTC
Reply
Permalink
On Thu, 12 Sep 2024 14:44:03 +0100
Post by Bart
For the latter, the example uses a fixed array size. For a dynamic
array size, call 'strtoargs' with a count of 0 to first determine the
number of args, then allocate an array and call again to populate it.
-------------------------------------------
#include <windows.h>
#include <stdio.h>
#include <string.h>
int strtoargs(char* cmd, char** dest, int count) {
enum {ilen=1024};
char item[ilen];
int n=0, length, c;
char *p=cmd, *q, *end=&item[ilen-1];
while (c=*p++) {
if (c==' ' || c=='\t')
continue;
else if (c=='"') {
length=0;
q=item;
while (c=*p++, c!='"') {
if (c==0) {
--p;
break;
} else {
if (q<end) *q++ = c;
}
}
goto store;
} else {
length=0;
q=item;
--p;
while (c=*p++, c!=' ' && c!='\t') {
if (c==0) {
--p;
break;
} else {
if (q<end) *q++ = c;
}
}
store: *q=0;
++n;
if (n<=count) dest[n-1]=strdup(item);
}
}
return n;
}
int main(void) {
char* cmdline;
enum {cap=30};
char* args[cap];
int n;
cmdline = GetCommandLineA();
n=strtoargs(cmdline, args, cap);
for (int i=0; i<n; ++i) {
if (i<cap)
printf("%d %s\n", i, args[i]);
else
printf("%d <overflow>\n", i);
}
}
-------------------------------------------
Apart from unnecessary ilen limit, of unnecessary goto into block (I
have nothing against forward gotos out of blocks, but gotos into blocks
make me nervous) and of variable 'length' that serves no purpose, your
code simply does not fulfill requirements of OP.
I can immediately see two gotchas: no handling of escaped double
quotation marks \" and no handling of single quotation marks. Quite
possibly there are additional omissions.
/*
* For most commands, we'll split the rest of the line into
* individual arguments, separated by whitespace. However,
* some commands may wish to process the entire remainder of
* the line as a single argument. Those commands will set the
* ce_splitargs field to zero in the command table.
*/
if (cep->ce_splitargs) {
argcount = 0;
cp = line;
while (*cp != '\0') {
if (argcount == MAX_ARGCOUNT) {
fprintf(stdout,
"Error: More than %d arguments unsupported\n",
MAX_ARGCOUNT);
return 1;
}
while (*cp != '\0' && isspace(*cp)) cp++;
if (*cp == '\0') continue;
if (*cp == '"') {
in_quote = true;
cp++;
}
arglist[argcount++] = cp;
if (in_quote) {
while (*cp != '\0' && *cp != '"') cp++;
in_quote = false;
} else {
while (*cp != '\0' && !isspace(*cp)) cp++;
}
if (*cp == '\0') continue;
*cp++ = '\0';
}
} else {
arglist[0] = command;
arglist[1] = line;
argcount = 2;
}
Michael S
2024-09-12 15:49:11 UTC
Reply
Permalink
On Thu, 12 Sep 2024 15:37:33 GMT
***@slp53.sl.home (Scott Lurndal) wrote:

<snip code from unidentified source>

Huh?
Bonita Montero
2024-09-12 14:04:12 UTC
Reply
Permalink
Post by Janis Papanagnou
I don't know of the other poster's solutions. But a quick browse seems
to show nothing incomprehensible or anything that should be difficult
to understand. (YMMV; especially if you're not familiar with C++ then
I'm sure the code may look like noise to you.)
C++ shared a property with C: The language facilties are mostly that
simple that it's easy to roughly imagine the resulting code. So C++
can be written with the same mindset.
Janis Papanagnou
2024-09-12 15:30:26 UTC
Reply
Permalink
Post by Bonita Montero
Post by Janis Papanagnou
I don't know of the other poster's solutions. But a quick browse seems
to show nothing incomprehensible or anything that should be difficult
to understand. (YMMV; especially if you're not familiar with C++ then
I'm sure the code may look like noise to you.)
C++ shared a property with C: The language facilties are mostly that
simple that it's easy to roughly imagine the resulting code. So C++
can be written with the same mindset.
Not only "roughly imagine"; I think the imperative languages have
so many common basic concepts that you can have a quite good idea,
especially if you know more than just two or three such languages.

But there are features, even basic ones, that are not existing in
"C" thus making especially folks who are focused to some specific
restricted or poorer language(s) obviously get confused.

Yes, C++ can be written with a "C" mindset. But this is nothing
I'd suggest. Better make yourself familiar with the new concepts
(OO, genericity, or even simple things like references). - IMO.

Janis
Bonita Montero
2024-09-12 13:58:44 UTC
Reply
Permalink
Post by Bart
Post by Bonita Montero
Post by Bart
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
count was half.
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
I don't use the newest feature to end in itself. I'm using this features
because they make sense. F.e. I use a lot of functional programming to
prevent vectors of return values. Instead I directly hand the data to
a callback. That's more efficient and more convenient.
And I use concepts to have meaningful errors when type-properties are
met. Maybe you think it's better to live with the errors from inside
a templated function; I think the errors which say which part of a
concept isn't met are more readable.
Lawrence D'Oliveiro
2024-09-12 22:09:52 UTC
Reply
Permalink
I tried to experiment with that with /proc/<pid>/cmdline. The first
problem was that the arguments aren't space delimited, but broken up
with zeroes.
That’s not a “problem”: it actually simplifies the parsing, because you
can unambiguously extract the original command arguments without having to
apply any complicated parsing/quoting/unquoting rules.
Bonita Montero
2024-09-12 15:08:24 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
I tried to experiment with that with /proc/<pid>/cmdline. The first
problem was that the arguments aren't space delimited, but broken up
with zeroes. The second problem was the cmdline-file doesn't contain
the original commandline but with expanded files.
This was my code so far:

#include <iostream>
#include <fstream>
#include <sstream>
#include <algorithm>
#include <unistd.h>

using namespace std;

int main()
{
pid_t pid = getpid();
string cmdLineFile( (ostringstream() << "/proc/" << pid <<
"/cmdline").str() );
ifstream ifs;
ifs.exceptions( ifstream::failbit | ifstream::badbit );
ifs.open( cmdLineFile );
string fullCmdLine;
ifs >> fullCmdLine;
ifs.close();
replace( fullCmdLine.begin(), fullCmdLine.end(), (char)0, (char)' ' );
cout << fullCmdLine << endl;
}
Bonita Montero
2024-09-12 15:24:16 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
I tried to experiment with that with /proc/<pid>/cmdline. The first
problem was that the arguments aren't space delimited, but broken up
with zeroes. The second problem was the cmdline-file doesn't contain
the original commandline but with expanded files.
More OT b***s***.
The problem would be the same in C.
Kenny McCormack
2024-09-12 15:23:02 UTC
Reply
Permalink
Post by Ted Nolan <tednolan>
I have the case where my C program is handed a string which is basically
a command line.
I tried to experiment with that with /proc/<pid>/cmdline. The first
problem was that the arguments aren't space delimited, but broken up
with zeroes. The second problem was the cmdline-file doesn't contain
the original commandline but with expanded files.
More OT b***s***.

Take it somewhere else.
--
Q: How much do dead batteries cost?

A: Nothing. They are free of charge.
Loading...