Discussion:
Threads across programming languages
(too old to reply)
Stefan Ram
2024-04-29 15:19:19 UTC
Permalink
paavo512 <***@osa.pri.ee> wrote or quoted:
|Anyway, multithreading performance is a non-issue for Python so far as
|the Python interpreter runs in a single-threaded regime anyway, under a
|global GIL lock. They are planning to get rid of GIL, but this work is
|still in development AFAIK. I'm sure it will take years to stabilize the
|whole Python zoo without GIL.

The GIL only prevents multiple Python statements from being
interpreted simultaneously, but if you're waiting on inputs (like
sockets), it's not active, so that could be distributed across
multiple cores.

With asyncio, however, you can easily handle the application
for threads to "wait in parallel" for thousands of sockets in a
single thread, and there are fewer opportunities for errors than
with multithreading.

Additionally, there are libraries like numpy that use true
multithreading internally to distribute computational tasks
across multiple cores. By using such libraries, you can take
advantage of that. (Not to mention the AI libraries that have their
work done in highly parallel fashion by graphics cards.)

If you want real threads, you could probably work with Cython
sometimes.

Other languages like JavaScript seem to have an advantage there
because they don't know a GIL, but with JavaScript, for example,
it's because it always runs in a single thread overall. And in
the languages where there are threads without a GIL, you quickly
realize that programming correct non-trivial programs with
parallel processing is error-prone.

Often in Python you can use "ThreadPoolExecutor" to start
multiple threads. If the GIL then becomes a problem (which is
not the case if you're waiting on I/O), you can easily swap it
out for "ProcessPoolExecutor": Then processes are used instead
of threads, and there is no GIL for those.

If four cores are available, by dividing up compute-intensive tasks
using "ProcessPoolExecutor", you can expect a speedup factor of two
to eight.

With the Celery library, tasks can be distributed across multiple
processes that can also run on different computers. See, for
example, "Parallel Programming with Python" by Jan Palach.
Paavo Helde
2024-04-29 16:13:09 UTC
Permalink
Post by Stefan Ram
|Anyway, multithreading performance is a non-issue for Python so far as
|the Python interpreter runs in a single-threaded regime anyway, under a
|global GIL lock. They are planning to get rid of GIL, but this work is
|still in development AFAIK. I'm sure it will take years to stabilize the
|whole Python zoo without GIL.
The GIL only prevents multiple Python statements from being
interpreted simultaneously, but if you're waiting on inputs (like
sockets), it's not active, so that could be distributed across
multiple cores.
With asyncio, however, you can easily handle the application
for threads to "wait in parallel" for thousands of sockets in a
single thread, and there are fewer opportunities for errors than
with multithreading.
In C++, async io is provided e.g. by the asio library.

Just for waiting on thousands on sockets I believe a single select()
call would be sufficient, no threads or asio is needed. But you probably
meant something more.
Post by Stefan Ram
Additionally, there are libraries like numpy that use true
multithreading internally to distribute computational tasks
across multiple cores. By using such libraries, you can take
advantage of that. (Not to mention the AI libraries that have their
work done in highly parallel fashion by graphics cards.)
If you want real threads, you could probably work with Cython
sometimes.
Huh, my goal is to avoid Python, not to work with it. Unfortunately this
(avoiding Python) becomes harder all the time.
Post by Stefan Ram
Other languages like JavaScript seem to have an advantage there
because they don't know a GIL, but with JavaScript, for example,
it's because it always runs in a single thread overall. And in
the languages where there are threads without a GIL, you quickly
realize that programming correct non-trivial programs with
parallel processing is error-prone.
Been there, done that, worked through it ... 15 years ago. Nowadays
non-trivial multi-threaded parallel processing in C++ seems pretty easy
for me, one just needs to follow some principles and take care to get
the details correct. I guess it's about the same as memory management in
C, one can get it correct with taking some care. In C++ I can forget
about memory management as it is largely automatic, but for
multithreading I still need to take care.
Lawrence D'Oliveiro
2024-04-29 20:29:42 UTC
Permalink
Post by Paavo Helde
Just for waiting on thousands on sockets I believe a single select()
call would be sufficient ...
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
Chris M. Thomasson
2024-04-29 20:33:22 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Paavo Helde
Just for waiting on thousands on sockets I believe a single select()
call would be sufficient ...
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
AIO on Linux, IOCP on windows.
Lawrence D'Oliveiro
2024-04-29 22:41:23 UTC
Permalink
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Paavo Helde
Just for waiting on thousands on sockets I believe a single select()
call would be sufficient ...
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
AIO on Linux, IOCP on windows.
AIO is for block I/O. Try io_uring instead.
Chris M. Thomasson
2024-04-29 23:46:17 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Paavo Helde
Just for waiting on thousands on sockets I believe a single select()
call would be sufficient ...
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
AIO on Linux, IOCP on windows.
AIO is for block I/O. Try io_uring instead.
Afaict, AIO is analogous to IOCP.
Lawrence D'Oliveiro
2024-04-30 00:11:43 UTC
Permalink
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Paavo Helde
Just for waiting on thousands on sockets I believe a single select()
call would be sufficient ...
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
AIO on Linux, IOCP on windows.
AIO is for block I/O. Try io_uring instead.
Afaict, AIO is analogous to IOCP.
So not really analogous to io_uring, then?
Bonita Montero
2024-05-01 04:54:13 UTC
Permalink
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Paavo Helde
Just for waiting on thousands on sockets I believe a single select()
call would be sufficient ...
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
AIO on Linux, IOCP on windows.
Boost.ASIO does that all for you with a convenient interface.
If enabled it even uses io_uring or the Windows' pendant.
Lawrence D'Oliveiro
2024-05-01 07:10:09 UTC
Permalink
Post by Bonita Montero
Boost.ASIO does that all for you with a convenient interface.
If enabled it even uses io_uring or the Windows' pendant.
How many languages does it support?
Bonita Montero
2024-05-01 08:13:03 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Boost.ASIO does that all for you with a convenient interface.
If enabled it even uses io_uring or the Windows' pendant.
How many languages does it support?
Just C++, because it's Boost. It has a strong functional interface
and I like functional programming very much and C++ is a fully fea-
tured functional programming language since C++11.
Lawrence D'Oliveiro
2024-05-01 08:53:12 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Boost.ASIO does that all for you with a convenient interface.
If enabled it even uses io_uring or the Windows' pendant.
How many languages does it support?
Just C++ ...
Not much use, then.
It has a strong functional interface and I
like functional programming very much and C++ is a fully fea-
tured functional programming language since C++11.
But functions and classes are not first-class objects in C++, like in
Python. You cannot define function factories and class factories, like you
can in Python.
Bonita Montero
2024-05-01 08:59:16 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Boost.ASIO does that all for you with a convenient interface.
If enabled it even uses io_uring or the Windows' pendant.
How many languages does it support?
Just C++ ...
Not much use, then.
System-level programming is mostly made with C++.
Post by Lawrence D'Oliveiro
But functions and classes are not first-class objects in C++, ...
Of course, since C++11.
Post by Lawrence D'Oliveiro
You cannot define function factories and class factories, like you
can in Python.
Python is nothing for me since it is extremely slow.
Lawrence D'Oliveiro
2024-05-01 20:34:25 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Boost.ASIO does that all for you with a convenient interface.
If enabled it even uses io_uring or the Windows' pendant.
How many languages does it support?
Just C++ ...
Not much use, then.
System-level programming is mostly made with C++.
No, it is actually mostly C, with Rust making inroads these days.

And you don’t have to be doing “system-level” programming to be needing
event-driven paradigms.
Post by Bonita Montero
Post by Lawrence D'Oliveiro
But functions and classes are not first-class objects in C++, ...
Of course, since C++11.
No they aren’t. You cannot easily define a C++ function that returns a
general function or class as a result, just for example.
Post by Bonita Montero
Post by Lawrence D'Oliveiro
You cannot define function factories and class factories, like you can
in Python.
Python is nothing for me since it is extremely slow.
Remember, we’re talking about maximizing I/O throughput here, so CPU is
not the bottleneck.
Bonita Montero
2024-05-02 03:45:29 UTC
Permalink
Post by Lawrence D'Oliveiro
No, it is actually mostly C, with Rust making inroads these days.
C++ has superseeded C with that for a long time with Job offers.
Rust is a language a lot of people talk about and no one actually uses.
Post by Lawrence D'Oliveiro
And you don’t have to be doing “system-level” programming to be needing
event-driven paradigms.
If you make asnychronous I/O you need performance, and this isn't
possible with Python.
Post by Lawrence D'Oliveiro
No they aren’t. You cannot easily define a C++ function that returns a
general function or class as a result, just for example.
function<void ()> fn();
Post by Lawrence D'Oliveiro
Remember, we’re talking about maximizing I/O throughput here, so CPU is
not the bottleneck.
With io_uring you can easily handle millions of I/Os with a single
thread, but not with Python.
David Brown
2024-05-02 13:53:16 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
No, it is actually mostly C, with Rust making inroads these days.
C++ has superseeded C with that for a long time with Job offers.
Rust is a language a lot of people talk about and no one actually uses.
Post by Lawrence D'Oliveiro
And you don’t have to be doing “system-level” programming to be needing
event-driven paradigms.
If you make asnychronous I/O you need performance, and this isn't
possible with Python.
Post by Lawrence D'Oliveiro
No they aren’t. You cannot easily define a C++ function that returns a
general function or class as a result, just for example.
function<void ()> fn();
That is a /long/ way from treating functions as first-class objects.
But it is certainly a step in that direction, as are lambdas.

You also claimed that classes are first-class objects in C++. Have you
anything to back that up?
Bonita Montero
2024-05-02 15:10:47 UTC
Permalink
Post by David Brown
That is a /long/ way from treating functions as first-class objects.
A C-style function is also a function-object in C++ because it has
a calling operator.
Post by David Brown
But it is certainly a step in that direction, as are lambdas.
Lambdas can be assigned to function<>-object to make them runtime
-polymorphic. Otherwise they can be generic types, which are compile
-time polymorphic - like the function-object for std::sort();
Post by David Brown
You also claimed that classes are first-class objects in C++.
I never said that and having sth. like class Class in Java is
beyond C++'s performance constraints.
Lawrence D'Oliveiro
2024-05-02 23:24:26 UTC
Permalink
Post by David Brown
You also claimed that classes are first-class objects in C++.
I never said that ...
No, you just ignored the point.
... and having sth. like class Class in Java is beyond
C++'s performance constraints.
Java’s ”Class” object is a pretty pitiful, awkward and crippled attempt at
run-time manipulation of classes. Still falls far short of Python’s full
treatment of classes as first-class objects.

That also extends to the fact that Python classes, being objects, must be
instances of classes themselves. The class that a class is an instance of
is called its “metaclass”.
David Brown
2024-05-03 07:38:46 UTC
Permalink
Post by Bonita Montero
Post by David Brown
That is a /long/ way from treating functions as first-class objects.
A C-style function is also a function-object in C++ because it has
a calling operator.
No it is not. C-style functions (or C++ functions for that matter) are
not objects, and do not have calling operators. Built-in operators do
not belong to a type, in the way that class operators do.
Post by Bonita Montero
Post by David Brown
But it is certainly a step in that direction, as are lambdas.
Lambdas can be assigned to function<>-object to make them runtime
-polymorphic. Otherwise they can be generic types, which are compile
-time polymorphic - like the function-object for std::sort();
You missed the point entirely. Lambdas can be used in many ways like
functions, and it is possible for one function (or lambda) to return a
different function, and can be used for higher-order functions
(functions that have functions as parameters or return types). They do
not mean that C++ can treat functions as first-class objects, but they
/do/ mean that you can get many of the effects you might want if C++
functions really were first-class objects.
Post by Bonita Montero
Post by David Brown
You also claimed that classes are first-class objects in C++.
I never said that and having sth. like class Class in Java is
beyond C++'s performance constraints.
You repeatedly replied to Lawrence's posts confirming that you believed
they were. (Re-read your posts in this thread.) I was fairly sure you
were making completely unsubstantiated claims, but it was always
possible you had thought of something interesting.

I like C++, but it is absurd and unhelpful to claim it is something that
it is not. Neither functions nor classes are first-class objects in
C++. C++ is not, by any stretch of the imagination, a "fully featured
functional programming language". It supports some functional
programming techniques, which is nice, but that does not make it a
functional programming language.
Bonita Montero
2024-05-03 07:58:29 UTC
Permalink
No it is not.  C-style functions (or C++ functions for that matter) are
not objects, and do not have calling operators.  Built-in operators do
not belong to a type, in the way that class operators do.
You can assign a C-style function pointer to an auto function-object.
That these function objects all have the same type doesn't metter.
You missed the point entirely.  Lambdas can be used in many ways like
functions, and it is possible for one function (or lambda) to return a
different function, and can be used for higher-order functions
(functions that have functions as parameters or return types).  They do
not mean that C++ can treat functions as first-class objects, but they
/do/ mean that you can get many of the effects you might want if C++
functions really were first-class objects.
C-style functions and lambda-types are generically interchangeable.
David Brown
2024-05-03 09:18:06 UTC
Permalink
Post by Bonita Montero
No it is not.  C-style functions (or C++ functions for that matter)
are not objects, and do not have calling operators.  Built-in
operators do not belong to a type, in the way that class operators do.
You can assign a C-style function pointer to an auto function-object.
A C-style function /pointer/ is an object. A C-style /function/ is not.
Do you understand the difference?
Post by Bonita Montero
That these function objects all have the same type doesn't metter.
You missed the point entirely.  Lambdas can be used in many ways like
functions, and it is possible for one function (or lambda) to return a
different function, and can be used for higher-order functions
(functions that have functions as parameters or return types).  They
do not mean that C++ can treat functions as first-class objects, but
they /do/ mean that you can get many of the effects you might want if
C++ functions really were first-class objects.
C-style functions and lambda-types are generically interchangeable.
Bonita Montero
2024-05-03 11:23:13 UTC
Permalink
Post by Bonita Montero
No it is not.  C-style functions (or C++ functions for that matter)
are not objects, and do not have calling operators.  Built-in
operators do not belong to a type, in the way that class operators do.
You can assign a C-style function pointer to an auto function-object.
A C-style function /pointer/ is an object.  A C-style /function/ is not.
 Do you understand the difference?
Practically there isn't a difference.
Post by Bonita Montero
That these function objects all have the same type doesn't metter.
You missed the point entirely.  Lambdas can be used in many ways like
functions, and it is possible for one function (or lambda) to return
a different function, and can be used for higher-order functions
(functions that have functions as parameters or return types).  They
do not mean that C++ can treat functions as first-class objects, but
they /do/ mean that you can get many of the effects you might want if
C++ functions really were first-class objects.
C-style functions and lambda-types are generically interchangeable.
Michael S
2024-05-03 15:01:02 UTC
Permalink
On Fri, 3 May 2024 13:23:13 +0200
Post by Bonita Montero
Post by Bonita Montero
No it is not.  C-style functions (or C++ functions for that
matter) are not objects, and do not have calling operators.
Built-in operators do not belong to a type, in the way that class
operators do.
You can assign a C-style function pointer to an auto
function-object.
A C-style function /pointer/ is an object.  A C-style /function/ is
not. Do you understand the difference?
Practically there isn't a difference.
For C, I agree, mostly because C has no nested functions.
For C++ (after C++11) I am less sure, because of lambdas with
non-empty captures.
Bonita Montero
2024-05-03 15:18:55 UTC
Permalink
Post by Michael S
On Fri, 3 May 2024 13:23:13 +0200
Post by Bonita Montero
Post by Bonita Montero
No it is not.  C-style functions (or C++ functions for that
matter) are not objects, and do not have calling operators.
Built-in operators do not belong to a type, in the way that class
operators do.
You can assign a C-style function pointer to an auto
function-object.
A C-style function /pointer/ is an object.  A C-style /function/ is
not. Do you understand the difference?
Practically there isn't a difference.
For C, I agree, mostly because C has no nested functions.
For C++ (after C++11) I am less sure, because of lambdas with
non-empty captures.
Lambdas without captures can be casted to C function-pointers and those
lambdas have all the same function-pointer type if the signature of the
calling operator is the same.
A nice trick to enforce function-pointer casting is to apply the +-ope-
rator to a non-capturing lambda since the plus-operator can be applied
to all pointers (I can really recommend the book "C++ Lambda Story" for
such details); this makes it possible to make the function-pointer defi-
nition type-inferenced. Or if you want to create a function<>-object
from the lambda which is guaranteed not to allocate if you pass a C
function-pointer you can enforce that if you attach the +-sign to the
assigned lambda.
Lawrence D'Oliveiro
2024-05-03 22:20:59 UTC
Permalink
Post by Michael S
For C, I agree, mostly because C has no nested functions.
GCC implements nested functions in the C compiler. Though oddly, not in C+
+.

I posted a C example using nested functions in the “Recursion, Yo” thread.
Lawrence D'Oliveiro
2024-05-02 23:21:57 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
No, it is actually mostly C, with Rust making inroads these days.
C++ has superseeded C with that for a long time with Job offers.
Rust is a language a lot of people talk about and no one actually uses.
Fun fact: the Linux kernel (the world’s most successful software project),
originally entirely C-based, is now incorporating Rust-based development.
It never accepted C++.
Post by Bonita Montero
Post by Lawrence D'Oliveiro
And you don’t have to be doing “system-level” programming to be needing
event-driven paradigms.
If you make asnychronous I/O you need performance, and this isn't
possible with Python.
I/O performance certainly is possible with Python, and it has the high-
performance production-quality frameworks to prove it.
Post by Bonita Montero
Post by Lawrence D'Oliveiro
No they aren’t. You cannot easily define a C++ function that returns a
general function or class as a result, just for example.
function<void ()> fn();
Try it with something that has actual lexically-bound local variables in
it:

def factory(count : int) :

def counter() :
nonlocal count
count += 1
return count
#end counter

#begin
return counter
#end factory

f1 = factory(3)
f2 = factory(30)
print(f1())
print(f2())
print(f1())
print(f2())

output:

4
31
5
32
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Remember, we’re talking about maximizing I/O throughput here, so CPU is
not the bottleneck.
With io_uring you can easily handle millions of I/Os with a single
thread, but not with Python.
Debunked above.
Bonita Montero
2024-05-03 06:45:58 UTC
Permalink
Post by Lawrence D'Oliveiro
Fun fact: the Linux kernel (the world’s most successful software project),
originally entirely C-based, is now incorporating Rust-based development.
It never accepted C++.
There are for sure a magnitude more C++ developers than C developers
because C has a magnitude more productivity.
Post by Lawrence D'Oliveiro
I/O performance certainly is possible with Python, and it has the high-
performance production-quality frameworks to prove it.
I thought about high performance code with >= 1e5 IOs/s.
That's not possible with Python.
Post by Lawrence D'Oliveiro
Try it with something that has actual lexically-bound local variables in
nonlocal count
count += 1
return count
#end counter
#begin
return counter
#end factory
f1 = factory(3)
f2 = factory(30)
print(f1())
print(f2())
print(f1())
print(f2())
4
31
5
32
That should be similar:

#include <iostream>
#include <functional>

using namespace std;

function<int ()> factory()
{
return []
{
static int count = 0;
return ++count;
};
}

int main()
{
auto
f1 = factory(),
f2 = factory();
cout << f1() << endl;
cout << f2() << endl;
cout << f1() << endl;
cout << f2() << endl;
}
Post by Lawrence D'Oliveiro
Post by Bonita Montero
With io_uring you can easily handle millions of I/Os with a single
thread, but not with Python.
Debunked above.
That's not possible with Python because Python is slow.
Bonita Montero
2024-05-03 07:05:09 UTC
Permalink
Post by Bonita Montero
using namespace std;
function<int ()> factory()
{
    return []
        {
            static int count = 0;
            return ++count;
        };
}
function<int ()> factory()
{
static auto fn = []
{
static int count = 0;
return ++count;
};
return cref( fn );
}

Better this, it doesn't allocate external memory for the function<>
object.
Bonita Montero
2024-05-03 07:09:23 UTC
Permalink
Post by Bonita Montero
Post by Bonita Montero
using namespace std;
function<int ()> factory()
{
     return []
         {
             static int count = 0;
             return ++count;
         };
}
function<int ()> factory()
{
    static auto fn = []
    {
        static int count = 0;
        return ++count;
    };
    return cref( fn );
}
Better this, it doesn't allocate external memory for the function<>
object.
Or do you mean sth. like this, where each function-object has its own
count state:

function<int ()> factory()
{
return [count=0]() mutable { return ++count; };
}
Lawrence D'Oliveiro
2024-05-04 02:33:59 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
I/O performance certainly is possible with Python, and it has the high-
performance production-quality frameworks to prove it.
I thought about high performance code with >= 1e5 IOs/s.
That's not possible with Python.
Sure it is. Try the “stress_test” script I wrote here
<https://gitlab.com/ldo/inotipy_examples>. It can easily generate more I/O
events than the Linux kernel can cope with, on whatever machine you’re on.
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Try it with something that has actual lexically-bound local variables
nonlocal count count += 1 return count
#end counter
#begin
return counter
#end factory
f1 = factory(3)
f2 = factory(30) print(f1())
print(f2())
print(f1())
print(f2())
4
31
5
32
#include <iostream>
#include <functional>
using namespace std;
function<int ()> factory()
{
return []
{
static int count = 0;
return ++count;
};
}
int main()
{
auto
f1 = factory(),
f2 = factory();
cout << f1() << endl;
cout << f2() << endl;
cout << f1() << endl;
cout << f2() << endl;
}
Ahem, and what is the output from your C++ version?

(Hint: I don’t think it’s correct.)
Chris M. Thomasson
2024-05-04 03:05:03 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
I/O performance certainly is possible with Python, and it has the high-
performance production-quality frameworks to prove it.
I thought about high performance code with >= 1e5 IOs/s.
That's not possible with Python.
Sure it is. Try the “stress_test” script I wrote here
<https://gitlab.com/ldo/inotipy_examples>. It can easily generate more I/O
events than the Linux kernel can cope with, on whatever machine you’re on.
Ahh the stress test. Record moments, especially right when the server
dies from running out of resources. Record that moment. Restart the
test... Set a limit a little lower than the one that killed the system
before... See if it dies again... If so, repeat until it does not die.
Then record this number for the system.

[...]
Chris M. Thomasson
2024-05-04 03:07:37 UTC
Permalink
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
I/O performance certainly is possible with Python, and it has the high-
performance production-quality frameworks to prove it.
I thought about high performance code with >= 1e5 IOs/s.
That's not possible with Python.
Sure it is. Try the “stress_test” script I wrote here
<https://gitlab.com/ldo/inotipy_examples>. It can easily generate more I/O
events than the Linux kernel can cope with, on whatever machine you’re on.
Ahh the stress test. Record moments, especially right when the server
dies from running out of resources. Record that moment. Restart the
test... Set a limit a little lower than the one that killed the system
before... See if it dies again... If so, repeat until it does not die.
Then record this number for the system.
[...]
This process should be done during the installation of the new server
software... :^)

The server says install now, or run stress tests, if you check here, we
will try to crash your system, but reboot with a lower number and keep
trying until we see no crash... This number is the death point of the
tests wrt the system... ;^D ROFL!!!
Chris M. Thomasson
2024-05-04 03:09:10 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
I/O performance certainly is possible with Python, and it has the high-
performance production-quality frameworks to prove it.
I thought about high performance code with >= 1e5 IOs/s.
That's not possible with Python.
Sure it is. Try the “stress_test” script I wrote here
<https://gitlab.com/ldo/inotipy_examples>. It can easily generate more I/O
events than the Linux kernel can cope with, on whatever machine you’re on.
Ahh the stress test. Record moments, especially right when the server
dies from running out of resources. Record that moment. Restart the
test... Set a limit a little lower than the one that killed the system
before... See if it dies again... If so, repeat until it does not die.
Then record this number for the system.
[...]
This process should be done during the installation of the new server
software... :^)
The server says install now, or run stress tests, if you check here, we
will try to crash your system, but reboot with a lower number and keep
trying until we see no crash... This number is the death point of the
tests wrt the system... ;^D ROFL!!!
I actually wrote server tests that did exactly that back on the good ol
winnt 4.0. lol!
Chris M. Thomasson
2024-05-02 05:20:47 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Boost.ASIO does that all for you with a convenient interface.
If enabled it even uses io_uring or the Windows' pendant.
How many languages does it support?
Just C++ ...
Not much use, then.
System-level programming is mostly made with C++.
No, it is actually mostly C, with Rust making inroads these days.
And you don’t have to be doing “system-level” programming to be needing
event-driven paradigms.
Post by Bonita Montero
Post by Lawrence D'Oliveiro
But functions and classes are not first-class objects in C++, ...
Of course, since C++11.
No they aren’t. You cannot easily define a C++ function that returns a
general function or class as a result, just for example.
Post by Bonita Montero
Post by Lawrence D'Oliveiro
You cannot define function factories and class factories, like you can
in Python.
Python is nothing for me since it is extremely slow.
Remember, we’re talking about maximizing I/O throughput here, so CPU is
not the bottleneck.
It can be if your thread synchronization scheme is sub par. I have
actually seen code where an IOCP completion thread locks a global mutex.
Something like this pseudo-code:
_________________
for (;;)
{
iocp_overlapped& p = GQCS(INFINITE);

lock_mutex(global);

// process event...

unlock_mutex(global);
}
_________________
This is really BAD! It will create a rather massive bottleneck under
times of heavy load... Also, it creates a nasty condition where things
can become deadlocked if processing the overlapped completion calls into
unknown user code to do some work. I have had to debug some others code
like this before. Not exactly fun...
Chris M. Thomasson
2024-05-02 05:22:29 UTC
Permalink
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Boost.ASIO does that all for you with a convenient interface.
If enabled it even uses io_uring or the Windows' pendant.
How many languages does it support?
Just C++ ...
Not much use, then.
System-level programming is mostly made with C++.
No, it is actually mostly C, with Rust making inroads these days.
And you don’t have to be doing “system-level” programming to be needing
event-driven paradigms.
Post by Bonita Montero
Post by Lawrence D'Oliveiro
But functions and classes are not first-class objects in C++, ...
Of course, since C++11.
No they aren’t. You cannot easily define a C++ function that returns a
general function or class as a result, just for example.
Post by Bonita Montero
Post by Lawrence D'Oliveiro
You cannot define function factories and class factories, like you can
in Python.
Python is nothing for me since it is extremely slow.
Remember, we’re talking about maximizing I/O throughput here, so CPU is
not the bottleneck.
It can be if your thread synchronization scheme is sub par. I have
actually seen code where an IOCP completion thread locks a global mutex.
_________________
for (;;)
{
   iocp_overlapped& p = GQCS(INFINITE);
   lock_mutex(global);
   // process event...
   unlock_mutex(global);
}
_________________
This is really BAD! It will create a rather massive bottleneck under
times of heavy load...
The global lock seems to work fine under light sporadic usage. However,
when the server get under load, this mutex really messes things up.
Post by Chris M. Thomasson
Also, it creates a nasty condition where things
can become deadlocked if processing the overlapped completion calls into
unknown user code to do some work. I have had to debug some others code
like this before. Not exactly fun...
Lawrence D'Oliveiro
2024-05-02 05:39:15 UTC
Permalink
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Remember, we’re talking about maximizing I/O throughput here, so CPU is
not the bottleneck.
It can be if your thread synchronization scheme is sub par.
Another reason to avoid threads. So long as your async tasks have an await
call somewhere in their main loops, that should be sufficient to avoid
most bottlenecks.
Bonita Montero
2024-05-02 05:53:21 UTC
Permalink
Post by Lawrence D'Oliveiro
Another reason to avoid threads. So long as your async tasks have an await
call somewhere in their main loops, that should be sufficient to avoid
most bottlenecks.
If you have a stream of individual I/Os and the processing of the I/Os
takes more time than the time between the I/Os you need threads. And
usually the processing takes longer.
You constantly deny technologies which overburden you.
Lawrence D'Oliveiro
2024-05-02 23:16:23 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Another reason to avoid threads. So long as your async tasks have an await
call somewhere in their main loops, that should be sufficient to avoid
most bottlenecks.
If you have a stream of individual I/Os and the processing of the I/Os
takes more time than the time between the I/Os you need threads.
That makes the CPU the bottleneck. Which is not the case we’re discussing
here.
Bonita Montero
2024-05-03 07:00:30 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Another reason to avoid threads. So long as your async tasks have an
await
Post by Bonita Montero
Post by Lawrence D'Oliveiro
call somewhere in their main loops, that should be sufficient to avoid
most bottlenecks.
If you have a stream of individual I/Os and the processing of the I/Os
takes more time than the time between the I/Os you need threads.
That makes the CPU the bottleneck. Which is not the case we’re discussing
here.
No, the processing beetween the I/O can mostly depend on other I/Os,
which is the standard case for server applications.
Lawrence D'Oliveiro
2024-05-04 02:30:38 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
If you have a stream of individual I/Os and the processing of the I/Os
takes more time than the time between the I/Os you need threads.
That makes the CPU the bottleneck. Which is not the case we’re
discussing here.
No, the processing beetween the I/O can mostly depend on other I/Os,
which is the standard case for server applications.
In that situation, multithreading isn’t going to speed things up.
Chris M. Thomasson
2024-05-04 03:36:58 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
If you have a stream of individual I/Os and the processing of the I/Os
takes more time than the time between the I/Os you need threads.
That makes the CPU the bottleneck. Which is not the case we’re
discussing here.
No, the processing beetween the I/O can mostly depend on other I/Os,
which is the standard case for server applications.
In that situation, multithreading isn’t going to speed things up.
ummm, so what does the server do after getting an io completion...?

It has to do something. Look something up in a RCU protected database
structure, ect, ect... Thread sync scalability usually becomes an issue
right when your server experiences any type of decent load. Especially
heavy load! ;^o
Chris M. Thomasson
2024-05-02 20:28:15 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Remember, we’re talking about maximizing I/O throughput here, so CPU is
not the bottleneck.
It can be if your thread synchronization scheme is sub par.
Another reason to avoid threads.
Why? Believe it or not, there are ways to create _highly_ scalable
thread synchronization schemes. Using a global lock is not one of
them... :^)

For database servers, RCU is probably the best you can get. It simply
shines for read mostly workloads.
Post by Lawrence D'Oliveiro
So long as your async tasks have an await
call somewhere in their main loops, that should be sufficient to avoid
most bottlenecks.
async tasks are using threads... No? Also, what type of synchronization
schemes are they using under the covers? I would hope they are using
some efficient lock and/or wait free algorithms. I have a lot of
experience in this area.
Lawrence D'Oliveiro
2024-05-02 23:15:16 UTC
Permalink
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Remember, we’re talking about maximizing I/O throughput here, so CPU
is not the bottleneck.
It can be if your thread synchronization scheme is sub par.
Another reason to avoid threads.
Why? Believe it or not, there are ways to create _highly_ scalable
thread synchronization schemes.
I’m sure there are. But none of that is relevant when the CPU isn’t the
bottleneck anyway.
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
So long as your async tasks have an await call somewhere in their main
loops, that should be sufficient to avoid most bottlenecks.
async tasks are using threads... No?
No. They are built on coroutines. Specifically, the “stackless” variety.

<https://gitlab.com/ldo/python_topics_notebooks/-/blob/master/Generators%20&%20Coroutines.ipynb?ref_type=heads>
Chris M. Thomasson
2024-05-02 23:58:54 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Remember, we’re talking about maximizing I/O throughput here, so CPU
is not the bottleneck.
It can be if your thread synchronization scheme is sub par.
Another reason to avoid threads.
Why? Believe it or not, there are ways to create _highly_ scalable
thread synchronization schemes.
I’m sure there are. But none of that is relevant when the CPU isn’t the
bottleneck anyway.
The CPU can become a bottleneck. Depends on how the programmer
implements things.
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
So long as your async tasks have an await call somewhere in their main
loops, that should be sufficient to avoid most bottlenecks.
async tasks are using threads... No?
No. They are built on coroutines. Specifically, the “stackless” variety.
<https://gitlab.com/ldo/python_topics_notebooks/-/blob/master/Generators%20&%20Coroutines.ipynb?ref_type=heads>
So, there is no way to take advantage of multiple threads on Python?
Heck, even JavaScript has WebWorkers... ;^)
Kaz Kylheku
2024-05-03 00:15:52 UTC
Permalink
Post by Chris M. Thomasson
The CPU can become a bottleneck.
Unfortunately, not in a way that you could use for playing slide
guitar, let alone actually drinking beer through it.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Chris M. Thomasson
2024-05-03 00:22:15 UTC
Permalink
Post by Kaz Kylheku
Post by Chris M. Thomasson
The CPU can become a bottleneck.
Unfortunately, not in a way that you could use for playing slide
guitar, let alone actually drinking beer through it.
:^D The problem is that I have had to debug server code that actually
locked a global mutex ala:

for (;;)
{
io_complete& io = wait_for_io(INFINITE);

lock();
io.foobar();
unlock();
}

Oh shit.
Chris M. Thomasson
2024-05-03 07:07:57 UTC
Permalink
Post by Chris M. Thomasson
Post by Kaz Kylheku
Post by Chris M. Thomasson
The CPU can become a bottleneck.
Unfortunately, not in a way that you could use for playing slide
guitar, let alone actually drinking beer through it.
:^D The problem is that I have had to debug server code that actually
for (;;)
{
   io_complete& io = wait_for_io(INFINITE);
   lock();
     io.foobar();
A fun part...

io.foobar() does some things that might call lock() again, during
certain scenarios. Oh, so the programmers says, well, lock() needs to be
recursive... Oh, well, it seems to work. Deadlock! Shit!
Post by Chris M. Thomasson
   unlock();
}
Oh shit.
Lawrence D'Oliveiro
2024-05-03 02:25:52 UTC
Permalink
Post by Chris M. Thomasson
The CPU can become a bottleneck.
Then that becomes an entirely different situation from what we’re
discussing.
Post by Chris M. Thomasson
So, there is no way to take advantage of multiple threads on Python?
There is, but the current scheme has limitations in CPU-intensive
situations. They’re working on a fix, without turning it into a memory hog
like Java.
Ross Finlayson
2024-05-03 19:33:30 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
The CPU can become a bottleneck.
Then that becomes an entirely different situation from what we’re
discussing.
Post by Chris M. Thomasson
So, there is no way to take advantage of multiple threads on Python?
There is, but the current scheme has limitations in CPU-intensive
situations. They’re working on a fix, without turning it into a memory hog
like Java.
Yeah, it can be that way. "How are things?" "Yesterday
I implemented an entire web service on the cloud."
"Oh, really, how'd that go?" "I opened Initializer and
added a starter and copied how to pop the queue
and put the queue name in a file, then I added it
to git and it went into the CICD pipeline and now
it's in Prod." "Great." "It only even needs 1 gigabyte of RAM."

Surely when it's like, "the only time this framework app
uses 1 gigabyte of RAM is at boot time it totally templates
itself into a gigabyte of RAM", then the guy's like "see,
I'm totally not using RAM." Yet it's like, "well, yeah,
but the meter for the RAM you're not using is on".



At least then for re-routines, and if it helps it's quite
an idee fixe at this point, it's clear as described they can
be implemented in most languages with or without
threads as with just a minimum of threads and thread
locals and exception handling being well-defined and
the most usual sort of procedural call stack, then,
get this: taking plain usual code, giving it a ton of
threads, making every invocation one of these things,
and automatically parallelizing the code automatically
according to the flow-graph dependencies declared
in the synchronous, blocking, routine.

Now _that's_ ridiculous.

Though, in C++ with this sort of approach, the only
sort of "unusable" object is a future<result<T, E>>
as it were, or "the ubiquitous type" sort of thing
then as to overload its access as to invoke "get()",
if there was a sort of way to overload the "." and "->"
operators, and have them most simply be compiled
as invoke "." and "->". Does std::identity work this way?
David Brown
2024-05-03 08:34:11 UTC
Permalink
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Remember, we’re talking about maximizing I/O throughput here, so CPU
is not the bottleneck.
It can be if your thread synchronization scheme is sub par.
Another reason to avoid threads.
Why? Believe it or not, there are ways to create _highly_ scalable
thread synchronization schemes.
I’m sure there are. But none of that is relevant when the CPU isn’t the
bottleneck anyway.
The CPU can become a bottleneck. Depends on how the programmer
implements things.
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
So long as your async tasks have an await call somewhere in their main
loops, that should be sufficient to avoid most bottlenecks.
async tasks are using threads... No?
No. They are built on coroutines. Specifically, the “stackless” variety.
<https://gitlab.com/ldo/python_topics_notebooks/-/blob/master/Generators%20&%20Coroutines.ipynb?ref_type=heads>
So, there is no way to take advantage of multiple threads on Python?
Heck, even JavaScript has WebWorkers... ;^)
Python supports multi-threading. It uses a global lock (the "GIL") in
the Python interpreter - thus only one thread can be running Python code
at a time. However, if you are doing anything serious with Python, much
of the time will be spend either blocked (waiting for network, IO, etc.)
or using compiled or external code (using your favourite gui toolkit,
doing maths with numpy, etc.). The GIL is released while executing such
code.

Thus if you are using Python for cpu-intensive work (and doing so
sensibly), you have full multi-threading. If you are using it for
IO-intensive work, you have full multi-threading. It's not going to be
as efficient as well-written compiled code, even with JIT and pypy, but
in practice it gets pretty close while being very convenient and
developer friendly.

If you really need parallel running of Python code, or better separation
between tasks, Python has a multi-processing module that makes it simple
to control and pass data between separate Python processes, each with
their own GIL.
Michael S
2024-05-03 15:05:33 UTC
Permalink
On Fri, 3 May 2024 10:34:11 +0200
Post by David Brown
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Remember, we’re talking about maximizing I/O throughput here,
so CPU is not the bottleneck.
It can be if your thread synchronization scheme is sub par.
Another reason to avoid threads.
Why? Believe it or not, there are ways to create _highly_ scalable
thread synchronization schemes.
I’m sure there are. But none of that is relevant when the CPU
isn’t the bottleneck anyway.
The CPU can become a bottleneck. Depends on how the programmer
implements things.
Post by Lawrence D'Oliveiro
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
So long as your async tasks have an await call somewhere in
their main loops, that should be sufficient to avoid most
bottlenecks.
async tasks are using threads... No?
No. They are built on coroutines. Specifically, the “stackless” variety.
<https://gitlab.com/ldo/python_topics_notebooks/-/blob/master/Generators%20&%20Coroutines.ipynb?ref_type=heads>
So, there is no way to take advantage of multiple threads on
Python? Heck, even JavaScript has WebWorkers... ;^)
Python supports multi-threading. It uses a global lock (the "GIL")
in the Python interpreter - thus only one thread can be running
Python code at a time. However, if you are doing anything serious
with Python, much of the time will be spend either blocked (waiting
for network, IO, etc.) or using compiled or external code (using your
favourite gui toolkit, doing maths with numpy, etc.). The GIL is
released while executing such code.
Thus if you are using Python for cpu-intensive work (and doing so
sensibly), you have full multi-threading. If you are using it for
IO-intensive work, you have full multi-threading. It's not going to
be as efficient as well-written compiled code, even with JIT and
pypy, but in practice it gets pretty close while being very
convenient and developer friendly.
If you really need parallel running of Python code, or better
separation between tasks, Python has a multi-processing module that
makes it simple to control and pass data between separate Python
processes, each with their own GIL.
A typical scenario is that you started you python program while
thinking that it wouldn't e CPU-intensive. And then it grew and became
CPU-intensive.
That's actually a good case, because it means that your program is used
and is doing something worthwhile.
Bonita Montero
2024-05-03 15:20:00 UTC
Permalink
Post by Michael S
A typical scenario is that you started you python program while
thinking that it wouldn't e CPU-intensive. And then it grew and became
CPU-intensive.
That's actually a good case, because it means that your program is used
and is doing something worthwhile.
I don't think it makes a big difference if Python has a GIL or
not since it is interpreted and extremely slow with that anyway.
Michael S
2024-05-03 15:47:54 UTC
Permalink
On Fri, 3 May 2024 17:20:00 +0200
Post by Bonita Montero
Post by Michael S
A typical scenario is that you started you python program while
thinking that it wouldn't e CPU-intensive. And then it grew and
became CPU-intensive.
That's actually a good case, because it means that your program is
used and is doing something worthwhile.
I don't think it makes a big difference if Python has a GIL or
not since it is interpreted and extremely slow with that anyway.
64 times faster than slow wouldn't be fast, but could be acceptable.
And 64 HW threads nowadays is almost low-end server, I have one at
work, just in case.
Also, I don't see why in the future Python could not be JITted.
Javascript was also considered slow 15-20 years ago, now it's pretty
fast.
But then, my knowledge of Python is very shallow, Possibly, it's not
JITted yet because of fundamental reasons rather than due to lack of
demand.
Lawrence D'Oliveiro
2024-05-03 22:19:27 UTC
Permalink
Post by Michael S
Also, I don't see why in the future Python could not be JITted.
It might require more use of static type annotations. Which some are
adopting in their Python code.
bart
2024-05-03 23:27:53 UTC
Permalink
Post by Michael S
On Fri, 3 May 2024 17:20:00 +0200
Post by Bonita Montero
Post by Michael S
A typical scenario is that you started you python program while
thinking that it wouldn't e CPU-intensive. And then it grew and
became CPU-intensive.
That's actually a good case, because it means that your program is
used and is doing something worthwhile.
I don't think it makes a big difference if Python has a GIL or
not since it is interpreted and extremely slow with that anyway.
64 times faster than slow wouldn't be fast, but could be acceptable.
And 64 HW threads nowadays is almost low-end server, I have one at
work, just in case.
Also, I don't see why in the future Python could not be JITted.
Javascript was also considered slow 15-20 years ago, now it's pretty
fast.
But then, my knowledge of Python is very shallow, Possibly, it's not
JITted yet because of fundamental reasons rather than due to lack of
demand.
PyPy has been around for many years.
Chris M. Thomasson
2024-05-01 18:55:57 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Boost.ASIO does that all for you with a convenient interface.
If enabled it even uses io_uring or the Windows' pendant.
How many languages does it support?
Just C++ ...
Not much use, then.
Well, its a C++ lib, so, well... Okay. :^)
Post by Lawrence D'Oliveiro
It has a strong functional interface and I
like functional programming very much and C++ is a fully fea-
tured functional programming language since C++11.
But functions and classes are not first-class objects in C++, like in
Python. You cannot define function factories and class factories, like you
can in Python.
Chris M. Thomasson
2024-05-01 18:55:00 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Paavo Helde
Just for waiting on thousands on sockets I believe a single select()
call would be sufficient ...
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
AIO on Linux, IOCP on windows.
Boost.ASIO does that all for you with a convenient interface.
If enabled it even uses io_uring or the Windows' pendant.
I never used ASIO. Back when I wrote server code for WinNT 4.0 I used
IOCP directly. Then I learned about AIO, and ported most of it. It was a
fairly interesting port. I love the GetQueuedCompletionStatusEx function:

https://learn.microsoft.com/en-us/windows/win32/fileio/getqueuedcompletionstatusex-func

Have you ever read about cohort scheduling? GetQueuedCompletionStatusEx
works rather well for that because it returns multiple events. A quick
little sort and we have a cohort:

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2001-39.pdf

I still remember when I first read that paper. :^D
Bonita Montero
2024-05-01 04:53:21 UTC
Permalink
Post by Lawrence D'Oliveiro
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
Use Boost.ASIO.
Lawrence D'Oliveiro
2024-05-01 07:09:31 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
Use Boost.ASIO.
And what does that use?
Bonita Montero
2024-05-01 08:11:14 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
Use Boost.ASIO.
And what does that use?
Boost.ASIO can even use io_uring if it is available. And it has
a callback-interface with function-objects which are called on
completion; that's much more convenient than to have io_uring
manually.
Check annas-archive.org for the Boost.ASIO book fom Apress.
Lawrence D'Oliveiro
2024-05-01 08:53:42 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
We use poll(2) or epoll(2) nowadays. select(2) is antiquated.
Use Boost.ASIO.
And what does that use?
Boost.ASIO can even use io_uring if it is available.
No async/await? Oh, they haven’t added that to C++--yet.
Bonita Montero
2024-05-01 09:00:04 UTC
Permalink
Post by Lawrence D'Oliveiro
No async/await? Oh, they haven’t added that to C++--yet.
No, Boost.ASIO is event driven with asynchronous callbacks
in a foreign thread's context.
Lawrence D'Oliveiro
2024-05-01 20:31:04 UTC
Permalink
Post by Lawrence D'Oliveiro
No async/await? Oh, they haven’t added that to C++--yet.
No, Boost.ASIO is event driven with asynchronous callbacks in a foreign
thread's context.
Callbacks can be a clunky way of event handling, since they force you to
break up your logic sequence into discontinguous pieces. This is why
coroutines have become popular, since they keep the logic flow together.
Scott Lurndal
2024-05-01 21:00:19 UTC
Permalink
Post by Lawrence D'Oliveiro
No async/await? Oh, they haven’t added that to C++--yet.
No, Boost.ASIO is event driven with asynchronous callbacks in a foreign
thread's context.
Callbacks can be a clunky way of event handling, since they force you to
break up your logic sequence into discontinguous pieces. This is why
coroutines have become popular, since they keep the logic flow together.
Callbacks work just fine, as the logic for submitting a request
is quite different from the logic for completing a request; indeed,
they more closely mirror the hardware interrupt that signals completion.

I wouldn't call coroutines popular at all, outside of python generators.
Michael S
2024-05-01 21:05:24 UTC
Permalink
On Wed, 01 May 2024 21:00:19 GMT
Post by Scott Lurndal
Post by Lawrence D'Oliveiro
No async/await? Oh, they haven’t added that to C++--yet.
No, Boost.ASIO is event driven with asynchronous callbacks in a
foreign thread's context.
Callbacks can be a clunky way of event handling, since they force
you to break up your logic sequence into discontinguous pieces. This
is why coroutines have become popular, since they keep the logic
flow together.
Callbacks work just fine, as the logic for submitting a request
is quite different from the logic for completing a request; indeed,
they more closely mirror the hardware interrupt that signals
completion.
I wouldn't call coroutines popular at all, outside of python
generators.
My impression was that in golang world co-routines are relatively
popular. But I can be wrong about it.
Lawrence D'Oliveiro
2024-05-01 23:05:50 UTC
Permalink
Post by Scott Lurndal
Post by Lawrence D'Oliveiro
Callbacks can be a clunky way of event handling, since they force you to
break up your logic sequence into discontinguous pieces. This is why
coroutines have become popular, since they keep the logic flow together.
Callbacks work just fine, as the logic for submitting a request
is quite different from the logic for completing a request ...
They are typically part of the same logic flow. Having to break it up
into separate callback pieces can make it harder to appreciate the
continuity, making the code harder to maintain. It can also require
more code.

Have a look at the two versions of the “rocket launch” example I
posted here
<https://github.com/HamPUG/meetings/tree/master/2017/2017-05-08/ldo-generators-coroutines-asyncio>:
not only is the callback version about 30% bigger, it is also harder
to understand.

Sure, it’s a toy example (44 versus 57 lines). But I think it does
illustrate the issues involved.
Bonita Montero
2024-05-02 03:46:59 UTC
Permalink
Post by Lawrence D'Oliveiro
No, Boost.ASIO is event driven with asynchronous callbacks in a foreign
thread's context.
Callbacks can be a clunky way of event handling, since they force
you to break up your logic sequence into discontinguous pieces.
Callbacks are the most convenient use for asyncbronous I/O.
Chris M. Thomasson
2024-05-02 20:33:55 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
No, Boost.ASIO is event driven with asynchronous callbacks in a foreign
thread's context.
Callbacks can be a clunky way of event handling, since they force
you to break up your logic sequence into discontinguous pieces.
Callbacks are the most convenient use for asyncbronous I/O.
They work nicely if you know what you are doing. I have seen some
nightmare code with callbacks. Usually do to holding a lock while
executing a callback. This is not good at all. Especially when the lock
is recursive. Yikes! Oh, this is bringing back some memories. Oh crap.
Lawrence D'Oliveiro
2024-05-03 22:22:16 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Callbacks can be a clunky way of event handling, since they force you
to break up your logic sequence into discontinguous pieces.
Callbacks are the most convenient use for asyncbronous I/O.
Wonder why the C++ folks are proposing to add async/await, then ...

C++ is already about 5× the complexity of Python, yet still nowhere near
as expressive.
Scott Lurndal
2024-04-30 16:48:51 UTC
Permalink
Post by Paavo Helde
Post by Stefan Ram
|Anyway, multithreading performance is a non-issue for Python so far as
|the Python interpreter runs in a single-threaded regime anyway, under a
|global GIL lock. They are planning to get rid of GIL, but this work is
|still in development AFAIK. I'm sure it will take years to stabilize the
|whole Python zoo without GIL.
The GIL only prevents multiple Python statements from being
interpreted simultaneously, but if you're waiting on inputs (like
sockets), it's not active, so that could be distributed across
multiple cores.
With asyncio, however, you can easily handle the application
for threads to "wait in parallel" for thousands of sockets in a
single thread, and there are fewer opportunities for errors than
with multithreading.
In C++, async io is provided e.g. by the asio library.
And the POSIX aio interfaces, on systems that support them.

I used lio_listio heavily in Oracle's RDMS.
Bonita Montero
2024-04-29 16:18:57 UTC
Permalink
Post by Stefan Ram
With asyncio, however, you can easily handle the application
for threads to "wait in parallel" for thousands of sockets in a
single thread, and there are fewer opportunities for errors than
with multithreading.
But you need multithreading to have maximum throughput since you often
process the data while other data is available.
Lawrence D'Oliveiro
2024-04-29 20:31:15 UTC
Permalink
Post by Bonita Montero
But you need multithreading to have maximum throughput since you often
process the data while other data is available.
In a lot of applications, the bottleneck is the network I/O, or a GUI
waiting for the next user event, that kind of thing. In this situation,
multithreading is more trouble than it’s worth. This is why coroutines (in
the form of async/await) have made a comeback over the last decade or so.
Bonita Montero
2024-04-30 03:58:31 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
But you need multithreading to have maximum throughput since you often
process the data while other data is available.
In a lot of applications, the bottleneck is the network I/O, or a GUI
waiting for the next user event, that kind of thing. In this situation,
multithreading is more trouble than it’s worth. This is why coroutines (in
the form of async/await) have made a comeback over the last decade or so.
Having a single thread and using state machines is more effortz.
Lawrence D'Oliveiro
2024-04-30 04:09:00 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
But you need multithreading to have maximum throughput since you often
process the data while other data is available.
In a lot of applications, the bottleneck is the network I/O, or a GUI
waiting for the next user event, that kind of thing. In this situation,
multithreading is more trouble than it’s worth. This is why coroutines
(in the form of async/await) have made a comeback over the last decade
or so.
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Bonita Montero
2024-04-30 05:59:06 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
But you need multithreading to have maximum throughput since you often
process the data while other data is available.
In a lot of applications, the bottleneck is the network I/O, or a GUI
waiting for the next user event, that kind of thing. In this situation,
multithreading is more trouble than it’s worth. This is why coroutines
(in the form of async/await) have made a comeback over the last decade
or so.
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Lawrence D'Oliveiro
2024-04-30 06:42:58 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
Bonita Montero
2024-04-30 07:37:18 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
This has nothing to do with a heisenbug. Threads are slightly less
performance than managing state with a coroutine, but they're more
convenient to develop. In Python the difference wouldn't count.
Lawrence D'Oliveiro
2024-04-30 09:09:32 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
[No]
Do you know what a “race condition” is?
Bonita Montero
2024-04-30 09:25:13 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
[No]
Do you know what a “race condition” is?
Race conditions are mostly easy to handle. If you do I/O and the
send / receive options are atomic you'd even not need a mutex and
the race condition isn't a problem although it is not handled.
Chris M. Thomasson
2024-04-30 19:56:14 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
[No]
Do you know what a “race condition” is?
Race conditions are mostly easy to handle.
Have you fixed your massive bug in your DCL futex thing?
Post by Bonita Montero
If you do I/O and the
send / receive options are atomic you'd even not need a mutex and
the race condition isn't a problem although it is not handled.
Lawrence D'Oliveiro
2024-04-30 20:22:40 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
[No]
Do you know what a “race condition” is?
[No]
You haven’t actually done much thread programming, have you?
Chris M. Thomasson
2024-04-30 20:33:51 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
[No]
Do you know what a “race condition” is?
[No]
You haven’t actually done much thread programming, have you?
Apparently not! Yikes!
Ross Finlayson
2024-05-01 03:27:15 UTC
Permalink
Post by Chris M. Thomasson
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
[No]
Do you know what a “race condition” is?
[No]
You haven’t actually done much thread programming, have you?
Apparently not! Yikes!
I think a lot of people had heard of the ACE threads toolkit
out of WUSTL. (It's a portability abstraction layer and
some mutexes.) Or, the "ACE object-oriented thread
encapsulation class library", for C++.


https://openscholarship.wustl.edu/cse_research/389/


I'm reading this again and it makes a lot more sense
now than it did then, and it was totally profound.

https://www.enseignement.polytechnique.fr/profs/informatique/Leo.Liberti/public/computing/parallel/threads/comp.programming.threads-faq.html


Back in the day there were some good books on "systems programming"
and "portable programming", like the Addison-Wesley series. When it
comes to Windows or NT, which is still sort of what Windows is,
there's Windows Internals, like 3'rd and 4'th edition, about
where these things in Windows are born. It's good when eating
for digesting to thoroughly know the food, to thoroughly chew
the food, and to know where the food comes from. "Advanced Windows".
"Addison-Wesley Professional."

I'm a fan of BM's style, it's thoroughly sort of evolved.
It's like "if there's a type_trait I'll use it, if not
I know the constants". Also "I have a pretty idea what
code the compiler generates for this code, and whether
it's extern linkage or not, and what it is if it is."
"What is its ordinals, what is its exports." Then knowing
about type_traits and this kind of thing, and metaprogramming
with templates in C++, is a pretty strong strength.

It's a good idea when eating the food to know where
the food comes from, and, you know, thoroughly chew
the food. Knowing where babies come from, this kind
of thing.


About design and "correctness, correctness, correctness:
pick one", there's something to be said for algorithms,
that resolve basically to compare-and-swap or CMPXCHG,
often about an idea of single-producer single-consumer
with a list of buffers and consuming the head of the
buffer and concatenating to the tail of the buffer,
while the consumer sees only the head and the producer
concatenates only to the tail.


Making modules so that modules are well-defined and
independent or, you know, with their logical and concrete
adapters, and then making it so that they're machines
and all sort of work together, is a challenge of
"library design". I.e., libraries and patterns are
supposed to be composed and consumed by users, programmers,
not necessarily written by them. Yet, it's the users'
fault if they don't know where their food comes from.

"Components", libraries, modules, ....

Composability is key. At some point somebody's going
to just concatenate the files and compile them.


These days things are often sort of simplified
into "the four facilities: DB MQ WS FS", with
the atomicity of transactions, consuming the queue,
operations and faults, and filesystem atomicity.
Chris M. Thomasson
2024-04-30 19:55:33 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
This has nothing to do with a heisenbug. Threads are slightly less
performance than managing state with a coroutine,
Huh? Threads allow one to take advantage of the processing power of a
multi-core/socket system. We can multiplex coroutines on a single
thread, okay fine, ugg... This is not going to use the full spectrum of
a system wrt multiple processing units. We can decide to use at least as
many threads as there are cores. This can get the full power of said
system. Sometimes using number_of_cores * 2 threads might be in order.
Post by Bonita Montero
but they're more
convenient to develop. In Python the difference wouldn't count.
Bonita Montero
2024-05-01 03:39:19 UTC
Permalink
Post by Chris M. Thomasson
Huh? Threads allow one to take advantage of the processing power of a
multi-core/socket system.
I'm talking about the scenario Lawrence mentioned.
Chris M. Thomasson
2024-05-01 18:49:57 UTC
Permalink
Post by Bonita Montero
Post by Chris M. Thomasson
Huh? Threads allow one to take advantage of the processing power of a
multi-core/socket system.
I'm talking about the scenario Lawrence mentioned.
Oh. Sorry about that.
Blue-Maned_Hawk
2024-05-01 06:29:40 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
You don't need threads to get heisenbugs.
--
Blue-Maned_Hawk│shortens to Hawk│/blu.mɛin.dʰak/│he/him/his/himself/Mr.
blue-maned_hawk.srht.site
1, 4, 2, 5, 3!
Lawrence D'Oliveiro
2024-05-01 07:09:09 UTC
Permalink
Post by Blue-Maned_Hawk
You don't need threads to get heisenbugs.
They are particularly prone to them.
Bonita Montero
2024-05-01 08:09:39 UTC
Permalink
Post by Blue-Maned_Hawk
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Bonita Montero
Having a single thread and using state machines is more effortz.
It would indeed. That’s why coroutines (async/await) are so handy.
Using a thread is even more handy.
Do you know what a “heisenbug” is?
You don't need threads to get heisenbugs.
We're in the context of I/O with multiple (thread)-states attached.
With that heisenbugs are unlikely.
Ross Finlayson
2024-04-29 16:44:16 UTC
Permalink
Post by Stefan Ram
|Anyway, multithreading performance is a non-issue for Python so far as
|the Python interpreter runs in a single-threaded regime anyway, under a
|global GIL lock. They are planning to get rid of GIL, but this work is
|still in development AFAIK. I'm sure it will take years to stabilize the
|whole Python zoo without GIL.
The GIL only prevents multiple Python statements from being
interpreted simultaneously, but if you're waiting on inputs (like
sockets), it's not active, so that could be distributed across
multiple cores.
With asyncio, however, you can easily handle the application
for threads to "wait in parallel" for thousands of sockets in a
single thread, and there are fewer opportunities for errors than
with multithreading.
Additionally, there are libraries like numpy that use true
multithreading internally to distribute computational tasks
across multiple cores. By using such libraries, you can take
advantage of that. (Not to mention the AI libraries that have their
work done in highly parallel fashion by graphics cards.)
If you want real threads, you could probably work with Cython
sometimes.
Other languages like JavaScript seem to have an advantage there
because they don't know a GIL, but with JavaScript, for example,
it's because it always runs in a single thread overall. And in
the languages where there are threads without a GIL, you quickly
realize that programming correct non-trivial programs with
parallel processing is error-prone.
Often in Python you can use "ThreadPoolExecutor" to start
multiple threads. If the GIL then becomes a problem (which is
not the case if you're waiting on I/O), you can easily swap it
out for "ProcessPoolExecutor": Then processes are used instead
of threads, and there is no GIL for those.
If four cores are available, by dividing up compute-intensive tasks
using "ProcessPoolExecutor", you can expect a speedup factor of two
to eight.
With the Celery library, tasks can be distributed across multiple
processes that can also run on different computers. See, for
example, "Parallel Programming with Python" by Jan Palach.
It sort of seems there are two approaches to
the parallel, and the asynchronous.

There's, "divide-and-conquer", and "information-cooperation".

The linear-speedup of the embarrassingly parallel
in the divide-and-conquer, or single-instruction-multiple-data,
is a pretty great thing.

Notions like map-reduce when the count of values
per key is about same and thusly the computing
the aggregates (summaries, digests, aggregate
and analytic functions) can be accomplished by
horizontal scaling (more boxes with same resources),
is another usual divide-and-conquer approach
(horizontal scaling).

Once upon a time there was this great idea called
"Aglets" or "mobile agents" or "mobile code". This
is basically that a functional program is distributed
to nodes, to run on the facilities of the nodes with
some resources, then to return to the "aglet-hive"
what results can be composed. This is also usually
called anything the "agent" or "instrumentation"
on the box. (The box, a process model, its processes,
their threads, their "inter-thread calls", their "inter-process
calls", their network, a node, a box. Aglets are the
little caps or tape at the end of shoe-laces, here
with regards to notions like "aggregate functions"
and "analytic functions".)


The cooperation is basically any notion of a callback.
The callback is one of the fundamental notions of
flow-of-control, and about the most elementary
notion of the functional paradigm in otherwise
the procedural or the imperative paradigm.

Otherwise for threads to fork, to divide, then
whether they join, is a callback.

So, first learning the idea of a callback is like,
"you mean I need to provide a different entry
point for this code to return and then where
I'm at is exiting forever as if in a shell process
model exec'ing another process and resulting
that this process becomes that one", and it's
like "yeah, you just give it a callback address
and what results is that's where it goes".
It's functional.

(Functional/event-driven, procedural/imperative.)


Some people learn functional first, and others
procedural first. It's hard to say how people
think, in their mental models of the things,
which is pretty much always flow-machines.
The chip, or the old planar integrated-circuit
the usually standard logic the chip, is systolic,
the systolic flow driven by the systolic clock,
that most people have a flow-model of code.


So, there's callbacks, and then there's funnels
and distributors, say, then as with regards to
something like a "Clos network", any kind of
usual model of data-flow, it's a flow-machine.

Funnels/sprinklers: Venturi effect.

In flow machines, there's basically something
like "Ford-Fulkerson flow algorithm", which is
a hypothetical sort of algorithm that formalizes
and optimizes flow.


Threads fork, and they also join.

The only hardware threads are independent cores,
and, their semantics of memory barriers,
according to their clocks. The rest is organization
of context and routine and state and stack,
and for the general purpose usually pre-emptive,
or, "time-sharing".

Or, you know, "nodes".


It's a time-sharing system.


So, there's processes and a process model,
there's the inter-process, then there's threading
models, and the re-entrant and shared and
the mutex, according to ordering and serial
guarantees or "delivery", it's message-passing
of course, vis-a-vis "the core" or memory,
a monad or a purely functional state,
it's a distributed system of nodes.


Once there was an initiative called "Parallel C",
language and compiler extensions to support
language constructs embodying the notions of
the parallel.

Somebody came up with pi calculus, process
calculus, communicating sequential processes
and the like. I've heard of Djikstra's law yet I
forget it, and any entry point is a "goto".

In clusters, there's a usual notion of message-passing,
often organized about the process model among nodes
of the cluster. There's MPI and old Silicon Grid Engine.
Once there was Wolfpack cluster. The clusters are
often mounted in a rack together and about things
like Infiniband networking and NUMA memory.

"HPC" they call it, though that includes both
clusters the horizontally scale-able, and also
computers of the super-scalar word variety.

The process model, is the usual way of the old
control-plane way to organize processes with
shared resources and separate quotas, and
to support independent process spaces,
making for fork as spawn and otherwise
though pipes and message-slots what make
for join. Then there are thread, OS threads,
and in some cases fibers, OS threads, as
about processes, OS threads.

Some usual virtual machines or runtimes like
ye olde Java, have threads and synchronization
for barriers and monitors and mutexes and what,
about system calls and barriers and monitors and
mutexes, in the process model.

The interpreted runtimes are usually "single-threaded",
about though the notions of event loops and responsiveness.
That follows from "a Win32 app the message pump", then
mostly these days since "a JavaScript binding for the script
element binding of an HTML with HTTP user-agent, for
UI-Events according to old W3C now whatwg, and maybe
it's ECMAScript and with modules or something, and then
also there's new-fangled web workers which are threads for
ECMAScript or JavaScript which are about the same".

Writing algorithms in event loops is a sort of
exercise in frustration, in a sense. Yet, when
recursion is figured out as having to build a state
instead of just filling the stack, it's a thing.
(A single-threaded thing.)

These days most distributed algorithms are sort
of advised for "horizontal scaling" and "eventual
consistency" with often "opportunistic locks" in
a world of "Murphy guarantees the un-ordered".

Then transactions of the more critical sort are
often "boxes with huge RAM rollback segments"
or as with regards to "matching and reconciliation",
after the fact.


Then of course obligatory about C++, or, C/C++,
it's about OS threads and, "guarantees".


Here my approach is "re-routines". "Re-routines:
it's a co-routine, though instead of suspending
it just quits, and instead of switching it just
builds its own monad for the recusion, and
instead of having callbacks, it's always callbacks,
and instead of having futures everywhere,
it's futures everywhere. Try, try again."

In the process model in the runtime though,
it's mostly about what services the bus DMA.
"Systems", programming.

Boxes is nodes, ....
Ross Finlayson
2024-04-29 18:51:07 UTC
Permalink
Post by Ross Finlayson
Post by Stefan Ram
|Anyway, multithreading performance is a non-issue for Python so far as
|the Python interpreter runs in a single-threaded regime anyway, under a
|global GIL lock. They are planning to get rid of GIL, but this work is
|still in development AFAIK. I'm sure it will take years to stabilize the
|whole Python zoo without GIL.
The GIL only prevents multiple Python statements from being
interpreted simultaneously, but if you're waiting on inputs (like
sockets), it's not active, so that could be distributed across
multiple cores.
With asyncio, however, you can easily handle the application
for threads to "wait in parallel" for thousands of sockets in a
single thread, and there are fewer opportunities for errors than
with multithreading.
Additionally, there are libraries like numpy that use true
multithreading internally to distribute computational tasks
across multiple cores. By using such libraries, you can take
advantage of that. (Not to mention the AI libraries that have their
work done in highly parallel fashion by graphics cards.)
If you want real threads, you could probably work with Cython
sometimes.
Other languages like JavaScript seem to have an advantage there
because they don't know a GIL, but with JavaScript, for example,
it's because it always runs in a single thread overall. And in
the languages where there are threads without a GIL, you quickly
realize that programming correct non-trivial programs with
parallel processing is error-prone.
Often in Python you can use "ThreadPoolExecutor" to start
multiple threads. If the GIL then becomes a problem (which is
not the case if you're waiting on I/O), you can easily swap it
out for "ProcessPoolExecutor": Then processes are used instead
of threads, and there is no GIL for those.
If four cores are available, by dividing up compute-intensive tasks
using "ProcessPoolExecutor", you can expect a speedup factor of two
to eight.
With the Celery library, tasks can be distributed across multiple
processes that can also run on different computers. See, for
example, "Parallel Programming with Python" by Jan Palach.
It sort of seems there are two approaches to
the parallel, and the asynchronous.
There's, "divide-and-conquer", and "information-cooperation".
The linear-speedup of the embarrassingly parallel
in the divide-and-conquer, or single-instruction-multiple-data,
is a pretty great thing.
Notions like map-reduce when the count of values
per key is about same and thusly the computing
the aggregates (summaries, digests, aggregate
and analytic functions) can be accomplished by
horizontal scaling (more boxes with same resources),
is another usual divide-and-conquer approach
(horizontal scaling).
Once upon a time there was this great idea called
"Aglets" or "mobile agents" or "mobile code". This
is basically that a functional program is distributed
to nodes, to run on the facilities of the nodes with
some resources, then to return to the "aglet-hive"
what results can be composed. This is also usually
called anything the "agent" or "instrumentation"
on the box. (The box, a process model, its processes,
their threads, their "inter-thread calls", their "inter-process
calls", their network, a node, a box. Aglets are the
little caps or tape at the end of shoe-laces, here
with regards to notions like "aggregate functions"
and "analytic functions".)
The cooperation is basically any notion of a callback.
The callback is one of the fundamental notions of
flow-of-control, and about the most elementary
notion of the functional paradigm in otherwise
the procedural or the imperative paradigm.
Otherwise for threads to fork, to divide, then
whether they join, is a callback.
So, first learning the idea of a callback is like,
"you mean I need to provide a different entry
point for this code to return and then where
I'm at is exiting forever as if in a shell process
model exec'ing another process and resulting
that this process becomes that one", and it's
like "yeah, you just give it a callback address
and what results is that's where it goes".
It's functional.
(Functional/event-driven, procedural/imperative.)
Some people learn functional first, and others
procedural first. It's hard to say how people
think, in their mental models of the things,
which is pretty much always flow-machines.
The chip, or the old planar integrated-circuit
the usually standard logic the chip, is systolic,
the systolic flow driven by the systolic clock,
that most people have a flow-model of code.
So, there's callbacks, and then there's funnels
and distributors, say, then as with regards to
something like a "Clos network", any kind of
usual model of data-flow, it's a flow-machine.
Funnels/sprinklers: Venturi effect.
In flow machines, there's basically something
like "Ford-Fulkerson flow algorithm", which is
a hypothetical sort of algorithm that formalizes
and optimizes flow.
Threads fork, and they also join.
The only hardware threads are independent cores,
and, their semantics of memory barriers,
according to their clocks. The rest is organization
of context and routine and state and stack,
and for the general purpose usually pre-emptive,
or, "time-sharing".
Or, you know, "nodes".
It's a time-sharing system.
So, there's processes and a process model,
there's the inter-process, then there's threading
models, and the re-entrant and shared and
the mutex, according to ordering and serial
guarantees or "delivery", it's message-passing
of course, vis-a-vis "the core" or memory,
a monad or a purely functional state,
it's a distributed system of nodes.
Once there was an initiative called "Parallel C",
language and compiler extensions to support
language constructs embodying the notions of
the parallel.
Somebody came up with pi calculus, process
calculus, communicating sequential processes
and the like. I've heard of Djikstra's law yet I
forget it, and any entry point is a "goto".
In clusters, there's a usual notion of message-passing,
often organized about the process model among nodes
of the cluster. There's MPI and old Silicon Grid Engine.
Once there was Wolfpack cluster. The clusters are
often mounted in a rack together and about things
like Infiniband networking and NUMA memory.
"HPC" they call it, though that includes both
clusters the horizontally scale-able, and also
computers of the super-scalar word variety.
The process model, is the usual way of the old
control-plane way to organize processes with
shared resources and separate quotas, and
to support independent process spaces,
making for fork as spawn and otherwise
though pipes and message-slots what make
for join. Then there are thread, OS threads,
and in some cases fibers, OS threads, as
about processes, OS threads.
Some usual virtual machines or runtimes like
ye olde Java, have threads and synchronization
for barriers and monitors and mutexes and what,
about system calls and barriers and monitors and
mutexes, in the process model.
The interpreted runtimes are usually "single-threaded",
about though the notions of event loops and responsiveness.
That follows from "a Win32 app the message pump", then
mostly these days since "a JavaScript binding for the script
element binding of an HTML with HTTP user-agent, for
UI-Events according to old W3C now whatwg, and maybe
it's ECMAScript and with modules or something, and then
also there's new-fangled web workers which are threads for
ECMAScript or JavaScript which are about the same".
Writing algorithms in event loops is a sort of
exercise in frustration, in a sense. Yet, when
recursion is figured out as having to build a state
instead of just filling the stack, it's a thing.
(A single-threaded thing.)
These days most distributed algorithms are sort
of advised for "horizontal scaling" and "eventual
consistency" with often "opportunistic locks" in
a world of "Murphy guarantees the un-ordered".
Then transactions of the more critical sort are
often "boxes with huge RAM rollback segments"
or as with regards to "matching and reconciliation",
after the fact.
Then of course obligatory about C++, or, C/C++,
it's about OS threads and, "guarantees".
it's a co-routine, though instead of suspending
it just quits, and instead of switching it just
builds its own monad for the recusion, and
instead of having callbacks, it's always callbacks,
and instead of having futures everywhere,
it's futures everywhere. Try, try again."
In the process model in the runtime though,
it's mostly about what services the bus DMA.
"Systems", programming.
Boxes is nodes, ....
Often these days, "nodes" is "virts".

It's a time-sharing system.


It's often all considered "flows",
mechanical flows and interactive flows,
nonblocking flows and blocking flows,
idempotent flows and non-side-effect-free flows,
logged, journaled, audited, or not,
"work" flows.

(It's business objects.)

Async is grief, .... (Was "dames is grief", which is
examined through a post-modernist deconstructivist
lens of revisionist neologism. Revisionist, not disfigurist.
"You want it when?")

https://en.wikipedia.org/wiki/CAP_theorem


It's a time-sharing system.


So, the usual idea of a sort of forward-safe
distributed algorithm, involves, vending a unique
ID, which is critical transactionally, and then for its data,
that it has a "forward-only" state machine, for that
as a model of asynchronous submission and completion,
it's consistent and eventually consistent, while opportunistic
locking is to prevent corruption, and pessimistic checking
is to followup completion.

This is for a sort of usual "model of a dispatch and
completion of a distributed asynchronous routine",
workflow.

So, it's all idempotent and safe, and the bottleneck
is vending the ID, and, enough rollback state record-wise
to keep it consistent (and correct). I.e., there's no
"safe" distributed algorithm without at least one bottleneck,
and a well-defined state-machine with only forward transitions,
and that actions on it are idempotent.

"On the order of orders", or "events" I suppose.

This these days is about "guarantees" and "deliveries",
and the two rules: 1) no drops, 2) no dupes.


It's business objects, it's workflows, it's a time-sharing system.


What's old is new again. Also, old wrapped as new.
Usually enough running eventually on old, ....
Ross Finlayson
2024-05-02 03:09:48 UTC
Permalink
Post by Ross Finlayson
Then of course obligatory about C++, or, C/C++,
it's about OS threads and, "guarantees".
it's a co-routine, though instead of suspending
it just quits, and instead of switching it just
builds its own monad for the recusion, and
instead of having callbacks, it's always callbacks,
and instead of having futures everywhere,
it's futures everywhere. Try, try again."
In the process model in the runtime though,
it's mostly about what services the bus DMA.
"Systems", programming.
Boxes is nodes, ....
Re-Routines

So, the idea of the re-routine, is a sort of co-routine. That is, it
fits the definition of being a co-routine, though as with that when its
asynchronous filling of the memo of its operation is unfulfilled, it
quits by throwing an exception, then is as expected to to called again,
when its filling of the memo is fulfilled, thus that it returns.

The idea is that re-routines are originated in an origination or
initiation context, an original re-routine, then it invokes either other
re-routines, or plain code with an adaptor to keep the routine going,
side routines, then as with regards to exit routines, and return routines.

It's sort of in the language of the comedy routine, yet is a paradigm in
the parallel and concurrent process model of the cooperating
multithreading, and the co-routine. It's a model of cooperative
multithreading, with the goal of being guaranteed by the syntax of the
language, and little discipline.

The syntax and language of the re-routine is a subset of the ordinary
syntax and language of the runtime.

The basic expectation is that a ready result is a "usable" object or
value, and behaves entirely ordinarily, while an unready result, is an
"un-usable" object or value, that can be assigned and lvalue in the
usual definition, or added to a collection, yet when de-referenced or
accessed, or via inspection, is determined "un-usable", with a defined
behavior, to throw an exception, or throwable, and with the idea that
it's not even necessarily a declared or checked exception. In languages
with exception handling, exceptions un-wind the stack context of their
invocation until, as from they were thrown ("throw", "raise"), they are
caught ("catch", "rescue").

It's figured that copying and moving around an "un-usable" object is
ordinary, then that any sort access its object or value throws, and that
any re-routine, has it so that any object or value it results returning,
is only "usable" objects or values, or collections of usable objects or
values. It's figured that collections or any other holders, are OK with
un-usable objects, only that effectively de-referencing the object or
value that is un-usable, throws an un-usable exception.

So, in languages like Java and C#, which run in a runtime that
interprets the objects and values, and where there's a reserved value
assignable to any object or value type, un-ready results are just
"null". In languages like C++, where there are no un-usable objects of
this type, and where the semantics of assignment may be complicated, and
where de-referenceing "null" causes a segfault or machine error instead
of an exception that can be caught, re-routines return the type of a
value-holder for the object or value, called a "future", then that any
accesses to the object or value through the holder, then can "throw" if
un-ready or return the value or object its value when ready. The
"future" is a library type and mostly common in all languages, and
already has these loose semantics, then that it's un-necessary in Java
or C# because why bother cluttering the signature, if these or
re-routines, though maybe it's a good idea anyways, whether catching
"NullPointerExceptions" is any more involved that catching
"FutureExceptions".

The re-routine thusly, meets only a Java declaration, or as of a pure
abstract C++ method, its method signature is any usual signature that
expect usable arguments, and throws any exception specification, and
returns usable return values, though with never throwing the exceptions,
that indicate the unfulfilled, that is the pending fulfilled, "un-usable
exceptions".

Then, the result of a re-routine, is its usable return value, or, what
usable exceptions, are from the normal flow-of-control in the normal
syntax of the language, and the behavior is defined thusly exactly the
same. The originator of a re-routine, gets called-back with the result,
either the return value or an exception that it can itself re-throw or
swallow or translate, re-routines that call re-routines just get the
return values or have thrown the exceptions, re-routines that call
side-routines, must have that the side-routine calls-back, what is
otherwise an "exit" routine, from the re-routine.

(As a matter of style, then indicating that re-routines are re-routines,
there's one original re-routine, it calls re-routines or side-routines,
and if it calls side-routines, then it's figured that it's an exit
routine, yet not in the usual sense that "exit" means to "exit" the
runtime, just that it expects the side-routine to invoke a call-back it
provides, to re-enter the re-routine.)


The context of the origination, then, is that a thread, is expected to
pick up these re-routines as a task, from a queue or provider, supplier,
of the tasks. The idea is that each re-routine is a class instance, i.e.
it's an object as defined by a class an instance of the class the
object, and the instance, has associated its memo, from the origin.

In languages like Java, there's a "ThreadLocal", and in C++ there's a
storage specification, "thread_local". When the task supplier is
invoked, it's in the calling context, of the thread, a worker or task
worker. The body of "get()", sets the values of otherwise the static or
global ThreadLocal or thread_local, of the task's memo, then that as
long as the thread is working on the task, the re-routine's instance
access to the memo, is specific to original re-routine, and the
re-routines it calls, all in the context, of the same thread. It's
figured that any state of the re-routine, is specific to the instance of
the re-routine, and the thread, and its scope, its thread locals, and
globals. The re-routine instance may be of a bunch of re-routine
instances so their memo is the thread local memo. The memo's nowhere
part of the method signatures, of the re-routines.


Callers calling Re-Routines

This is the origination of a re-routine: it's basically en exit-routine
from the caller, to the submission to the task queue of the re-routine
originator, with calling back the caller.

Re-Routines calling Re-Routines

Re-routines, basically have an interceptor or layer, an aspect, before
invoking the body of the routine.

A: step -1) if the usable return value or usable exception is already in
the memo, return it (throw it respectively)
B: step 0) if any of the arguments, or any of the held values, are
un-usable, then throw an un-usable exception

C: step 1) invoke the body of the re-routine, and
C1: if calling a side-routine, put an un-usable return value in the
memo, and invoke the side-routine, and return an un-usable object
C2: if calling a re-routine, it's these same semantics, the re-routines
keep the same semantics going
C9: when eventually returning or throwing a usable exception, put it in
the memo

Re-Routines calling Side-Routines

It's figured that re-routines calling side-routines, makes the
re-routine an exit-routine, then that when it's called back by the
side-routine, is to initiate the exit-routine as an
exit-re-enter-routine. The idea is that the exit routine provides a
callback, and invokes whatever function in whatever thread context, and
the specification of the callback, is that the original initiator or
originator supplier, has a way to re-submit the task, of the exit-routine.

Then there isn't really a strong compile time guarantee, that
side-routines call-back their exit-re-enter-routine. It's figured that
side-routines must accept a signature of the call-back, and it's figured
they do, thus that the side-routines, call back with the return value or
exception, and, the callback body puts the return value or exception on
the memo, translated as necessary to a usable object or a usable
exception, or a translation of an unusable object or exception as a
usable exceptiopn, re-submits what was the exit-routine, that as the
exit-re-enter-routine, now completes, about how to re-enter the routine,
that the re-routine is re-entrant.

Re-Routines are Re-Entrant

The idea here is that the routine that's unready, when it's fulfilled,
it can either call all over again the entire original re-routine, or, it
can just invoke itself, that on completion invoking its re-routine
caller and so on and so forth, about whether a re-routine can act as a
side-routine in this manner, or it just always calls the original
re-routine which runs all the way through.

The idea is that a re-routine, is entirely according to the flow of
control, and also that all its iterations are ordered, or contained in a
re-routine that anything un-ordered must be the last thing that comes
out of a re-routine, so that in the memo, the entire call-graph of a
re-routine, is just a serial ordering in the siblings, and a very simple
tree, what results a breath-first traversal, in the access to the memo,
the organization of the memo, and the maintenance of the memo.

As the Re-routine is going along, there is that, in the normal
flow-of-control of each re-routine, it's serial, so, the original
re-routine has an entry-point, and that is the root of the memo-tree.
Then, whatever re-routines it calls, are siblings. The idea is that as a
data structure, when a sibling is created, is also created a root for
its children. So, the siblings get added to the root for the children,
and each has added an empty root for its children. The value of the
tree-node, is a holder of either an object or exception, of the
re-routine, to be initially populated with unusable object and
exception. The tree-node is created, on the entry point of the re-routine.

Then, the behavior of re-routines calling re-routines, basically has to
establish for a given invocation of the re-routine, what is its root.
This is basically a path as a list of integers, the n'th child's n'th
child's n'th child's n'th child.

Then, about the maintenance of the tree, is to make is so, that, it
needs to be thread-safe, as any re-routine can write on the memo at any
time. It needs to be thread-safe without any blocking by locking, if
possible, and the only way it can block is by locking, and it can't
deadlock.

The count of siblings is un-known, then as whether to just make a memory
organization, that the re-routine knows its ancestry integers, so the
memo is just a sequence of zero-terminated integer sequences with
object-exception pairs, then those are just concatenated to the memo,
and lookup is the linear in the memo.
(Though, the first encountered values are first, and search can run from
both sides taking turns.) I.e., the re-routine knows its ancestry
integers when its value results from a re-routine or
exit-re-enter-routine, then it updates the memo by concatenating its
ancestry integers and the usable value/exception.


Now why is this any good when the stack of the usual runtime just does
this all already? When the references to the locals are local and on the
stack an offset away? When the runtime will just suspend the entire
stack and block the thread and wait as a co-routine?

Well, the idea is that there isn't unboundedly many threads anyways, and
somehow a conscientious approach to cooperative multithreading, must
arrive at performing as well as preemptive multithreading, and somehow a
model of non-blocking routine, must arrive at performing as well as
blocking routine,
and this doesn't do either but runs as non-blocking code and looks like
blocking code and also launches unfulfilled siblings in parallel
automatically without blocking when their inputs are fulfilled, without
synchronizing or declaring how they meet and join in the satisifaction
of their dependencies, because it's automatically declared in the language.

So, that seems the best thing, that as far as sibling calls, for example
each of a list of a lot of items, is independent, they all get launched
in order with their coordinates the ancestry index, as coming back load
up the memo, the original caller needn't know nor care except write "for
each". They don't block and also launch in parallel courtesy having
usable values at all.

About the memo and maintaining the memo, is that, eventually a
re-routine returns. Then, it doesn't matter what re-routines it calls
return, once all of a re-routines sub-re-routines return, and it puts it
value on the memo, all their values can be zeroed out, resulting on the
eventual conclusion of the re-routine, a memo of all zeros.

Or, you know, neatening it up and logging it.



A most usual idea is that routines start as plain old routines
implementing an interface or base class. Now so far these are interfaces
with only one method, with regards to that otherwise what gets stored in
the memory is ancestry/method/memoized instead of just ancestry/memoized.

Then, plain old routines are sub-classed, overriding and hiding the
routine's methods, providing the default implementation, then just
calling the superclass implementation. The issue then gets into that
re-reroutines and side-routines get separated, which would result a big
mess, as to whether the thread_local should implement this passage of
the state, _without changing the signature of the routines_.
Lawrence D'Oliveiro
2024-05-02 06:48:42 UTC
Permalink
Post by Ross Finlayson
So, the idea of the re-routine, is a sort of co-routine. That is, it
fits the definition of being a co-routine, though as with that when its
asynchronous filling of the memo of its operation is unfulfilled, it
quits by throwing an exception, then is as expected to to called again,
when its filling of the memo is fulfilled, thus that it returns.
The normal, non-comedy way of handling this is to have the task await
something variously called a “future” or “promise”: when that object is
marked as completed, then the task is automatically woken again to fulfil
its purpose.
Bonita Montero
2024-05-02 11:22:39 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Ross Finlayson
So, the idea of the re-routine, is a sort of co-routine. That is, it
fits the definition of being a co-routine, though as with that when its
asynchronous filling of the memo of its operation is unfulfilled, it
quits by throwing an exception, then is as expected to to called again,
when its filling of the memo is fulfilled, thus that it returns.
The normal, non-comedy way of handling this is to have the task await
something variously called a “future” or “promise”: when that object is
marked as completed, then the task is automatically woken again to fulfil
its purpose.
The problem with a future and a promise is that in most languages you
can't wait for multiple futures at once to have out of order completion.
So if you have a server application futures are easy to use, but mostly
less efficient.
I never think about futures and I have my own thread pool class to which
I dispatch my function-objects. These respond o a generic queue which
can handle the responses out of order and can also handle different
states than a completion. That's what you usually need with a server
application.
For me promises and futures are mostly for educational purposes as the
first step to show how asynchronous operations work to proceed to more
efficient solutions later.
Chris M. Thomasson
2024-05-02 20:38:02 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
Post by Ross Finlayson
So, the idea of the re-routine, is a sort of co-routine. That is, it
fits the definition of being a co-routine, though as with that when its
asynchronous filling of the memo of its operation is unfulfilled, it
quits by throwing an exception, then is as expected to to called again,
when its filling of the memo is fulfilled, thus that it returns.
The normal, non-comedy way of handling this is to have the task await
something variously called a “future” or “promise”: when that object is
marked as completed, then the task is automatically woken again to fulfil
its purpose.
The problem with a future and a promise is that in most languages you
can't wait for multiple futures at once to have out of order completion.
So if you have a server application futures are easy to use, but mostly
less efficient.
I never think about futures and I have my own thread pool class to which
I dispatch my function-objects. These respond o a generic queue which
can handle the responses out of order and can also handle different
states than a completion. That's what you usually need with a server
application.
Right. Agreed here.
Post by Bonita Montero
For me promises and futures are mostly for educational purposes as the
first step to show how asynchronous operations work to proceed to more
efficient solutions later.
Lawrence D'Oliveiro
2024-05-04 02:35:40 UTC
Permalink
Post by Bonita Montero
Post by Lawrence D'Oliveiro
The normal, non-comedy way of handling this is to have the task await
something variously called a “future” or “promise”: when that object is
marked as completed, then the task is automatically woken again to
fulfil its purpose.
The problem with a future and a promise is that in most languages you
can't wait for multiple futures at once to have out of order completion.
Of course you can. Any decent event-loop framework will provide this
capability. Python’s asyncio does. I use it all the time.
Ross Finlayson
2024-05-03 01:30:03 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Ross Finlayson
So, the idea of the re-routine, is a sort of co-routine. That is, it
fits the definition of being a co-routine, though as with that when its
asynchronous filling of the memo of its operation is unfulfilled, it
quits by throwing an exception, then is as expected to to called again,
when its filling of the memo is fulfilled, thus that it returns.
The normal, non-comedy way of handling this is to have the task await
something variously called a “future” or “promise”: when that object is
marked as completed, then the task is automatically woken again to fulfil
its purpose.
Thanks for writing, warm regards.

Aye, it's typical that people think that "await" will make for blocking
a thread on an "async" future, because that's the language construct
they've heard about and what people make of things like the "threading
building blocks" or, "machines", in their synchrony, abstractly they're
complex machines.

(These days a lot of what could have been in the MMX registers for SIMD
that those are integer vectors and a lot of them have gone instead to
employ what's the same unit as for the XMM then XMM2, into floating
point or some fixed point vectors, what were often matrix
multiplications for affine geometry in screen coordinates, now is lots
of arithmetic coding or Hopfield nets. "Threading Building Blocks" was a
library that Intel released with language bindings to the intrinsics of
synchronization primitives and other threading building blocks for
complex synchrony. These days something like the UEFI BIOS has that
there's an interface where people are actually supposed to write to the
timing with regards mostly to real-time DRAM refresh, then with the fast
side of the bus and the slow side, then what people get out of that is
just plain SMBIOS and ACPI and some UEFI functions, all sort of mostly
all booted up in an EFI BIOS often in Forth, the totally ubiquitous
64-bit setup on all PCs everywhere, with PCI and PCIe, and some other
very usual adapters like the bog-standard 802.x and WIFI, and some
blinking lights.)


It's all about _timing_, if you get my drift, and just provided all
smoothly above that as it's just another protocol, the firmware.


"Re-Routines": is a great idea, given that, in languages without
language features for asynchrony or for that matter threads, cooperating
multithreading or multitasking, is still a great thing. When there's
only one thread of control then a scheduler can still round-robin these
queues, of non-blocking tasks what apiece courtesy their non-blocking as
re-routines or non-blocking as thanks to select/epoll/kqueue or
non-blocking I/O, it's pretty much the
same pattern.

So I've been thinking about this awhile, in about 2016 I was like "you
know some day Usenet will need a fresh ecosystem of servers" and
so over on sci.math I started tapping away at "Meta: a usenet server
just for sci.math", and came up with this idea of re-routines, and
implemented
a most of it.



If there's one great thing about a re-routine: it's really easy to mock.


(Yeah, I know, I heard that some people actually do not even perceive
that which is not, "literal", which is not figurative, ..., which is not
figurative, ....)

The re-routine is exactly the same and is a model of definition of
synchrony by the usual flow-of-control, that's almost exactly what
"defined behavior of synchrony" is, the definition of state according to
the guarantees of flow-of-control, in the language in the syntax in all
the languages that have procedural flow-of-control.

So, it's, really easy to mock.

Then, it's sort of an abstraction of what also usually the languages
does, the program stack and the call stack. I.e., the memo, where
"memoization" is a very usual term in optimization and can be
unconfused with "cache", the memo has a holder for each of
the results still being used in a re-routine, and a model of
the call stack with regards to "no callbacks, callbacks everywhere,
no futures, futures everywhere", as it's a great model of implicits.


One of the things I would gripe about these days is that
people don't program to PIMPL, which is an abstract,
"point-to-implementation", what in Java is called
"extracting interfaces". There's just connected a giant
adapter with a thousand methods when almost always
the use-case is like "I push to the queue" or "I pop from
the queue", and it's like, you know, it's not so much that
it's easier to mock when the surface is minimal, as that,
it's much easier.

So, here, re-routines are easier to mock in a sense,
but especially easier to implement usual synchronous
modules of them, when the idea is "actually I'd like to
run this machine in-memory and synchronously
before introducing asynchrony and the distributed".

Especially the idea of "re-using the same code for the
synchronous edition and later asynchronous edition",
is mostly for that by the very nature of declaring and
initialization as of returning and holding and passing
and accessing of usable objects, defining dependencies
of synchrony, that's sort of what there is to it.

So, it's a great idea, I've been tapping away on it on
the design of servers for usual protocols on "Meta:
a usenet server just for sci.math".

I imagine it's a very old idea of just sort of modeling
the call stack first-class in routine, as a model of
cooperative multithreading, if it's really a joke then
there are only a dozen jokes in the world already
constantly wrapped as new, maybe it's just too good
to tell.
Chris M. Thomasson
2024-04-29 20:22:52 UTC
Permalink
Post by Stefan Ram
|Anyway, multithreading performance is a non-issue for Python so far as
|the Python interpreter runs in a single-threaded regime anyway, under a
|global GIL lock. They are planning to get rid of GIL, but this work is
|still in development AFAIK. I'm sure it will take years to stabilize the
|whole Python zoo without GIL.
The GIL only prevents multiple Python statements from being
interpreted simultaneously, but if you're waiting on inputs (like
sockets), it's not active, so that could be distributed across
multiple cores.
With asyncio, however, you can easily handle the application
for threads to "wait in parallel" for thousands of sockets in a
single thread, and there are fewer opportunities for errors than
with multithreading.
Additionally, there are libraries like numpy that use true
multithreading internally to distribute computational tasks
across multiple cores. By using such libraries, you can take
advantage of that. (Not to mention the AI libraries that have their
work done in highly parallel fashion by graphics cards.)
If you want real threads, you could probably work with Cython
sometimes.
Other languages like JavaScript seem to have an advantage there
because they don't know a GIL, but with JavaScript, for example,
it's because it always runs in a single thread overall. And in
the languages where there are threads without a GIL, you quickly
realize that programming correct non-trivial programs with
parallel processing is error-prone.
[...]

Have you ever used webworkers?
Lawrence D'Oliveiro
2024-04-29 20:36:57 UTC
Permalink
With asyncio, however, you can easily handle the application for
threads to "wait in parallel" for thousands of sockets in a single
thread, and there are fewer opportunities for errors than with
multithreading.
It makes event-loop programming much more convenient. I posted a
simple example here from some years ago
<https://github.com/HamPUG/meetings/tree/master/2017/2017-05-08/ldo-generators-coroutines-asyncio>:
compare the version based on callbacks, with the one using asyncio:
the former is about 30% bigger.
Stefan Ram
2024-04-30 09:04:48 UTC
Permalink
Post by Stefan Ram
The GIL only prevents multiple Python statements from being
interpreted simultaneously, but if you're waiting on inputs (like
sockets), it's not active, so that could be distributed across
multiple cores.
Disclaimer: This is not on-topic here as it discusses Python,
not C or C++.

FWIW, here's some multithreaded Python code modeled after what
I use in an application.

I am using Python to prepare a press review for me, getting article
headers from several newssites, removing all headers matching a list
of regexps, and integrating everything into a single HTML resource.
(I do not like to read about Lindsay Lohan, for example, so articles
with the text "Lindsay Lohan" will not show up on my HTML review.)

I'm usually downloading all pages at once using Python threads,
which will make sure that a thread uses the CPU while another
thread is waiting for TCP/IP data. This is the code, taken from
my Python program and a bit simplified:

from multiprocessing.dummy import Pool

...

with Pool( 9 if fast_internet else 1 )as pool:
for i in range( 9 ):
content[ i ] = pool.apply_async( fetch,[ uris[ i ] ])
pool.close()
pool.join()

. I'm using my "fetch" function to fetch a single URI, and the
loop starts nine threads within a thread pool to fetch the
content of those nine URIs "in parallel". This is observably
faster than corresponding sequential code.

(However, sometimes I have a slow connection and have to download
sequentially in order not to overload the slow connection, which
would result in stalled downloads. To accomplish this, I just
change the "9" to "1" in the first line above.)

In case you wonder about the "dummy":

|The multiprocessing.dummy module module provides a wrapper
|for the multiprocessing module, except implemented using
|thread-based concurrency.
|
|It provides a drop-in replacement for multiprocessing,
|allowing a program that uses the multiprocessing API to
|switch to threads with a single change to import statements.

. So, this is an area where multithreading the Python way is easy
to use and enhances performance even in the presence of the GIL!
Chris M. Thomasson
2024-05-04 03:44:32 UTC
Permalink
Post by Stefan Ram
Post by Stefan Ram
The GIL only prevents multiple Python statements from being
interpreted simultaneously, but if you're waiting on inputs (like
sockets), it's not active, so that could be distributed across
multiple cores.
Disclaimer: This is not on-topic here as it discusses Python,
not C or C++.
FWIW, here's some multithreaded Python code modeled after what
I use in an application.
I am using Python to prepare a press review for me, getting article
headers from several newssites, removing all headers matching a list
of regexps, and integrating everything into a single HTML resource.
(I do not like to read about Lindsay Lohan, for example, so articles
with the text "Lindsay Lohan" will not show up on my HTML review.)
I'm usually downloading all pages at once using Python threads,
which will make sure that a thread uses the CPU while another
thread is waiting for TCP/IP data. This is the code, taken from
from multiprocessing.dummy import Pool
...
content[ i ] = pool.apply_async( fetch,[ uris[ i ] ])
pool.close()
pool.join()
. I'm using my "fetch" function to fetch a single URI, and the
loop starts nine threads within a thread pool to fetch the
content of those nine URIs "in parallel". This is observably
faster than corresponding sequential code.
(However, sometimes I have a slow connection and have to download
sequentially in order not to overload the slow connection, which
would result in stalled downloads. To accomplish this, I just
change the "9" to "1" in the first line above.)
|The multiprocessing.dummy module module provides a wrapper
|for the multiprocessing module, except implemented using
|thread-based concurrency.
|
|It provides a drop-in replacement for multiprocessing,
|allowing a program that uses the multiprocessing API to
|switch to threads with a single change to import statements.
. So, this is an area where multithreading the Python way is easy
to use and enhances performance even in the presence of the GIL!
Agreed. However, its a very small sample. Try to download 60,000 files
concurrently from different sources all at once. This can be where the
single lock messes with performance...

Loading...