Discussion:
80386 C compiler
(too old to reply)
Paul Edwards
2024-11-24 14:00:08 UTC
Permalink
Hi.

I have been after a public domain C compiler for decades.
None of them reach C90 compliance. SubC comes the
closest but was written without full use of C90, which
makes it difficult to read. I'm after C90 written in C90.

A number of people have tried, but they always seem
to fall short. One of those attempts is pdcc. The
preprocessor was done, but the attempt (by someone
else) to add C code generation was abandoned.

I decided to take a look at it, and it looks to me like
a significant amount of work has already been done.

Also, my scope is limited - I am only after enough
functionality to get my 80386 OS (PDOS) compiled,
and I don't mind short=int=long = 32 bits, I don't
mind not having float. I don't use bitfields.

Anyway, I have had some success in making enhancements
to it, and here is one:

https://sourceforge.net/p/pdos/gitcode/ci/3356e623785e2c2e16c28c5bf8737e72df
d39e04/

But I don't really know what I'm doing (I do know some
of the theory - but this is a particular design).

E.g. now that I have managed to get a variable passed to
a function, I now want the address of that variable passed
to the function - ie I want to do &x instead of x - and I am
not sure whether to create a new ADDRESS type, or
whether it is part of VARREF or what - in the original
(incomplete) concept. Or CC_EXPR_AMPERSAND.

I am happy to do the actual coding work - I'm just looking
for some nudges in the right direction if anyone can assist.

Thanks. Paul.
fir
2024-11-24 17:51:22 UTC
Permalink
Post by Paul Edwards
Hi.
I have been after a public domain C compiler for decades.
None of them reach C90 compliance. SubC comes the
closest but was written without full use of C90, which
makes it difficult to read. I'm after C90 written in C90.
A number of people have tried, but they always seem
to fall short. One of those attempts is pdcc. The
preprocessor was done, but the attempt (by someone
else) to add C code generation was abandoned.
I decided to take a look at it, and it looks to me like
a significant amount of work has already been done.
Also, my scope is limited - I am only after enough
functionality to get my 80386 OS (PDOS) compiled,
and I don't mind short=int=long = 32 bits, I don't
mind not having float. I don't use bitfields.
Anyway, I have had some success in making enhancements
https://sourceforge.net/p/pdos/gitcode/ci/3356e623785e2c2e16c28c5bf8737e72df
d39e04/
But I don't really know what I'm doing (I do know some
of the theory - but this is a particular design).
E.g. now that I have managed to get a variable passed to
a function, I now want the address of that variable passed
to the function - ie I want to do &x instead of x - and I am
not sure whether to create a new ADDRESS type, or
whether it is part of VARREF or what - in the original
(incomplete) concept. Or CC_EXPR_AMPERSAND.
I am happy to do the actual coding work - I'm just looking
for some nudges in the right direction if anyone can assist.
Thanks. Paul.
you mean there is no such a compiler? rise a fund for some to
write it and they will write it..and if few thousand of people
will give some money there it will be written
fir
2024-11-24 17:58:14 UTC
Permalink
Post by fir
Post by Paul Edwards
Hi.
I have been after a public domain C compiler for decades.
None of them reach C90 compliance. SubC comes the
closest but was written without full use of C90, which
makes it difficult to read. I'm after C90 written in C90.
A number of people have tried, but they always seem
to fall short. One of those attempts is pdcc. The
preprocessor was done, but the attempt (by someone
else) to add C code generation was abandoned.
I decided to take a look at it, and it looks to me like
a significant amount of work has already been done.
Also, my scope is limited - I am only after enough
functionality to get my 80386 OS (PDOS) compiled,
and I don't mind short=int=long = 32 bits, I don't
mind not having float. I don't use bitfields.
Anyway, I have had some success in making enhancements
https://sourceforge.net/p/pdos/gitcode/ci/3356e623785e2c2e16c28c5bf8737e72df
d39e04/
But I don't really know what I'm doing (I do know some
of the theory - but this is a particular design).
E.g. now that I have managed to get a variable passed to
a function, I now want the address of that variable passed
to the function - ie I want to do &x instead of x - and I am
not sure whether to create a new ADDRESS type, or
whether it is part of VARREF or what - in the original
(incomplete) concept. Or CC_EXPR_AMPERSAND.
I am happy to do the actual coding work - I'm just looking
for some nudges in the right direction if anyone can assist.
Thanks. Paul.
you mean there is no such a compiler? rise a fund for some to
write it and they will write it..and if few thousand of people
will give some money there it will be written
c compiler imo needes something like about form few months to about year
of coding of 1 person probably if he is somewhat experienced and healthy
so the cost i would probably estimate as prbably something like from
100k maybe 200k maybe 300 k dollars/euros depending on things
(raw cost imo could be more like 100-150k maybe but the risk/taxes etc
could increase it, on the other thing there are people probably able to
do things cheap)
(im not a merchant so i dont know but something as this estimation imo)
Paul Edwards
2024-11-25 00:00:52 UTC
Permalink
Post by fir
Post by fir
you mean there is no such a compiler? rise a fund for some to
write it and they will write it..and if few thousand of people
will give some money there it will be written
c compiler imo needes something like about form few months to about year
of coding of 1 person probably if he is somewhat experienced and healthy
so the cost i would probably estimate as prbably something like from
100k maybe 200k maybe 300 k dollars/euros depending on things
(raw cost imo could be more like 100-150k maybe but the risk/taxes etc
could increase it, on the other thing there are people probably able to
do things cheap)
(im not a merchant so i dont know but something as this estimation imo)
I'm basically paying myself to write the C compiler.

I have reached a point in my life where this is the most useful
thing I can think of doing.

BFN. Paul.
Bart
2024-11-24 18:00:29 UTC
Permalink
Post by fir
Post by Paul Edwards
Hi.
I have been after a public domain C compiler for decades.
None of them reach C90 compliance. SubC comes the
closest but was written without full use of C90, which
makes it difficult to read. I'm after C90 written in C90.
A number of people have tried, but they always seem
to fall short. One of those attempts is pdcc. The
preprocessor was done, but the attempt (by someone
else) to add C code generation was abandoned.
I decided to take a look at it, and it looks to me like
a significant amount of work has already been done.
Also, my scope is limited - I am only after enough
functionality to get my 80386 OS (PDOS) compiled,
and I don't mind short=int=long = 32 bits, I don't
mind not having float. I don't use bitfields.
Anyway, I have had some success in making enhancements
https://sourceforge.net/p/pdos/gitcode/ci/3356e623785e2c2e16c28c5bf8737e72df
d39e04/
But I don't really know what I'm doing (I do know some
of the theory - but this is a particular design).
E.g. now that I have managed to get a variable passed to
a function, I now want the address of that variable passed
to the function - ie I want to do &x instead of x - and I am
not sure whether to create a new ADDRESS type, or
whether it is part of VARREF or what - in the original
(incomplete) concept. Or CC_EXPR_AMPERSAND.
I am happy to do the actual coding work - I'm just looking
for some nudges in the right direction if anyone can assist.
Thanks. Paul.
you mean there is no such a compiler? rise a fund for some to
write it and they will write it..and if few thousand of people
will give some money there it will be written
There are any number of open source C compilers. But they need to be
good enough (too many support only a subset, which may not be enough for
the OP) and they need to be public domain for the OP's purposes.
BGB
2024-11-24 23:46:59 UTC
Permalink
Post by Bart
Post by fir
Post by Paul Edwards
Hi.
I have been after a public domain C compiler for decades.
None of them reach C90 compliance. SubC comes the
closest but was written without full use of C90, which
makes it difficult to read. I'm after C90 written in C90.
A number of people have tried, but they always seem
to fall short. One of those attempts is pdcc. The
preprocessor was done, but the attempt (by someone
else) to add C code generation was abandoned.
I decided to take a look at it, and it looks to me like
a significant amount of work has already been done.
Also, my scope is limited - I am only after enough
functionality to get my 80386 OS (PDOS) compiled,
and I don't mind short=int=long = 32 bits, I don't
mind not having float. I don't use bitfields.
Anyway, I have had some success in making enhancements
https://sourceforge.net/p/pdos/gitcode/
ci/3356e623785e2c2e16c28c5bf8737e72df
d39e04/
But I don't really know what I'm doing (I do know some
of the theory - but this is a particular design).
E.g. now that I have managed to get a variable passed to
a function, I now want the address of that variable passed
to the function - ie I want to do &x instead of x - and I am
not sure whether to create a new ADDRESS type, or
whether it is part of VARREF or what - in the original
(incomplete) concept. Or CC_EXPR_AMPERSAND.
I am happy to do the actual coding work - I'm just looking
for some nudges in the right direction if anyone can assist.
Thanks. Paul.
you mean there is no such a compiler? rise a fund for some to
write it and they will write it..and if few thousand of people
will give some money there it will be written
There are any number of open source C compilers. But they need to be
good enough (too many support only a subset, which may not be enough for
the OP) and they need to be public domain for the OP's purposes.
I am more in the camp of MIT or BSD license should be good enough for
most things.

Trying to go full public domain has a few of its own issues:
* Not always recognized as valid;
* Implicitly lacks "No Warranty" and "No Liability" protections for the
author (say, if someone wanted to file a lawsuit over the code being
buggy, etc).
* ...

There could almost be a "MIT Minus" or something, which could be, say,
MIT with a clause saying one is allowed to discard the license terms for
sake of derived works (but still offering protection from liability).



As for C compilers, I have a compiler for my own uses, but:
* MIT licensed;
* Doesn't target x86.
* Sorta implements C99 with various fragments of newer standards.
** Though, is a bit hit/miss on the now-optional parts.
** VLAs sorta exist but do not necessarily work correctly.
*** Currently unsupported in DLLs;
*** Seemingly may result in memory leaks if used.
*** Essentially, they are implemented via runtime library calls.
**** With memory provided indirectly via malloc.

Old target list (for which the code still exists):
* SH-4 (AKA: SuperH, most well known for SEGA Saturn and Dreamcast)
* BJX-1 (Was a highly modified version of SH-4)
* BTSR1 (a small SH inspired ISA, intended to be comparable to MSP430).
** Not maintained, RV32IC seems like a better option.

Currently active targets:
* BJX-2: Now a group of several closely related variants.
** All are 64-bit, most using a 48-bit VAS (some had a 32-bit VAS)
** Baseline: 16/32/64/96 bit instructions, 32 or 64 GPRs
** XG2: 32/64/96 bit, 64 GPRs
* RISC-V, RV64G + Custom Extensions
** Has some extensions which can help notably with performance.
** Can support plain RV64G as well.
** No current support for the 16-bit 'C' encodings.
* XG3RV
** Mostly a tweaked and repacked version of XG2 used alongside RV64G.
** The XG3 encoding space replaces the RV64 'C' (Compressed) extension.
** Both XG3 and RV64 instructions may be encoded at the same time.
** XG3 is used in a functionally-similar subset, just with 64 GPRs.


Not yet bothered with a target for RV32IC, GCC does this well enough.
* x86/x86-64/ARM: We generally have GCC and Clang.

Granted, GCC and Clang are both very large and slow/painful to rebuild
from source. My compiler is at least a lot smaller and easy to rebuild.

Likely, far more of the total effort of my project has ended up going
into my compiler than into the emulator or Verilog implementation though.



The BJX-2 register space had 64 registers and was split in half for the
RV64G modes (32 GPRs and 32 FPRs), whereas XG3 and my jumbo-prefix
extensions partly undo this split.

( Decided to try changing the way I write my ISA name as maybe adding a
hyphen will get me less trouble... ).


Though, partly this is because for performance BGBCC seems to need a lot
of registers (it could barely operate with the SH4's 16 GPRs, and still
has a fairly high spill-and-fill rate with 32 GPRs).

Though, can note that with my compiler and XG3RV, despite not adding
much over RV64+Jumbo, does beat both code density and performance of
RV64G via "GCC -O3" (and also beats the code-density of RV64GC, as in
this case, fewer instructions is better than smaller instructions).


A big part of the performance delta between the ISAs could be addressed
by adding a few major features to RV64:
* Jumbo Prefixes: Prefix may extend 12-bit imm/disp fields to 33 bits;
** Also extends LUI, AUIPC, and JAL to 33-bit forms.
* Load/Store with a register index;
* Load/Store Pair.

With BGBCC vs GCC RV64G, this gives around a 30% speedup.
* It is closer to 70% if comparing against BGBCC with plain RV64G.
* BGBCC can't match GCC if both are targeting RV64G.
** I am not sure what GCC would do if it had my extensions.

The specific extensions here mostly targeting the dominant sources of
inefficiency in the RV64G encodings as they exist (the ISA design deals
poorly to exceeding what can be encoded directly in an immediate, ...).



The jumbo prefixes may also be used to merge the register space back
into a 64 register space (at the cost of using 64-bit instruction
encodings to do so), but this only extends the imm/disp fields to 23
bits (except for LUI/AUPIC/JAL, which always have an expanded 6b
register field with jumbo prefixes).

Note that J+AUIPC loads an address of PC +/- 4GB into Rd. Likewise,
J+JAL is +/- 4GB (with LSB as MBZ).


The relative performance gains from the XG3RV vs extended RV64G were
smaller, it mostly serves to improve code-density (makes Doom roughly
16% smaller; and is around 44% smaller than plain RV64G).

Main thing it has (in theory) is access to a lot of the specialized SIMD
instructions and similar that exist in my ISA but lack equivalents on
the RV64G side of things.

There are a few instructions that exist here which are tempting to add
as extended instructions to RV64:
* Compare Equal, Not-Equal, and Greater-Equal instructions (SEQ, SNE, SGE);
* Load/Store relative to GP with a larger displacement (TBD, 2).


Some notable features from BJX-2 were effectively made optional in XG3,
such as support for an SR.T bit (originally carried over SuperH), and
predication (in BJX-2, instructions could be encoded for whether or not
to execute based on the status of the SR.T bit). However, no direct
architectural equivalent exists in RV64.

In XG3RV, the questionable design choice had been made to conceptually
holding these parts of the architectural state in the high-order bits of
PC and LR/RA (in my other ISA variants, LR merely captured these bits
from SR).


2: It is tempting to consider, possibly:
LW/LD, SW/SD, with an addressing mode like: [GP+Disp14u*4|8]
So, able to encode an access 64 or 128K relative to GP rather than +/-
2K. This would save some space over the use of a jumbo prefix (at least
with my compiler tending to use GP to access global variables).

Where, it would be "better" here if one could access most of the global
variables in a single 32-bit instruction. But, wouldn't fit in as well
with the existing ISA encodings.


Generally, BGBCC uses a modified PE/COFF variant.
* For RV64G, I switched it to default to using plain PE/COFF.
* Some people might find this slightly easier to deal with.

Though, can note that GNU binutils still has no idea how to handle RV64
PE/COFF, as it seemingly treats every machine-type as its own file
format (and does not support any RV64 + PE/COFF targets).

Where, for some of the other ISAs, BGBCC generates LZ4 compressed
binaries (file headers are uncompressed, but the rest of the image is
compressed). Rationale is mostly that loading binaries from an SDcard is
IO bound, and within the limits tested, LZ4 did best for executable code.


I have another byte-oriented LZ format (RP2) which works better for
general data, but seemingly worse on program binaries. Entropy coded
format were not used, as the speed cost of Huffman decoding is higher
than that of the time spent reading data from an SDcard.

Seeming main difference is the RP2 correlates match length and distance
(to encode a larger distance also encodes a longer match-length field).
This correlation is true of most data, but less true of program
binaries. LZ4 has a fixed 16-bit match distance, and came out ahead.


TBD if I should add support for 64-bit ELF, but ELF kinda sucks IMHO
(and for ELF PIE binaries, roughly around half of the binary ends up
eaten by metadata).

Where, bloated binaries are bad for both loading time and memory use (it
is bad to have a 900K binary as an "ELF tax" when PE/COFF would have
only needed 400K; well, and say the binary LZ4 compresses down to around
260K in the latter case; though one will still need 400K in RAM).

Further not helped in this case by ELF needing to load a new copy of the
binaries for every program instance, whereas with my ABI, I was able to
share the read-only sections across multiple instances (only the
data/bss sections need to be instantiated per-process).

Can note that in my case, the PE/COFF "Global Pointer" entry in the Data
Directory is effectively used to express the start of ".data" (which is
also where the Global Pointer points), along with the combined size of
data and bss (if the size is non-zero).

Global Pointer:
RVA=Size=0: No Global Pointer
RVA!=0, Size=0: Global Pointer points here, may not be relocated.
RVA!=0, Size!=0: Start of data area, may be relocated per instance.


...
Paul Edwards
2024-11-25 00:15:37 UTC
Permalink
Post by BGB
Post by Bart
There are any number of open source C compilers. But they need to be
good enough (too many support only a subset, which may not be enough for
the OP) and they need to be public domain for the OP's purposes.
I am more in the camp of MIT or BSD license should be good enough for
most things.
Yes, there are a lot of people in a lot of camps, and this is
where we end up - no public domain C90 compiler
(cc64 is close, but is generated code, and not 80386).

And so that is what I am trying to achieve now. I've given
up waiting for someone else to do it.
Claimed issues.
Post by BGB
* Not always recognized as valid;
I'm happy to say in the documentation that you can use
CC0 instead if you wish.
Post by BGB
* Implicitly lacks "No Warranty" and "No Liability" protections for the
author (say, if someone wanted to file a lawsuit over the code being
buggy, etc).
You can add such a disclaimer to anything - copyright or
public domain.

Have you ever heard of anyone in the world ever being sued
for writing public domain code that had a bug in it?

PDOS is public domain. It shouldn't be difficult to find what
could be described as a bug in it.

I encourage you to find a bug in it, try to sue me, and see
how far you get.

Anyway - this is part of the reason why we are where we are.
Post by BGB
* MIT licensed;
Yes, yet another non-public domain C compiler described.

BFN. Paul.
Janis Papanagnou
2024-11-24 18:52:55 UTC
Permalink
Post by Paul Edwards
I have been after a public domain C compiler for decades.
[...] I'm after C90 written in C90.
Why formulate the latter condition if you can bootstrap?
(Did you mean; written in a "C" not more recent than C90?)

Janis

https://en.wikipedia.org/wiki/Bootstrapping_(compilers)
Paul Edwards
2024-11-24 23:46:36 UTC
Permalink
Post by Janis Papanagnou
Post by Paul Edwards
I have been after a public domain C compiler for decades.
[...] I'm after C90 written in C90.
Why formulate the latter condition if you can bootstrap?
(Did you mean; written in a "C" not more recent than C90?)
Yes - written in C90 so that it can be maintained with
just knowledge of C90.

And also written in C90 so that it is written naturally
for a C90 programmer, not using a subset of C90
(like SubC written on the assumption that "struct"
doesn't exist).

BFN. Paul.
Kaz Kylheku
2024-11-25 18:23:58 UTC
Permalink
Post by Paul Edwards
Post by Janis Papanagnou
Post by Paul Edwards
I have been after a public domain C compiler for decades.
[...] I'm after C90 written in C90.
Why formulate the latter condition if you can bootstrap?
(Did you mean; written in a "C" not more recent than C90?)
Yes - written in C90 so that it can be maintained with
just knowledge of C90.
And also written in C90 so that it is written naturally
for a C90 programmer, not using a subset of C90
But, do yourself a favor and, have it as an extension to allow
non-constant expressions to allow block scoped aggregates:

void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */

(You don't have to use it in the source code of the thing,
so it can be boostrapped by other C90 compilers without
the extension.)

Also, pin down the truncation behavior of / and % to match C99.
(Though, again, without relying on that in the C90 source
of the compiler.)

Define the behavior of a [0] array at the end of a struct,
so that the C90 struct hack is "blessed" in your implementation.
The C99 flexible array member cannot be used, after all.
You can have it so that [0] has the same semantics as C99 []
in that role.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Rosario19
2024-11-25 21:14:06 UTC
Permalink
Post by Kaz Kylheku
void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */
is in the above foo() called before bar()?

void fn(int a)
{
int x[3];
x[0]=foo(); x[1]=bar(); x[2]=a;

this would be ok with every C compiler
Kaz Kylheku
2024-11-26 17:59:08 UTC
Permalink
Post by Rosario19
Post by Kaz Kylheku
void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */
is in the above foo() called before bar()?
No, you cannot rely on that. Maybe it's fixed in a more recent standard,
but C99 (which I happen to have open in a PDF reader tab) stated that
"The order in which any side effects occur among the initialization list
expressions is unspecified.". This implies that there is no sequence
point between any two initializing expressions, which means we don't
know whose expression's function call takes place first.

In any case, a C90 compiler with the above support as an extension to
C90 can specify rigid sequencing behavior.
Post by Rosario19
void fn(int a)
{
int x[3];
x[0]=foo(); x[1]=bar(); x[2]=a;
this would be ok with every C compiler
One problem is, if you're doing this because your compiler is C90, you
also have to do something about all declarations which follow the int
x[3], since they cannot occur after a statement. You can add another
level of block nesting for them, or whatever.

Initialization is preferable to leaving an object uninitialized and
assigning. There is a scope where the name is visible, but the object
is not initialized, inviting code to be inserted there which tries
to use it.

If I needed foo to be called before bar, I would still rather do
the following than assignment:

int f = foo();
int b = bar();
int x[3] = { f, b, a };
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Keith Thompson
2024-11-26 21:05:12 UTC
Permalink
Post by Kaz Kylheku
Post by Rosario19
Post by Kaz Kylheku
void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */
is in the above foo() called before bar()?
No, you cannot rely on that. Maybe it's fixed in a more recent standard,
but C99 (which I happen to have open in a PDF reader tab) stated that
"The order in which any side effects occur among the initialization list
expressions is unspecified.". This implies that there is no sequence
point between any two initializing expressions, which means we don't
know whose expression's function call takes place first.
N3096 (C23 draft) has :
"""
The evaluations of the initialization list expressions are
indeterminately sequenced with respect to one another and thus the order
in which any side effects occur is unspecified.
"""

C23 is more explicit (redundant?) than C99, which doesn't mention the
lack of a sequence point. (C11 dropped sequence points, replacing them
with "sequenced before", "sequenced after", and "unsequenced", basically
a new way of describing the same semantics.)

Given:

int n = 42;
int a[] = { n++, n++ };

C99 could imply that the value of a is merely unspecified, either {
42, 43 } or { 43, 42 }. Though it can almost certainly be inferred
from other parts of the C99 standard that there is no sequence
point between the two evaluations of n++ (I haven't taken the time
to check).
Post by Kaz Kylheku
In any case, a C90 compiler with the above support as an extension to
C90 can specify rigid sequencing behavior.
True, but I don't know of anyone who's interested in a C 90 compiler
with this kind of extension. Paul Edwards has made it clear he's only
interested in unextended C90, and anyone else can just use a more modern
compiler.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Paul Edwards
2024-11-26 21:30:21 UTC
Permalink
Post by Keith Thompson
True, but I don't know of anyone who's interested in a C 90 compiler
with this kind of extension. Paul Edwards has made it clear he's only
interested in unextended C90, and anyone else can just use a more modern
compiler.
While not a "compiler" per se, there is one extension to
C90 I might add, which is to have formal names like:

ESC_CHAR '\x1b'
ESC_CHAR_STR "\x1b"

that would allow me to support ASCII and EBCDIC in
my "starter suite".

Microemacs and msged need them.

I probably need names for the control keys too for microemacs.

I'll need to revisit the code to be sure.

But that's what my expectations are for a minimal close-to-C90
standard are - something that will allow a portable implementation
of the basic tools.

It wasn't obvious to me when I started that that was even possible.

Note that this assumes the pre-existence of something like a
BIOS (similar to UEFI), which I call a pseudo-bios. That is
not expected to be portable.

And although I am not expecting the C library to be portable,
to my surprise it is in fact portable other than setjmp/longjmp.

BFN. Paul.
Keith Thompson
2024-11-26 22:27:51 UTC
Permalink
Post by Paul Edwards
Post by Keith Thompson
True, but I don't know of anyone who's interested in a C 90 compiler
with this kind of extension. Paul Edwards has made it clear he's only
interested in unextended C90, and anyone else can just use a more modern
compiler.
While not a "compiler" per se, there is one extension to
ESC_CHAR '\x1b'
ESC_CHAR_STR "\x1b"
that would allow me to support ASCII and EBCDIC in
my "starter suite".
I don't see why this needs to be a language extension. Just define it
as a macro wherever it's needed.
Post by Paul Edwards
Microemacs and msged need them.
Do they?
Post by Paul Edwards
I probably need names for the control keys too for microemacs.
I'll need to revisit the code to be sure.
My guess is that getting microemacs and/or msged to work with EBCDIC is
going to involve more than just defining the Escape character.

For example, here's a code fragment from msged :

while ((ch != 'a') && (ch != 'r')) {
ch = 0x7f & getkey();
ch = tolower(ch);
if (ch == 0x1b)
return(NULL);
}

0x1b is the ASCII code for the Escape character. Defining a macro
*within the code* is nearly trivial; the only tricky part would be
determining whether the current system uses EBCDIC.. But masking the
character value will break on an EBCDIC system, where many printable
characters have codes exceeding 0x7f. (This is assuming there's any
reason at all to make microemacs and msged support EBCDIC, something I'm
very skeptical about.)

If you insist on using a language extension to support the Escape
character, you could just copy gcc's '\e'.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Paul Edwards
2024-11-27 05:23:49 UTC
Permalink
Post by Keith Thompson
Post by Paul Edwards
Post by Keith Thompson
True, but I don't know of anyone who's interested in a C 90 compiler
with this kind of extension. Paul Edwards has made it clear he's only
interested in unextended C90, and anyone else can just use a more modern
compiler.
While not a "compiler" per se, there is one extension to
ESC_CHAR '\x1b'
ESC_CHAR_STR "\x1b"
that would allow me to support ASCII and EBCDIC in
my "starter suite".
I don't see why this needs to be a language extension. Just define it
as a macro wherever it's needed.
Because it is something I expect from the language - a portable
way to provide the keys required to drive an ANSI X3.64 terminal.
Post by Keith Thompson
Post by Paul Edwards
Microemacs and msged need them.
Do they?
How else do you propose providing a fullscreen interface?

We have a standard - ANSI X3.64.
Post by Keith Thompson
Post by Paul Edwards
I probably need names for the control keys too for microemacs.
I'll need to revisit the code to be sure.
My guess is that getting microemacs and/or msged to work with EBCDIC is
going to involve more than just defining the Escape character.
microemacs has been working on EBCDIC for years. I ported
it already (to a sufficient extent, anyway).
Post by Keith Thompson
while ((ch != 'a') && (ch != 'r')) {
ch = 0x7f & getkey();
ch = tolower(ch);
if (ch == 0x1b)
return(NULL);
}
I haven't attempted to do msged yet. But yes, that's exactly
the sort of code that I want to eliminate. Although that
particular bit of code isn't in the msged version I am using:

D:\devel\msged\src>grep -i 7f *
fido.c: tms.tm_year = (x & 0x7f) + 80;
keys.h: #define Key_C_BS 0x007f
keys.h: #define Key_A_8 0x7f00
spawn.asm: xfcb2 db 16 dup(?) ; 70..7F - default FCB

D:\devel\msged\src>

I can't remember if I previously eliminated it myself, but if I did,
it must have been in the 1990s, because I see no change involving
a 7f from my oldest available release from the 1990s.

But yes - that's the whole point - I expect to be able to write
that code portably, in either the standard, or a modified
standard - whatever is required to get ANSI X3.159-1989
to support ANSI X3.64.

It could be an ANSI X3.64 extension I suppose.
Post by Keith Thompson
0x1b is the ASCII code for the Escape character. Defining a macro
*within the code* is nearly trivial;
Defining it in a standard C90 header file or some extension
is equally as trivial, and would put it where it belongs, rather
than in every single fullscreen application.
Post by Keith Thompson
the only tricky part would be
determining whether the current system uses EBCDIC.. But masking the
character value will break on an EBCDIC system, where many printable
characters have codes exceeding 0x7f.
And a C90-compliant program *already* shouldn't be doing
such masks, as C90 *already* allows for EBCDIC.
Post by Keith Thompson
(This is assuming there's any
reason at all to make microemacs and msged support EBCDIC, something I'm
very skeptical about.)
What editor would you like me to use on my mainframe
operating systems (z/PDOS and z/PDOS-generic) instead?
edlin?

I was using microemacs today on z/PDOS-generic to modify
my makefile.zpg in PDPCLIB, as I am now able to run the
entire toolchain to produce an executable. Actually I could
already do that previously, but now I can use standard
HLASM in the process.
Post by Keith Thompson
If you insist on using a language extension to support the Escape
character, you could just copy gcc's '\e'.
That puts a burden on the compiler - every compiler,
basically - which is far from the trivial addition to an
existing header file, or a new header file, that I
suggested as an alternative.

BTW, the subject says 80386 C compiler, but I am likely
going to do S/370 (which also runs on z/Arch) at the same
time, as I realized that circumstances have changed such
that I can have a public domain suite for the mainframe
too if the compiler is covered. A bit of work would be
required to z/PDOS-generic to support the IFOX
MVS executables to cover the assembler. Unfortunately
the assembler is written in assembler instead of C (for
obvious historical reasons), but it's still public domain.

BFN. Paul.
Keith Thompson
2024-11-27 05:59:06 UTC
Permalink
Post by Paul Edwards
Post by Keith Thompson
Post by Paul Edwards
Post by Keith Thompson
True, but I don't know of anyone who's interested in a C 90
compiler with this kind of extension. Paul Edwards has made it
clear he's only interested in unextended C90, and anyone else can
just use a more modern compiler.
While not a "compiler" per se, there is one extension to
ESC_CHAR '\x1b'
ESC_CHAR_STR "\x1b"
that would allow me to support ASCII and EBCDIC in
my "starter suite".
I don't see why this needs to be a language extension. Just define it
as a macro wherever it's needed.
Because it is something I expect from the language - a portable
way to provide the keys required to drive an ANSI X3.64 terminal.
I don't think that's a reasonable expectation, but it's your project,
so do what you like.
Post by Paul Edwards
Post by Keith Thompson
Post by Paul Edwards
Microemacs and msged need them.
Do they?
How else do you propose providing a fullscreen interface?
We have a standard - ANSI X3.64.
It doesn't address EBCDIC. If you want to create your own standard that
does, nobody is going to stop you.
Post by Paul Edwards
Post by Keith Thompson
Post by Paul Edwards
I probably need names for the control keys too for microemacs.
I'll need to revisit the code to be sure.
My guess is that getting microemacs and/or msged to work with EBCDIC is
going to involve more than just defining the Escape character.
microemacs has been working on EBCDIC for years. I ported
it already (to a sufficient extent, anyway).
Apparently you think you need a language extension for something
you've already implemented. Odd.
Post by Paul Edwards
Post by Keith Thompson
while ((ch != 'a') && (ch != 'r')) {
ch = 0x7f & getkey();
ch = tolower(ch);
if (ch == 0x1b)
return(NULL);
}
I haven't attempted to do msged yet. But yes, that's exactly
the sort of code that I want to eliminate. Although that
It's from https://github.com/jrnutt/msged .

[...]
Post by Paul Edwards
But yes - that's the whole point - I expect to be able to write
that code portably, in either the standard, or a modified
standard - whatever is required to get ANSI X3.159-1989
to support ANSI X3.64.
The (ancient and obsolete) 1990 ISO C standard does not support your
expectation.
Post by Paul Edwards
It could be an ANSI X3.64 extension I suppose.
Post by Keith Thompson
0x1b is the ASCII code for the Escape character. Defining a macro
*within the code* is nearly trivial;
Defining it in a standard C90 header file or some extension
is equally as trivial, and would put it where it belongs, rather
than in every single fullscreen application.
OK, you have your opinion about where it "belongs". I won't continue
arguing about it.
Post by Paul Edwards
Post by Keith Thompson
the only tricky part would be
determining whether the current system uses EBCDIC.. But masking the
character value will break on an EBCDIC system, where many printable
characters have codes exceeding 0x7f.
And a C90-compliant program *already* shouldn't be doing
such masks, as C90 *already* allows for EBCDIC.
Or for any other character set that satisifies a few requirements, and
which may or may not have a character called "escape".

Apparently by "C90-compliant" you mean "100% portable", which is
certainly not what I mean by it. The source code for microemacs and
msged is not 100% portable. If you want to work on making it so, knock
yourself out.
Post by Paul Edwards
Post by Keith Thompson
(This is assuming there's any
reason at all to make microemacs and msged support EBCDIC, something I'm
very skeptical about.)
What editor would you like me to use on my mainframe
operating systems (z/PDOS and z/PDOS-generic) instead?
edlin?
Use any editor you like. You've now said you already have a working
microemacs for mainframe operating systems.

[...]
Post by Paul Edwards
Post by Keith Thompson
If you insist on using a language extension to support the Escape
character, you could just copy gcc's '\e'.
That puts a burden on the compiler - every compiler,
basically - which is far from the trivial addition to an
existing header file, or a new header file, that I
suggested as an alternative.
What do you mean by "every compiler"? I thought all you cared
about was a (currently nonexistent) public domain C90 compiler.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Paul Edwards
2024-11-27 13:50:35 UTC
Permalink
Post by Keith Thompson
Post by Paul Edwards
Post by Keith Thompson
Post by Paul Edwards
Post by Keith Thompson
True, but I don't know of anyone who's interested in a C 90
compiler with this kind of extension. Paul Edwards has made it
clear he's only interested in unextended C90, and anyone else can
just use a more modern compiler.
While not a "compiler" per se, there is one extension to
ESC_CHAR '\x1b'
ESC_CHAR_STR "\x1b"
that would allow me to support ASCII and EBCDIC in
my "starter suite".
I don't see why this needs to be a language extension. Just define it
as a macro wherever it's needed.
Because it is something I expect from the language - a portable
way to provide the keys required to drive an ANSI X3.64 terminal.
I don't think that's a reasonable expectation, but it's your project,
so do what you like.
Post by Paul Edwards
Post by Keith Thompson
Post by Paul Edwards
Microemacs and msged need them.
Do they?
How else do you propose providing a fullscreen interface?
We have a standard - ANSI X3.64.
It doesn't address EBCDIC. If you want to create your own standard that
does, nobody is going to stop you.
EBCDIC ANSI is (or can be) identical to ASCII ANSI.
The same text is used to drive the screen. All the characters
have EBCDIC equivalents. They're all displayable ASCII
except for when the escape character is used - and that's
why I proposed adding a formal define for it - so that you
can drive a terminal in a portable manner.
Post by Keith Thompson
Post by Paul Edwards
Post by Keith Thompson
Post by Paul Edwards
I probably need names for the control keys too for microemacs.
I'll need to revisit the code to be sure.
My guess is that getting microemacs and/or msged to work with EBCDIC is
going to involve more than just defining the Escape character.
microemacs has been working on EBCDIC for years. I ported
it already (to a sufficient extent, anyway).
Apparently you think you need a language extension for something
you've already implemented. Odd.
Depends on the definition of "need".

Yes I have it working. But it required conditional compilation
for the hex code of the ESC character. That's what I find odd.
That I can write an entire toolchain, including OS, portably
in C90, except for a smalll portion of the editor.
Post by Keith Thompson
Post by Paul Edwards
Post by Keith Thompson
while ((ch != 'a') && (ch != 'r')) {
ch = 0x7f & getkey();
ch = tolower(ch);
if (ch == 0x1b)
return(NULL);
}
I haven't attempted to do msged yet. But yes, that's exactly
the sort of code that I want to eliminate. Although that
It's from https://github.com/jrnutt/msged .
Thanks. That is msged 2.x. I am using msged 3.x. Which may
have been created after the original author disappeared,
presumably died.
Post by Keith Thompson
Post by Paul Edwards
But yes - that's the whole point - I expect to be able to write
that code portably, in either the standard, or a modified
standard - whatever is required to get ANSI X3.159-1989
to support ANSI X3.64.
The (ancient and obsolete) 1990 ISO C standard does not support your
expectation.
It comes very very close. I'm just trying to eliminate that gap
with a reasonable extension.
Post by Keith Thompson
Post by Paul Edwards
It could be an ANSI X3.64 extension I suppose.
Post by Keith Thompson
0x1b is the ASCII code for the Escape character. Defining a macro
*within the code* is nearly trivial;
Defining it in a standard C90 header file or some extension
is equally as trivial, and would put it where it belongs, rather
than in every single fullscreen application.
OK, you have your opinion about where it "belongs". I won't continue
arguing about it.
Post by Paul Edwards
Post by Keith Thompson
the only tricky part would be
determining whether the current system uses EBCDIC.. But masking the
character value will break on an EBCDIC system, where many printable
characters have codes exceeding 0x7f.
And a C90-compliant program *already* shouldn't be doing
such masks, as C90 *already* allows for EBCDIC.
Or for any other character set that satisifies a few requirements, and
which may or may not have a character called "escape".
Correct - so I would be introducing something stricter than
C90 - a requirement for the character set to include an ESC,
so that an ANSI terminal can be driven.

And I think there is no avoiding a set of control keys so that
the microemacs keystrokes can be supported.

So yeah - ASCII and EBCDIC would both qualify. Other
theoretical character sets would be supported too. But yes,
some other real or theoretical character sets would not be
supported, so you can't use "the" portable version of
microemacs to drive an attached ANSI terminal.
Post by Keith Thompson
Apparently by "C90-compliant" you mean "100% portable", which is
certainly not what I mean by it.
Well - "strictly conforming" is the proper term I believe.

I want to be strictly conforming to something close to C90
that takes into consideration the need for fullscreen apps
on another ANSI standard (X3.64).
Post by Keith Thompson
The source code for microemacs and
msged is not 100% portable. If you want to work on making it so, knock
yourself out.
Right - that's exactly what I'm doing. But I cannot do so if I
need to hardcode the code point of the ESC character.

Something has to give.
Post by Keith Thompson
Post by Paul Edwards
Post by Keith Thompson
(This is assuming there's any
reason at all to make microemacs and msged support EBCDIC, something I'm
very skeptical about.)
What editor would you like me to use on my mainframe
operating systems (z/PDOS and z/PDOS-generic) instead?
edlin?
Use any editor you like. You've now said you already have a working
microemacs for mainframe operating systems.
For 2 somewhat independent (rewrites) mainframe operating systems, yes.

It should be possible to get them to work under other mainframe
operating systems (ie MVS TSO) too, with an appropriate
attached terminal (likely emulated by a PC as is almost
universally done now). Would likely work on z/OS USS or
whatever it is called too, but I've never investigated that.
Post by Keith Thompson
Post by Paul Edwards
Post by Keith Thompson
If you insist on using a language extension to support the Escape
character, you could just copy gcc's '\e'.
That puts a burden on the compiler - every compiler,
basically - which is far from the trivial addition to an
existing header file, or a new header file, that I
suggested as an alternative.
What do you mean by "every compiler"? I thought all you cared
about was a (currently nonexistent) public domain C90 compiler.
No. I care about all compilers.

The (work in progress) public domain ones satisfy one thing.

But I want to be able to go to any compiler I happen to be
using and add the necessary C90-X3.64-and-possibly-others
extensions.

Note that it is only by emperical testing that I found that the
only thing I needed was extensions for X3.64. But it is
possible that I have missed something and I would like
C90 extended to support some other ANSI standard
(or something else). Before the emperical results were in
I certainly didn't know what the limits of C90 were. Oh
yeah - in support of the FAT filesystem I have some
conditional compilation between ASCII and EBCDIC
too. To mark the file as "deleted". So possibly I need to
(in principle at least), change the FAT filesystem delete
character to something defined in C90+X3.64.

Note that z/PDOS-generic has a FAT file system in EBCDIC.
The EBCDIC character is what you get from what I consider
to be a definitive 819/1047 translation in Hercules.

BFN. Paul.
Tim Rentsch
2024-11-28 00:45:17 UTC
Permalink
Post by Keith Thompson
Post by Rosario19
Post by Kaz Kylheku
void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */
is in the above foo() called before bar()?
No, you cannot rely on that. Maybe it's fixed in a more recent standard,
but C99 (which I happen to have open in a PDF reader tab) stated that
"The order in which any side effects occur among the initialization list
expressions is unspecified.". This implies that there is no sequence
point between any two initializing expressions, which means we don't
know whose expression's function call takes place first.
"""
The evaluations of the initialization list expressions are
indeterminately sequenced with respect to one another and thus the order
in which any side effects occur is unspecified.
"""
C23 is more explicit (redundant?) than C99, which doesn't mention the
lack of a sequence point. (C11 dropped sequence points, replacing them
with "sequenced before", "sequenced after", and "unsequenced", basically
a new way of describing the same semantics.)
int n = 42;
int a[] = { n++, n++ };
C99 could imply that the value of a is merely unspecified, either {
42, 43 } or { 43, 42 }. Though it can almost certainly be inferred
from other parts of the C99 standard that there is no sequence
point between the two evaluations of n++ (I haven't taken the time
to check).
Under C99 rules, I believe this initializer has undefined
behavior, because of more than one modification of an object
without an intervening sequence point.
David Brown
2024-11-27 10:00:42 UTC
Permalink
Post by Kaz Kylheku
Post by Rosario19
Post by Kaz Kylheku
void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */
is in the above foo() called before bar()?
No, you cannot rely on that. Maybe it's fixed in a more recent standard,
The implication of the word "fixed" is that you think the current
standards as somehow "broken" in this respect. Do you think that is the
case?

The C standards go to quite an effort to say when you have a guarantee
about the order of execution, and when the compiler has the freedom to
re-arrange things for greater efficiency (and perhaps to discourage
people from writing unclear code).
Kaz Kylheku
2024-11-27 19:42:50 UTC
Permalink
Post by David Brown
Post by Kaz Kylheku
Post by Rosario19
Post by Kaz Kylheku
void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */
is in the above foo() called before bar()?
No, you cannot rely on that. Maybe it's fixed in a more recent standard,
The implication of the word "fixed" is that you think the current
standards as somehow "broken" in this respect. Do you think that is the
case?
The specification has an inconsistency, because it gives the order
in which initializations occur, yet not the order of evaluation of
the expressions that produce their values.

Above we know that x[0] is initialized first before x[1].

That doesn't even matter unless initializations are being observed,
which they can be if there is self-reference going on, like:

int x[3] = { foo(), x[0] + bar(), x[0] + x[1] }

I'm assuming this sort of thing must to be the purpose for specifying
the order of initialization.

It looks inconsistent to me that the effects of the subobjects receiving
their inital values are ordered, but all other effects are not.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
James Kuyper
2024-11-27 19:59:08 UTC
Permalink
On 11/27/24 14:42, Kaz Kylheku wrote:
...
Post by Kaz Kylheku
The specification has an inconsistency, because it gives the order
in which initializations occur, yet not the order of evaluation of
the expressions that produce their values.
That's not an inconsistency, it's a deliberate choice to give
implementations freedom to use whichever order is most convenient.
Kaz Kylheku
2024-11-27 21:52:10 UTC
Permalink
Post by James Kuyper
...
Post by Kaz Kylheku
The specification has an inconsistency, because it gives the order
in which initializations occur, yet not the order of evaluation of
the expressions that produce their values.
That's not an inconsistency, it's a deliberate choice to give
implementations freedom to use whichever order is most convenient.
Implementations are not given freedom about initialization order;
in { A, B } the initialization implied by A happens before B.

Granting a freedom here while taking it away there is inconsistent.

Expression B may rely on the initialization A having completed, but
not on the effects of A having been settled.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
James Kuyper
2024-11-27 23:07:04 UTC
Permalink
Post by Kaz Kylheku
Post by James Kuyper
...
Post by Kaz Kylheku
The specification has an inconsistency, because it gives the order
in which initializations occur, yet not the order of evaluation of
the expressions that produce their values.
That's not an inconsistency, it's a deliberate choice to give
implementations freedom to use whichever order is most convenient.
Implementations are not given freedom about initialization order;
in { A, B } the initialization implied by A happens before B.
Granting a freedom here while taking it away there is inconsistent.
Expression B may rely on the initialization A having completed, but
not on the effects of A having been settled.
I'm sorry - I thought you meant that they were logically inconsistent.
What you're actually saying is more like stylistically inconsistent.

In C90, the order in which the initializers were evaluated didn't
matter, because they were required to be static initializers. It was
only in C99 that they were allowed to be arbitrary expressions.

However, in the same version of the standard, designated initializers
were added. Designated initializers are allowed to update elements in a
different order from their order in memory, and to initialize the same
element multiple times, with only the final initialization actually
occurring. This can be convenient for setting up a rule and then adding
exceptions to that rule. If there weren't a rule mandating the order in
which initializers were applied, when two or more initializers affect
the same object, it wouldn't be possible to be certain which one
overrode the others.
Kaz Kylheku
2024-11-30 01:30:15 UTC
Permalink
Post by James Kuyper
Post by Kaz Kylheku
Post by James Kuyper
...
Post by Kaz Kylheku
The specification has an inconsistency, because it gives the order
in which initializations occur, yet not the order of evaluation of
the expressions that produce their values.
That's not an inconsistency, it's a deliberate choice to give
implementations freedom to use whichever order is most convenient.
Implementations are not given freedom about initialization order;
in { A, B } the initialization implied by A happens before B.
Granting a freedom here while taking it away there is inconsistent.
Expression B may rely on the initialization A having completed, but
not on the effects of A having been settled.
I'm sorry - I thought you meant that they were logically inconsistent.
What you're actually saying is more like stylistically inconsistent.
In C90, the order in which the initializers were evaluated didn't
matter, because they were required to be static initializers. It was
only in C99 that they were allowed to be arbitrary expressions.
However, in the same version of the standard, designated initializers
were added. Designated initializers are allowed to update elements in a
different order from their order in memory, and to initialize the same
element multiple times, with only the final initialization actually
occurring. This can be convenient for setting up a rule and then adding
exceptions to that rule.
But it simply ends up being left to right.

Given { A, B, C }, the members are initialized in order of increasing
offset address, corresponding to left-to-right order in the syntax.

Given { [2] = A, [1] = B, [0] = C }, they are initialized in the
left-to-right order in the syntax: [2] first, then [1] then [0].

So we have order. And yet we don't have order; the expressions are not
actually sequenced.
Post by James Kuyper
If there weren't a rule mandating the order in
which initializers were applied, when two or more initializers affect
the same object, it wouldn't be possible to be certain which one
overrode the others.
It would make sense for that simply to be a constraint violation;
two initializations for the same object are being requested.

There is no sequencing in the initialization: { i++, i++ } would
be undefined behavior. Yet, you can request multiple initializations
of the same subobject and have it safely resolved to the rightmost?
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Tim Rentsch
2024-11-30 05:00:20 UTC
Permalink
Post by Kaz Kylheku
If there weren't a rule mandating the order in which initializers
were applied, when two or more initializers affect the same
object, it wouldn't be possible to be certain which one overrode
the others.
That's wrong. The priority rule for initializing the same subobject
depends not on order of evaluation but on syntactic order. There
doesn't have to be a rule for evaluation order to make the order
of subobject overriding be well defined.
Post by Kaz Kylheku
It would make sense for that simply to be a constraint violation;
two initializations for the same object are being requested.
It isn't that simple. There are situations where overriding the
initialization of a particular subobject makes sense, and is
useful. Example:

typedef struct { int x, y; } Bas;
typedef struct { Bas b[2]; } Foo;

Foo
sample_foo( Bas b ){
Foo foo = { b, b, .b[1].y = -1 };
return foo;
}

The subobject .b[1].y is overridden, but we can't take the previous
initialization of .b[1] without changing the semantics.
James Kuyper
2024-11-30 14:00:32 UTC
Permalink
...
Post by Kaz Kylheku
Post by James Kuyper
In C90, the order in which the initializers were evaluated didn't
matter, because they were required to be static initializers. It was
only in C99 that they were allowed to be arbitrary expressions.
However, in the same version of the standard, designated initializers
were added. Designated initializers are allowed to update elements in a
different order from their order in memory, and to initialize the same
element multiple times, with only the final initialization actually
occurring. This can be convenient for setting up a rule and then adding
exceptions to that rule.
But it simply ends up being left to right.
Why is that it a "but"? If you want to give users control over the order
of initialization, what is simpler or more natural that using the
textual order.
Post by Kaz Kylheku
Given { A, B, C }, the members are initialized in order of increasing
offset address, corresponding to left-to-right order in the syntax.
Given { [2] = A, [1] = B, [0] = C }, they are initialized in the
left-to-right order in the syntax: [2] first, then [1] then [0].
So we have order. And yet we don't have order; the expressions are not
actually sequenced.
You can always make something sound contradictory or confusing by
leaving out the details that resolve the contradiction or remove the
confusion.
Yes, the initializations of the members of an aggregate object are
ordered. And also yes, the evaluations of the initializer expressions
for those objects are unordered, the same as is generally the case -
there's only a few features of C that impose order on expressions - the
semicolon at the ends of declarations or statements are the most common.
As a general way, whenever you need to order the evaluation of
expressions that would otherwise be unordered, the way to do so is
simply put them in a different declarations or statements, which often
requires creating temperaries to hold the results of intermediate
evaluations.
Post by Kaz Kylheku
Post by James Kuyper
If there weren't a rule mandating the order in
which initializers were applied, when two or more initializers affect
the same object, it wouldn't be possible to be certain which one
overrode the others.
It would make sense for that simply to be a constraint violation;
two initializations for the same object are being requested.
The committee felt otherwise. The standard quite explicitly says: "...
each initializer provided for a particular subobject overriding any
previously listed initializer for the same subobject ..." (6.7.10p20). I
agree - I can see obscure situations where that rule makes the feature
more convenient. The standard also provides a relevant example where the
intended behavior depends upon this feature:

"EXAMPLE 13 Space can be "allocated" from both ends of an array by using
a single designator:
int a[MAX] = {
1, 3, 5, 7, 9, [MAX-5] = 8, 6, 4, 2, 0
};

In the above, if MAX is greater than ten, there will be some zero-valued
elements in the middle; if it is less than ten, some of the values
provided by the first five initializers will be overridden by the
second five."

If you wanted the behavior to depend upon the value of MAX in precisely
the fashion provided by this feature, and this feature were not
available, the code would have to be a lot more complicated.
Post by Kaz Kylheku
There is no sequencing in the initialization: { i++, i++ } would
be undefined behavior. Yet, you can request multiple initializations
of the same subobject and have it safely resolved to the rightmost?
Correct. If you need initializer expressions to be ordered, you'll have
to put them in different statements or declarations.
David Brown
2024-11-28 08:12:40 UTC
Permalink
Post by Kaz Kylheku
Post by David Brown
Post by Kaz Kylheku
Post by Rosario19
Post by Kaz Kylheku
void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */
is in the above foo() called before bar()?
No, you cannot rely on that. Maybe it's fixed in a more recent standard,
The implication of the word "fixed" is that you think the current
standards as somehow "broken" in this respect. Do you think that is the
case?
The specification has an inconsistency, because it gives the order
in which initializations occur, yet not the order of evaluation of
the expressions that produce their values.
Above we know that x[0] is initialized first before x[1].
That doesn't even matter unless initializations are being observed,
int x[3] = { foo(), x[0] + bar(), x[0] + x[1] }
I'm assuming this sort of thing must to be the purpose for specifying
the order of initialization.
It looks inconsistent to me that the effects of the subobjects receiving
their inital values are ordered, but all other effects are not.
I don't see any justification in the standard for assuming that the
initialisers are evaluated in any particular order. The standard (at
least my reading of section 6.7.9) gives a clear order to the
initialisation itself (which may, of course, be re-ordered by the
compiler under "as-if" rules) so that if you are using designated
initialisers, it is clear that the last initialiser for each element is
what counts. Not only does that section say nothing about the order of
evaluation for the parts, it says that if an initialiser is overridden,
then it may not be evaluated at all (with the implication being that it
might be evaluated and the result dropped).

Section 6.7.9p23:

"""
The evaluations of the initialization list expressions are
indeterminately sequenced with respect to one another and thus the order
in which any side effects occur is unspecified. 152)

152) In particular, the evaluation order need not be the same as the
order of subobject initialization.
"""


This is very much like the evaluation of arguments to a function call -
there is no specified order or sequencing between the evaluations of the
arguments.
Tim Rentsch
2024-11-28 03:26:21 UTC
Permalink
Post by Rosario19
Post by Kaz Kylheku
void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */
is in the above foo() called before bar()?
No, you cannot rely on that. Maybe it's fixed in a more recent
standard, but C99 (which I happen to have open in a PDF reader
tab) stated that "The order in which any side effects occur among
the initialization list expressions is unspecified.". This
implies that there is no sequence point between any two
initializing expressions, which means we don't know whose
expression's function call takes place first.
Challenge exercise for C standard enthusiasts: It is possible
(in C99 and later) to write an initializer for x[] that puts
in the same values as the initializer above, but guarantees
foo() is called before bar(). Hint: nothing else is needed
besides a different writing of the initializer for x[] (still
an array of length 3). How to do it?
Rosario19
2024-11-30 15:41:03 UTC
Permalink
Post by Kaz Kylheku
Post by Rosario19
Post by Kaz Kylheku
void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */
is in the above foo() called before bar()?
No, you cannot rely on that. Maybe it's fixed in a more recent standard,
but C99 (which I happen to have open in a PDF reader tab) stated that
"The order in which any side effects occur among the initialization list
expressions is unspecified.". This implies that there is no sequence
point between any two initializing expressions, which means we don't
know whose expression's function call takes place first.
In any case, a C90 compiler with the above support as an extension to
C90 can specify rigid sequencing behavior.
Post by Rosario19
void fn(int a)
{
int x[3];
x[0]=foo(); x[1]=bar(); x[2]=a;
this would be ok with every C compiler
One problem is, if you're doing this because your compiler is C90, you
also have to do something about all declarations which follow the int
x[3], since they cannot occur after a statement. You can add another
level of block nesting for them, or whatever.
int fn(int a)
{ int x[3];
int b=9;
x[0]=foo(); x[1]=bar(); x[2]=a;
return x[0]==0||a==b;
}

i don't see onother level of block nesting
Post by Kaz Kylheku
Initialization is preferable to leaving an object uninitialized and
assigning. There is a scope where the name is visible, but the object
is not initialized, inviting code to be inserted there which tries
to use it.
If I needed foo to be called before bar, I would still rather do
int f = foo();
int b = bar();
int x[3] = { f, b, a };
ok

Paul Edwards
2024-11-26 02:48:10 UTC
Permalink
Post by Kaz Kylheku
Post by Paul Edwards
Post by Janis Papanagnou
Post by Paul Edwards
I have been after a public domain C compiler for decades.
[...] I'm after C90 written in C90.
Why formulate the latter condition if you can bootstrap?
(Did you mean; written in a "C" not more recent than C90?)
Yes - written in C90 so that it can be maintained with
just knowledge of C90.
And also written in C90 so that it is written naturally
for a C90 programmer, not using a subset of C90
But, do yourself a favor and, have it as an extension to allow
void fn(int a)
{
int x[3] = { foo(), bar(), a }; /* not in C90 */
(You don't have to use it in the source code of the thing,
so it can be boostrapped by other C90 compilers without
the extension.)
Also, pin down the truncation behavior of / and % to match C99.
(Though, again, without relying on that in the C90 source
of the compiler.)
Define the behavior of a [0] array at the end of a struct,
so that the C90 struct hack is "blessed" in your implementation.
The C99 flexible array member cannot be used, after all.
You can have it so that [0] has the same semantics as C99 []
in that role.
I don't have any such code in PDOS, so it is very unlikely I
will be doing anything along those lines.

My goal is to get the existing PDOS source code to compile.
Plus the tools, including the new C compiler. So that there is
a completely public domain infrastructure that can be used as
a base to produce all of the above, and more.

If I was to enhance it to do the above to meet some market
need, it is more likely that it would be a commercial derivative
rather than being public domain.

BFN. Paul.
Lynn McGuire
2024-11-25 21:55:09 UTC
Permalink
Post by Paul Edwards
Hi.
I have been after a public domain C compiler for decades.
None of them reach C90 compliance. SubC comes the
closest but was written without full use of C90, which
makes it difficult to read. I'm after C90 written in C90.
A number of people have tried, but they always seem
to fall short. One of those attempts is pdcc. The
preprocessor was done, but the attempt (by someone
else) to add C code generation was abandoned.
I decided to take a look at it, and it looks to me like
a significant amount of work has already been done.
Also, my scope is limited - I am only after enough
functionality to get my 80386 OS (PDOS) compiled,
and I don't mind short=int=long = 32 bits, I don't
mind not having float. I don't use bitfields.
Anyway, I have had some success in making enhancements
https://sourceforge.net/p/pdos/gitcode/ci/3356e623785e2c2e16c28c5bf8737e72df
d39e04/
But I don't really know what I'm doing (I do know some
of the theory - but this is a particular design).
E.g. now that I have managed to get a variable passed to
a function, I now want the address of that variable passed
to the function - ie I want to do &x instead of x - and I am
not sure whether to create a new ADDRESS type, or
whether it is part of VARREF or what - in the original
(incomplete) concept. Or CC_EXPR_AMPERSAND.
I am happy to do the actual coding work - I'm just looking
for some nudges in the right direction if anyone can assist.
Thanks. Paul.
Did you look at the Open Watcom compilers: C, C++, and F77 ?
https://openwatcom.org/

Open Watcom has many modes of compilation: 8086, 80286, 80386, etc.

Lynn
Keith Thompson
2024-11-25 22:10:05 UTC
Permalink
Post by Lynn McGuire
Post by Paul Edwards
I have been after a public domain C compiler for decades.
[...]
Post by Lynn McGuire
Did you look at the Open Watcom compilers: C, C++, and F77 ?
https://openwatcom.org/
Open Watcom has many modes of compilation: 8086, 80286, 80386, etc.
Open Watcom's compilers are not public domain, so they don't meet Paul's
(rather odd) requirements.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Lynn McGuire
2024-11-26 00:32:48 UTC
Permalink
Post by Keith Thompson
Post by Lynn McGuire
Post by Paul Edwards
I have been after a public domain C compiler for decades.
[...]
Post by Lynn McGuire
Did you look at the Open Watcom compilers: C, C++, and F77 ?
https://openwatcom.org/
Open Watcom has many modes of compilation: 8086, 80286, 80386, etc.
Open Watcom's compilers are not public domain, so they don't meet Paul's
(rather odd) requirements.
Are you sure about the public domain thing ? The license is here:
http://openwatcom.org/ftp/install/license.txt

I forgot to mention that the Open Watcom compilers are released for the
following platforms: DOS, Linux, OS/2, and Win32. A Win64 fork and port
is being worked on at:
https://open-watcom.github.io/

Lynn
Keith Thompson
2024-11-26 00:49:58 UTC
Permalink
Post by Lynn McGuire
Post by Keith Thompson
Post by Lynn McGuire
Post by Paul Edwards
I have been after a public domain C compiler for decades.
[...]
Post by Lynn McGuire
Did you look at the Open Watcom compilers: C, C++, and F77 ?
https://openwatcom.org/
Open Watcom has many modes of compilation: 8086, 80286, 80386, etc.
Open Watcom's compilers are not public domain, so they don't meet
Paul's (rather odd) requirements.
http://openwatcom.org/ftp/install/license.txt
Yes, I'm sure that that's a license that imposes some restrictions.
It's not public domain.

Just one example:

"""
You must retain and reproduce in all copies of Original Code the copyright
and other proprietary notices and disclaimers of Sybase as they appear in the
Original Code, and keep intact all notices in the Original Code that refer to
this License;
"""

Anything that's public domain has no copyright.

Why do you ask?

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
BGB
2024-11-26 19:22:01 UTC
Permalink
Post by Keith Thompson
Post by Lynn McGuire
Post by Keith Thompson
Post by Lynn McGuire
Post by Paul Edwards
I have been after a public domain C compiler for decades.
[...]
Post by Lynn McGuire
Did you look at the Open Watcom compilers: C, C++, and F77 ?
https://openwatcom.org/
Open Watcom has many modes of compilation: 8086, 80286, 80386, etc.
Open Watcom's compilers are not public domain, so they don't meet
Paul's (rather odd) requirements.
http://openwatcom.org/ftp/install/license.txt
Yes, I'm sure that that's a license that imposes some restrictions.
It's not public domain.
Skims license...

Yeah, I wouldn't touch that one with a stick...
Post by Keith Thompson
"""
You must retain and reproduce in all copies of Original Code the copyright
and other proprietary notices and disclaimers of Sybase as they appear in the
Original Code, and keep intact all notices in the Original Code that refer to
this License;
"""
Anything that's public domain has no copyright.
This is one of the "less bad" restrictions in there (it is merely, what
most other licenses require).

It isn't really a FOSS license at all.
Post by Keith Thompson
Why do you ask?
[...]
Looking around, it seems my idea for a "MIT Minus" license already
exists in two different variants:
MIT No Attribution
BSD Zero Clause

Which do basically let people do whatever with the code, while still
providing the protection of a no warranty clause.


I guess it is possible I could consider moving to "MIT No Attribution"
for some of my stuff if the normal MIT (Expat) license is seen as asking
too much.

But, possibly, oh the terror, some of my past projects had used the LGPL...


My projects still do bundle some GPL code, but it is "OK" as I don't use
any of the GPL code in the non GPL parts of the project (it mostly
applies to things like the Doom and Quake engines and similar).

Though, they are lacking the WAD and PAK files, as redistributing them
is questionable. In theory, the Shareware files could be distributed in
an unmodified form, but safer is to omit them.


For my own uses, I was largely using WAD files that were converted to
WAD2 mostly so that they could use data compression (mostly LZ4 and/or
RP2 in this case; packing tool using whichever gave a smaller file).
Functionally, this doesn't change much, but is mostly to make loading
times faster (as with program loading, loading in Doom is often IO bound).

Though, Doom does spend a long time doing texture loading during startup
with a primarily CPU bound task. Namely, in the WAD file, textures exist
as "patch" lumps, and at load time the Doom engine glues them together
to build the final wall textures. Not an obvious way to speed it up, can
note that id's original response was to display some square brackets and
fill it in with a series of dots (eg: "[....... ]").

The engines retain the ability to load the original IWAD format as well.


Quake didn't originally use or support WAD2 with compressed lumps, but
it was provided for in the format. Originally, it specified uncompressed
lumps and LZSS (IIRC). I used LZ4 and RP2 instead, as they are both
faster and compress better than LZSS (but, back in the 90s, LZSS was
fairly popular; taking on a lot of the roles one might now find LZ4 or
similar taking on).

Some of this stuff could have theoretically been avoided with filesystem
level compression, but this isn't really a thing in FAT32.


Sadly, there is no Wolf3D port, as while I had partially ported Wolf3D,
what was lacking was a version of the Wolf3D engine under friendly
license terms. I did experimentally also recreate Wolf3D in my modified
port of the ROTT engine (also GPL), but couldn't distribute it with data
files as, again, I can't legally distribute these (and the original DOS
files wont work, things would need to be regenerated using the data
files from the Wold3D iOS port, which while available were also not
under friendly license terms).

Did partially make a set of "placeholder" assets partly derived from
parts scavenged from FreeDoom, but it is pretty far from recapturing the
Wolf3D experience (and, likely, someone would also need to make a
"legally distinct" stand-in for the Wolf3D levels).

Potentially, I could make a tool to allow expressing maps as ASCII text
files, likely as 64x64 grids of characters, with data encoded in the
characters. Simple case might be to use a single grid of characters with
a translation key (Wolf3D had used 3 planes internally, but a lot of
cases are mutually exclusive, like one isn't generally going to be
putting entities inside walls, ...).
Loading...