technology discussion → does the world need a "new" C ?

It’s called “Rust”.

BGB

2024-07-05 07:30:34 UTC

Post by Lawrence D'Oliveiro
It’s called “Rust”.

If anything, I suspect may make sense to go a different direction:
Not to a bigger language, but to a more narrowly defined language.

Basically, to try to distill what C does well, keeping its core essence
intact.

Goal would be to make it easier to get more consistent behavior across
implementations, and also to make it simpler to implement (vs an actual
C compiler); with a sub-goal to allow for implementing a compiler within
a small memory footprint (as would be possible for K&R or C89).

Say for example:
Integer type sizes are defined;
Nominally, integers are:
Twos complement;
Little endian;
Wrap on overflow.
Dropped features:
VLAs
Multidimensional arrays (*1)
Bitfields
...
Simplified declaration syntax (*2):
{Modifier|Attribute}* TypeName Declarator

*1: While not exactly that rare, and can be useful, it is debatable if
they add enough to really justify their complexity and relative semantic
fragility. If using pointers, one almost invariably needs to fall back
to doing "arr[y*N+x]" or similar anyways, so it is arguable that it
could make sense to drop them and have people always do their
multidimensional indexing manually.

Note that multidimensional indexing via multiple levels of pointer
indirection would not be effected by this.

*2: This can be used to both make parsing easier, and also make parsing
faster, as it can eliminate needing to lookup symbols to see if they
were a previously defined typedef or similar (which in effect in C ends
up being needed for pretty much every non-keyword identifier encountered
here; ideally you don't want to need to do it at all in the parsing stage).

Say, integer types:
sbyte, byte/ubyte: 8 bits
short, ushort: 16 bits
int, uint: 32 bits
long, ulong: 64 bits
intNN/uintNN: Explicit sized types, may map to the above.

Unclear:
char: 8-bit, unsigned
Could go either way, signed is more traditional,
but unsigned makes more logical sense here.
wchar: 16-bit, unsigned

Arrays:
Basic array types are always one dimensional;
Type[] will alias with "Type*" in most contexts.

Would likely drop C's function pointer syntax, likely in favor of, say:
typedef int fooFunc_t(); //declare a function type
fooFunc_t *fptr; //actual function pointer

Similarly, structs may not be declared at the point of use, but only as
types.

struct FooStruct {
int x, y;
}

FooStruct *fs; //pointer to FooStruct

Where, declaring a struct will also behave as-if it had also been
implicitly typedef'ed with the same name.

Struct semantics would be tweaked:
A by-value struct will behave as if it were pass-by-reference with
copy-on-assignment (as is typically the case when structs are used as
lvalues).

Would make some other restrictions:
Variable declarations are only allowed in the top-level block of a
function, and (regardless of location) will always behave as if they
were declared at the top of the function.
Initial values in a declaration may only be a constant expression or a
reference to a global declaration (including for local variables).

Expressions like sizeof() and offsetof() will not (necessarily) be seen
as constants (except if the value may be trivially determined). Note
that these will also be only valid for type names, not for the type of
an arbitrary expression.

Note that only certain expressions (such as variable assignments or
function calls) will be allowed in statement context (most other
expressions would not be allowed).

...

So, for example:
int Foo(int x, int y)
{
BAD:
int z=x/y; //not allowed, not constant
OK:
int z;
z=x/y;
if(z>10)
{
BAD:
int w; //declaration is not allowed here
z+=3; //OK
z*4; //BAD, expression not allowed as statement
}
}

Maybe:
Pointers may be allowed to be bounds-checked;
But, casts between pointer and integer types will be restricted.
An implementation will be allowed to disallow this.
Granted, this would disallow traditional forms of pointer tagging.

An implementation may instead provide optional intrinsics for working
with pointer tagging (in place of raw casts and bit-twiddling). Though,
this would mean one would either need a runtime that is aware of
type-tagging, or allow for implementations which forbid pointer tagging
entirely (likely requiring a fallback to other strategies, such as boxed
values).

Though, in this case, requiring the runtime to be a little more clever
is an easier sell than trying to deal with it in the compiler.

...

Will also add a restriction to break and continue:
They will only be valid within the body of a loop, or within an if/else
block within the loop. Nearly any other constructs (such as another loop
or a "switch()" will entirely hide the visibility of the outer break or
continue).

...

Possible functional difference:
Will use explicit module importing rather than headers.

Modules will be parsed top-to-bottom, with the ability to see into any
imported modules. Each module will only be exported once, with a logical
declaration order based on a DAG walk.
Preprocessing defines/macros would not carry across module boundaries.

Modules would function in a way partway between headers and static
libraries, likely being built in advance, but pulled into the compiler
stage (likely with a manifest defining any types or global declarations
within the module). Ideally, the goal would be to allow for
implementation both with separate compilation (such as COFF or ELF
objects; where likely the object code and manifest would exist
separately) or with a bytecode IR (which would likely combine both into
a single entity). Ideally, it should be possible to determine module
dependency order without fully invoking the compiler (say, such that the
logic for compiling each module, and scheduling the compilation of
modules, can operate independently).

But, admittedly, I have had good results using a stack-machine IR in my
compilers for things like static libraries, so leveraging similar
technology could still make sense.

Though, would want to do a few things differently from my current IR to
be able to reduce memory footprint; as my current IR was designed in
such a way that it is effectively necessary to load in the whole program
before code-generation can be done. Ideally one would want the compiler
to be able to read-in the IR as-needed and discard the parts it is
already done with (but, existing efforts to redesign the IR stage here
have tended to fizzle; would effectively need to redesign the compiler
backend to be able to make effective use of it).

For a new compiler, could make sense to try to "do it right this time"
(though, not too much of an issue if running the compiler on a modern
PC, as they are "not exactly RAM constrained" in this area; so reading
in and decoding the IR and symbol tables for an entire executable image
at the same time, is not too much of an issue).

Though, one likely option here would be to use a similar system to
"package" and "import" in Java (in strong contrast to how these keywords
were used in ActionScript, which would have more in common with
"namespace" and "using" in C++ and C#).

Potentially, a largish program could use something akin to a Unix-style
filesystem model for managing package importing (with libraries being
effectively mounted within a local VFS, potentially also allowing for
"symlinks" within the package space).

Note that within the modules, it would still behave as-if it were a C
style global namespace (it need not necessarily introduce any additional
scoping semantics, unlike Java).

Scoping semantics could help, but would add some additional complexity
to the compiler. If supported, one option would be to support these also
using "namespace" and "using" keywords (likely used in a similar way to
their usage in C#).

If pulled of well, such a module system could be both faster and require
less memory use in the compiler if compared with headers (where in many
programs, the amount of time and memory spent processing headers
significantly exceeds that spent processing the actual code within each
translation unit).

There is the typical workaround of include'ing everything into a single
big translation unit (AKA: "unity builds"), but while this is fast, it
can still eat a significant amount of memory in the compiler. Better
might be to provide an alternative both to textual header inclusion and
unity builds, while ideally allowing for fast and low-overhead
compilation (reading in a binary manifest of declared types and symbols
likely being a bit cheaper).

Even with unity builds, build times can still get annoying for bigger
programs. And here, I am talking like 20 seconds to rebuild a 250kLOC
program. Granted, even if parsing is fast, this still leaves the
challenge of fast/efficient machine-code generation.

( Cough, and not like the horribly slow compile times one sees trying to
rebuild LLVM... )

...

It need not be directly backwards compatible with C, but ideally should
be close enough that "copy paste translation" shouldn't be too much
effort (IOW: There shouldn't be any drastic changes in terms of syntax
nor significant changes to commonly used features).

But, I don't know, just some idle thoughts at the moment.

Or basically, sort of like a hybridized common subset of C and a C-like
subset of "unsafe" C# (IOW: mo OOP nor Generics, nor a garbage
collector, ...).

Lawrence D'Oliveiro

2024-07-05 07:52:09 UTC

Post by Lawrence D'Oliveiro
It’s called “Rust”.

Not to a bigger language, but to a more narrowly defined language.

I think we already have enough of C. The only way forward is outward--
rethink the whole problem from a different angle.

yeti

2024-07-05 08:30:00 UTC

Post by Lawrence D'Oliveiro
I think we already have enough of C.

No.

--
Trust me, I know what I'm doing...

Keith Thompson

2024-07-05 08:09:37 UTC

Post by Lawrence D'Oliveiro
It’s called “Rust”.

Not to a bigger language, but to a more narrowly defined language.
Basically, to try to distill what C does well, keeping its core
essence intact.
Goal would be to make it easier to get more consistent behavior across
implementations, and also to make it simpler to implement (vs an
actual C compiler); with a sub-goal to allow for implementing a
compiler within a small memory footprint (as would be possible for K&R
or C89).
Integer type sizes are defined;
Twos complement;
Little endian;
Wrap on overflow.
VLAs
Multidimensional arrays (*1)
Bitfields
...
{Modifier|Attribute}* TypeName Declarator
*1: While not exactly that rare, and can be useful, it is debatable if
they add enough to really justify their complexity and relative
semantic fragility. If using pointers, one almost invariably needs to
fall back to doing "arr[y*N+x]" or similar anyways, so it is arguable
that it could make sense to drop them and have people always do their
multidimensional indexing manually.
Note that multidimensional indexing via multiple levels of pointer
indirection would not be effected by this.

[...]

Multidimensional arrays in C are not a distinct language feature.
They are simply arrays of arrays, and all operations on them follow
from operations on ordinary arrays and pointers.

Are you proposing (in this hypothetical new language) to add
an arbitrary restriction, so that arrays can have elements of
arithmetic, pointer, struct, etc. type, but not of array type?
I'm not sure I see the point.

I personally would hope that this language would *not* inherit C's
odd treatment of arrays and pointers. If so, and if it supports
multidimensional arrays, they'd have to be defined differently than
the way they're defined in C.

(Ada, for example, has both arrays of arrays and multidimensional
arrays as distinct language features, using a(i)(j) to index the
former and a(i,j) to index the latter.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Lawrence D'Oliveiro

2024-07-05 08:25:10 UTC

(Ada, for example, has both arrays of arrays and multidimensional arrays
as distinct language features, using a(i)(j) to index the former and
a(i,j) to index the latter.)

Algol 68 had it before that. It also had a “rowing” coercion, to convert a
value of type T to type “row of T” (as a single-element array, of course).

BGB

2024-07-05 09:19:07 UTC

Post by Lawrence D'Oliveiro
It’s called “Rust”.

As-is, the multidimensional arrays require the compiler to realize that
it needs to multiply one index by the product of all following indices.

So, say:
int a[4][4];
int j, k, l;
l=a[j][k];

Essentially needs to be internally translated to, say:
l=a[j*4+k];

Eliminating multidimensional arrays eliminates the need for this
translation logic, and the need to be able to represent this case in the
typesystem handling logic (which is, as I see it, in some ways very
different from what one needs for a struct).

While eliminating structs could also simplify things; structs also tend
to be a lot more useful.

At least in BGBCC, internally types are represented as several logical
fields:
array size (usually 0 for non-arrays);
(sometimes) a field holding an additional "modifier mode";
pointer indirection level;
ID number for primitive type or complex type/structure.

So, for part of the range, it is understood to represent one of the
primitive types. Else it identifies a structure, which identifies the
contents of a struct or function pointer or similar.

One can represent a multidimensional array by daisy-chaining the types
(via a "typedef" entry whose sole purpose is to hold another type), but
this is wonky. Though, C contains other occasionally used edge cases
which require this.

Though, within the serialized IR stage, there is an ASCII notation, say:
i //int
A4i //int[4]
A4,4i //int[4][4]
Pi //int*
PPi //int**
XSomeStruct; //SomeStruct, given by name
X123 //SomeStruct (given by index number)
A4X123 //SomeStruct[4]
P(xx)y //unsigned long long (*)(long long, long long);
...
Where:
a..z: Various primitive types.
Cx, Gx: More primitive types.
Most capital letters represent structural features of the type.

Versus the compiler internally using a bit-packed format.

In the normal bit-packed form (with multiple sub-layouts existing), the
various fields have a limited range. Going bigger turns into a case
where the type is instead interpreted as an index into a table of
"overflowed" types, which are represented as structures.

This is less desirable, as an entry in a table holding a dedicated
struct is less desirable than a packed bit-pattern (but, logically, both
can represent the same general information).

Post by Keith Thompson
I personally would hope that this language would *not* inherit C's
odd treatment of arrays and pointers. If so, and if it supports
multidimensional arrays, they'd have to be defined differently than
the way they're defined in C.

The idea in this case was to make it so that:
int[16];
Can be functionally identical to:
int*
As far as most of the compiler's type-system handling is concerned. In
this case, one only needs to care about the size when reserving space
for the array.

One other simplification is also to make is so that:
SomeStruct tfoo, tbar;
And:
SomeStruct *pfoo, *pbar;

Are treated as equivalent at the lower levels, with the primary
difference that:
pbar=pfoo;
Will merely assign the pointer, whereas:
tbar=tfoo;
Needs to reserve memory for the structs in the stack-frame or similar,
and then effectively turns the assignment internally into:
memcpy(&tbar, &tfoo, sizeof(SomeStruct));

The ABI in my case also typically passes (and returns) structs by
reference (thus, at the low-level, the compiler and ABI can treat them
as pointers). Though, struct return is a little wonky as one effectively
passes in a hidden argument for where the structure is to be returned
into (and trying to call a function that returns a struct by value with
a missing prototype will very possibly result in a crash).

Though, functionally it is not that much different in this sense from
either the Win64 x64 ABI, or the WinCE SH-4 ABI (from which part the
core of my ABI design was derived). Though, can note that both the Win64
x64 and WinCE SH-4 ABI are "oddly similar" in many areas.

Granted, arrays of structs still represent some hassle, as one needs to
multiply the index by the size of the struct when offsetting the pointer.

This is in strong contrast, say, to the RISC-V ABI, which directly
copies the structure contents around in memory to pass and return them.

Well, and needs logic much more complex than:
Does the struct fit into an integer type?
Pass/return contents as the integer type.
Else, pass/return as a pointer.

But, to preserve lvalue semantics, one may still need to pass structs by
reference within a function (even if they are small enough to fit into
an integer register).

Post by Keith Thompson
(Ada, for example, has both arrays of arrays and multidimensional
arrays as distinct language features, using a(i)(j) to index the
former and a(i,j) to index the latter.)

Also kinda similar with C#.

But, I figured possibly going in a direction more like Java.

...

Ben Bacarisse

2024-07-05 11:20:29 UTC

Post by Lawrence D'Oliveiro
It’s called “Rust”.

As-is, the multidimensional arrays require the compiler to realize that it
needs to multiply one index by the product of all following indices.
int a[4][4];
int j, k, l;
l=a[j][k];
l=a[j*4+k];
Eliminating multidimensional arrays eliminates the need for this
translation logic, and the need to be able to represent this case in the
typesystem handling logic (which is, as I see it, in some ways very
different from what one needs for a struct).

How can it be eliminated? All your plan does is force me to wrap the
inner array in a struct in order to get anything like the convenience of
the above:

struct a { int a[4]; };

struct a a[4];
l = a[j].a[k];

Most compilers with generate the same arithmetic (indeed exactly the
same code) for this as for the more convenient from that you don't like.

All you can do to eliminate this code generation is to make it so hard
to re-write the convenient code you dislike. (And yes, I used the same
name all over the place because you are forcing me to.)

While eliminating structs could also simplify things; structs also tend to
be a lot more useful.

Indeed. And I'd have to use them for this!

--
Ben.

BGB

2024-07-05 13:28:19 UTC

Post by Lawrence D'Oliveiro
It’s called “Rust”.

How can it be eliminated? All your plan does is force me to wrap the
inner array in a struct in order to get anything like the convenience of
struct a { int a[4]; };
struct a a[4];
l = a[j].a[k];
Most compilers with generate the same arithmetic (indeed exactly the
same code) for this as for the more convenient from that you don't like.
All you can do to eliminate this code generation is to make it so hard
to re-write the convenient code you dislike. (And yes, I used the same
name all over the place because you are forcing me to.)

It is not so much a dislike of multidimensional arrays as a concept, but
rather, the hair they add to the compiler and typesystem.

Granted, one would still have other complex types, like structs and
function pointers, so potentially the complexity savings would be limited.

While eliminating structs could also simplify things; structs also tend to
be a lot more useful.

Indeed. And I'd have to use them for this!

Errm, the strategy I would assume is, as noted:
int a[4][4];
...
l=a[j][k];
Becomes:
int a[16];
...
l=a[j*4+k];

Much like what one would typically need to do anyways if the array was
heap-allocated.

Though, the major goal for this sort of thing is mostly to try to limit
the complexity required to write a compiler (as opposed to programmer
convenience).

Like, for example, I had tried (but failed) to write a usable C compiler
in less than 30k lines (and also ideally needing less than 4MB of RAM).
But, if the language design is simplified some, this might be a little
closer. Might still be doable, but a C compiler in 50-75k lines is much
less impressive.

Though, admittedly, my current C compiler (used for my project)
currently weighs in at around 250k lines, and much of the code
complexity is in the backend code generator. One would need to have some
fairly weak code-generation to get this down (likely translating the IR
to 3AC form, and then doing a pretty close to 1:1 mapping between 3AC
operations and output code, possibly with no register allocator or other
optimizations).

But, even with this massive 250k line compiler, I am still running into
edge cases which don't work correctly (as recently discovered after
resuming the effort to port Quake3 to my ISA).

Granted, this is along with needing to fix a bunch of bugs in my DLL
loading and similar (Quake3 being a more complex engine than Doom or
Quake1; in this case effectively requiring virtual memory and a DLL
loader). And, discovering various other bugs in the runtime libraries
and other things in the process (and, Quake3 builds and sorta works, but
it seems the BSP doesn't load correctly and the player just sort of
falls out the bottom of the incorrectly loaded map).

As-is, also, for the main Quake3 EXE, my compiler ends up using about
270MB of RAM, which is a bit steep...

A big chunk of the RAM in this case ends up being:
Memory needed for the parser ASTs;
Memory needed for the structures representing globals;
Memory needed for the IR stages (mostly the 3AC IR, *1);

...

*1: Where, as-is, pretty much every operation in the program ends up
being represented as a struct.

My compiler generally does code generation for an entire executable
image at the same time. The AST is a lot bulkier, but generally the AST
only needs enough memory to parse a single translation unit (and the AST
nodes are reused between translation units).

Though, yes, granted, porting something these games would still require
a proper C compiler, so alas...

Ben Bacarisse

2024-07-05 22:40:45 UTC

Post by Lawrence D'Oliveiro
It’s called “Rust”.

It is not so much a dislike of multidimensional arrays as a concept, but
rather, the hair they add to the compiler and typesystem.
Granted, one would still have other complex types, like structs and
function pointers, so potentially the complexity savings would be limited.

Then the "hair" is still there.

While eliminating structs could also simplify things; structs also tend to
be a lot more useful.

Indeed. And I'd have to use them for this!

int a[4][4];
...
l=a[j][k];
int a[16];
...
l=a[j*4+k];

That's what you want to force me to write, but I can use and array of
arrays despite your arbitrary ban on them by simply putting the array in
a struct.

Anyway, sine C exists, I won't be forced to use your proposed
alternative.

Post by BGB
Much like what one would typically need to do anyways if the array was
heap-allocated.

Only if you forbid variably modified types (note not VLAs).

int (*a)[n] = malloc(m * sizeof *a);

allows me to index a[i][j] conveniently.

Post by BGB
Though, the major goal for this sort of thing is mostly to try to limit the
complexity required to write a compiler (as opposed to programmer
convenience).

Ah.

Post by BGB
Like, for example, I had tried (but failed) to write a usable C compiler in
less than 30k lines (and also ideally needing less than 4MB of RAM). But,
if the language design is simplified some, this might be a little
closer. Might still be doable, but a C compiler in 50-75k lines is much
less impressive.

Why do you think this matters to other people?

--
Ben.

BGB

2024-07-06 03:30:53 UTC

Post by Lawrence D'Oliveiro
It’s called “Rust”.

It is not so much a dislike of multidimensional arrays as a concept, but
rather, the hair they add to the compiler and typesystem.
Granted, one would still have other complex types, like structs and
function pointers, so potentially the complexity savings would be limited.

Then the "hair" is still there.

Possibly true.

I am now left thinking maybe the issue was not that the multidimensional
arrays exist, but rather, how I had approached implementing them...

Ironically, I probably should have leaned harder into the "daisy chained
types" interpretation, rather than treating it as an undesirable
implementation tradeoff.

As noted, my conceptual approach to types was a little different; and,
implicitly, more directly mimics how types behave in the Java VM (but
with C like type semantics being expressed on top of this).

Though, rather than I/L/F/D/A (Int/Long/Float/Double/Address) as the
fundamental types, it is more like I/L/D/X/A/V
(Int/Long/Double/XLong/Address/Variant).

With arrays and by-value structs being treated like subtypes of the
Address type, just with pre reserved storage being created in the prolog.

Variant also exists, holding any types that are nominally represented as
tagged references (and type-checked at runtime; though using all this is
non-standard in C).

The XLong category mostly holds types with a 128-bit format.

Well, and with a C compiler that originally started out as an
interpreter for a JavaScript clone (before later taking influence from
the JVM and moving towards a static-typed core; with a fork of this VM
having been modified to be able to parse C, ...).

Design isn't entirely free from hair.

The backend did end up with a fair bit of complexity as well, as some
initial ideas turned out to not have been so good in retrospect.

While eliminating structs could also simplify things; structs also tend to
be a lot more useful.

Indeed. And I'd have to use them for this!

int a[4][4];
...
l=a[j][k];
int a[16];
...
l=a[j*4+k];

That's what you want to force me to write, but I can use and array of
arrays despite your arbitrary ban on them by simply putting the array in
a struct.
Anyway, sine C exists, I won't be forced to use your proposed
alternative.

IN most contexts, I don't really see how a struct is preferable to a
multiply, but either way...

Whole language design is still a hypothetical at this point anyways (and
my actual compiler does support multidimensional arrays).

Post by BGB
Much like what one would typically need to do anyways if the array was
heap-allocated.

Only if you forbid variably modified types (note not VLAs).
int (*a)[n] = malloc(m * sizeof *a);
allows me to index a[i][j] conveniently.

This is another one of the edge cases that I would likely omit from such
a language (rarely used in my experience).

But, as noted elsewhere, sizeof would also likely no longer be allowed
for expressions.

Post by BGB
Though, the major goal for this sort of thing is mostly to try to limit the
complexity required to write a compiler (as opposed to programmer
convenience).

Ah.

Why do you think this matters to other people?

As-is, C compilers tend to be a relative pain to implement.

A simplified language would be less of a pain, so people can more easily
throw together their own compiler as-needed, rather than have an
implicit assumption of "one compiler to rule them all".

But, C is at least a little more amendable to people rolling their own
compilers, than, say, a language like C++ (where hand-rolling a
full-featured C++ compiler seems likely out of reach for a single
developer project).

But, say, for example, you want to be able to write a compiler that runs
on a system with relatively limited RAM and possibly no MMU.

Granted, there is also MicroPython, but... blarg...

At present, I was left with inertia from my current compiler:
Given I am pretty much exclusively cross-compiling from a PC, the
incentive to invest effort on a smaller compiler is lacking.

Technically, I might also want a compiler that can deal with GLSL.
Using my existing compiler to static compile shader programs would be
possible, but tacky and wouldn't mesh well with the OpenGL API (which
generally expects shaders to be loaded in plain text form, rather than,
say, as a statically-compiled part of the program, or as COFF objects or
DLLs or similar).

David Brown

2024-07-06 17:01:54 UTC

Post by BGB
Ironically, I probably should have leaned harder into the "daisy chained
types" interpretation, rather than treating it as an undesirable
implementation tradeoff.

I once had to use a very limited C compiler that supported arrays, and
structs, but not arrays of structs or structs containing arrays. It was
truly a PITA. A language that has such inconvenient limitations would
be unusable for me - I'd rather go back to using BASIC on a ZX Spectrum.

BGB

2024-07-06 18:47:43 UTC

Post by BGB
Ironically, I probably should have leaned harder into the "daisy
chained types" interpretation, rather than treating it as an
undesirable implementation tradeoff.

I would not assume going quite that far.

Ben Bacarisse

2024-07-06 22:41:32 UTC

While eliminating structs could also simplify things; structs also tend to
be a lot more useful.

Indeed. And I'd have to use them for this!

int a[4][4];
...
l=a[j][k];
int a[16];
...
l=a[j*4+k];

That's what you want to force me to write, but I can use and array of
arrays despite your arbitrary ban on them by simply putting the array in
a struct.

...

Post by BGB
IN most contexts, I don't really see how a struct is preferable to a
multiply, but either way...

And I can't see how an array of arrays is harder for your compiler than
an array of structs. C's indexing requires the compiler to know that
size of the items pointed to.

I suspect that there is something amiss with your design if you are
considering this limiting in order to simplify the compiler. A simple
compiler should not care what kind of thing p points to in

p[i]

only what size of object p points to.

Post by BGB
Whole language design is still a hypothetical at this point anyways (and my
actual compiler does support multidimensional arrays).

Ah. I think that when you've written most of it, you will see that
ruling out arrays of arrays will have not simplifying effect.

--
Ben.

Keith Thompson

2024-07-06 23:42:41 UTC

[...]

Post by BGB
Whole language design is still a hypothetical at this point anyways (and my
actual compiler does support multidimensional arrays).

Ah. I think that when you've written most of it, you will see that
ruling out arrays of arrays will have not simplifying effect.

Indeed. I believe (based on no direct experience) that you could write
a fully conforming C compiler without even thinking about
multidimensional arrays, and if you get the array code right,
multidimensional arrays will just work.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

2024-07-07 03:04:17 UTC

Post by BGB
Whole language design is still a hypothetical at this point anyways (and my
actual compiler does support multidimensional arrays).

Ah. I think that when you've written most of it, you will see that
ruling out arrays of arrays will have not simplifying effect.

There are some corner cases that seem to require some thinking.
For example,

extern int x[];
extern int y[][10];

are both alright, but

extern int z[10][];

is not. There are similar cases with function parameters, and
maybe with VLAs. Also, possibly, flexible array members (do I
have that name right?).

Not saying the amount of thinking needed is overpowering, but
it does seem to be greater than zero.

BGB

2024-07-07 04:28:36 UTC

While eliminating structs could also simplify things; structs also tend to
be a lot more useful.

Indeed. And I'd have to use them for this!

int a[4][4];
...
l=a[j][k];
int a[16];
...
l=a[j*4+k];

That's what you want to force me to write, but I can use and array of
arrays despite your arbitrary ban on them by simply putting the array in
a struct.

...

Post by BGB
IN most contexts, I don't really see how a struct is preferable to a
multiply, but either way...

And I can't see how an array of arrays is harder for your compiler than
an array of structs. C's indexing requires the compiler to know that
size of the items pointed to.
I suspect that there is something amiss with your design if you are
considering this limiting in order to simplify the compiler. A simple
compiler should not care what kind of thing p points to in
p[i]
only what size of object p points to.

When I designed the compiler code, the initial approach for internal
type layout was to bit-pack it into 32 bits, say (a lot of this is from
memory, so maybe wrong):
Basic1
(31:28): Layout of Type (0=Basic)
(27:16): Array Size
(15:12): Pointer Level Count
(11: 0): Base Type
Basic2
(31:28): Layout of Type (1=Basic2)
(27: 8): Array Size
( 7: 6): Pointer Level Count
( 5: 0): Base Type
Basic3
(31:28): Layout of Type (2=Basic3)
(27:24): Array Size
(23:20): Pointer Level Count
(19: 0): Base Type
Overflow
(31:28): Layout of Type (3=Overflow)
(27:24): MBZ
(23: 0): Index into Type-Overflow Table
And, a few other cases...

Basic1 was the default, able to express arrays from 0..4095 elements,
with 0..7 levels of pointer indirection, and 0..4095 for the base type.
Where, 0=T, 1=T*, 2=T**, ..., 7=T*******
8=T[], 9=T[][], A=T*[], B=T*[*], C=&T, ...

Note that at present, there is no way to express more than 7 levels of
pointer indirection, but this issue hasn't come up in practice.

Basic2 is for big arrays of a primitive type, 0..3 pointer levels. May
only encode low-numbered primitive types.

Basic3 is the opposite, able to express a wider range of types, but only
small arrays.

There is another variant of Basic1 that splits the Array Size field in
half, with a smaller array limit, but able to encode
const/volatile/restrict/etc (but only in certain combinations).

Overflow would be used if the type couldn't fit into one of the above,
the type is then expressed in a table. It is avoided when possible, as
overflow entry tables are comparably expensive.

Type Numbering space:
0.. 63: Primitive Types, Higher priority
64.. 255: Primitive Types, Lower priority
256 .. 4095: Complex Types, Index into Literals Table
4096..1048575: Complex Types, Index into Literals Table

Small numbered base types were higher priority:
00=Int, 01=Long(64bit), 02=Float, 03=Double,
04=Ptr(void*), 05=Void, 06=Struct(Abstract), 07=NativeLong
08=SByte, 09=UByte, 0A=Short, 0B=UShort,
0C=UInt, 0D=ULong, 0E=UNativeLong, 0F=ImplicitInt
Followed by, say:
10=Int128, 11=UInt128, 12=Float128/LongDouble, 13=Float16,
...

Where, Type Number 256 would map to index 0 in the Literal Table.

An index into the literals table will generally be used to encode a
Struct or Function Pointer or similar. This table will hold a structure
describing the fields of a struct, or the arguments and return value of
a function pointer (in my BS2 language, it may also define class
members, a superclass, implemented interfaces, ...).

It could also be used to encode another type, which was needed for
things like multidimensional arrays and some other complex types. But,
this seemed like an ugly hack... (And was at odds with how I imagined
types working, but seemed like a technical necessity).

These would often be packed inside of a 64-bit register/variable descriptor.

Local Variables:
(63:56): Descriptor Type
(55:24): Variable Type
(23:12): Sequence Number
(11: 0): Identity Number
Global Variables:
(63:56): Descriptor Type
(55:24): Variable Type
(23: 0): Index into Global Table
Integer Literal:
(63:56): Descriptor Type
(55:32): Compact Type
(31: 0): Value
String Literal:
(63:56): Descriptor Type
(55:32): Compact Type
(31: 0): Offset into String Table

There were various other types, representing larger integer and floating
point types:
Long and Double literals, representing the value as 56 bits
Low 8 bits cut off for Double

An index into a table of raw 64-bit values (if it can't be expressed
directly as one of the other options).

Values for 128-bit types were expressed as an index pair into the table
of 64-bit values:
(63:56): Descriptor Type
(55:48): Primitive Type
(47:24): Index into Value Table (High 64 bits)
(23: 0): Index into Value Table (Low 64 bits)

One downside as-is, is that if a given variable is assigned more than
4096 times in a given function, it can no longer be given a unique ID.
Though uncommon, this is not entirely implausible (with sufficiently
large functions), and there isn't currently any good way to deal with
this (apart from raising a compiler error).

This can happen potentially in large functions. Taking away bits from
the base ID isn't good either, as functions pushing 1000+ local
variables aren't entirely implausible either (though, thus far, not
really seen any with much more than around 400 local variables, but
still...).

Can note though, that generally, there seems to be an upper limit of 18
to 23 for the maximum number of function arguments in the code I have
tested.

Sequence numbers don't currently apply to global variables, where only
one instance of a global variable can exist at a time and global
variables are always synchronized at the end of a basic block.

Thus far, not dealt with any programs that push the current hard limit
of 16M global declarations (at present, ~ 30-60k seems more typical).

Where, there would be multiple such fields in each 3AC op, say (copy
pasting):
struct BGBCC_CCXL_VirtOp_s {
byte opn; //opcode number
byte opr; //operator
byte prd; //predication mode
byte immty; //immediate type
byte tgt_mult; //branch target multiplier
byte llvl; //Loop Level
ccxl_type type; //destination type of operation
ccxl_type stype; //source type of operation
ccxl_register dst; //destination
ccxl_register srca; //first source operand
ccxl_register srcb; //second source operand
ccxl_register srcc; //third source operand
ccxl_register srcd; //fourth source operand
ccxl_value imm; //operation immediate (union)
};

These operations are held in arrays, with spans within this array used
to define the various basic-blocks within a function.

The code for dealing with a lot of this got rather hairy...

But, as noted, the 3AC IR only exists in memory.

In the IR, the operations are expressed as a sort of linear bytecode
operating on a virtual stack machine; with types expressed as ASCII strings.

Logically, the stack holds "ccxl_register" values, and the number of 3AC
ops is typically less than the number of stack-machine operations (those
which exist simply to shuffle registers around essentially disappear in
the translation process).

Say, for example:
LOAD x
LOAD y
ADD
STORE z
Might turn into a single 3AC operation.

Note that (unlike a language like Forth or similar) generally control
flow may not be used to convey varying sets of items on the stack.
Note that in most cases, typically the stack is empty during any
control-flow operations.

Post by BGB
Whole language design is still a hypothetical at this point anyways (and my
actual compiler does support multidimensional arrays).

Ah. I think that when you've written most of it, you will see that
ruling out arrays of arrays will have not simplifying effect.

I guess it is possible...

Well, and/or there might just be enough hair that something like this
will disappear into the noise.

bart

2024-07-07 11:35:17 UTC

While eliminating structs could also simplify things; structs also tend to
be a lot more useful.

Indeed. And I'd have to use them for this!

    int a[4][4];
    ...
    l=a[j][k];
    int a[16];
    ...
    l=a[j*4+k];

That's what you want to force me to write, but I can use and array of
arrays despite your arbitrary ban on them by simply putting the array in
a struct.

...

Post by BGB
IN most contexts, I don't really see how a struct is preferable to a
multiply, but either way...

And I can't see how an array of arrays is harder for your compiler than
an array of structs. C's indexing requires the compiler to know that
size of the items pointed to.
I suspect that there is something amiss with your design if you are
considering this limiting in order to simplify the compiler. A simple
compiler should not care what kind of thing p points to in
p[i]
only what size of object p points to.

When I designed the compiler code, the initial approach for internal
type layout was to bit-pack it into 32 bits, say (a lot of this is from
Basic1
(31:28): Layout of Type (0=Basic)
(27:16): Array Size
(15:12): Pointer Level Count
(11: 0): Base Type
Basic2
(31:28): Layout of Type (1=Basic2)
(27: 8): Array Size
( 7: 6): Pointer Level Count
( 5: 0): Base Type
Basic3
(31:28): Layout of Type (2=Basic3)
(27:24): Array Size
(23:20): Pointer Level Count
(19: 0): Base Type
Overflow
(31:28): Layout of Type (3=Overflow)
(27:24): MBZ
(23: 0): Index into Type-Overflow Table
And, a few other cases...
Basic1 was the default, able to express arrays from 0..4095 elements,
with 0..7 levels of pointer indirection, and 0..4095 for the base type.
Where, 0=T, 1=T*, 2=T**, ..., 7=T*******
8=T[], 9=T[][], A=T*[], B=T*[*], C=&T, ...

That's quite a level of detail. This looks more like an encoding you
might devise for a CPU instruction set. And yet you say elsewhere that
the whole compiler is 250K lines? So you're not trying to squeeze it
into a 16KB ROM or something.

Post by BGB
It could also be used to encode another type, which was needed for
things like multidimensional arrays and some other complex types. But,
this seemed like an ugly hack... (And was at odds with how I imagined
types working, but seemed like a technical necessity).

I can see how multi-dim arrays would be troublesome with such a scheme.

My own approach is to use a bunch of parallel arrays (a table of structs
didn't appeal). They are populated with the standard types at the lower
end, then new types can be added.

A C type is represented by an index into those arrays. One of those
arrays is this (not C):

[0:maxtype]int tttarget # All names start with 'tt'

For a pointer, it is contains the index of the target type. For arrays,
it is the index of the element type. There is no restriction on nesting,
other than 'maxtype' (set to some large number; one day I'll make the
limit flexible).

Post by BGB
One downside as-is, is that if a given variable is assigned more than
4096 times in a given function, it can no longer be given a unique ID.
Though uncommon, this is not entirely implausible (with sufficiently
large functions), and there isn't currently any good way to deal with
this (apart from raising a compiler error).

One of my compiler benchmarks is this program:

#include <stdio.h>

int main(void) {
int a, b=2, c=3, d=4;

a=b+c*d;

printf("%d\n", a);
}

Except that the 'a=b+c*d;' line is repeated 1M or 2M times, usually
within an included file.

Some compilers (including for other languages) find this challenging.

Post by BGB
But, as noted, the 3AC IR only exists in memory.
In the IR, the operations are expressed as a sort of linear bytecode
operating on a virtual stack machine; with types expressed as ASCII strings.
Logically, the stack holds "ccxl_register" values, and the number of 3AC
ops is typically less than the number of stack-machine operations (those
which exist simply to shuffle registers around essentially disappear in
the translation process).
LOAD x
LOAD y
ADD
STORE z
Might turn into a single 3AC operation.

I've used 3AC as an IL (several attempts), and also a stack VM. The 3AC
always looked so simple, but it also was a lot of hard work to turn it
into decent register-based code.

Now I used a stack-based IL which is far easier to work with.

I wouldn't start off with turning stack code into 3AC! (SSA is supposed
to be /the/ way to generate code in compilers; they can keep it.)

Those 4 stack instructions might turn into 3/4 machine instructions on
x64 (perhaps fewere is register-resident). But if converting them to one
3AC instruction, you then have to expand them again.

Perhaps 3AC suits another architecture better (I think ARM uses
3-register ops; x64 is 2-register ops).

BGB

2024-07-07 19:57:39 UTC

While eliminating structs could also simplify things; structs also tend to
be a lot more useful.

Indeed. And I'd have to use them for this!

    int a[4][4];
    ...
    l=a[j][k];
    int a[16];
    ...
    l=a[j*4+k];

That's what you want to force me to write, but I can use and array of
arrays despite your arbitrary ban on them by simply putting the array in
a struct.

...

Post by BGB
IN most contexts, I don't really see how a struct is preferable to a
multiply, but either way...

And I can't see how an array of arrays is harder for your compiler than
an array of structs. C's indexing requires the compiler to know that
size of the items pointed to.
I suspect that there is something amiss with your design if you are
considering this limiting in order to simplify the compiler. A simple
compiler should not care what kind of thing p points to in
p[i]
only what size of object p points to.

When I designed the compiler code, the initial approach for internal
type layout was to bit-pack it into 32 bits, say (a lot of this is
Basic1
   (31:28): Layout of Type (0=Basic)
   (27:16): Array Size
   (15:12): Pointer Level Count
   (11: 0): Base Type
Basic2
   (31:28): Layout of Type (1=Basic2)
   (27: 8): Array Size
   ( 7: 6): Pointer Level Count
   ( 5: 0): Base Type
Basic3
   (31:28): Layout of Type (2=Basic3)
   (27:24): Array Size
   (23:20): Pointer Level Count
   (19: 0): Base Type
Overflow
   (31:28): Layout of Type (3=Overflow)
   (27:24): MBZ
   (23: 0): Index into Type-Overflow Table
And, a few other cases...
Basic1 was the default, able to express arrays from 0..4095 elements,
with 0..7 levels of pointer indirection, and 0..4095 for the base type.
  Where, 0=T, 1=T*, 2=T**, ..., 7=T*******
    8=T[], 9=T[][], A=T*[], B=T*[*], C=&T, ...

These values are packed inside of register descriptors, of which there
are multiple in each 3AC op. And, there may be a whole lot of 3AC ops in
a program. Much bulkier would cause the compiler to use a lot more RAM.

Though, a likely RAM-saving option would be to redesign the compiler
such that only one function needs to have 3AC loaded at a time. But, for
some tasks (such as tracing out everything that is reachable), this
would mean repetitively re-decoding the stack IR for various functions.

Well, and/or building more metadata, such as (for every function)
building a table of any other functions called, any global variables
accessed, etc. But, this is just trading one type of memory bulk for
another.

Like, say, if there are 4000 functions in a program, any metadata that
needs to be stored per function adds up quickly.

I can see how multi-dim arrays would be troublesome with such a scheme.

Yeah.
It doesn't express them naturally, so I have to hack it.

It can, however, fairly easily express "FooStruct *x;".
But, if one wants, say:
FooStruct *x[8192];
Then it needs to put it into the type-overflow table.

Post by bart
My own approach is to use a bunch of parallel arrays (a table of structs
didn't appeal). They are populated with the standard types at the lower
end, then new types can be added.
A C type is represented by an index into those arrays. One of those
[0:maxtype]int tttarget # All names start with 'tt'
For a pointer, it is contains the index of the target type. For arrays,
it is the index of the element type. There is no restriction on nesting,
other than 'maxtype' (set to some large number; one day I'll make the
limit flexible).

Yeah, dunno...

I guess simpler could have been to do everything via the type-overflow
table and not have bothered with bit-twiddled type descriptors.

Originally though, the type overflow table was a response to noting that
there were a lot of situations that could not fit into a bit-packed format.

   #include <stdio.h>
   int main(void) {
       int a, b=2, c=3, d=4;
       a=b+c*d;
       printf("%d\n", a);
   }
Except that the 'a=b+c*d;' line is repeated 1M or 2M times, usually
within an included file.
Some compilers (including for other languages) find this challenging.

As-is, if the line were repeated more than 4096 times, it would break my
compiler absent adding an overflow case for the local variable descriptor.

Though, one option would be to drop the type field and expand the ID and
sequence numbers.

Since, generally, for local variables the type is also encoded in a
descriptor struct for the variable (though, the inline type is useful
for things like "soft casting", etc).

Also relevant for things like temporaries, where the descriptor only
contains an abstract type (temporaries primarily rely on sequence
numbers for identity, with only a small number of "canonical" temporary
variables representing every temporary in the function).

For temporaries, would need some sort of sequence-number overflow case
that does not lose the ability to encode the type.

But, yeah, compiler breaking because a given variable is assigned too
many times in a function is kinda lame.

Post by BGB
But, as noted, the 3AC IR only exists in memory.
In the IR, the operations are expressed as a sort of linear bytecode
operating on a virtual stack machine; with types expressed as ASCII strings.
Logically, the stack holds "ccxl_register" values, and the number of
3AC ops is typically less than the number of stack-machine operations
(those which exist simply to shuffle registers around essentially
disappear in the translation process).
   LOAD x
   LOAD y
   ADD
   STORE z
Might turn into a single 3AC operation.

The problem with generating code directly from a stack IL is that the
generated code tends to be rather inefficient.

But, going directly from an AST to 3AC or SSA is a pain...

And storing and reloading SSA as a "good" IR is a pain (and something
that just sorta serializes the internal structures, like what LLVM
bitcode does, is not ideal).

Say, in effect, any compiler which uses LLVM bitcode, would also need to
do things the same way that LLVM does them. And, a structural flattening
of my compiler's internal 3AC form, would require any other compiler to
do things the same way.

Luckily at least, a stack IR/IL can avoid this.

But, annoyingly, my existing IL has some design issues that would limit
its "general use"; mostly in that it is a monolithic linear bytecode
format that also expresses all of the metadata using bytecode ops.

The likely fix would be to go over to a structure more like I used for
the BS2VM, of basically representing everything inside a TLV format,
with each function and variable being individually wrapped (with offset
indices mapping table numbers to their corresponding TLV lumps).

Though, the actual contents of the bytecode wouldn't change that much,
and I had gone back and forth on whether it would be better to use
structural representation for metadata, or continue the hack of using
bytecode ops to encode the metadata.

But, attempts to rework BGBCC to use such a structure have thus far
tended to fizzle out.

I had a few times also considered trying to adopt .NET CIL, except that
it doesn't express C very well, and it represents metadata very
differently than my existing compiler (and its way of expressing
metadata isn't very flexible, *).

*: Basically, it uses a bunch of tables organized in a way almost like
in a relational database, but with a bit-packed representation (with the
bit-width of various members also depending on the number of rows or
range of primary keys within the other tables, etc). The table layouts
are effectively fixed, so one would be SOL if they want to express
anything that doesn't fit into the existing tables.

Post by bart
I wouldn't start off with turning stack code into 3AC! (SSA is supposed
to be /the/ way to generate code in compilers; they can keep it.)

My 3AC format isn't pure 3AC, but rather an intermediate between 3AC and
SSA (hence the whole sequence number issue...).

But, I ended up going this direction (at first in my BGBScript VM), as
it allowed for significant performance improvements over operating in
terms of a stack machine.

Can note that I was able (at the time) to make BGBScript and BGBScript2,
when mostly using a static-typed language variant and JIT compiled,
reasonably performance competitive with native C. BGBScript2 (beyond the
syntax changes) was also in some ways somewhat simplified vs what its
predecessor had become (its scope model and type-inference was a
significant source of pain; and effectively turning it into C# made this
part simpler).

A naive purely dynamic interpreter is a lot simpler, but then one is
hard-pressed to get much better than around 50x to 100x slower versus
native C (and, if running any non-trivial code in the VM, something with
a 100x slowdown is undesirable; and one needs either static typing or
type inference to have any hope of decent performance).

Post by bart
Those 4 stack instructions might turn into 3/4 machine instructions on
x64 (perhaps fewere is register-resident). But if converting them to one
3AC instruction, you then have to expand them again.

FWIW:
No current plans to target BGBCC to x86-64, as MSVC and GCC serve this
use case well enough.

Post by bart
Perhaps 3AC suits another architecture better (I think ARM uses
3-register ops; x64 is 2-register ops).

It makes sense on my own BJX2 ISA, which at this point is primarily
3-register ops (with 64 GPRs, etc).

I might have almost been tempted to consider switching more over to
RISC-V and GCC, except that (even with superscalar, etc), RISC-V
performance is a little weak.

Granted, "real world" it may not matter, because RISC-V is starting to
appear in ASICs, and a 1GHz CPU is going to beat a 50MHz CPU (in an
FPGA) fairly easily even if it takes more clock cycles to run similar code.

But, had noted that in some contrived benchmarks (software OpenGL and
neural-nets), my CPU core (at 50MHz) was able to be performance
competitive with an early 2000s laptop running at 1.4 GHz, but seemingly
mostly because x87 sucks in the NN case, ...

Well, and on some metrics the difference between them isn't as huge as
the difference in clock-speed would seem to imply. The laptop was
significantly faster at plain integer code though (for code that is not
memory bound).

The laptop seemed to be (relative to clock speed) bogged down in a few
areas:
The x87 FPU ops are slow;
Memory is also relatively slow.

Like, despite the 28x difference in clock speed, there was only around a
3.5x difference in memory bandwidth.

Technically though the laptop is slower than the newer RasPi models (or
running code in Termux on my cellphone...).

But, despite my efforts, getting Quake and similar running at decent
framerates on a 50MHz CPU remains elusive.

But, if I could bump it to 100MHz while keeping everything else the
same; Quake would generally be running at around 20fps.

Had ironically ended up mostly switching my efforts to GLQuake as,
ironically, my software OpenGL rasterizer had tended to be a little
faster than Quake's software renderer (at least, on my own ISA; on
RISC-V, the situation being very different).

Had also noted that vs my Ryzen, the my software GL rasterizer had
tended to be around 5x faster than x86-64 in a "clock cycles per pixel"
sense (though, the Ryzen still wins in terms of raw performance due to
having a 74x clock-speed advantage).

Much of this difference is because for my ISA, I had various "helper
instructions" for things like packing and unpacking RGB555 pixels into
packed-integer SIMD vectors, etc.

I suspect this sort of thing wrecks performance when building for the
other ISAs (turns into a bunch of shifting and masking).

Technically was using RGB555 and Z12.S4 buffers mostly because it needs
too much memory bandwidth to do RGBA32 and Z24.S8, ...

Though, my GL implementation currently still suffers from a relatively
slow transform/projection stage (an eventual TODO being likely to redo
this part).

As-is:
Pop a primitive off the stack;
Project vertices to screen space;
Check for culling;
If it can be culled, do so.
Check if primitive can be drawn as-is:
If yes, draw it.
Else, fragment primitive, and push fragments back onto stack.

Generally, the fragmentation is being done in world-space, so each
fragmented primitive requires any new vertices to be projected (it can
cache the previously projected vertices from the parent primitive).

Generally, the projection does the XYZ/W conversion upfront.

A likely "better" option:
Run polygon vertices through modelview and projection matrices;
Clip polygon against frustum planes (possibly in XYZW space);
Check and subdivide (similar to before, just in XYZW space);
Split into final triangles and quads and draw.

Theoretically, interpolating in XYZW space should work, with the XYZ/W
step being used for drawing the primitives (though, one sill needs to
divide by W to know whether the primitive needs to be subdivided).

Though, this is because the final rasterization stages (and generally
also my experimental hardware rasterizer) work in terms of affine
texturing; where larger on-screen primitives need to be subdivided to
avoid obvious texture warping.

Basically, if one has seen the 3D rendering on the original PlayStation,
they should probably know what sort of warping I am dealing with...

Malcolm McLean

2024-07-06 10:23:10 UTC

Why do you think this matters to other people?

To the average prgrammer, no, the compiler is "just there". Someone else
wrote it, if it is a paid-for compiler than usually the university of
the empployer paid for it. But of course someone smewere has to write it.

And if this difficult to do it limits the usability of the language. Now
C is so popular that it almost doesn't matter. When a new chip comes
out, usually the first thing the manufacturers is provide a C compiler
for it. For other languages, this isn't the case.

And the moment I'm writing a shell. And it would be nice to add a "cc"
command. But that's a huge task. However I already have a "bb"
(BabyBasic), and it's still going to be quite a big job to get it to a
state where it is easy to write useful prgrams with it. But not
comparable to implementing a C compiler.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

BGB

2024-07-06 18:46:51 UTC

Post by Malcolm McLean

Why do you think this matters to other people?

To the average prgrammer, no, the compiler is "just there". Someone else
wrote it, if it is a paid-for compiler than usually the university of
the empployer paid for it. But of course someone smewere has to write it.
And if this difficult to do it limits the usability of the language. Now
C is so popular that it almost doesn't matter. When a new chip comes
out, usually the first thing the manufacturers is provide a C compiler
for it. For other languages, this isn't the case.
And the moment I'm writing a shell. And it would be nice to add a "cc"
command. But that's a huge task. However I already have a "bb"
(BabyBasic), and it's still going to be quite a big job to get it to a
state where it is easy to write useful prgrams with it. But not
comparable to implementing a C compiler.

Yeah.

In my case I did a BASIC interpreter in under 1k lines, albeit for a
fairly limited form of the language.

Something like a JavaScript interpreter could probably be done in a few
thousand lines (assuming one limits it to ES3 or similar).

Something like a C compiler is a whole different thing entirely...

It isn't really the parser that is hard in this case, more that C has a
fairly difficult typesystem. Many other mainstream static-typed
languages (such as Java or C#) have comparably simpler typesystems.

This is part of why my idea for a simplified language would effectively
redesign the typesystem to be a little more like the one in C#, albeit
hopefully still usable for many of the types of tasks C is used for.

But, yeah, for most people, writing the compiler is always "someone
else's problem".

Granted, even with as much suck as it was to write my own C compiler to
target my ISA, it is still likely a better experience than it would have
been trying to retarget GCC or Clang.

GCC is sort of a tangled mess of code, trying to add a new target would
likely involve weaving it in all across the codebase, which is
comparably fairly large (well into MLOC territory IIRC).

Any desire to retarget Clang is limited by the issue that it would
involve needing to edit and recompile LLVM a large number of times, and
LLVM takes an absurdly long time to recompile...

They are like "well, use -j N".
But, like, "-j 1": It takes like 4 hours;
"-j 4": It still takes around 1 hour, PC still usable.
"-j 8": It takes 45 minutes, but PC is made unusable.

Doing so also eats most of the RAM, on a PC with 112 GB... (Can't
install a full 128GB with my MOBO/chipset it seems).

At "-j 8", the HDD with the pagefile is basically running at full capacity.

It is a mystery why anyone uses LLVM, I would have considered it a dead
end due to all this...

If there is any merit to GCC, it is at least that it doesn't wreck the
PC so hard trying to recompile it...

One other traditional option is LCC, but here, it is only the
parser/frontend stages, pretty much everything else one needs to write
themselves. And, the preprocessor and parser is not the hard part of
writing a C compiler.

It is mostly when one gets into dealing with the typesystem, and 3AC /
SSA stuff, that a lot of the pain begins.

But, in a way, this is where a stack-based bytecode IR comes in:
Easy to generate from a frontend;
Easy to translate into 3AC or SSA form on load.

As noted, the two mainstream examples of this sort of bytecode being the
Java JVM and .NET / CIL.

Technically, my compiler organizes things more like in the JVM;
essentially with two big tables: One holds all the top-level
declarations, and another all the types/structs/etc.

But, with a bytecode design more like in the .NET VM (the stack
operators don't directly encode the types, but rather the type of the
operation is based on the types of the input values).

Granted, there are a few things I would have preferred to have been
different in the design of my IR, but alas.

While it may be tempting to have the front-end emit code in SSA form,
this makes things harder. It could potentially allow a little more
flexibility for the front-end to optimize things in advance, but not
clear it is worth the added pain.

This was more the approach taken by Clang (producing LLVM bitcode as
output). However, it is unclear that LLVM bitcode would be terribly
useful in anything that is not LLVM (seems to be more designed more as a
compact serialization scheme for objects within LLVM).

There is SPIR-V as another example of this approach.

By themselves, the assembler and linker stages aren't too bad, if one
does these sensibly (my existing backend kinda botched this part).

My short lived "new compiler" effort did at least implement a more
proper assembler and linker stages. And, did the "more sensible" thing
of translating instruction encodings using an instruction-listing table
(as opposed to a bunch of giant "switch()" statements and bit-twiddling).

Translating 3AC or SSA form to ASM is one of those things:
Can be done "simply", but compiler output is not going to be efficient;
Doing it "better" is where things get to be an issue.

Ideally, one wants to try to limit the amount of complexity that this
stage needs to deal with (in terms of typesystem mechanics or behavior).

Note that SSA is typically a special case of 3AC, where conceptually in
SSA every variable may only be assigned once and the logical identity of
variables through control flow paths is merged via PHI operators.

In BGBCC, I had used a hybrid approach where variables are assigned
"version numbers" instead; so each time a given variable is assigned it
is given a new version. These may behave as distinct variables within a
basic-block, but typically all variables with the same base variable
will merge together (as-if a PHI had been used) in the space between
basic-blocks.

Though, this does lead to some obvious differences in the handling of
registers between BGBCC and GCC. Have also noted that MSVC seems to use
registers in a similar way to BGBCC (among various other things that
seem "oddly similar").

...

bart

2024-07-06 19:15:48 UTC

Post by BGB
But, yeah, for most people, writing the compiler is always "someone
else's problem".
Granted, even with as much suck as it was to write my own C compiler to
target my ISA, it is still likely a better experience than it would have
been trying to retarget GCC or Clang.
GCC is sort of a tangled mess of code, trying to add a new target would
likely involve weaving it in all across the codebase, which is
comparably fairly large (well into MLOC territory IIRC).
Any desire to retarget Clang is limited by the issue that it would
involve needing to edit and recompile LLVM a large number of times, and
LLVM takes an absurdly long time to recompile...
They are like "well, use -j N".
But, like, "-j 1": It takes like 4 hours;
"-j 4": It still takes around 1 hour, PC still usable.
"-j 8": It takes 45 minutes, but PC is made unusable.
Doing so also eats most of the RAM, on a PC with 112 GB... (Can't
install a full 128GB with my MOBO/chipset it seems).
At "-j 8", the HDD with the pagefile is basically running at full capacity.

Those figures are extraordinary.

But do you really need to recompile everything in LLVM even when
changing only target-specific components?

How much of it is due to it using C++ rather than C?

Maybe you should consult with David Brown who doesn't believe in the
benefits of fast compilers, and always knows some tricks to get a
build-time in seconds no matter how slow the compiler.

With my own tools, that run on one core with 8GB (2GB of which seems
tied up with the display for some reason, even though the HD display
needs only a 0.006GB frame buffer), I can't do anything else when
compiling, because it will have finished before I've taken my finger off
the Enter key!

This is largely thanks to machines now being so fast rather than my
coding skills, but also because I've never managed to learn how to write
huge, bloated, inefficient software.

I don't like products like LLVM; one reason I'm still doing this stuff
is to makes a stand.

(About 10 years ago I needed a new side-gate for my house. Rather than
buy one, I made one myself that was a perfect size, 6' high. If the
people behind LLVM made garden gates, theirs would have been 9 MILES
high! Probably 15 miles now.)

Post by BGB
It is a mystery why anyone uses LLVM, I would have considered it a dead
end due to all this...

Apparently it takes care of all that pesky back-end stuff. For me it
would make the task a thousand times harder. Plus I'd end up with a
compiler of which only 0.4% was my own work; 99.6% somebody else's.
Based on typical LLVM-based compilers being 100MB.

BGB

2024-07-06 22:01:08 UTC

Post by BGB
But, yeah, for most people, writing the compiler is always "someone
else's problem".
Granted, even with as much suck as it was to write my own C compiler
to target my ISA, it is still likely a better experience than it would
have been trying to retarget GCC or Clang.
GCC is sort of a tangled mess of code, trying to add a new target
would likely involve weaving it in all across the codebase, which is
comparably fairly large (well into MLOC territory IIRC).
Any desire to retarget Clang is limited by the issue that it would
involve needing to edit and recompile LLVM a large number of times,
and LLVM takes an absurdly long time to recompile...
They are like "well, use -j N".
   But, like, "-j 1": It takes like 4 hours;
   "-j 4": It still takes around 1 hour, PC still usable.
   "-j 8": It takes 45 minutes, but PC is made unusable.
Doing so also eats most of the RAM, on a PC with 112 GB... (Can't
install a full 128GB with my MOBO/chipset it seems).
At "-j 8", the HDD with the pagefile is basically running at full capacity.

Those figures are extraordinary.
But do you really need to recompile everything in LLVM even when
changing only target-specific components?

I haven't looked into it too much, but as I understand it, changing
target stuff will require rebuilding LLVM each time. Either way, I
didn't feel terribly inclined to find out.

Seemingly the usual answer is "have a Linux box with some absurd
hardware stats to use as a build server". This is, not ideal...

Like, I don't exactly have the money to afford on a "Dual socket
Threadripper with 256GB of RAM" or similar in order to try to get
semi-fast LLVM rebuilds.

Post by bart
How much of it is due to it using C++ rather than C?

Probably a fair bit.

When I looked at the LLVM code before, it seems like they are writing
C++ in a similar way to how one would write code in Java. This sort of
thing seems "kinda evil" for build times...

Post by bart
Maybe you should consult with David Brown who doesn't believe in the
benefits of fast compilers, and always knows some tricks to get a
build-time in seconds no matter how slow the compiler.

I can generally recompile BGBCC in a few seconds...

Though, for the most part, it is a monolithic unity build, for better or
for worse.

Could potentially split it up into multiple components, but doing so
would make it more of a hassle than leaving it as a single monolithic
entity.

Post by bart
With my own tools, that run on one core with 8GB (2GB of which seems
tied up with the display for some reason, even though the HD display
needs only a 0.006GB frame buffer), I can't do anything else when
compiling, because it will have finished before I've taken my finger off
the Enter key!
This is largely thanks to machines now being so fast rather than my
coding skills, but also because I've never managed to learn how to write
huge, bloated, inefficient software.

Code tends to get kinda bulky in my case, but still fairly small vs a
lot of OSS projects.

As noted, BGBCC is around 250 kLOC.

Overall project is around 2 MLOC, but around half of this is from ported
software.

So, say (very loosely):
250 kLOC: BGBCC
100 kLOC: jx2vm (CPU emulator)
200 kLOC: TestKern "OS" and runtime libraries.
Highly modified PDPCLIB
Stuff for memory management, filesystem, etc.
A makeshift GUI of sorts;
An OpenGL 1.x implementation;
...
...

Then, say, other stuff:
50 kLOC: Doom
75 kLOC: Hexen
150 kLOC: ROTT
300 kLOC: Quake3
...

Comparably, Doom 3 is around 1 MLOC, but like, it is also C++ and has no
chance of running on BJX2 as-is, so not worth bothering.

I had just barely crossed the threshold to where it seemed worthwhile to
resume an effort to port Quake3 (after looking at it a few years ago,
and noting that at the time, I didn't have a lot of the infrastructure
needed for a Quake3 port).

Nevermind an open question of whether it has any hope of being playable
on a 50MHz CPU.

Though, if anything gives hope here, it is that ironically Quake3 seems
to throw significantly less geometry at the OpenGL backend than Quake1
did (mostly I suspect because QBSP carves all the map surfaces into
little pieces along all of the BSP planes, whereas Quake3 tends to leave
surfaces intact across planes, duplicating a reference to each surface
within each leaf rather than carving it up).

So, for example, Quake3 tended to average (IIRC) around 300-500 polygons
per frame (despite the much more elaborate scenery) vs around 1000-1500
per frame for Quake1 (also it has LOD for the 3D models, so models use
simpler geometry as they are further from the camera, etc).

But, to really see what happens, I need to get it fully working...

My porting effort did make some aggressive corner cutting in a few
areas, like replacing the ZIP based PK3 files with WAD4, and replacing
most of the use of JPEG images with ".DDS" files.

Where WAD4 is effectively:

Similar to the WAD2 format, but supporting directory trees and
32-character names. Where, WAD2 was a format originally used in Quake 1
for storing various graphics (and for textures in the QBSP tools). The
WAD2 format was also used for similar purposes in Half-Life.

I had also used it for various purposes in my compilers and in my "OS".

Technically also the ".rsrc" section in my PE output is also based on
the WAD2 format (replacing the original format used by Windows, which
seemed hopelessly convoluted). It still serves the purpose though of
mostly holding BMP images and icons.

Or, like:
IWAD/PWAD:
Used in Doom and friends
No compression
8 character lump names (nominally upper case).
WAD2:
Used in Quake and Half-Life
Optional compression
16 character lump names (nominally lower case).
Names will omit file extensions,
which are stored as a 1 byte lump type.
WAD3:
Used by Half-Life / GoldSrc, slightly tweaked WAD2.
WAD4:
Custom format (used in my projects), expanded form of WAD2.
Optional compression
Directory Trees
32 character lump names (case sensitive, nominally lower case).

I felt OK with calling it WAD4 as it seemed like no one else had used
the WAD4 name, and the format design does mostly carry on the legacy of
its WAD predecessors.

I am mostly using LZ4 and RP2 compression, say:
0: Store
1: Unused (IIRC, was an LZSS like format in Quake)
2: Unused
3: RP2
4: LZ4
5..7: Unused
8: Deflate (unused at present)
9: Deflate64 (unused at present)
...

Deflate gives decent compression, but is mostly a bit slow and
heavyweight for my uses here.

Using ZIP for a VFS, while technically possible, has the drawback that
one needs to read-in and process the central directory (and doing it the
way Quake3 did it, ate a lot of memory).

Comparably, using fixed-length 64-byte directory entries both avoids a
lot of the up-front processing and still uses less memory (say, vs
representing every file name in the PK3 file as a full-path string which
is allocated via the Z_Malloc allocator, with thousands of files, this
is a lot of bytes...).

For some uses where TGA or PNG may have been used, I am using the QOI
format (moderately fast to decode, like a "Poor man's PNG").

Post by bart
I don't like products like LLVM; one reason I'm still doing this stuff
is to makes a stand.
(About 10 years ago I needed a new side-gate for my house. Rather than
buy one, I made one myself that was a perfect size, 6' high. If the
people behind LLVM made garden gates, theirs would have been 9 MILES
high! Probably 15 miles now.)

Possibly...

Post by BGB
It is a mystery why anyone uses LLVM, I would have considered it a
dead end due to all this...

Yeah.

Main EXE for BGBCC is ~ 4MB, but this is building with "/Zi" for VS2022,
which doesn't exactly generate the most compact binaries possible
(similar to "-O0 -g" with GCC).

Essentially, the binary in this case takes on the role of the entire
compiler toolchain (to mimic a GCC cross-compiler setup, I can symlink
the various tools to the BGBCC binary, and it tries to mimic the CLI for
the tool it is called as).

Luckily, in past small-scale experiments, GNU autotools doesn't seem to
notice/care what file-formats are being used, but is fairly particular
about some other things (like how the files are named).

For example, if called with no output file, the compiler needs to output
"a.out" or "a.exe" as the default output, otherwise autotools gets
confused (vs, the MSVC behavior of using the name of the first C source
file or similar to derive the name of the default output EXE).

Well, along with reliance of behaviors and command-line options outside
the scope of the POSIX spec (where GCC is similar to POSIX 'cc' or 'c89'
in terms of its CLI, but not exactly the same).

Though, there are a lot of tools that it does not yet mimic:
nm, objdump, objcopy, etc...

But these were not used by "./configure" or the Makefile's that
configure spits out, so...

It cares not that:
".o" files are RIL3 bytecode rather than COFF or ELF.
".a" files were also RIL3 bytecode.
The 'ar' tool linking the '.o' files together,
rather than an '!<arch>' file...
...

Though, technically, POSIX specifies that the '!<arch>' format is used.
For the stalled new compiler effort, the idea was to use WAD2 or similar
instead. Well, unless someone can make a strong case for why using the
original format actually matters.

Janis Papanagnou

2024-07-06 00:24:27 UTC

Post by BGB
It is not so much a dislike of multidimensional arrays as a concept, but
rather, the hair they add to the compiler and typesystem.
Granted, one would still have other complex types, like structs and
function pointers, so potentially the complexity savings would be limited.
[...]
Though, the major goal for this sort of thing is mostly to try to limit
the complexity required to write a compiler (as opposed to programmer
convenience).
Like, for example, I had tried (but failed) to write a usable C compiler
in less than 30k lines (and also ideally needing less than 4MB of RAM).
But, if the language design is simplified some, this might be a little
closer. Might still be doable, but a C compiler in 50-75k lines is much
less impressive.
[...]

Well, when reading that there's immediately faint memories forming
in my mind about an (only 90 page) booklet of Nicklaus Wirth with
the title "Compilerbau" (don't know about English translations). It
illustrates how to construct a simple (Pascal like) compiler based
on a Recursive Descent parser. It's development is also implying a
bootstrap mechanism, IIRC. Given the large numbers of lines of code
you mention above this might be a sensible alternative. Maybe you
can find some stimulus for your project even if you started your
own project differently. Your above stated requirements to limit
the complexity of writing a compiler is certainly addressed there
in several ways; simple LL(1) parser, bootstrapping, table driven
approaches, complete but extensible language as an example, etc.

Janis

PS: Google found a PDF online[*] (which is obviously more extensive
than my 1st ed. book), so you can inspect it and inform yourself to
see whether it's of any use to you.

PPS: Note also that even if the book uses a Pascal-like design the
described mechanisms are not bound to that language. Use it for a
"C"-like language if you think that's a good idea. :-)

[*]
http://pascal.hansotten.com/uploads/books/Compiler%20Bau%20Nwirth%201986.pdf

BGB

2024-07-06 06:39:25 UTC

Post by Janis Papanagnou

Yeah.

My first attempt had some missteps.

I have generally used hand-written recursive descent parsers.

Early origin of my compiler:

Originally it started as an interpreter for a language I called
BGBScript, which was more or less a clone of JavaScript (with some
differences).

The first version of this VM (~ 2003-2005) was based around using an XML
DOM implementation for an AST (inspired by XML-RPC), with an AST walking
interpreter. This VM was horribly slow.

There was then an attempt to rewrite this VM to use a bytecode
interpreter in place of the AST walker. At this stage, BGBCC was forked
off into its own thing (~ 2007).

BGBCC was modified to parse C, initially with the idea of being a C
interpreter (this part didn't work out).

On the BGBScript side of things, around the same time (~ 2007), the VM
was rewritten, to reuse parts of an older Scheme interpreter I had
written as a core; but partly migrating the core over to static typing
and 3AC. Its syntax migrated towards being closer to ActionScript3 and Haxe.

The scheme core was written back when I was in high-school (at around
the 2001/2002 timeframe); initially using a 32-bit tag-pointer format
with tags in the low-order bits, something like:
x000: Object
x100: CONS Cell
xx01: Fixnum (30 bit)
xx10: Flonum (30 bit)
yy11: Various small value types

Its AST format went over to being based on CONS-cell lists
(conventionally expressed using S-Expression syntax), with an internal
structure very similar to a mutilated version of Scheme.

It ended up translating this to a stack bytecode, which was then
unpacked into 3AC. Similarly, it ended up going from a 32-bit tagref
format to a 64-bit tagref format (very similar to the one I am still
using, with the tags generally held in the high order bits).

My BGBScript language ended up used in my projects up until around the
2012/2013 timeframe.

I later started work on a new language, which I had called BGBScript2
(BS2), which borrowed more from Java and C#. It was intended to be a
higher performance than its predecessor. In this language, I had
abandoned the use of a garbage collector (instead going over to a mix of
automatic lifetimes, new/delete, and zone allocation).

Initially (~ 2014 timeframe), I started prototyping the BS2 language on
BGBCC; but soon ended up abandoning this for writing a new and more
specialized VM. In the BS2 VM, I had ended up going over to using
"objects" to represent the AST nodes (conceptually, the AST for this
language would resemble JSON).

In this VM, also, the ASTs were first compiled to a stack-based bytecode
(with a design fairly similar to JVM bytecode), which was then stored
into images. When the images were loaded, functions would be unpacked
from the stack-based bytecode into an internal 3AC format (which was
used for the interpreter).

The BS2 VM was used for a lot of the game logic in a 3D engine I wrote
in the 2015/2016 timeframe (with the core 3D engine and renderer still
written in C). Goal was to do like Doom 3 and use a custom language for
most of the game logic and UI.

In the "BS2 in BGBCC" prototyping stage, I had experimented with doing a
3AC VM IR (partly inspired by Android's Dalvik VM). It sort of worked,
and the VM was rather fast for an interpreter, but the VM code ended up
suffering from "combinatorial explosion" issues (and the VM backend
itself was too heavyweight). This VM effort was called "FRVM" (Fast
Register VM).

Sometime around this era, I started floating the idea of a language I
had called "C-Aux" which would have been essentially a C / BS2 hybrid
(also targeting FRVM). This idea ended up fizzling out (and it also
became obvious that FRVM was a dead end).

Around 2016, I started messing around with making BGBCC target the
Hitachi SuperH / SH-4 ISA (most famously used in the Sega Dreamcast). I
ended up creating a heavily modified version of SH-4, with an emulator
sort of lazily copying the hardware interfaces from the Dreamcast (but,
it could also run an SH-2 variant).

Initially, chunks of the former FRVM effort were repurposed for the
SuperH backend (and middle end). Initial idea was to go directly from
3AC to machine code; and SH4 had used a fairly simple encoding with
fixed-length 16-bit instructions.

Approach was then to write giant "switch()" blocks to emit every
possible instruction. The results would then be dumped into a PE/COFF.

I had initially intended to eliminate the stack IR stage, but soon ended
up reversing on this after realizing I still needed static libraries.

Loading in static libraries as C source code would have been stupid, and
I didn't have a COFF linker, so most obvious choice at the time was to
keep using the stack-based bytecode IR.

Soon enough, it also became obvious I still needed an assembler, so this
part was written and just sort of hacked onto the side of the SH backend.

Around 2018, the hacked up SH-4 variant (BJX1) had become an ugly mess,
and it soon became obvious that this would not be viable to implement in
an FPGA.

The effort had internally fragmented:
BJX1-32: Basically SH4 with stuff glued on.
It was hacked to support 32 GPRs, but only on certain instructions.
It used variable length 16/32 instructions.
BJx1-64A: BJX1-32 but with GPRs expanded to 64-bits;
There were modal flag bits to bank in/out parts of the ISA;
This sucked, as there was too much mode-twiddling.
BJX1-64C:
Dropped some parts of the SH-4 ISA to make space for
64-bit instructions (eliminating the mode twiddling).

Essentially, the relation between SH4 and BJX1-64C could be compared
with the difference between 32 x86 and x86-64; using basically the same
instruction encodings, but with some instructions dropped/replaced.

Some encodings had been borrowed from SH-2A/SH-2E, others were new. Most
of the 32-bit instructions were just sort of shoved into odd corners of
the 16-bit encoding space.

Types of instructions:
Load/Store with a displacement (from SH-2A/2E);
Load with a 20-bit immediate (MOVI20S, also from SH-2A);
ALU instructions for 3R and 3RI forms (BJX1):
ADD Rm, Rt, Rn
ADD Rm, Imm8u, Rn
ADD Rm, Imm8n, Rn
(Pretty much all normal SH instructions were 2R or 2RI).
...
...

I did a partial reboot of the ISA design (which became known as BJX2):
Most of the high-level aspects of the ISA remained similar (to
BJX1-64C), but the encoding scheme was entirely redesigned (and a lot of
"cleanup" was made).

The BJX2 backend in BGBCC started out as a copy/paste of the BJX1 backend.

With the various changes to the instruction formats, I ended up adding
layers of cruft to repack the instructions from the older format to the
newer formats (still using the giant "switch()" blocks to emit
instructions, and starting the encoding in terms of 16-bit instruction
units). This part sucks, and is an ugly mess...

Though, in this process, BGBCC's original DOM based AST format was
replaced by a partially object-based AST format, which continues to
mimic XML DOM in some regards, but is generally faster and doesn't eat
as much memory (things like node tags and attribute numbers being
expressed as 16-bit tags rather than as strings, etc). Also generally
most strings are interned rather than allocated/freed individually
(except for big ones).

Typically, dumping the ASTs is still generally done as XML syntax though.

The BJX2 project has continued until now.

The design has shifted some over the years:
Initial:
16/32/48 bit instructions, 32 GPRs, 16 FPRs
Dropped FPRs, FPU moved into GPR space
Made CPU cheaper
Made various instruction sequences shorter
Gained predicated ops;
Instructions may conditionally execute based on the SR.T flag.
Replaced the 48-bit ops with Jumbo prefixes:
Instructions were then 16/32/64/96 bits.
Gained an extension to support 64 GPRs
"Baseline" form only for a subset of the ISA.
Added a newer "XG2" encoding mode.
Drops 16-bit encodings in this mode;
All of the instructions now have access to all 64 GPRs;
For many instructions, the immediate values gain 1 bit.
Added a RISC-V decoder mode.
The CPU can also run 64-bit RISC-V (currently RV64G).
X0..X31 mostly map to R0..R31, but some map to CRs;
F0..F31 map to R32..R63.

As I can note, despite GCC's cleverness, "GCC -O3" seemingly can't win
with RV64G in terms of performance.

Though, in most areas, my ISA is a superset of RISC-V, the main area
where RISC-V holding an advantage being (ironically) that its basic
immediate fields tend to be bigger (but is counterbalanced by not having
any good fallbacks). Something like a "LI Xd, Imm17s" instruction could
help a fair bit here.

In my ill-fated new compiler, things were changed:
There would now be a "proper" assembler and linker;
Albeit, plan had been to use a hacked version of the WAD format for
object files, rather than COFF, noting as this could allow for a simpler
object format.

In the new design, the code generator would have first emitted
instructions into a "binary ASM" format, which would then be assembled
into machine code.

So, pipeline was generally:
Preprocess;
Parse (into ASTs);
Translate to IR;
Translate IR to Binary ASM;
Assemble into machine code (in Object Modules);
Link;
Emit EXE or DLL.

To save memory, one idea was to interleave the parser and frontend
compiler, effectively working one function at a time. This could avoid
needing to parse the entirely translation unit into an AST, or needing
to keep IR around for the entire program.

Though, my current compiler does some optimizations (such as culling
unreachable parts of the program), which likely need the ability to keep
the entire program's declaration and IR space in RAM.

Post by Janis Papanagnou
Janis
PS: Google found a PDF online[*] (which is obviously more extensive
than my 1st ed. book), so you can inspect it and inform yourself to
see whether it's of any use to you.
PPS: Note also that even if the book uses a Pascal-like design the
described mechanisms are not bound to that language. Use it for a
"C"-like language if you think that's a good idea. :-)
[*]
http://pascal.hansotten.com/uploads/books/Compiler%20Bau%20Nwirth%201986.pdf

May need to look into this...

Keith Thompson

2024-07-05 18:46:38 UTC

[...]

Post by BGB
*1: While not exactly that rare, and can be useful, it is debatable if
they add enough to really justify their complexity and relative
semantic fragility. If using pointers, one almost invariably needs to
fall back to doing "arr[y*N+x]" or similar anyways, so it is arguable
that it could make sense to drop them and have people always do their
multidimensional indexing manually.
Note that multidimensional indexing via multiple levels of pointer
indirection would not be effected by this.

As-is, the multidimensional arrays require the compiler to realize
that it needs to multiply one index by the product of all following
indices.
int a[4][4];
int j, k, l;
l=a[j][k];
l=a[j*4+k];
Eliminating multidimensional arrays eliminates the need for this
translation logic, and the need to be able to represent this case in
the typesystem handling logic (which is, as I see it, in some ways
very different from what one needs for a struct).

The C standard has just one normative paragraph that discusses
multidimensional arrays. That paragraph is provided for clarity, but
could be removed without changing the language.

Multidimensional arrays require no work by the compiler that's not
already required for any other array type, particularly if you disallow
VLAs. Every indexing operation requires multiplying the the index by
the size of the element type.

As Ben suggests, if you add a special case rule to disallow arrays of
arrays, programmers will simply wrap arrays in structures, achieving the
same generated code with more obfuscated source code.

Removing multidimensional arrays from a C-like language would require
adding an arbitrary restriction, not removing a feature.

There are probably some optimizations a compiler could perform for
multidimensional array indexing -- but if you want to simplify the
compiler, just don't do those optimizations.

[...]

int[16];
int*
As far as most of the compiler's type-system handling is concerned. In
this case, one only needs to care about the size when reserving space
for the array.

No, arrays are not pointers. As I'm sure you know, an expression of
array type is "converted" to a pointer to its initial element in most
but not all contexts; the exceptions are sizeof, unary "&", and a string
literal used to initialize an array object. There are also special
rules about parameters that appear to be of array type. Aside from
those rules, array types and pointer types are as distinct as, say,
structs and integers.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Lawrence D'Oliveiro

2024-07-06 07:23:13 UTC

Post by Keith Thompson
No, arrays are not pointers.

Except array indexing is designed to be indistinguishable from pointer
arithmetic.

BGB

2024-07-06 08:51:02 UTC

Post by Keith Thompson
No, arrays are not pointers.

Except array indexing is designed to be indistinguishable from pointer
arithmetic.

Yeah, and in the 1D case, an array can be seen as functionally an
implicitly defined pointer with an assigned size and preassigned backing
memory.

Granted, C generally allows one to see the backing memory, but not the
implicit pointer to said backing memory. I guess one could argue that if
one can't take the address of it, it doesn't exist, but yeah...

Kind of a similar feature with lvalue structs:
An implicit pointer exists...
But, C wont let you see it directly or take its address.

But, I guess, if C did allow people to see these, it would add a whole
lot of meta issues.

Would also open up ABI issues, say, for ABI's which pass structs by
copying rather than by reference (and hinder optimizations like passing
structs by value in registers in cases where they fit into registers).

Then again, I suspect in the wider ABI design space, it seems like
passing/returning by-value structs by-reference is mostly a Microsoft
thing... But, I also did it this way (well, along with the controversial
decision of providing a designated spill space for register arguments).

Did notice that always providing 128 bytes for a 16 argument ABI (*1)
was a bit steep, so dropped it to 64 bytes / 8 args if:
No called function uses more than 8 arguments;
No vararg functions are called.
Does have the drawback that the functions are back to reserving 128
bytes as soon as any "printf()" calls or similar exist, which isn't
particularly rare.

*1:
With 16 GPRs, there were 4 arguments (R4..R7);
With 32 GPRs, it was 8 (R4..R7. R20..R23);
With 64 GPRs, it is 8 or 16 (adds R36..R39, R52..R55 in the latter).

Though, 32 GPRs seems near optimal for most programs. But, GL
rasterization benefits a fair bit, mostly because it is interpolating
over a lot of state, which otherwise turns into a whole lot of register
spill-and-fill.

...

BGB

2024-07-06 09:23:06 UTC

Post by Keith Thompson
No, arrays are not pointers.

Except array indexing is designed to be indistinguishable from pointer
arithmetic.

Yeah, and in the 1D case, an array can be seen as functionally an
implicitly defined pointer with an assigned size and preassigned backing
memory.
Granted, C generally allows one to see the backing memory, but not the
implicit pointer to said backing memory. I guess one could argue that if
one can't take the address of it, it doesn't exist, but yeah...
An implicit pointer exists...
But, C wont let you see it directly or take its address.
But, I guess, if C did allow people to see these, it would add a whole
lot of meta issues.
Would also open up ABI issues, say, for ABI's which pass structs by
copying rather than by reference (and hinder optimizations like passing
structs by value in registers in cases where they fit into registers).
Then again, I suspect in the wider ABI design space, it seems like
passing/returning by-value structs by-reference is mostly a Microsoft
thing... But, I also did it this way (well, along with the controversial
decision of providing a designated spill space for register arguments).
Did notice that always providing 128 bytes for a 16 argument ABI (*1)
No called function uses more than 8 arguments;
No vararg functions are called.
Does have the drawback that the functions are back to reserving 128
bytes as soon as any "printf()" calls or similar exist, which isn't
particularly rare.
With 16 GPRs, there were 4 arguments (R4..R7);
With 32 GPRs, it was 8 (R4..R7. R20..R23);
With 64 GPRs, it is 8 or 16 (adds R36..R39, R52..R55 in the latter).
Though, 32 GPRs seems near optimal for most programs. But, GL
rasterization benefits a fair bit, mostly because it is interpolating
over a lot of state, which otherwise turns into a whole lot of register
spill-and-fill.

^ GL rasterization benefits a fair bit from having 64 GPRs available.
In effect, there are some loops where the working state within the loop
would not otherwise fit in the available registers with a 32 GPR
configuration.

Meanwhile, 16 isn't really enough in general, and results in higher
levels of spill and fill for most code.

Though, it may make sense for the compiler to limit register use based
on the relative register pressure of a function (saving/restoring too
many registers may cost more than it saves in terms of register
spill/fill; resulting in most functions still being limited to using the
low 32 registers...).

But, register allocation is its own fun topic...

Post by BGB
...

James Kuyper

2024-07-06 18:26:47 UTC

On 7/6/24 04:51, BGB wrote:
...

Post by BGB
Yeah, and in the 1D case, an array can be seen as functionally an
implicitly defined pointer with an assigned size and preassigned
backing memory.
Granted, C generally allows one to see the backing memory, but not the
implicit pointer to said backing memory. I guess one could argue that
if one can't take the address of it, it doesn't exist, but yeah...

C won't let you take that pointer's address because it need not exist,
and in my experience, usually doesn't. Compilers that I'm familiar with
often store all objects that are local to given scope in a single block
of memory whose starting address is stored in a register. No memory is
set aside to store separate pointer objects pointing to any of those
individual objects. When a pointer value pointing at one of those
objects is needed, a fixed offset is added to the address stored in that
register. Are you familiar with any compilers that handle such things
differently? I make no claim to wide knowledge of compilers, but the
compilers that I used which work that way are among the most widely used
C compilers.

BGB

2024-07-06 19:33:16 UTC

In my compiler (BGBCC), such an internal pointer exists for arrays and
structures in the local stack frame.

No separate pointer exists inside of things like structs, where, as can
be noted, the array exists at a fixed size and location.

So, eg:
void Foo()
{
int a[100];
...
}

There is both the space for 100 integers reserved in the stack frame,
and a variable 'a' which exists as an implicit pointer to that location.

But, say:
void Foo()
{
int a[8192];
...
}

There is no space reserved on the stack, and the array is instead
allocated dynamically (from the heap). In this case, the "a" variable
exists as a pointer to that location in memory.

Similar treatment also applies to structs.

...

Doesn't currently apply to global structs or arrays though, but with
some recent experiences, it might almost make sense (dealing with
programs with unwieldy large ".bss" sections, and in NOMMU contexts it
may be easier to perform a number of smaller allocations than to locate
a single large contiguous chunk of memory).

Granted, this would be more of an issue:
The likely way to do this would be to have the program allocate and zero
the memory the first time it is accessed, but having the compiler add
code to check and conditionally allocate global objects, seems undesirable.

Similarly, allocating it all up-front on program startup would have its
own issues (and as-is would not work for dynamically loaded DLLs).

For my Quake3 porting effort, had ended up doing some of this manually
(since, as noted, at first Quake3's ".bss" section was big enough to
break stuff; by exceeding an 8MB hard limit that currently exists in my
compiler, mostly due to part of how it is internally representing
relocations).

If not for this limit, the PE32+ format is theoretically limited to
around 2GB or so, but as-is, a 2GB PE image would not be loadable anyways.

My "PEL4" format is not strictly PE32+, but in this case the differences
don't really matter (PEL4 mostly differing from PE32+ in that it drops
the MZ header and adds LZ4 compression; among a bunch of other mostly
minor differences). I did make creative use of the "Global Pointer"
entry in the "Data Directory", but it isn't too far removed from how it
worked in WinCE, so alas... (But, specific information on WinCE is hard
to find, most information focusing on desktop versions of the OS, but
can note that they also dropped the MZ stub, as it is basically
irrelevant if MS-DOS is no longer involved; but I went slightly further
and also dropped the MZ header, saving roughly 64 bytes in this case).

So, say, first 4 bytes of file:
"MZ\0\0": Traditional PE-COFF
"PE\0\0": No MZ Header, uncompressed.
"PEL4": No MZ Header, LZ4 compressed (after the PE headers).

Loading a PE image generally involves reading the thing into RAM,
whereas a PEL4 image generally involves reading the headers, and then
decompressing the rest of the image into the destination address.

The MZ+PE format is basically similar. Though, besides the difference in
removing the MZ header, also a different checksum algorithm is used (the
original MZ checksum was weak; so used a "slightly stronger" checksum
which was better able to detect things like if the LZ4 decompression or
similar went amiss).

...

David Brown

2024-07-09 14:37:31 UTC

Post by BGB
In my compiler (BGBCC), such an internal pointer exists for arrays and
structures in the local stack frame.
No separate pointer exists inside of things like structs, where, as can
be noted, the array exists at a fixed size and location.
void Foo()
{
     int a[100];
     ...
}
There is both the space for 100 integers reserved in the stack frame,
and a variable 'a' which exists as an implicit pointer to that location.
void Foo()
{
     int a[8192];
     ...
}
There is no space reserved on the stack, and the array is instead
allocated dynamically (from the heap). In this case, the "a" variable
exists as a pointer to that location in memory.
Similar treatment also applies to structs.

The C standard does not require a stack or say how local data is
implemented, it just gives rules for the scope and lifetime of locals.
However, I would be surprised and shocked to find a compiler I was using
allocate local data on the heap in some manner. If I have an array as
local data, it is with the expectation that it is allocated and freed
trivially (an add or subtract to the stack pointer, typically combined
with any other stack frame). If I want something on the heap, I will
use malloc and put it on the heap.

Such an implementation as yours is not, I think, against the C standards
- but IMHO it is very much against C philosophy.

Michael S

2024-07-09 15:54:06 UTC

On Tue, 9 Jul 2024 16:37:31 +0200

Post by BGB
In my compiler (BGBCC), such an internal pointer exists for arrays
and structures in the local stack frame.
No separate pointer exists inside of things like structs, where, as
can be noted, the array exists at a fixed size and location.
void Foo()
{
     int a[100];
     ...
}
There is both the space for 100 integers reserved in the stack
frame, and a variable 'a' which exists as an implicit pointer to
that location.
void Foo()
{
     int a[8192];
     ...
}
There is no space reserved on the stack, and the array is instead
allocated dynamically (from the heap). In this case, the "a"
variable exists as a pointer to that location in memory.
Similar treatment also applies to structs.

The C standard does not require a stack or say how local data is
implemented, it just gives rules for the scope and lifetime of
locals. However, I would be surprised and shocked to find a compiler
I was using allocate local data on the heap in some manner. If I
have an array as local data, it is with the expectation that it is
allocated and freed trivially (an add or subtract to the stack
pointer, typically combined with any other stack frame). If I want
something on the heap, I will use malloc and put it on the heap.
Such an implementation as yours is not, I think, against the C
standards
- but IMHO it is very much against C philosophy.

I wouldn't mind if my ABI/compiler allocates all big local objects
together with all local VLA either on separate secondary stack or even
on heap. Such strategy will improve locality of reference for primary
stack. So, despite higher management overhead, it could sometimes be
advantageous even for performance.
The main advantage however is not performance, but reducing the
severity of damage caused by buffer overrun bugs.

Although if "security" is a primary concern then one would want
stricter policy than the one outlined above. I.e. not only big objects,
but small object as well should be allocated away from primary stack as
long as their address participate in pointer arithmetic that can't be
proven safe by static analysis.

David Brown

2024-07-09 18:03:04 UTC

Post by Michael S
On Tue, 9 Jul 2024 16:37:31 +0200

Post by BGB
In my compiler (BGBCC), such an internal pointer exists for arrays
and structures in the local stack frame.
No separate pointer exists inside of things like structs, where, as
can be noted, the array exists at a fixed size and location.
void Foo()
{
     int a[100];
     ...
}
There is both the space for 100 integers reserved in the stack
frame, and a variable 'a' which exists as an implicit pointer to
that location.
void Foo()
{
     int a[8192];
     ...
}
There is no space reserved on the stack, and the array is instead
allocated dynamically (from the heap). In this case, the "a"
variable exists as a pointer to that location in memory.
Similar treatment also applies to structs.

The C standard does not require a stack or say how local data is
implemented, it just gives rules for the scope and lifetime of
locals. However, I would be surprised and shocked to find a compiler
I was using allocate local data on the heap in some manner. If I
have an array as local data, it is with the expectation that it is
allocated and freed trivially (an add or subtract to the stack
pointer, typically combined with any other stack frame). If I want
something on the heap, I will use malloc and put it on the heap.
Such an implementation as yours is not, I think, against the C standards
- but IMHO it is very much against C philosophy.

I would mind a great deal if it used a heap. Heaps support out-of-order
allocation and deallocation - this can mean significant overhead in
space and very non-deterministic behaviour. A secondary stack would be
a different matter entirely, and is much more reasonable. There are
some kinds of code where a secondary stack would be a useful structure.
(I've worked with microcontrollers where there was a separate return
stack and data stack.) But I'd want to know about it!

It makes a lot more sense to me to simply have compiler flags that
complain if you have stack objects that are too big. (How much is "too
big" will vary depending on your target.)

Post by Michael S
The main advantage however is not performance, but reducing the
severity of damage caused by buffer overrun bugs.

There are much better ways to deal with that than randomly using a heap
for some objects and not others.

Post by Michael S
Although if "security" is a primary concern then one would want
stricter policy than the one outlined above. I.e. not only big objects,
but small object as well should be allocated away from primary stack as
long as their address participate in pointer arithmetic that can't be
proven safe by static analysis.

Just use a more appropriate language that has better run-time checking
when this is your prime concern. And if you are using C, keep things
small and simple so that the programmer, the static analysis tools, and
the code reviewer can all confirm that there are no buffer overflows.

BGB

2024-07-09 18:23:15 UTC

Post by Michael S
On Tue, 9 Jul 2024 16:37:31 +0200

Post by BGB
In my compiler (BGBCC), such an internal pointer exists for arrays
and structures in the local stack frame.
No separate pointer exists inside of things like structs, where, as
can be noted, the array exists at a fixed size and location.
void Foo()
{
     int a[100];
     ...
}
There is both the space for 100 integers reserved in the stack
frame, and a variable 'a' which exists as an implicit pointer to
that location.
void Foo()
{
     int a[8192];
     ...
}
There is no space reserved on the stack, and the array is instead
allocated dynamically (from the heap). In this case, the "a"
variable exists as a pointer to that location in memory.
Similar treatment also applies to structs.

The C standard does not require a stack or say how local data is
implemented, it just gives rules for the scope and lifetime of
locals. However, I would be surprised and shocked to find a compiler
I was using allocate local data on the heap in some manner. If I
have an array as local data, it is with the expectation that it is
allocated and freed trivially (an add or subtract to the stack
pointer, typically combined with any other stack frame). If I want
something on the heap, I will use malloc and put it on the heap.
Such an implementation as yours is not, I think, against the C standards
- but IMHO it is very much against C philosophy.

I wouldn't mind if my ABI/compiler allocates all big local objects
together with all local VLA either on separate secondary stack or even
on heap. Such strategy will improve locality of reference for primary
stack. So, despite higher management overhead, it could sometimes be
advantageous even for performance.
The main advantage however is not performance, but reducing the
severity of damage caused by buffer overrun bugs.
Although if "security" is a primary concern then one would want
stricter policy than the one outlined above. I.e. not only big objects,
but small object as well should be allocated away from primary stack as
long as their address participate in pointer arithmetic that can't be
proven safe by static analysis.

It is mostly for big objects to not lead to stack overflows...

On a 128K stack, theoretically the maximum number of 16K objects on the
stack is 8 (assuming no space for anything else).

You don't want to require a big stack if:
It is possible for the system to run without an MMU, thus bigger stacks
waste memory;
Most of the programs do not benefit from having a larger stack.

I did tweak the rules slightly a little more recently:
If an object is 16K or larger, it goes on the heap;
If the sum of objects is larger than 16K:
Calculate n = total / 12K
New limit is 16K/n.

So, say:
int arr1[3072];
int arr2[3072];
Previously, both could go on the stack (using 24K), but now they will
trigger the limit to be reduced to 8K, putting both on the heap.

Though, as noted, these objects go into a linked list and will be
automatically freed when the function returns.

I recently did modify the mechanism (moving the responsibility from the
frontend to the backend frame-layout handling), which should hopefully
reduce some of the bugs I was seeing with it.

Partly, what triggered some of this originally, was GLQuake had a buffer
like:
unsigned resampbuf[256*256];

Which, would have worked with a 1MB stack, but with a 128K stack led to
an immediate overflow.

This mechanism still allows such code to exist...

For context, the array was used when uploading textures to OpenGL, as
Quake used many non-power-of-2 textures, but OpenGL only allows
power-of-2 texture sizes; with an accepted maximum of 256*256 as this
was the limit on early graphics cards; though, newer ones can typically
handle 4096*4096 or similar; currently TKRA-GL assumes a limit of 1024*1024.

The general assumption though, is that these sort of large stack objects
are the exception rather than the rule, so a little extra overhead is
tolerable (but, implicitly, things like interrupt handlers, etc, are not
allowed to use large stack arrays).

Well, more so that in my case the interrupt handler currently has a 64K
stack. Originally, it was ~ 6K (in a special SRAM area), but it was
difficult to avoid overflows (with things like virtual memory code), so
I ended up moving the interrupt stack to RAM.

Based on some looking, apparently early versions of Mac OS/X may have
done something similar, albeit:
With an object-size limit of 32K;
With a stack size of 512K.

This isn't too far off in terms of proportions.

Though, it looks like more modern versions went to using bigger stacks
(no information if they still do the array folding).

Can note that:
Windows defaults to a 1MB stack size;
Linux defaults to an 8MB stack size.
Appears modern OSX is 4MB.

Granted, none of these systems is NOMMU.

Keith Thompson

2024-07-06 22:38:14 UTC

Post by Keith Thompson
No, arrays are not pointers.

Except array indexing is designed to be indistinguishable from pointer
arithmetic.

Yeah, and in the 1D case, an array can be seen as functionally an
implicitly defined pointer with an assigned size and preassigned
backing memory.

No, there is no implicitly defined pointer.

Consider:

int integer_object;
int array_object[10];

This create an object of integer type with size sizeof(int) and an
object of array type with size 10 * sizeof (int). There is no implicit
pointer object associated with it either of them.

If you evaluate the expression `array_object` in most contexts, it's
implicitly converted to a pointer *value*, pointing to the 0th element
of the array object. There is still no implicit pointer object.
Similarly, evaluating `&integer_object` yields a pointer value, but does
not allocate a pointer object (but the "&" operator has to be explicit).

In very early C, before K&R1, an array object declaration actually did
implicitly create a pointer object holding the address of its initial
element. This became unworkable for structs containing members of
struct type; copying a struct would copy the address of the array but
not the array itself, resulting in the two structs sharing the same
array data.

Post by BGB
Granted, C generally allows one to see the backing memory, but not the
implicit pointer to said backing memory. I guess one could argue that
if one can't take the address of it, it doesn't exist, but yeah...
An implicit pointer exists...
But, C wont let you see it directly or take its address.

If you take the address of a struct object, you get a pointer *value*.
There is no implicit pointer *object*, so there's nothing whose address
can be taken.

If you assume that arrays are really pointers, it's difficult or
impossible to build a consistent model of how C works. If instead you
realize that arrays are arrays and arrays are pointers, and that C has
some peculiar rules about their interaction, everything is peculiar but
consistent.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

BGB

2024-07-07 04:55:58 UTC

Post by Keith Thompson
No, arrays are not pointers.

Except array indexing is designed to be indistinguishable from pointer
arithmetic.

Yeah, and in the 1D case, an array can be seen as functionally an
implicitly defined pointer with an assigned size and preassigned
backing memory.

No, there is no implicitly defined pointer.
int integer_object;
int array_object[10];
This create an object of integer type with size sizeof(int) and an
object of array type with size 10 * sizeof (int). There is no implicit
pointer object associated with it either of them.
If you evaluate the expression `array_object` in most contexts, it's
implicitly converted to a pointer *value*, pointing to the 0th element
of the array object. There is still no implicit pointer object.
Similarly, evaluating `&integer_object` yields a pointer value, but does
not allocate a pointer object (but the "&" operator has to be explicit).
In very early C, before K&R1, an array object declaration actually did
implicitly create a pointer object holding the address of its initial
element. This became unworkable for structs containing members of
struct type; copying a struct would copy the address of the array but
not the array itself, resulting in the two structs sharing the same
array data.

This implicit pointer need not exist at a location in memory...

In my compiler, such implicit pointers do exist in the stack frame
though, and are used in some cases (such as when the backing memory is
located on the heap rather than in the stack frame; which is also a
feature in my compiler).

But, yeah, putting implicit pointers to struct arrays within structs
would likely be unworkable...

I am not really in favor of "exposing" their existence in the language
proper, as this would open up a whole mess of meta-issues (so, whether
or not every array and struct also has a transient pointer holding its
address, or one that pops into existence whenever one tries to access an
array within a structure, ..., can be left as a matter of
interpretation...).

If you take the address of a struct object, you get a pointer *value*.
There is no implicit pointer *object*, so there's nothing whose address
can be taken.

Well, unless it exists in the compiler and ABI (which, in my compiler,
is actually the case).

But, either way, probably still don't want code taking the address of it
(and there is no defined way to do so within C).

Post by Keith Thompson
If you assume that arrays are really pointers, it's difficult or
impossible to build a consistent model of how C works. If instead you
realize that arrays are arrays and arrays are pointers, and that C has
some peculiar rules about their interaction, everything is peculiar but
consistent.

Well, as noted, I am thinking some here more at the level of compiler
implementation, not necessarily at the level of C language semantics.

The language semantics and compiler/ABI behavior need not be strictly
1:1 in these areas.

Like, say, if one has an implementation where the whole existence of
things like passing structures by value is, in effect, an illusion...

But, to some extent, the existence of local variables as singular
entities holding a value of a specific size in a specific location, is
itself also an illusion (as they may or may not have a location on the
stack, or have their existence hop around from register to register
based on whichever register happened to be available at the moment, and
only appearing in memory at the time one takes its address, ...).

James Kuyper

2024-07-07 14:03:10 UTC

...

Post by Keith Thompson
No, there is no implicitly defined pointer.

...

Post by BGB
This implicit pointer need not exist at a location in memory...

Which is why C doesn't give you access to it's location in memory -
something you complained about earlier.

BGB

2024-07-07 20:10:51 UTC

Post by Keith Thompson
No, there is no implicitly defined pointer.

...

Post by BGB
This implicit pointer need not exist at a location in memory...

Which is why C doesn't give you access to it's location in memory -
something you complained about earlier.

I don't think I was claiming that one should have direct access to its
location or value within the language, rather that their existence and
behaviors could be acknowledged in the language design (for a "not quite
C" language).

My existing compiler uses such pointers internally, but does not give
programs the ability to access their contents (as this would basically
open up a crap storm in the semantics; and there is the "less bad"
option of "just use pointers").

Well, and potentially, one can argue, that the existence or
non-existence of such pointers could potentially influence the behavior
of the program in some edge cases (well, along with things like turning
large local arrays or structs into implicit memory allocations).

Like, say, an implicit malloc call is still a malloc call...

But alas (seemingly it is a necessary evil absent reserving MB of memory
for the stack for many programs...).

...

James Kuyper

2024-07-07 23:22:24 UTC

Post by Keith Thompson
No, there is no implicitly defined pointer.

...

Post by BGB
This implicit pointer need not exist at a location in memory...

Which is why C doesn't give you access to it's location in memory -
something you complained about earlier.

I think that the existence of an implicit pointer would be a bad thing
to acknowledge, given that the language doesn't require that it exist,
and typical implementations don't use them. From what I understand, the
fact that your implementation does have implicit pointers makes it a rarity.

Kaz Kylheku

2024-07-08 00:02:39 UTC

Post by Keith Thompson
No, there is no implicitly defined pointer.

...

Post by BGB
This implicit pointer need not exist at a location in memory...

Which is why C doesn't give you access to it's location in memory -
something you complained about earlier.

Ritchie's B language had arrays which contained a pointer to their
first element. Via a hack, it was possible to relocate an array.

In C, such a thing is not simply not required; it is ruled out
by the detailed semantic description of arrays.

The entire representation of an array of size N elements of type
T is contained in the memory block that is sizeo(T)*N bytes wide.

If you copy that block, you have a fully functional copy of the array.
No extra pointer needs to be set up with the correct value.

Furthermore, to dynamically allocate an array, you need only
provide sizeof(T)*N bytes of storage, and not a bit more.

There is simply nowhere in the representation of an array where
a pointer could hide that is part of the representation.

Code that manipulates arrays can be translated into something that
calculates a pointer. So a pointer can exist at run time, in a transient
way, perhaps in a register. That register can even be spilled into
memory. However, that's just part of the state of an evolving
calculation, not part of the representation of the array.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

BGB

2024-07-08 03:10:48 UTC

Post by Keith Thompson
No, there is no implicitly defined pointer.

...

Post by BGB
This implicit pointer need not exist at a location in memory...

Which is why C doesn't give you access to it's location in memory -
something you complained about earlier.

All this only really applies if one assumes the pointer to the array is
part of the array itself.

In the case of the "hidden pointer", it does not exist within the array
or object data, but rather, at another location within the function's
stack frame and/or held in a register, and its existence is mostly
invisible.

Implicitly, the value of this pointer can be exposed when an array
decays into a pointer (or when taking the address of a struct).

Post by Kaz Kylheku
Code that manipulates arrays can be translated into something that
calculates a pointer. So a pointer can exist at run time, in a transient
way, perhaps in a register. That register can even be spilled into
memory. However, that's just part of the state of an evolving
calculation, not part of the representation of the array.

I was never saying the pointer is *in* the array or object...

But, yeah, they generally exist on the stack or in registers, sometimes
coming into existence as part of another calculation.

But, in any case, there often needs to be "something" to keep track of
the address.

Much like, if one writes:
y=3*x+4;
And it becomes, internally, say:
_T0=3*x;
y=_T0+4;

There are hidden temporary variables holding the intermediate results of
the calculation, which may potentially also exist in the stack frame in
some cases (but, within C, there isn't really a way to observe them
directly nor to take their address; nor would one necessarily want it to
be possible to do so).

...

Ben Bacarisse

2024-07-08 09:53:47 UTC

Post by Keith Thompson
No, there is no implicitly defined pointer.

...

Post by BGB
This implicit pointer need not exist at a location in memory...

Which is why C doesn't give you access to it's location in memory -
something you complained about earlier.

Ritchie's B language had arrays which contained a pointer to their
first element. Via a hack, it was possible to relocate an array.

Nit: Ritchie describes B as Thompson's own language. He might have been
being very modest, but that seems unlikely.

--
Ben.

James Kuyper

2024-07-08 04:28:53 UTC

On 7/7/24 20:02, Kaz Kylheku wrote:
...

Post by Kaz Kylheku
Ritchie's B language had arrays which contained a pointer to their
first element. Via a hack, it was possible to relocate an array.
In C, such a thing is not simply not required; it is ruled out
by the detailed semantic description of arrays.
The entire representation of an array of size N elements of type
T is contained in the memory block that is sizeo(T)*N bytes wide.
If you copy that block, you have a fully functional copy of the array.
No extra pointer needs to be set up with the correct value.

An implementation which took the following code:

int array1[5], array2[5];
memcpy(array2, array1, sizeof array1);

and translated it into machine code that was the equivalent of

int array1[5], array2[5];
int *_p1 = &array1[0], *_p2 = &array2[0];
memcpy(_p2, _p1, sizeof array1);

would not violate any of the requirements you mention. The key point is
that when you copy the contents of an array to a new location, you
wouldn't want to copy the implicit pointer - it would point at the wrong
location. And if the destination is itself declared as an array, it
would already have an implicit pointer that pointed at the correct location.

I see no point in having implicit pointers, but I don't believe that
they are prohibited.

Post by Kaz Kylheku
Furthermore, to dynamically allocate an array, you need only
provide sizeof(T)*N bytes of storage, and not a bit more.
There is simply nowhere in the representation of an array where
a pointer could hide that is part of the representation.

Allocated memory is already accessed through a pointer; there would be
no corresponding need to create an implicit one when there's already an
explicit one.

BGB

2024-07-08 17:39:02 UTC

int array1[5], array2[5];
memcpy(array2, array1, sizeof array1);
and translated it into machine code that was the equivalent of
int array1[5], array2[5];
int *_p1 = &array1[0], *_p2 = &array2[0];
memcpy(_p2, _p1, sizeof array1);
would not violate any of the requirements you mention. The key point is
that when you copy the contents of an array to a new location, you
wouldn't want to copy the implicit pointer - it would point at the wrong
location. And if the destination is itself declared as an array, it
would already have an implicit pointer that pointed at the correct location.

Pretty much.

Post by James Kuyper
I see no point in having implicit pointers, but I don't believe that
they are prohibited.

They mostly exist in a "sort of simpler to implement the compiler this
way" sense.

In the implicit pointer case, the compiler just treats it as-if it were
an explicit pointer. In this case, both are basically treated as being
roughly equivalent at the IR levels.

And, most of the code-generation stage doesn't need separate handling
for arrays and pointers, but can use combined "ArrayOrPointer" handling
or similar.

It had all seemed "obvious enough".

Similar reasoning for passing structs by-reference in the ABI:
Pass by reference is easy to implement;
In place copying and decomposing into registers, kinda bad.

Though, this one seems to be a common point of divergence between "SysV"
and "Microsoft" ABIs. Sometimes a target will have an ABI defined, and
the MS version was almost the same, just typically differing in that it
passes structs by reference and provides a spill space for register
arguments.

Allocated memory is already accessed through a pointer; there would be
no corresponding need to create an implicit one when there's already an
explicit one.

Yes.

Similarly:
int a[10];
int *p;
p=a;
Merely copies the value of the implicit pointer to the explicit one.

The 'a', from the compiler's POV, is basically a pointer, that just
happens to be initialized to point to a piece of memory holding 10 integers.

In the IR, there are some instructions to signal that memory should be
reserved for an array or object, eg:
INITARR dst, size
INITOBJ dst, type
INITOBJARR dst, type, size

Though, the actual space reservation and initialization is done in the
prolog, and these ops more exist to signal that space should be reserved.

In some of my VMs, these sort of ops perform the memory reservation as
well. In these VMs, local variables could be seen as an array of
pointer-sized values (where, say, most types reserve 1 element, but
128-bit types reserve 2 elements).

But, if one is familiar with the Java JVM, how all this behaves
shouldn't be too hard to figure out. Well, except that in this case,
locals, arguments, and temporaries, existed as 3 arrays, rather than a
single array. In the BS2VM, it was two arrays: one for arguments, and
another for locals and temporaries.

...

David Brown

2024-07-09 14:31:30 UTC

Post by James Kuyper
...
I see no point in having implicit pointers, but I don't believe that
they are prohibited.

They mostly exist in a "sort of simpler to implement the compiler this
way" sense.
In the implicit pointer case, the compiler just treats it as-if it were
an explicit pointer. In this case, both are basically treated as being
roughly equivalent at the IR levels.
And, most of the code-generation stage doesn't need separate handling
for arrays and pointers, but can use combined "ArrayOrPointer" handling
or similar.
It had all seemed "obvious enough".
Pass by reference is easy to implement;
In place copying and decomposing into registers, kinda bad.
Though, this one seems to be a common point of divergence between "SysV"
and "Microsoft" ABIs. Sometimes a target will have an ABI defined, and
the MS version was almost the same, just typically differing in that it
passes structs by reference and provides a spill space for register
arguments.

I don't think it is helpful that you keep mixing /logical/ terms with
/implementation/ terms.

In C, there is no "pass by reference" or "return by reference". It is
all done by value. Even when you use pointer arguments or return types,
you are passing or returning pointer values. C programmers use pointers
to get the effect of passing by reference, but in C you use pointers to
be explicit about references.

Structs in C are passed by value, and returned by value. Not by reference.

The C standards don't say how passing structs around by value is to be
implemented - that is hidden from the programmer. Usually ABI's (which
are also hidden from the programmer) specify the implementation details,
but some ABI's are weak in that area. Generally, structs up to a
certain size or complexity are passed in registers while bigger or more
advanced types are passed via addresses (pointing to stack areas) in
registers or the stack, just like any other bigger types. This is not
"pass by reference" as far as the C programming is concerned - but you
could well call it that at the assembly level. Where the line between
"passing in registers" and "passing via addresses to space on the stack"
is drawn, is entirely up to the compiler implementation and any ABI
requirements. Some simpler compilers will pass all structs via
addresses, no matter how simple they are, while others will aim to use
registers whenever possible.

So if you have these structs and declarations :

struct small { uint16_t a; uint16_t b; };
struct big { uint32_t xs[10]; };

struct small foos(struct small y);
struct big foob(struct big y);

Then compilers will typically implement "x = foos(y)" as though it were:

extern uint32_t foos(uint32_t ab);
uint32_t _1 = foos(y.a << 16) | (y.b);
struct small x = { _1 >> 16, _1 & 0xffff };

And they will typically implement "x = foosb(y)" as though it were:

extern void foob(struct big * ret, const struct big * xs);
struct big x;
foob(&x, &y);

This is not, as you wrote somewhere, something peculiar to MSVC - it is
the technique used by virtually every C compiler, except perhaps for
outdated brain-dead 8-bit microcontrollers that have difficulty handling
data on a stack.

And it is not really "pass by reference" or "implicit pointers", it is
just passing addresses around behind the scenes in the implementation.

bart

2024-07-09 14:54:06 UTC

Post by BGB
Though, this one seems to be a common point of divergence between
"SysV" and "Microsoft" ABIs. Sometimes a target will have an ABI
defined, and the MS version was almost the same, just typically
differing in that it passes structs by reference and provides a spill
space for register arguments.

I don't think it is helpful that you keep mixing /logical/ terms with
/implementation/ terms.
In C, there is no "pass by reference" or "return by reference". It is
all done by value.

Arrays are passed by reference:

void F(int a[20]) {}

int main(void) {
int x[20];
F(x);
}

Although the type of 'a' inside 'F' will be int* rather than int(*)[20].

Post by David Brown
struct small { uint16_t a; uint16_t b; };
struct big { uint32_t xs[10]; };
struct small foos(struct small y);
struct big foob(struct big y);
    extern uint32_t foos(uint32_t ab);
    uint32_t _1 = foos(y.a << 16) | (y.b);
    struct small x = { _1 >> 16, _1 & 0xffff };
    extern void foob(struct big * ret, const struct big * xs);
    struct big x;
    foob(&x, &y);

From what I've seen, structs that are not small enough to be passed in
registers, are copied to a temporary, and the address of that temporary
is passed.

This seems to be the case even when the struct param is marked 'const'.

(My compiler won't create a copy when the parameter is 'const'. I
assumed that was how gcc did it; I was wrong.)

This is for Win64 ABI, however an ABI will only say they are passed by
reference; it will not stipulate making a copy. That is up to the
language implementation.

Ben Bacarisse

2024-07-09 15:58:50 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

This is the sort of thing that bad tutors say to students so that they
never learn C properly. All parameter passing in C is by value. All of
it. You just have to know (a) what the syntax means and (b) what values
get passed.

void F(int a[20]) ... declares a to be of type int *. Feel free to rail
about that as much as you like but that is what that syntax means.

The x in F(x) is converted to a pointer to x[0] since the x is not an
operand of &, sizeof etc. F(x) passes a pointer by value. F receives a
pointer value in a.

Post by bart
Although the type of 'a' inside 'F' will be int* rather than
int(*)[20].

No. a is of type int *.

--
Ben.

bart

2024-07-09 16:29:57 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.

And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

Post by Ben Bacarisse
void F(int a[20]) ... declares a to be of type int *. Feel free to rail
about that as much as you like but that is what that syntax means.
The x in F(x) is converted to a pointer to x[0] since the x is not an
operand of &, sizeof etc. F(x) passes a pointer by value. F receives a
pointer value in a.

Post by bart
Although the type of 'a' inside 'F' will be int* rather than
int(*)[20].

No. a is of type int *.

And that is different 'int*' how, about from having an extra space which
C says is not signigicant in this context?

Ben Bacarisse

2024-07-09 17:22:56 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.
And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

An address value is passed by value. C has only one parameter passing
mechanism. You can spin it as much as you like, but C's parameter
passing is simple to understand, provided learner tune out voices like
yours.

Post by bart
Although the type of 'a' inside 'F' will be int* rather than
int(*)[20].

No. a is of type int *.

And that is different 'int*' how, about from having an extra space which C
says is not signigicant in this context?

Sorry, I missed what you wrote. I don't know why even brought up int
(*)[20] but I thought you were saying that was the type of a.

--
Ben.

BGB

2024-07-09 19:14:33 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.
And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

Though, in ABI specs, this would usually still be termed as "pass by
reference" rather than "pass by pointer" or "pass by address" (and
neither has any relation to "reference" as it might be used in a
language like C++, or IOW passing the address of the pointer holding the
address of the array... which I don't think anyone was arguing for here;
or how it is used in Visual Basic, where it means that function
arguments assigned in the callee are visible to the caller, which is
also different, ...).

Either way, the end result is that a pointer or address of the array
data gets passed in a register or similar.

Would be nice if everything were consistent between the various levels
of abstraction, but it is not.

Say:
The stack that exists in a Stack-IR is unrelated to the CPU stack;
The term 'register' is used in 3AC or SSA to refer to something that is
more closely associated with a language variable than with an actual CPU
register (and may have quirks not seen in either, such as being
assign-once and joined together in phi operators);
...

Post by bart
Although the type of 'a' inside 'F' will be int* rather than int(*)[20].

No. a is of type int *.

And that is different 'int*' how, about from having an extra space which C
says is not signigicant in this context?

Sorry, I missed what you wrote. I don't know why even brought up int
(*)[20] but I thought you were saying that was the type of a.

Ben Bacarisse

2024-07-09 23:15:13 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.
And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

Though, in ABI specs, this would usually still be termed as "pass by
reference"

I am not saying that Bart is alone in getting this wrong. Any document
that that says that C have any parameter passing mechanism other than
pass by value is wrong. But then I doubt any ABI spec says that since
an ABI is, almost by definition, not about the C language.

--
Ben.

BGB

2024-07-10 00:08:06 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.
And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

Though, in ABI specs, this would usually still be termed as "pass by
reference"

True enough.

...

bart

2024-07-09 20:28:49 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.
And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

Little about C's type system is simple. You're doing your students a
disservice if you try and hide all the quirks.

There's a discontinuity in the type system when it comes to parameter
types (T is is non-array typedef; U is an array typedef):

Non-Param Same type as Param

T a T a
T a[10] T *a
T a[10][20] T (*a)[20]

U a U *a
U a[10] U (*a)[?] ? is the length of U's array
U A[10][20] U (*a)[20][?]

Any top-level array type in a parameter (that is, its description in
English starts with 'array of'), is converted to a pointer type (now
starts with 'pointer to'). But the lower levels are not affected.

If you were to try the same exercise in one of my languages, if would be
a smaller table:

Param/Non-Param,
T is array/non-array

T a
T a[10]
T a[10][20]

Post by Ben Bacarisse
Sorry, I missed what you wrote. I don't know why even brought up int
(*)[20] but I thought you were saying that was the type of a.

int(*)[20] would be the type of 'a' inside my function, IF it was
properly passed by value instead of some weird combination.

If I define this function in one of my languages, where '&' means
pass-by-reference:

proc f([20]int32 &A)=
end

and tell it to transpile to C, it produces (minus the name decoration):

static void f(i32 (*a)[20]) {
}

Tim Rentsch

2024-07-09 21:23:47 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.
And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

Little about C's type system is simple. You're doing your students a
disservice if you try and hide all the quirks.

That's what you are doing, only worse.

Ben Bacarisse

2024-07-09 23:35:57 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.
And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

Little about C's type system is simple.

Parameter passing is relatively simple though since there is only one
mechanism -- pass by value.

Post by bart
You're doing your students a
disservice if you try and hide all the quirks.

If. Always with the if. There are lots of things I don't do that would
be doing my students a disservice were I to do them. Beautiful spin!

--
Ben.

Kaz Kylheku

2024-07-09 22:32:45 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.
And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

In C, arrays are not passed to functions, period.

Therefore ABIs do not say anything about array parameters,
(or if they do, it's not in relation to C).

An array can be wrapped in a struct: struct a { int a[42] }.
/That/ can be passed to a function, in the only way that C supports,
which is by value.

At the ABI level, passing of structs may be by address or copy,
and it may be dependent on the size. A small structure an be passed
as a parameter, or couple of parameters, which may be registers.
That's an implementation detail, all corresponding to "pass by value".

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

bart

2024-07-09 23:04:40 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.
And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

In C, arrays are not passed to functions, period.

Arrays can be passed by explicit reference:

void F(int(*A)[20]) {
printf("%zu\n", sizeof(*A)/sizeof((*A)[0])); // shows 20
}

That can be called like this:

int a[20];
F(&a);

Here, an actual array type is involved, which is passed by an actual
pointer to array, not a pointer to its elements.

Post by Kaz Kylheku
Therefore ABIs do not say anything about array parameters,
(or if they do, it's not in relation to C).

That makes a change; most people think that C invented ABIs, which were
created in its image.

The Win64 ABI talks about aggregate types.

Keith Thompson

2024-07-09 23:50:06 UTC

bart <***@freeuk.com> writes:
[...]

Post by bart
void F(int(*A)[20]) {
printf("%zu\n", sizeof(*A)/sizeof((*A)[0])); // shows 20
}
int a[20];
F(&a);

On the language level, that's passing a pointer to an array object.
The pointer itself is passed by value. Passing a pointer to an array
is conceptually no different than passing a pointer to anything else.

C has pass-by-reference in exactly the same way that it has
linked lists. It has neither as a language feature, but both can
be emulated using pointers. And you can't really understand how
C handles arrays if you start by asserting that they're "passed
by reference".

But you just have to make it seem more complicated than it really is.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

bart

2024-07-10 00:38:23 UTC

Post by bart
void F(int(*A)[20]) {
printf("%zu\n", sizeof(*A)/sizeof((*A)[0])); // shows 20
}
int a[20];
F(&a);

I was replying to:

"In C, arrays are not passed to functions, period."

Post by Keith Thompson
C has pass-by-reference in exactly the same way that it has
linked lists. It has neither as a language feature, but both can
be emulated using pointers. And you can't really understand how
C handles arrays if you start by asserting that they're "passed
by reference".

This is how I can pass arrays by reference in my language:

proc F([]int &A) =
A[2]:=777
end

proc main=
static[]int A = (10,20,30)

println A[2] # shows 20
F(A)
println A[2] # shows 777
end

Without the feature I'd need to use 'ref[]int A' and call as F(&A) (the
extra deref inside F is added by the commpiler).

This is how you might do the same thing in C:

void F(int A[]) {
A[1]=777;
}

int main(void) {
int A[] = {10,20,30};

printf("%d\n", A[1]); // shows 20
F(A);
printf("%d\n", A[1]); // shows 777
}

To get the 777 output involves somehow passing the array by reference.

But notice how C gives exactly the same result as my code that used
by-reference, even though:

* C "doesn't pass arrays by reference"
* C's F function uses the same parameter type (only & is missing; maybe
by-ref is implicit...)
* No explicit de-ref is needed inside F
* No explicit address-of is needed when calling F

So C behaves exactly as though it passes arrays by-reference, and yet it
doesn't have pass-by-reference. In fact, C does it without even needing
to be told!

Keith Thompson

2024-07-10 01:18:02 UTC

Post by bart
void F(int(*A)[20]) {
printf("%zu\n", sizeof(*A)/sizeof((*A)[0])); // shows 20
}
int a[20];
F(&a);

"In C, arrays are not passed to functions, period."

Which is a correct statement.

[...]

Post by bart
But notice how C gives exactly the same result as my code that used
* C "doesn't pass arrays by reference"
* C's F function uses the same parameter type (only & is missing; maybe
by-ref is implicit...)
* No explicit de-ref is needed inside F
* No explicit address-of is needed when calling F

Right. The C rules that make all this possible have been explained
to you many times. I won't waste my time explaining them to you
again. If you were interested in learning, you would read section
6 of the comp.lang.c FAQ.

Yes, some of C's rules make it *look like* arrays are passed by
reference.

Post by bart
So C behaves exactly as though it passes arrays by-reference, and yet
it doesn't have pass-by-reference. In fact, C does it without even
needing to be told!

If you actually believed that C has pass-by-reference for arrays, it
would indicate that you don't understand C. But you're only pretending
to believe it.

If C had pass-by-reference for arrays, then presumably you could obtain
the size of an array parameter by applying sizeof to its name, and you
could get the address of an array parameter by applying unary "&" to its
name. I know why that doesn't work. And so do you.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Kaz Kylheku

2024-07-10 03:19:10 UTC

Post by bart
void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}

The end result is that a parameter declared with value-array syntax is
passed using a reference rather than by value.
And it does so because the language says, not because the ABI requires
it. A 2-byte array is also passed by reference.

In C, arrays are not passed to functions, period.

The C term for "explicit reference" is "pointer".

C has neither "pass by pointer" nor "pass by explicit reference"; it has
pointers passed by value.

Pass by reference/pointer looks like this:

void f(var int x)
{
x++:
}

int main(void)
{
int x = 42;
f(x);
// x is now 43
}

That's like a var parameter in Pascal or C++ references (where our
fantasy syntax looks like void f(int &x).

Pass by reference means the callee can assign to the parameter
and the caller's object changes. (Unless it's some kind of constant
reference or something.)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

David Brown

2024-07-09 16:31:43 UTC

I don't think it is helpful that you keep mixing /logical/ terms with
/implementation/ terms.
In C, there is no "pass by reference" or "return by reference". It is
all done by value.

void F(int a[20]) {}
int main(void) {
int x[20];
F(x);
}
Although the type of 'a' inside 'F' will be int* rather than int(*)[20].

Arrays are not passed by reference in C. When you use the array in most
expression contexts, including as an argument to a function call, the
array expression is converted to a pointer to its first element, and
that pointer is passed by value.

That's why the array type information is lost in the call, and you get a
pointer in the function - /not/ a reference to an array.

Post by David Brown
struct small { uint16_t a; uint16_t b; };
struct big { uint32_t xs[10]; };
struct small foos(struct small y);
struct big foob(struct big y);
     extern uint32_t foos(uint32_t ab);
     uint32_t _1 = foos(y.a << 16) | (y.b);
     struct small x = { _1 >> 16, _1 & 0xffff };
     extern void foob(struct big * ret, const struct big * xs);
     struct big x;
     foob(&x, &y);

From what I've seen, structs that are not small enough to be passed in
registers, are copied to a temporary, and the address of that temporary
is passed.

That will depend on the details of the compiler, optimisation, and what
happens to the array after the call. But yes, that is certainly
something that is done.

This seems to be the case even when the struct param is marked 'const'.

"const" in C is not strong enough to guarantee that things will not be
changed in this context. If you have a pointer to non-const data,
convert it to a pointer to const, and then later convert it back to a
pointer to non-const, you can use that to change the data. Thus using a
pointer to const does not let the compiler be sure that the data cannot
be changed by the function - so if the struct/array is used again later,
and its value must be preserved, the compiler needs to make a copy.

(I'd like it if there were a way to have such guarantees, but C is what
C is.)

(My compiler won't create a copy when the parameter is 'const'. I
assumed that was how gcc did it; I was wrong.)

Your compiler is wrong. But if you only give it code where the const
pointer is never converted to a non-const pointer, you'll be safe.

This is for Win64 ABI, however an ABI will only say they are passed by
reference; it will not stipulate making a copy. That is up to the
language implementation.

Keith Thompson

2024-07-09 20:05:34 UTC

[...]

Post by David Brown
In C, there is no "pass by reference" or "return by reference". It
is all done by value.

No, they are not.

Certain language features make it appear as if arrays are passed by
reference, but of course what's actually happening is that a pointer
(the address of the array's initial element) is passed by value.

Yes, C's treatment of arrays and pointers is confusing. You seem
to be determined to ensure that people continue to be confused.

Anyone who hasn't already done so should read section 6 of the
comp.lang.c FAQ, <https://www.c-faq.com/>.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Michael S

2024-07-09 15:39:13 UTC

On Tue, 9 Jul 2024 16:31:30 +0200

Post by James Kuyper
...
I see no point in having implicit pointers, but I don't believe
that they are prohibited.

They mostly exist in a "sort of simpler to implement the compiler
this way" sense.
In the implicit pointer case, the compiler just treats it as-if it
were an explicit pointer. In this case, both are basically treated
as being roughly equivalent at the IR levels.
And, most of the code-generation stage doesn't need separate
handling for arrays and pointers, but can use combined
"ArrayOrPointer" handling or similar.
It had all seemed "obvious enough".
Pass by reference is easy to implement;
In place copying and decomposing into registers, kinda bad.
Though, this one seems to be a common point of divergence between
"SysV" and "Microsoft" ABIs. Sometimes a target will have an ABI
defined, and the MS version was almost the same, just typically
differing in that it passes structs by reference and provides a
spill space for register arguments.

I don't think it is helpful that you keep mixing /logical/ terms with
/implementation/ terms.
In C, there is no "pass by reference" or "return by reference". It
is all done by value. Even when you use pointer arguments or return
types, you are passing or returning pointer values. C programmers
use pointers to get the effect of passing by reference, but in C you
use pointers to be explicit about references.
Structs in C are passed by value, and returned by value. Not by reference.
The C standards don't say how passing structs around by value is to
be implemented - that is hidden from the programmer. Usually ABI's
(which are also hidden from the programmer) specify the
implementation details, but some ABI's are weak in that area.
Generally, structs up to a certain size or complexity are passed in
registers while bigger or more advanced types are passed via
addresses (pointing to stack areas) in registers or the stack, just
like any other bigger types. This is not "pass by reference" as far
as the C programming is concerned - but you could well call it that
at the assembly level. Where the line between "passing in registers"
and "passing via addresses to space on the stack" is drawn, is
entirely up to the compiler implementation and any ABI requirements.
Some simpler compilers will pass all structs via addresses, no matter
how simple they are, while others will aim to use registers whenever
possible.
struct small { uint16_t a; uint16_t b; };
struct big { uint32_t xs[10]; };
struct small foos(struct small y);
struct big foob(struct big y);
extern uint32_t foos(uint32_t ab);
uint32_t _1 = foos(y.a << 16) | (y.b);
struct small x = { _1 >> 16, _1 & 0xffff };
extern void foob(struct big * ret, const struct big * xs);
struct big x;
foob(&x, &y);

It depends on the ABI. Newer ABIs tend to work like that. Older ABIs
often copy the [big] structure to parameters area on callie's stack.
With 16 or more registers I see no obvious performance advantage to
either method. The preference of newer ABIs appears to be a matter of
fashion.

Post by David Brown
This is not, as you wrote somewhere, something peculiar to MSVC - it
is the technique used by virtually every C compiler, except perhaps
for outdated brain-dead 8-bit microcontrollers that have difficulty
handling data on a stack.
And it is not really "pass by reference" or "implicit pointers", it
is just passing addresses around behind the scenes in the
implementation.

Scott Lurndal

2024-07-09 16:20:59 UTC

Post by Michael S
On Tue, 9 Jul 2024 16:31:30 +0200

Post by David Brown
=20
struct small { uint16_t a; uint16_t b; };
struct big { uint32_t xs[10]; };
=20
struct small foos(struct small y);
struct big foob(struct big y);
=20
Then compilers will typically implement "x =3D foos(y)" as though it
=20
extern uint32_t foos(uint32_t ab);
uint32_t _1 =3D foos(y.a << 16) | (y.b);
struct small x =3D { _1 >> 16, _1 & 0xffff };
=20
=20
extern void foob(struct big * ret, const struct big * xs);
struct big x;
foob(&x, &y);

It also depends on whether the callee modifies the structure. From
the System V i386 processor Specific ABI (dated 1997):

As described in the data representation section, structures and unions can have
byte, halfword, or word alignment, depending on the constituents. An
argument's size is increased, if necessary, to make it a multiple of words. This
may require tail padding, depending on the size of the argument. To ensure that
data in the stack is properly aligned, the stack pointer should always point to a
word boundary. Structure and union arguments are pushed onto the stack in the
same manner as integral arguments, described above. This provides call-by-value
semantics, letting the called function modify its arguments without affecting the
calling function's object.

BGB

2024-07-09 18:55:51 UTC

Post by James Kuyper
...
I see no point in having implicit pointers, but I don't believe that
they are prohibited.

They mostly exist in a "sort of simpler to implement the compiler this
way" sense.
In the implicit pointer case, the compiler just treats it as-if it
were an explicit pointer. In this case, both are basically treated as
being roughly equivalent at the IR levels.
And, most of the code-generation stage doesn't need separate handling
for arrays and pointers, but can use combined "ArrayOrPointer"
handling or similar.
It had all seemed "obvious enough".
Pass by reference is easy to implement;
In place copying and decomposing into registers, kinda bad.
Though, this one seems to be a common point of divergence between
"SysV" and "Microsoft" ABIs. Sometimes a target will have an ABI
defined, and the MS version was almost the same, just typically
differing in that it passes structs by reference and provides a spill
space for register arguments.

The pass by reference, in this context, was referring to the ABI, not to
C itself.

It looks from C's POV as-if it were by-value.

Which it is, depends on if one is looking at things at the language
level, ABI level, or IR level, ...

Post by David Brown
Structs in C are passed by value, and returned by value. Not by reference.

Granted, by-value is how it looks from within the language.

But, saying that it is by-value in the ABI inaccurate if it is
implemented by passing the address of the structure to the called
function (and then making local copies as needed).

Post by David Brown
The C standards don't say how passing structs around by value is to be
implemented - that is hidden from the programmer. Usually ABI's (which
are also hidden from the programmer) specify the implementation details,
but some ABI's are weak in that area. Generally, structs up to a
certain size or complexity are passed in registers while bigger or more
advanced types are passed via addresses (pointing to stack areas) in
registers or the stack, just like any other bigger types. This is not
"pass by reference" as far as the C programming is concerned - but you
could well call it that at the assembly level. Where the line between
"passing in registers" and "passing via addresses to space on the stack"
is drawn, is entirely up to the compiler implementation and any ABI
requirements. Some simpler compilers will pass all structs via
addresses, no matter how simple they are, while others will aim to use
registers whenever possible.
struct small { uint16_t a; uint16_t b; };
struct big { uint32_t xs[10]; };
struct small foos(struct small y);
struct big foob(struct big y);
    extern uint32_t foos(uint32_t ab);
    uint32_t _1 = foos(y.a << 16) | (y.b);
    struct small x = { _1 >> 16, _1 & 0xffff };
    extern void foob(struct big * ret, const struct big * xs);
    struct big x;
    foob(&x, &y);
This is not, as you wrote somewhere, something peculiar to MSVC - it is
the technique used by virtually every C compiler, except perhaps for
outdated brain-dead 8-bit microcontrollers that have difficulty handling
data on a stack.
And it is not really "pass by reference" or "implicit pointers", it is
just passing addresses around behind the scenes in the implementation.

In the SysV AMD64 ABI, and original SuperH ABI, struct passing is done as:
Decompose each struct member into registers as-if each field were passed
as its own function argument, putting the whole rest of the struct on
the stack if it doesn't fit in registers;
Returning a structure was done using the same area as used for passing
memory arguments.

This is both needlessly complicated and inefficient (as well as
requiring that the stack be big enough to accommodate any struct which
may be conceivably passed or returned by-value).

The RISC-V ABI is similar, except that IIRC the idea is that the struct
is passed and returned by copying it as a chunk of memory (without
otherwise decomposing or repacking it to match argument-list layout).

Contrast, Win64 ABI and the WinCE SH4 ABI:
Pass in a register if it fits, else pass a reference;
On a function call, a pointer is passed to the callee to accept the
returned structure on function return (may need to point to a dummy area
if no return location was given in the function call), if the structure
is too big to fit in a register.

On GCC for SH-4, IIRC it would use either a modified form of the Hitachi
ABI or WinCE ABI depending on whether it was configured for ELF or PE/COFF.

My ABI generally also works in this way, though uses register pairs (for
passing/returning 128-bit types and 16-byte structures).

So:
1-8 bytes: 1 register
9-16 byes: 2 registers
17+ bytes: Reference

James Kuyper

2024-07-09 20:22:41 UTC

On 7/9/24 14:55, BGB wrote:
...

Post by BGB
The pass by reference, in this context, was referring to the ABI, not
to C itself.
It looks from C's POV as-if it were by-value.
Which it is, depends on if one is looking at things at the language
level, ABI level, or IR level, ...

The C standard doesn't explicitly specify pass by value, or pass by
reference, or anything other passing mechanism. What it does say is what
a programmer needs to know to use the passing mechanism. It says that
the value of a function parameter that is seen by the code inside that
function is a copy of the value passed as an argument to the function.
The copy can be modified without changing the original. When a C
function's declaration looks as though it takes an array as an argument,
what that declaration actually means is that it takes a pointer value as
an argument, and it is a copy of that pointer's value which is seen
inside the function, and can be modified. The memory it points at is the
same as the memory pointed at by the corresponding argument.

BGB

2024-07-09 21:43:02 UTC

We can probably agree that, in C:
typedef struct Foo_s Foo;
struct Foo_s {
int x, y, z, a, b, c;
};

int FooFunc(Foo obj)
{
obj.z = obj.x + obj.y;
return(obj.z);
}

int main()
{
Foo obj;
int z1;
obj.x=3;
obj.y=4;
obj.z=0;
z1=FooFunc(obj);
printf("%d %d\n", obj.z, z1);
}

Should print "0 7" regardless of how the structure is passed in the ABI.

Though, one possibility being to relax the language such that both "0 7"
and "7 7" are valid possibilities (the latter potentially allowing more
performance by not needing to make a temporary local copy). Though,
AFAIK, C doesn't really allow for this.

An implementation could be clever though and only make local copies in
cases where the structure is modified by the callee, as in the example
above.

Tim Rentsch

2024-07-09 02:01:33 UTC

Post by Keith Thompson
No, there is no implicitly defined pointer.

...

Post by BGB
This implicit pointer need not exist at a location in memory...

Which is why C doesn't give you access to it's location in memory -
something you complained about earlier.

Not to muddy the waters, but there are circumstances where
it is advantageous to put an array "somewhere else" and
simply keep a pointer to it locally.

Having said that, it must also be said that such pointers
are purely implementation details and have no place in
the abstract machine. As far as the abstract machine is
concerned, an array is just an array, and not an array
and a pointer.

Lawrence D'Oliveiro

2024-07-10 04:29:59 UTC

... in the 1D case, an array can be seen as functionally an
implicitly defined pointer ...

No, there is no implicitly defined pointer.
...
If you evaluate the expression `array_object` in most contexts, it's
implicitly converted to a pointer *value*, pointing to the 0th element
of the array object. There is still no implicit pointer object.

The OP said “pointer”, not “pointer object” or “pointer value”. Not sure
what hair you are trying to split here.

James Kuyper

2024-07-06 18:28:54 UTC

Post by Keith Thompson
No, arrays are not pointers.

Except array indexing is designed to be indistinguishable from pointer
arithmetic.

Actually, C doesn't have array indexing; it only has pointer indexing.
The subscript operator requires that one of it's operands shall have the
type "pointer to a complete object type", and that the other shall have
integer type. It cannot be applied to arrays; but conveniently, the
standard mandates that:

"Except when it is the operand of the sizeof operator, or typeof
operators, or the unary & operator, or is a string literal used to
initialize an array, an expression that has type "array of type" is
converted to an expression with type "pointer to type" that points to
the initial element of the array object ..." (6.3.2.1p3).

It is that conversion which creates the illusion of array indexing, but
since it's been converted to a pointer, it is actually pointer indexing.

The key point is that an expression of array type does not always get
converted into a pointer to the first element of that array. The clause
above starts out with four exceptions, and an array behaves quite
differently from a pointer when any of those exceptions apply.

bart

2024-07-06 18:53:56 UTC

Post by Keith Thompson
No, arrays are not pointers.

Except array indexing is designed to be indistinguishable from pointer
arithmetic.

This is really, really pedantic. Even gcc doesn't get it right in that
case, because if I try and compile this:

int a, b
a[b];

it says:

error: subscripted value is neither array nor pointer nor vector

'Subscripting' I think we can agree is the same thing as 'indexing':
what those funny square brackets do.

Post by James Kuyper
It is that conversion which creates the illusion of array indexing, but
since it's been converted to a pointer, it is actually pointer indexing.

Isn't that how all languages work 'under the hood'? The net affect from
the user's point of view is that you have an array object, and use
'indexing' to access its individual elements.

Lawrence D'Oliveiro

2024-07-10 04:27:48 UTC

... an expression that has type "array of type" is
converted to an expression with type "pointer to type" that points to
the initial element of the array object ..." (6.3.2.1p3).

This is really, really pedantic. Even gcc doesn't get it right in that
int a, b

Missing a semicolon?

Post by bart
a[b];
error: subscripted value is neither array nor pointer nor vector

Which is correct, since “a” is not an array of anything.

Post by bart
what those funny square brackets do.

It’s from mathematical usage, where selection of one from a sequence of
values is done with an actual subscript.

Keith Thompson

2024-07-06 22:45:40 UTC

James Kuyper <***@alumni.caltech.edu> writes:
[...]

Post by James Kuyper
The key point is that an expression of array type does not always get
converted into a pointer to the first element of that array. The clause
above starts out with four exceptions, and an array behaves quite
differently from a pointer when any of those exceptions apply.

There are three exceptions, not four.

The N1570 draft of C11 incorrectly says:

Except when it is the operand of the sizeof operator, the
_Alignof operator, or the unary & operator, or is a string
literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression
with type ‘‘pointer to type’’ that points to the initial
element of the array object and is not an lvalue. If the array
object has register storage class, the behavior is undefined.

There error is that the _Alignof operator (which was added in C11) can
only be applied to a parenthesized type name, not to an expression (I'm
not sure why that is). The C11 standard corrects the error, showing
just the usual three exceptions.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

James Kuyper

2024-07-07 01:18:18 UTC

There are three exceptions, not four.

I was quoting from n3096.pdf, dated April 1, 2023. It says:

"Except when it"
1. "is the operand of the sizeof operator,"
2. "or typeof operators,"
3. "or the unary & operator,"
4. "or is a string literal used to initialize an array, ..."

What am I miscounting?

Keith Thompson

2024-07-07 01:35:56 UTC

There are three exceptions, not four.

"Except when it"
1. "is the operand of the sizeof operator,"
2. "or typeof operators,"
3. "or the unary & operator,"
4. "or is a string literal used to initialize an array, ..."
What am I miscounting?

Nothing; I forgot that C23 adds "typeof".

In C17 and earlier, there are three exceptions. Starting in C23, there
are four. (It seems odd to refer to typeof as an "operator", but that's
another topic.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson

2024-07-06 22:23:47 UTC

Post by Keith Thompson
No, arrays are not pointers.

Except array indexing is designed to be indistinguishable from pointer
arithmetic.

No, arrays are not pointers.

Please read section 6 of the comp.lang.c FAQ, <https://www.c-faq.com/>.

C's treatment of arrays and pointers is a major point of confusion.
Please don't make it worse.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Lawrence D'Oliveiro

2024-07-06 05:42:19 UTC

(Ada, for example, has both arrays of arrays and multidimensional arrays
as distinct language features, using a(i)(j) to index the former and
a(i,j) to index the latter.)

And also the option for a lower bound other than zero.

bart

2024-07-05 13:31:44 UTC

Post by Lawrence D'Oliveiro
It’s called “Rust”.

Not to a bigger language, but to a more narrowly defined language.
Basically, to try to distill what C does well, keeping its core essence
intact.

That would just be rearranging the deck chairs I think.

Whatever the failings of C, it is so entrenched everywhere that it is
almost impossible to dislodge. Especially if you take out all the the
features that people apparently find indispensible.

There are anyway plenty of big-name contenders which change a LOT more,
and are full of modern, higher-level features that people seem to want.

C also is the only language that is supposed to work on any kind of
processor; most of the others require a machine which has 8/16/32/64-bit
types and have little interest in unusual targets.

(Actually, the nearest language to C that I have seen, is mine. It stays
at about that level, but it has modern conveniences such as a module
scheme and proper name-spaces. However, it is a private language, and
also 1-based, case-insensitive and with non-brace syntax! It is not a
contender.)

*1: While not exactly that rare, and can be useful, it is debatable if
they add enough to really justify their complexity and relative semantic
fragility.

C only has 1D arrays, but array elements can themselves be array types.

The only complication, a big one, is that modern C allows a variable
length for those array elements (so not known at a compile-time). This
is tied in with VLAs and 'variable modified types' or whatever the
feature is called.

This applies when the multi-array data is a single block of memory.

If using pointers, one almost invariably needs to fall back
to doing "arr[y*N+x]"

I've never had to do that, even if N was a variable. If using Iliffe
vectors then C allows you do regular indexing even with runtime bounds.

Note that multidimensional indexing via multiple levels of pointer
indirection would not be effected by this.

These are Iliffe vectors.

Similarly, structs may not be declared at the point of use, but only as
types.

This is how it works on my language; they must be a named user-type.
It's crazy how C allows them just anywhere, even in useless declarations
like this:

struct {int x,y;};

Though, would want to do a few things differently from my current IR to
be able to reduce memory footprint; as my current IR was designed in
such a way that it is effectively necessary to load in the whole program
before code-generation can be done. Ideally one would want the compiler
to be able to read-in the IR as-needed and discard the parts it is
already done with (but, existing efforts to redesign the IR stage here
have tended to fizzle; would effectively need to redesign the compiler
backend to be able to make effective use of it).
For a new compiler, could make sense to try to "do it right this time"
(though, not too much of an issue if running the compiler on a modern
PC, as they are "not exactly RAM constrained" in this area; so reading
in and decoding the IR and symbol tables for an entire executable image
at the same time, is not too much of an issue).

It's not an issue at all. My main compiler is a whole-program one; the
entire source code for my biggest program occupies 1/8000th of the
memory of my PC. That would be like building an 8-byte source file on my
first computer.

Presumably you are running on some restricted hardware of your own?

If pulled of well, such a module system could be both faster and require
less memory use in the compiler if compared with headers

Headers are a bottleneck in C. 50 modules all including the same huge
header (say the 1000 #includes involved with #include <gtk<>, which is
for GTK2), would involve repeating all that work 50 times.

Even with unity builds, build times can still get annoying for bigger
programs. And here, I am talking like 20 seconds to rebuild a 250kLOC
program.

In my language, a 250Kloc app would take about half a second to build
into an executable.

My C compiler is slower since it uses a discrete ASM stage.

(It will build sql.c which is actually 250Kloc including headers, in
half a second, generating a 1MB EXE, however that program is 40%
comments, and big chunks are conditional blocks. Preprocessing it
generates 85Kloc.)

Granted, even if parsing is fast, this still leaves the
challenge of fast/efficient machine-code generation.

Unoptimised code in my case (for the way I write code, for my language,
and for my apps) is about 50% slower than when passed through C and gcc-O3.

BGB

2024-07-05 15:48:32 UTC

Post by Lawrence D'Oliveiro
It’s called “Rust”.

Not to a bigger language, but to a more narrowly defined language.
Basically, to try to distill what C does well, keeping its core
essence intact.

For what I want, at this point, is not a whole lot of high-level features.

Post by bart
C also is the only language that is supposed to work on any kind of
processor; most of the others require a machine which has 8/16/32/64-bit
types and have little interest in unusual targets.

Yeah.

In this case, I would assume limiting it to 32 and 64 bit machines.

My hobby ISA on FPGAs is 64-bit, albeit can use 32 or 64 bit pointers.
I mostly settled on using 64-bit pointers with 48-bit address space and
16-bits for tag metadata (implicitly, the high 16 bits is ignored by
normal Load/Store operations).

Though, the use of pointer-tagging is not entirely invisible to C,
though, unless bounds-checking is enabled, the high 16 bits are
typically zeroed for C data pointers (they are used for function
pointers and link-register values though, to encode the intended
operating mode for the CPU and similar).

Post by bart
(Actually, the nearest language to C that I have seen, is mine. It stays
at about that level, but it has modern conveniences such as a module
scheme and proper name-spaces. However, it is a private language, and
also 1-based, case-insensitive and with non-brace syntax! It is not a
contender.)

I would assume something that wouldn't be a massive pain for copy/paste
translation.

*1: While not exactly that rare, and can be useful, it is debatable if
they add enough to really justify their complexity and relative
semantic fragility.

C only has 1D arrays, but array elements can themselves be array types.
The only complication, a big one, is that modern C allows a variable
length for those array elements (so not known at a compile-time). This
is tied in with VLAs and 'variable modified types' or whatever the
feature is called.
This applies when the multi-array data is a single block of memory.

At least one major C compiler has rejected VLAs entirely last I heard.

I added them experimentally, but it doesn't fully work.

I also made my compiler fall back to using them for arrays which are too
big (over 16K, for sake of a 128K default stack size).

In this case, they are turned into heap allocations with automatic
freeing on function return, but seemingly it is buggy and is currently
prone to data corruption and memory leaks.

My attempt to port Quake3 ran into issues here as it regularly tried to
use arrays over the size limit (though, some comments in the code
implied that OSX did something similar, albeit with a 32K size limit;
some functions though tried to use 64K or 128K local arrays, which isn't
really going to fly short of boosting the stack size to 512K or 1MB).

Choice of stack size being:
32K: Generally only small programs will fit;
64K: Many programs fit, but some don't.
Say, Doom can fit onto a 64K stack, Quake not so much.
128K: Stuff generally fits (*1).

*1: Though, GLQuake and Quake3 required moving a lot of former stack
arrays onto the heap.

Using larger stacks is undesirable though if one wants to support NOMMU
operation, since if a program allocates 1MB and only uses 90K or so,
then the remaining 934K are entirely wasted.

Generally, recursion and register save/restore by itself isn't enough to
threaten a 128K stack, but large structs and arrays will do so readily.

Luckily, an ABI based around pass-by-reference has little issue with
internally turning large structs into heap allocations.

Though, having a compiler which turns large arrays or large structs into
a compiler error (vs giving a warning and turning them into heap
allocs), probably wouldn't fly to well.

Though, in the attempted Quake3 port, did end up turning some large
global structs into heap allocations as well, mostly as they caused the
".bss" section to become so large that the PE loading was failing (at
the moment stuff appears to break if the total size of the PE image goes
over 8MB).

Then again, there is a limit at present (in my compiler) that the
handling of relocs (in the PE export code) can't express sections larger
than 8MB (the remaining bits of a 32-bit value are used to encode the
reloc type and section number). It is possible I was hitting this limit
(it was a TODO feature to expand this part of the code to use a 64-bit
format for intermediate reloc handling).

If using pointers, one almost invariably needs to fall back to doing
"arr[y*N+x]"

I've never had to do that, even if N was a variable. If using Iliffe
vectors then C allows you do regular indexing even with runtime bounds.

Note that multidimensional indexing via multiple levels of pointer
indirection would not be effected by this.

These are Iliffe vectors.

Pros/cons.
For many uses, one still wants a contiguous buffer.

For some uses (typically limited to square power-of-2 arrays), I have
used Morton order, but I also ended up adding special CPU instructions
to deal with Morton order arrays.

Similarly, structs may not be declared at the point of use, but only
as types.

This is how it works on my language; they must be a named user-type.
It's crazy how C allows them just anywhere, even in useless declarations
struct {int x,y;};

Yeah.

As-is the parser needs to lift them out and put them in their own
special part of the AST.

Ideally, one doesn't want this.

Ideally also one doesn't want to need to parse a crapton of prototypes
and structs/typedefs/etc for every translation unit either.

Though, would want to do a few things differently from my current IR
to be able to reduce memory footprint; as my current IR was designed
in such a way that it is effectively necessary to load in the whole
program before code-generation can be done. Ideally one would want the
compiler to be able to read-in the IR as-needed and discard the parts
it is already done with (but, existing efforts to redesign the IR
stage here have tended to fizzle; would effectively need to redesign
the compiler backend to be able to make effective use of it).
For a new compiler, could make sense to try to "do it right this time"
(though, not too much of an issue if running the compiler on a modern
PC, as they are "not exactly RAM constrained" in this area; so reading
in and decoding the IR and symbol tables for an entire executable
image at the same time, is not too much of an issue).

It's not an issue at all. My main compiler is a whole-program one; the
entire source code for my biggest program occupies 1/8000th of the
memory of my PC. That would be like building an 8-byte source file on my
first computer.
Presumably you are running on some restricted hardware of your own?

Ideally, I would want a compiler that I could run on my ISA, without
needing to provide like 512MB or 1GB of swap space to do so...

One of the main FPGA boards I am targeting has 128MB of RAM, and can
generally manage a CPU core running at 50MHz.

If pulled of well, such a module system could be both faster and
require less memory use in the compiler if compared with headers

Headers are a bottleneck in C. 50 modules all including the same huge
header (say the 1000 #includes involved with #include <gtk<>, which is
for GTK2), would involve repeating all that work 50 times.

Hence, unity builds...

The performance gains of compiling, say, 50-100 KLOC all in one
translation unit without needing to duplicate a bunch of header parsing,
more than makes up for any loss due to inability to run compiler
instances in parallel (and if one does have multiple compiler instances,
a decent sized program can still be reduced down to a matter of seconds
of build time).

Even with unity builds, build times can still get annoying for bigger
programs. And here, I am talking like 20 seconds to rebuild a 250kLOC
program.

In my language, a 250Kloc app would take about half a second to build
into an executable.
My C compiler is slower since it uses a discrete ASM stage.
(It will build sql.c which is actually 250Kloc including headers, in
half a second, generating a 1MB EXE, however that program is 40%
comments, and big chunks are conditional blocks. Preprocessing it
generates 85Kloc.)

Granted, even if parsing is fast, this still leaves the challenge of
fast/efficient machine-code generation.

Unoptimised code in my case (for the way I write code, for my language,
and for my apps) is about 50% slower than when passed through C and gcc-O3.

On my ISA, it is slower.

Though, ironically, despite my compiler's general level of suck in these
areas (vs GCC), in terms of performance it is beating out "GCC -O3"
targeting RISC-V.

It was pretty close with static-linked non-relocatable images in RV64G.
Though, with PIE images, it has turned into "no contest" (PIC/PIE seems
to kinda ruin both code density and performance for RISC-V).

Where, I have a CPU design that can run both my own ISA and RV64G
(technically, at the same time; such as a kernel in my own ISA but
programs compiled as RV64G)

This is despite also my compiler using an ABI designed around the
assumption of NOMMU operation (similar to FDPIC in this sense). Where,
essentially, function calling generally involves a magic ritual to
reload the Global Pointer to make sure it points to the correct location
for whichever image is currently executing code.

Though, I suspect much of this is because RISC-V takes a fairly steep
hit to performance due to a few issues:
Lack of register-indexed memory addressing;
No good way to deal with constant/immediate values larger than 12 bits.

Say, a 32-bit constant typically requires a 2-instruction sequence to
load it into a register (though, GCC seems to prefer turning all of
these constants into memory loads rather than use LUI+ADD or similar);
and 64 bit always gets turned into a memory load.

There are a lot of other ISA differences, but I suspect these are likely
the big ones.

...

Technically, could have tried porting Quake3 to build for RV64G instead
of my own ISA, but then I would need to go implement loader support for
ELF Shared-Objects in my "OS", which I don't want to deal with at the
moment.

Though, I am using a software OpenGL rasterizer, and past attempts to
build the OpenGL rasterizer in RISC-V mode resulted in performance that
could best be described as "glacial" on a 50MHz CPU (32 GPRs, no SIMD,
and no RGB helper instructions, really hurt at this part).

Though, this testing was generally with GLQuake (but, wouldn't expect
Quake3Arena to be much different here). Though, I still kinda have
doubts that Quake3 is going to give particularly playable performance on
a 50MHz CPU with software rasterized OpenGL (if/when I get it fully
working).

...

Lawrence D'Oliveiro

2024-07-06 01:38:35 UTC

Post by bart
C also is the only language that is supposed to work on any kind of
processor ...

I don’t think there is anything innate in the design of C to ensure that.
It was simply its popularity that meant it was usually the first language
implemented on a new processor.

For example, C assumes byte addressability. So that causes awkwardness on
architectures like the PDP-10, for example. It just so happened such
architectures became extinct at about the time the rise of 8-bit
microprocessors (and their more advanced successors) made byte-
addressability essentially universal.

Keith Thompson

2024-07-06 02:00:58 UTC

Post by bart
C also is the only language that is supposed to work on any kind of
processor ...

I don’t think there is anything innate in the design of C to ensure that.
It was simply its popularity that meant it was usually the first language
implemented on a new processor.
For example, C assumes byte addressability. So that causes awkwardness on
architectures like the PDP-10, for example. It just so happened such
architectures became extinct at about the time the rise of 8-bit
microprocessors (and their more advanced successors) made byte-
addressability essentially universal.

C assumes byte addressibility, but it doesn't assume that bytes are 8
bits.

The PDP-10 had 36-bit words and could operate on bit fields of any size
from 1 to 36 bits. A conforming PDP-10 C compiler might have
CHAR_BIT==9, for example. But yes, it would be awkward to deal with
arrays of 6-bit quantities.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Janis Papanagnou

2024-07-06 02:36:52 UTC

Post by Keith Thompson
The PDP-10 had 36-bit words and could operate on bit fields of any size
from 1 to 36 bits. A conforming PDP-10 C compiler might have
CHAR_BIT==9, for example. But yes, it would be awkward to deal with
arrays of 6-bit quantities.

ISTR that the old CDCs 175/176 (with NOS) had 60 bit words and stored
6-bit based Pascal character strings (with array length 10) in a word.
Don't recall how it had handled other data types from other languages
or longer strings (if these were possible [in their Pascal dialect]).

Janis

Lawrence D'Oliveiro

2024-07-06 07:25:06 UTC

Post by Keith Thompson
C assumes byte addressibility, but it doesn't assume that bytes are 8
bits.
The PDP-10 had 36-bit words and could operate on bit fields of any size
from 1 to 36 bits.

But it couldn’t address them.

Ben Bacarisse

2024-07-06 22:24:17 UTC

Post by Keith Thompson
C assumes byte addressibility, but it doesn't assume that bytes are 8
bits.
The PDP-10 had 36-bit words and could operate on bit fields of any size
from 1 to 36 bits.

But it couldn’t address them.

And C does not assume that the hardware can address bytes. On word
addressed machines, some pointer conversions generate code.

--
Ben.

Keith Thompson

2024-07-06 22:55:25 UTC

Post by Keith Thompson
C assumes byte addressibility, but it doesn't assume that bytes are 8
bits.
The PDP-10 had 36-bit words and could operate on bit fields of any size
from 1 to 36 bits.

But it couldn’t address them.

There was a project to add PDP-10 support to GCC. It had CHAR_BIT==9.
I don't know any of the details.

http://pdp10.nocrew.org/gcc/

Cray vector systems like the T90 have (had?) 64-bit words as the
smallest addressible unit. The C compilers emulated 8-bit byte
addressing in software.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

James Kuyper

2024-07-07 01:34:29 UTC

Post by Keith Thompson
C assumes byte addressibility, but it doesn't assume that bytes are 8
bits.
The PDP-10 had 36-bit words and could operate on bit fields of any size
from 1 to 36 bits.

But it couldn’t address them.

It doesn't matter whether there's hardware support for addressing bytes
- byte addressing can be emulated in software on any platform
sufficiently powerful to implement C's bitwise operators. To read a byte
on a word-addressed machine where the word size is multiple bytes, just
read in the word, then extract the bits that represent that particular
byte. To write a byte on such a machine, read in the current contents of
that word, replace the bits that represent that byte with their new
values, and write the entire word back to memory.

On many platforms, if _Alignof(type) is less than the word size, then a
C pointer to that type is implemented as the combination of the machine
address of the correct word, combined with an offset within that word of
the first byte of that object. The existence of real world
implementations that did this is the main reason that the C standard
does not require that all pointers have the same representation.

This may be inefficient, maybe sufficiently so to be a reason for
avoiding using C on such a platform, but it doesn't (and hasn't) prevent
the existence of a fully conforming implementation of C on that platform.

Lawrence D'Oliveiro

2024-07-10 00:57:37 UTC

Post by James Kuyper
On many platforms, if _Alignof(type) is less than the word size, then a
C pointer to that type is implemented as the combination of the machine
address of the correct word, combined with an offset within that word of
the first byte of that object.

Which is a terrific idea, except it cannot be carried to its logical
conclusion (addressing of arbitrarily-aligned dynamically-defined
bitfields) because of the requirement in the C spec that the size of a
“byte” be at least 8 bits.

Tim Rentsch

2024-07-07 03:15:41 UTC

Post by Keith Thompson
C assumes byte addressibility, but it doesn't assume that bytes are 8
bits.
The PDP-10 had 36-bit words and could operate on bit fields of any size
from 1 to 36 bits.

But it couldn't address them.

Actually it could. There was a special form of hardware pointer
designed specifically to address sub-word bitfields.

Janis Papanagnou

2024-07-06 02:29:57 UTC

Post by bart
C also is the only language that is supposed to work on any kind of
processor ...

Where did you get that idea from?

I mean, compiler construction differentiates between various stages,
and the code generation part can be defined for any processor. Why
should any language exclude processors because of a language type?

The reasons why compilers were restricted or targeted to specific
computers were quite mundane; e.g. because it were the available
machines at the company or institute, or because they were funded by
system manufacturers, and other practical, commercial, or political
reasons.

But there's another important spin; using "C" as a universal sort
of assembler. So that nowadays you don't need to generate machine
code any more but can create "C" code instead, using an additional
compile step to get machine code.

Post by Lawrence D'Oliveiro
I don’t think there is anything innate in the design of C to ensure that.
It was simply its popularity that meant it was usually the first language
implemented on a new processor.
For example, C assumes byte addressability. So that causes awkwardness on
architectures like the PDP-10, for example. It just so happened such
architectures became extinct at about the time the rise of 8-bit
microprocessors (and their more advanced successors) made byte-
addressability essentially universal.

I'm not sure I correctly understand you here.

"Byte addressability" is a bit vague a term given that there was no
clear definition of a "byte". (That's a reason why they introduced
the term "octet" later.)

And the "C" data types on a, say, Honeywell 6000 was based on 9 bit
entities (char: 9 bit, int: 36 bit, etc.). Okay, that is a legacy
topic. (You can read about that even in the K&R "C" book.)

Janis

James Kuyper

2024-07-06 05:46:27 UTC

Post by bart
C also is the only language that is supposed to work on any kind of
processor ...

I don’t think there is anything innate in the design of C to ensure that.

It is, however, a deliberate design goal of the C committee to make the
language widely implementable. Many kinds of unspecified and undefined
behavior are unspecified or undefined in order to allow unusual
architectures to have a fully conforming implementation of C - a fact
which drives bart crazy(er).

Post by Lawrence D'Oliveiro
It was simply its popularity that meant it was usually the first language
implemented on a new processor.
For example, C assumes byte addressability. ...

True. However, C is widely implementable because of the fact that it
defines a "byte" as containing CHAR_BIT bits, and makes the value of
CHAR_BIT implementation-defined, subject only to the restriction that
CHAR_BIT is an integer and >= 8. Also, bytes don't need to be hardware
addressable, byte addressability can be emulated in software, and often
is on machines with a word size > 8.

There are machines which cannot be configured to support that model for
any permitted value of CHAR_BIT, but not many.

Post by Lawrence D'Oliveiro
... So that causes awkwardness on
architectures like the PDP-10, for example.

While I was learning C at the time, I vaguely recall programming the
PDP-10 using Fortran IV more than five decades ago. I was far less
knowledgeable about computers than I am now, so I might not have
noticed. What problems are you referring to?

bart

2024-07-06 09:21:36 UTC

Post by bart
C also is the only language that is supposed to work on any kind of
processor ...

C didn't define a 'byte' at all. It assumed 'char' addressability, but
allowed that 'char' to be any width at all. I think at some point a
minimum of 8 bits was applied.

Post by Lawrence D'Oliveiro
So that causes awkwardness on
architectures like the PDP-10, for example.

Which was also the first machine I used, and the first I wrote a
compiler for.

C didn't exist on it, at least at that establishment, and was never
mentioned. I didn't take a closer looker until 16 years after I started
coding.

The 36-bit words caused problems on other languages too. There was an
instruction set extention to allow access to bitfields of any width, but
that was fiddly to use.

Some languages had to choose between 'packed' strings, and strings using
one word per character.

Post by Lawrence D'Oliveiro
It just so happened such
architectures became extinct at about the time the rise of 8-bit
microprocessors (and their more advanced successors) made byte-
addressability essentially universal.

The next machine I wrote a compiler for was an 8-bit microprocessor,
using twos complement, byte addressibility, some 16-bit capability, and
16-bit addressing.

Most of today's hardware evolved from such a model: 32- and 64-bit words
and addresses were an obvious natural progression. C however still
hasn't got the memo.

Keith Thompson

2024-07-06 23:04:53 UTC

Post by bart
C also is the only language that is supposed to work on any kind of
processor ...

C didn't define a 'byte' at all. It assumed 'char' addressability, but
allowed that 'char' to be any width at all. I think at some point a
minimum of 8 bits was applied.

What???

C defines a "byte" as an "addressable unit of data storage large enough
to hold any member of the basic character set of the execution
environment". You know that. C references going back to 1974 all talk
about bytes (the early ones are specific to the PDP-11).

Perhaps you meant that there's no predefined type named "byte", but
nobody said there was.

The requirement that a byte is at least 8 bits goes back at least to
C89. K&R1 (1978) doesn't make this requirement explicit, but shows
examples of 8- and 9-bit bytes.

[...]

Post by bart
Most of today's hardware evolved from such a model: 32- and 64-bit
words and addresses were an obvious natural progression. C however
still hasn't got the memo.

Right, C makes it *so* difficult to support systems with 8-bit bytes and
32- or 64-bit word.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

bart

2024-07-07 00:36:13 UTC

Post by bart
C also is the only language that is supposed to work on any kind of
processor ...

C didn't define a 'byte' at all. It assumed 'char' addressability, but
allowed that 'char' to be any width at all. I think at some point a
minimum of 8 bits was applied.

What???
C defines a "byte" as an "addressable unit of data storage large enough
to hold any member of the basic character set of the execution
environment". You know that. C references going back to 1974 all talk
about bytes (the early ones are specific to the PDP-11).
Perhaps you meant that there's no predefined type named "byte", but
nobody said there was.
The requirement that a byte is at least 8 bits goes back at least to
C89. K&R1 (1978) doesn't make this requirement explicit, but shows
examples of 8- and 9-bit bytes.
[...]

Post by bart
Most of today's hardware evolved from such a model: 32- and 64-bit
words and addresses were an obvious natural progression. C however
still hasn't got the memo.

Right, C makes it *so* difficult to support systems with 8-bit bytes and
32- or 64-bit word.

There's no 'byte' type. There's an odd selection of *5* char, short,
int, long and long long types which cover the *4* 8/16/32/64 bit sizes.

They include some wonderful denotations such as 'long int unsigned long'
(you choose the order you like).

None are defined by the language as any specific size, other than
certain minimum widths, and that each successive type is no shorter than
the last.

IF you include a special header, THEN it also provides some fixed-width
sizes, quite likely defined internally on top of types above. However
these are so ugly, that every other C program prefers to define its own
aliases. Plus there are 100 special macros to define format codes,
MIN/MAX values etc etc.

Meanwhile, format codes, and constant suffixes, are defined in terms of
0, 1 or 2 'longs'.

Most programs also use 'int' extensively, which although is very
commonly 32 bits, C only guarantees it to be 16 bits. Lots also use
'long' which is variously 32 or 64 bits.

So it's not 'difficult', just incredibly messy. I haven't even mentioned
least and fast types, whatever they do.

Pretty much every language touted as a C replacement which has fixed
width integers, has a far tidier and smaller set of more concrete types
(eg. 8-10, compared with what, 35?).

They also don't need CHAR_BIT to tell it that chars have 8 bits.

My point was that this evolution was apparent at least 40 years ago.

Keith Thompson

2024-07-07 01:39:50 UTC

bart <***@freeuk.com> writes:
[...]

Post by bart
There's no 'byte' type. There's an odd selection of *5* char, short,
int, long and long long types which cover the *4* 8/16/32/64 bit sizes.

You're right, there's no type named "byte". Nobody said there was.
You wrote that "C didn't define a 'byte' at all.". You were wrong.
C defines the word "byte" and uses it extensively.

[...]

Post by bart
My point was that this evolution was apparent at least 40 years ago.

And there are still C implementations with CHAR_BIT > 8, mostly for
DSPs. Please feel free to ignore them, but don't ask everyone else to
pretend they don't exist.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

lexi hale

2024-07-05 19:54:40 UTC

On Fri, 5 Jul 2024 02:30:34 -0500

Post by Lawrence D'Oliveiro
It’s called “Rust”.

Not to a bigger language, but to a more narrowly defined language.

i am cautiously hopeful for Odin (odin-lang.org) which is taking that exact approach

Bonita Montero

2024-07-07 04:35:27 UTC

Post by aotto1968
Hi,
1. does the world need a "new" C ?
-> http://thedev.nhi1.de/theKernel/main/managed-object.htm
2. is one compiler enough ?
-> http://thedev.nhi1.de/theCompiler/main/alc-compiler.htm

I just thought that if C would be a new language no one would use it.

aotto1968

2024-07-07 18:29:27 UTC