Post by bartPost by BGBPost by bartPost by BGBBy-Value Structs smaller than 16 bytes are passed as-if they were a
64 or 128 bit integer type (as a single register or as a register
pair, with a layout matching their in-memory representation).
...
But, yeah, at the IL level, one could potentially eliminate structs
and arrays as a separate construct, and instead have bare pointers
and a generic "reserve a blob of bytes in the frame and initialize
this pointer to point to it" operator (with the business end of this
operator happening in the function prolog).
The problem with this, that I mentioned elsewhere, is how well it
would work with SYS V ABI, since the rules for structs are complex,
and apparently recursive.
Having just a block of bytes might not be enough.
In my case, I am not bothering with the SysV style ABI's (well, along
with there not being any x86 or x86-64 target...).
I'd imagine it's worse with ARM targets as there are so many more
registers to try and deconstruct structs into.
Not messed much with the ARM64 ABI or similar, but I will draw the line
in the sand somewhere.
Struct passing/return is enough of an edge case that one can just sort
of declare it "no go" between compilers with "mostly but not strictly
compatible" ABIs.
Post by bartPost by BGBFor my ISA, it is a custom ABI, but follows mostly similar rules to
some of the other "Microsoft style" ABIs (where, I have noted that
across multiple targets, MS tools have tended to use similar ABI
designs).
When you do your own thing, it's easy.
In the 1980s, I didn't need to worry about call conventions used for
other software, since there /was/ no other software! I had to write
everything, save for the odd calls to DOS which used some form of SYSCALL.
Then, arrays and structs were actually passed and returned by value (not
via hidden references), by copying the data to and from the stack.
However, I don't recall ever using the feature, as I considered it
efficient. I always used explicit references in my code.
Most of the time, one is passing/returning structures as pointers, and
not by value.
By value structures are usually small.
When a structure is not small, it is both simpler to implement, and
usually faster, to internally pass it by reference.
If you pass a large structure to a function by value, via an on-stack
copy, and the function assigns it to another location (say, a global
variable):
Pass by reference: Only a single copy operation is needed;
Pass by value on-stack: At least two copy operations are needed.
One also needs to reserve enough space in the function arguments list to
hold any structures passed, which could be bad if they are potentially
large.
But, on my ISA, ABI is sort of like:
R4 ..R7 : Arg0 ..Arg3
R20..R23: Arg4 ..Arg7
R36..R39: Arg8 ..Arg11 (optional)
R52..R55: Arg12..Arg15 (optional)
Return Value:
R2, R3:R2 (128 bit)
R2 is also used to pass in the return value pointer.
'this':
Generally passed in either R3 or R18, depending on ABI variant.
Where, callee-save:
R8 ..R14, R24..R31,
R40..R47, R56..R63
R15=SP
Non-saved scratch:
R2 ..R7 , R16..R23,
R32..R39, R48..R55
Arguments beyond the first 8/16 register arguments are passed on stack.
In this case, a spill space for the first 8/16 arguments (64 or 128
bytes) is provided on stack before the first non-register argument.
If the function accepts a fixed number of arguments and the number of
argument registers is 8 or less, spill space need only be provided for
the first 8 arguments (calling vararg functions will always reserve
space for 16 registers in the 16-register ABI). This spill space
effectively belongs to the callee rather than the caller.
Structures (by value):
1.. 8 bytes: Passed in a single register
9..16 bytes: Passed in a pair, padded to the next even pair
17+: Pass as a reference.
Things like 128-bit types are also passed/returned in register pairs.
Contrast, RV ABI:
X10..X17 are used for arguments;
No spill space is provided;
...
My variant uses similar rules to my own ABI for passing/returning
structures, with:
X28, structure return pointer
X29, 'this'
Normal return values go into X10 or X11:X10.
Note that in both ABI's, passing 'this' in a register would mean that
class instances and COM objects are not equivalent (COM object methods
always pass 'this' as the first argument).
The 'this' register is implicitly also used by lambdas to pass in the
pointer to the captured bindings area (which mostly resembles a
structure containing each variable captured by the lambda).
Can note though that in this case, capturing a binding by reference
means the lambda is limited to automatic lifetime (non-automatic lambdas
may only capture by value). In this case, capture by value is the default.
Post by bartPost by BGBFor my compiler targeting RISC-V, it uses a variation of RV's ABI rules.
Argument passing is basically similar, but struct pass/return is
different; and it passes floating-point values in GPRs (and, in my own
ISA, all floating-point values use GPRs, as there are no FPU
registers; though FPU registers do exist for RISC-V).
Supporting C's variadic functions, which is needed for many languages
when calling C across an FFI, usually requires different rules. On Win64
ABI for example, by passing low variadic arguments in both GPRs and FPU
registers.
I simplified things by assuming only GPRs are used.
Post by bart/Implementing/ variadic functions (which only occurs if implementing C)
is another headache if it has to work with the ABI (which can be assumed
for a non-static function).
I barely have a working solution for Win64 ABI, which needs to be done
via stdarg.h, but wouldn't have a clue how to do it for SYS V.
(Even Win64 has problems, as it assumes a downward-growing stack; in my
IL interpreter, the stack grows upwards!)
Most targets use a downward growing stack.
Mine is no exception here...
Post by bartPost by BGBNot likely a huge issue as one is unlikely to use ELF and PE/COFF in
the same program.
For the "OS" that runs on my CPU core, it is natively using PE/COFF, but
That's interesting: you deliberately used one of the most complex file
formats around, when you could have devised your own?
For what I wanted, I would have mostly needed to recreate most of the
same functionality as PE/COFF anyways.
When one considers the entire loading process (including DLLs/SOs), then
PE/COFF loading is actually simpler than ELF loading (ELF subjects the
loader to needing to deal with symbol and relocation tables), similar to
PIE loading.
Things like the MZ stub are optional in my case, and mostly ignored if
present (in my LZ compressed PE variants, the MZ stub is omitted entirely).
I had at one point considered doing a custom format resembling LZ
compressed MachO, but ended up not bothering, as it wouldn't have really
saved anything over LZ compressed PE/COFF.
Some "unneeded cruft" like the Resource Section was discarded, mostly
replaced by an embedded WAD2 image. The header was modified some to
allow for backwards compatibility with the Windows format (mostly
creating a dummy header in the original format that points to the WAD2
directory).
Idea is that icons, bitmaps, and other things, would mostly be held in
WAD lumps. Though, resources which may be accessed via symbols in the
EXE/DLL need to be stored uncompressed (where "__rsrc_lumpname" may be
used to access the contents of resource-section lumps as an extern symbol).
Say, for example:
extern byte __rsrc_mybitmap[]; //resolves to a DIB/BMP or similar
For now, resource formats:
Images:
BMP (various settings)
4, 8, and 16 bpp typical
Supports a non-standard 16-bpp alpha-blended mode (*1).
Supports non-standard 16 color and 256 color with transparent.
Supports CRAM BMP as well (2 bpp)
QOI (assumes RGBA32, nominally lossless)
QOI is a semi-simplistic non-entropy-coded format.
Can give PNG-like compression in some cases.
Reasonably fast/cheap to decode.
LCIF, custom lossy format, color-cell compression.
OK Q/bpp but mostly only on the low-end.
Resembles a QOI+CRAM hybrid.
UPIC, lossy or lossless, JPEG-like (*2)
*1:
0rrrrrgggggbbbbb Normal/Opaque
1rrrraggggabbbba With 3 bit alpha (4b/ch RGB).
For 16 and 256 color, a variant is supported with a transparent color.
Generally the high intensity magenta is reused as the transparent color.
This is encoded in the color palette (if all colors apart from one have
the alpha bits set to FF, and one color has 00, then that color is
assumed to be a transparent color).
CRAM bpp: Uses a limited form of the 8-bit CRAM format:
16 bits, 4x4 pixels, 1 bit per pixel
2x 8 bits: Color Endpoints
The rest of the format being unsupported, so it can simply assume a
fixed 32-bits per 4x4 pixel cell.
*2: The UPIC format is structurally similar to JPEG, but:
Uses TLV packaging (vs FF-escape tagging);
Uses Rice coding (vs Huffman)
Uses Z3.V5 VLC, vs Z4.V4
Uses Block-Haar and RCT
Vs DCT and YCbCr.
Supports an alpha channel.
Y 1 (*2A)
YA 1:1 (*2A)
YUV 4:2:0
YUV 4:4:4 (*2A)
YUVA 4:2:0:4
YUVA 4:4:4:4 (*2A)
*2A: May be used in the lossless modes, depending on image.
VLC coding resembles Deflate's natch distance encoding, with sign-folded
values. Runs of zero coefficients have a shorter limit, but similar.
Like with JPEG, an 0x00 symbol encodes an early EOB.
In tests, on my main PC:
Vs JPEG: It is a little faster
Q/bpp is similar, better/worse depends on image.
Slightly worse on photos, but "similar".
Generally somewhat better on artificial images.
Vs PNG:
Faster to decode (with less memory overhead);
Better compression on many images (particularly photo-like).
Note that UPIC was designed to not require any large intermediate
buffers, so will decode directly to an RGB555 or RGBA32 output buffer
(decoding happens in terms of individual 16x16 pixel macroblocks).
It was designed to be moderately fast and to try to minimize memory
overhead for decoding (vs either PNG or JPEG, which need a more
significant chunk of working memory to decode).
Block-Haar is a Haar transform made to fit the same 8x8 pixel blocks as
DCT, where Haar maps (A,B)->(C,D):
C=(A+B)/2 (*: X/2 here being defined as (X>>1))
D=A-B
But, can be reversed exactly, IIRC:
B=C-(D/2)
A=B+D
By doing multiple stages of Haar transform, one can build an 8-pixel
version, and then use horizontal and vertical transforms for an 8x8
block. It is computationally fairly cheap, and lossless.
The Walsh-Hadamard transform can give similar properties, but generally
involves a few extra steps that make it more computationally expensive.
It is possible to use a lifting transform to make a Reversible DCT, but
it is slow...
BGBCC accepts JPEG and PNG for input and can convert them to
BMP/QOI/UPIC as needed.
For audio storage, generally using the RIFF WAV format. For bulk audio,
both A-Law and IMA ADPCM work OK. Granted, IMA ADPCM is not space
efficient for stereo, but mostly OK for mono (most common use-case for
sound effects).
Post by bartI did exactly that at a period when my generated DLLs were buggy for
some reason (it turned out to be two reasons). I created a simple
dynamic library format of my own. Then I found the same format worked
also for executables.
But I needed a loader program to run them, as Windows obviously didn't
understand the format. Such a program can be written in 800 lines of C,
and can dynamically libraries in both my format, and proper DLLs (not
the buggy ones I generated!).
A hello-world program is under 300 bytes compared with 2 or
2.5KB of EXE. And the format is portable to Linux, so no need to
generate ELF (but I haven't tried). Plus the format might be transparent
to AV software (haven't tried that either).
OK.
By design, my PEL format (PE+LZ) isn't going to get under 2K (1K for
headers, 1K for LZ'ed sections).
But, usually this is not a problem.