Post by email@example.com Post by Gordon Burditt
A triple-compile test can be very valuable for finding compiler
1. Compile the compiler source code with any old compiler you've
got around (such as the last-known-stable version of the compiler
you are developing, or the compiler your system came with).
Call the result C1.
2. Compile the same compiler source code with C1. Call the result
3. Compile the same compiler source code with C2. Call the result
How about taking any two compilers one has lying around, and compiling
the compiler source with each, then with the compilers produced thereby,
and then with the compilers produced by those.
You forgot to explicitly state the step where you compare the final
executables for each set of compilations against each other. It's
important, even though I know that's what was meant.
Post by firstname.lastname@example.org
That would ensure that
any secret code-tampering logic which was present in either of the
originals couldn't undetectably alter the results unless the exact same
code-tampering logic was present in both originals.
That's fine as far as it goes. It doesn't handle a number of other
possible problems for the paranoid:
- If you pick two versions of GCC (or two versions from any other
vendor), there's a high probability that if one was tampered with
at the vendor, the tampering is also present in a later version.
- If you pick two very different compilers, they don't run on the
same platform or generate code for the same CPU and the same OS,
making the executables nothing alike.
- The OS has been tampered with to make the compiler act differently.
- The hypervisor has been tampered with to make the compiler act
- The CPU has been tampered with to make the compiler act differently.
Do you really think Intel and AMD and other CPU manufacturers
haven't been compromised by the NSA?
- The tampering is in the C library, not the compiler proper, and
all compilers for the platform use the same C library (msvcrt.dll,
anyone?) Nobody ever recompiles it because they don't have the
source code (I presume: Microsoft rarely makes source code
available, and I'm guessing msvcrt.dll is no exception.)
Also, the triple-compile test is really much more useful for
discovering accidental bugs, not intentional tampering where the
tamperer may try to interfere with any tests you might run, or
conveniently kill you with an explosion in your cell phone battery
before the test results are in.
I do recall a similar problem of hidden data transfer from a tool
to its output that really happened: long, long ago. The tool was
a z80 assembler running on a TRS-80 Model II. Building it usually
used a previously known-good version of that same assembler.
1. At one point, the feature was added to permit using the name
of an opcode as a symbol that had as its value the value of the
machine opcode. The opcode table had fields like opcode name,
machine opcode value, what type of arguments it took (registers,
1-byte immediate data, 2-byte immediate data, address, etc.)
and the address of a handler that processed that opcode.
2. At a later point, the assembler was changed to use feature #1
in its opcode table.
Much later, someone complains that the opcode value generated by
one of the more obscure z80 opcodes was wrong. (OTIR, I think.) I
look at the output of the sample code given, then reproduce it. I
then look at the manuals for the z80 processor. Yep, it's wrong.
Further, the assembler produced the same opcode value for two
different opcode names, so it was unlikely to be a typo introduced
into the manufacturer documentation after the assembler was written.
So I then go to the source code. There's no place in the source
code where the value of that opcode appears!
Ever since change #2, the values of the opcodes were passed down
from predecessor to the current version. Patching the binary, then
rebuilding from unchanged source using the fixed assembler (which
would re-create the binary with the patch already in it!) would fix
I think the real fix was to revert change #2 above. This was done
very carefully, as there were many unrelated changes done since
1. replace in the opcode table the symbol with the value
of the opcode, except DO NOT fix the incorrect opcode
(which the assembler didn't use in its own code). This
is a big change where typographical errors are likely
and would be a disaster. Also add a detailed comment
of why not to re-apply change #2 no matter how good an
idea it may seem to be.
2. Re-assemble and verify that the executable didn't change.
3. Now fix the incorrect opcode. There's a place in the
source code to change it now.
4. Re-assemble. Verify that only one byte changed. Verify
that the sample source code assembles correctly. Just for
thoroughness, assemble the test file that includes all
opcodes and make sure nothing else changed.
5. Update the master source archive with the new version
and inform everyone who might have a private copy to
update it. (No source code control systems then on
desktop floppy-disk computers, and I don't think the
Model II even had a hard disk (8 megabytes. no, not
RAM or cache, hard disk space.) yet.)
The only way you would have detected this with using two different
assemblers to build the one of interest would be if one of the
assemblers you picked had the value of an opcode wrong. Also, as
features were added, probably the only assembler that could assemble
the source code of interest was the assembler of interest anyway.
Z80 assemblers weren't nearly as standardized as C.