Discussion:
Another test with mcc64
(too old to reply)
m***@gmail.com
2017-07-01 07:39:09 UTC
Permalink
Raw Message
I made another test with mcc64, where I found several things.

The prototypes for strtoul() and atof() is missing from stdlib.h.
The definitions of DBL_MIN, DBL_MAX, FLT_DIG, DBL_DIG, FLT_RADIX,
FLT_MANT_DIG and DBL_MANT_DIG are missing from float.h

Using the sequence 08 in a macro causes the error:

OCTAL BAD DIGIT 436 analyze.c

The macro definitions are:
#define INT64TYPE_FORMAT "ll"
#define F_X64(width) "%" #width INT64TYPE_FORMAT "x"

The code that triggers the error is:
printf("\"%s\", 0x" F_X64(08) ", *, ",
striAsUnquotedCStri(sourceFilePath), options);

As can be seen, the sequence 08 is used to form the format
%08llx
In formats 0 does NOT introduce an octal number, but instead
triggers leading zeros.

To progress under windows access to the windows specific headers is
also necessary.


Regards,
Thomas Mertes
--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
bartc
2017-07-01 10:33:14 UTC
Permalink
Raw Message
Post by m***@gmail.com
I made another test with mcc64, where I found several things.
The prototypes for strtoul() and atof() is missing from stdlib.h.
The definitions of DBL_MIN, DBL_MAX, FLT_DIG, DBL_DIG, FLT_RADIX,
FLT_MANT_DIG and DBL_MANT_DIG are missing from float.h
OCTAL BAD DIGIT 436 analyze.c
Ha! The was fixed long ago, but I deliberately introduced that message
to help me find examples of such macro calls, to point out in a c.l.c
post. The mistake was uploading a version with that still in.
Post by m***@gmail.com
To progress under windows access to the windows specific headers is
also necessary.
I've made a fix (and imported the DBL_MIN etc definitions from N1570.PDF
- they look about right). A revised version is here:

https://github.com/bartg/langs/blob/master/bccproj/mcc64.c

But note that I've stopped work on this version of that compiler, other
than for things such as updates to headers.

Probably I will come back to it later in the year when it can be a front
end to a self-contained compiler, and with a revised code generator (I'm
not happy with this one).

[NOTE: that new version may not work conventionally in compiling one .c
file to one .obj file. It may only do multiple .c files to one .exe file
for example. It might not be suitable to invoke from a make file. But
that is not definite.]

However there will always be a problem in getting the standard headers
(which now seem to include POSIX) fully and correctly populated.

Especially so with windows.h. Other compilers' windows.h vary from
20,000 lines to around 220,000 lines I think (in from 3 to 150 separate
headers). And all the associated WinAPI headers in MSVC come to
2,000,000 lines.

My windows.h currently is one file of 177 lines!
--
bartc
m***@gmail.com
2017-07-03 21:24:12 UTC
Permalink
Raw Message
Post by bartc
Post by m***@gmail.com
I made another test with mcc64, where I found several things.
The prototypes for strtoul() and atof() is missing from stdlib.h.
The definitions of DBL_MIN, DBL_MAX, FLT_DIG, DBL_DIG, FLT_RADIX,
FLT_MANT_DIG and DBL_MANT_DIG are missing from float.h
OCTAL BAD DIGIT 436 analyze.c
Ha! The was fixed long ago, but I deliberately introduced that message
to help me find examples of such macro calls, to point out in a c.l.c
post. The mistake was uploading a version with that still in.
Post by m***@gmail.com
To progress under windows access to the windows specific headers is
also necessary.
I've made a fix (and imported the DBL_MIN etc definitions from N1570.PDF
https://github.com/bartg/langs/blob/master/bccproj/mcc64.c
But note that I've stopped work on this version of that compiler, other
than for things such as updates to headers.
Sorry to hear that you have stopped to work on mcc64.c.

Concerning updates to headers I found several things:
Your should add a const to the definition of wcslen in wchar.h:
size_t wcslen(const wchar_t*);

A definition of raise() is missing from signal.h:
int raise(int sig);

A definition of isnan() is missing from math.h.

The include file winsock2.h is missing.

The function wfopen (respectively _wfopen) is also missing.

There seems to be also a problem, when the file name of an include
directive is provided by a macro. I get the following errors:
Lex error include?
or
Lex error Can't find include file
Post by bartc
Probably I will come back to it later in the year when it can be a front
end to a self-contained compiler, and with a revised code generator (I'm
not happy with this one).
[NOTE: that new version may not work conventionally in compiling one .c
file to one .obj file. It may only do multiple .c files to one .exe file
for example. It might not be suitable to invoke from a make file. But
that is not definite.]
For Seed7 I need a C compiler that can compile one .c file to one .obj file.

Regards,
Thomas Mertes
--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
bartc
2017-07-03 22:25:30 UTC
Permalink
Raw Message
Post by m***@gmail.com
Post by bartc
But note that I've stopped work on this version of that compiler, other
than for things such as updates to headers.
Sorry to hear that you have stopped to work on mcc64.c.
(It'll be back. But too much C recently has started to fry my brain so
need to get back to working on a saner language for a while.)
Post by m***@gmail.com
size_t wcslen(const wchar_t*);
int raise(int sig);
A definition of isnan() is missing from math.h.
The include file winsock2.h is missing.
The function wfopen (respectively _wfopen) is also missing.
Is this still for Seed7 sources (rather than Seed7-generated C)? As I
said before, it will be more effective for me to whizz through the
sources. I've tried it once but didn't know what files are supposed to
compile, and which shouldn't, but I can try again and use gcc as a guide.
Post by m***@gmail.com
There seems to be also a problem, when the file name of an include
Lex error include?
or
Lex error Can't find include file
You mean code like this:

#define STDIO "stdio.h"

#include STDIO

?

That was a very quick fix, but only when the filename is a string. This:

#define FILE <sys/stat.h>
#include FILE

I can't do. (And if it involves building a filename from the tokens
'sys', '/', 'stat', '.' and 'h', then I don't want to! Sheesh...)
Post by m***@gmail.com
Post by bartc
[NOTE: that new version may not work conventionally in compiling one .c
file to one .obj file. It may only do multiple .c files to one .exe file
for example. It might not be suitable to invoke from a make file. But
that is not definite.]
For Seed7 I need a C compiler that can compile one .c file to one .obj file.
That version would be experimental. But if I can figure out how to write
.exe, then I can probably do .obj, and allow it to be done on individual
modules.

However... that would mean either relying on an external linker, or
creating a linker (which needs to process dll files directly), neither
of which is appealing.

And also, the 'USP' of this project would be instant compilation and
execution of an application, otherwise there are plenty of C compilers
that will do the job (written by people who respect the language more
than me as well).
--
bartc
m***@gmail.com
2017-07-04 08:55:32 UTC
Permalink
Raw Message
Post by bartc
Post by m***@gmail.com
Post by bartc
But note that I've stopped work on this version of that compiler, other
than for things such as updates to headers.
Sorry to hear that you have stopped to work on mcc64.c.
(It'll be back. But too much C recently has started to fry my brain so
need to get back to working on a saner language for a while.)
Post by m***@gmail.com
size_t wcslen(const wchar_t*);
int raise(int sig);
A definition of isnan() is missing from math.h.
The include file winsock2.h is missing.
The function wfopen (respectively _wfopen) is also missing.
Is this still for Seed7 sources (rather than Seed7-generated C)? As I
said before, it will be more effective for me to whizz through the
sources. I've tried it once but didn't know what files are supposed to
compile, and which shouldn't, but I can try again and use gcc as a guide.
Yes, this is still for Seed7. I created a makefile for mcc64 (mk_mcc64.mak)
and a small frontend for mcc64. The frontend (mcc.c) calls mcc64, nasm (to
produce an .obj) and gcc (to link it together to an executable). Executing
chkccomp.c succeeds. There are several hickups with chkccomp.c, E.g.: I am
currently not sure if system() succeeds in calling mcc.exe. But I get an
version.h file, such that I can compile the sources of the Seed7 interpreter
afterwards. There I get errors about missing functions, etc. (see above for
the most obvious findings). All of this is very experimental. Therefore it
is currently not part of the Seed7 release.
Post by bartc
Post by m***@gmail.com
There seems to be also a problem, when the file name of an include
Lex error include?
or
Lex error Can't find include file
#define STDIO "stdio.h"
#include STDIO
?
#define FILE <sys/stat.h>
#include FILE
I can't do. (And if it involves building a filename from the tokens
'sys', '/', 'stat', '.' and 'h', then I don't want to! Sheesh...)
I think a filename as string is okay (I am in the office, so I cannot
check).

Regards,
Thomas Mertes
--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
bartc
2017-07-04 11:36:58 UTC
Permalink
Raw Message
Post by m***@gmail.com
Post by bartc
Is this still for Seed7 sources (rather than Seed7-generated C)? As I
said before, it will be more effective for me to whizz through the
sources. I've tried it once but didn't know what files are supposed to
compile, and which shouldn't, but I can try again and use gcc as a guide.
Yes, this is still for Seed7. I created a makefile for mcc64 (mk_mcc64.mak)
and a small frontend for mcc64. The frontend (mcc.c) calls mcc64, nasm (to
produce an .obj) and gcc (to link it together to an executable). Executing
chkccomp.c succeeds. There are several hickups with chkccomp.c, E.g.: I am
currently not sure if system() succeeds in calling mcc.exe. But I get an
version.h file, such that I can compile the sources of the Seed7 interpreter
afterwards. There I get errors about missing functions, etc. (see above for
the most obvious findings). All of this is very experimental. Therefore it
is currently not part of the Seed7 release.
That sounds promising. As I said, I'll do another pass through the
sources later. (Compiling the 170Kloc sqlite.c, using my own headers
rather than another compiler's as I've used for that, is another qood
way of picking up missing elements of headers.)

But with other test programs that I have managed to compile and run (eg.
Lua, Tiny C, Tex) they execute, but then diverge in behaviour from the
same program built with a solid compiler like gcc. This might be a
subtle difference, or it might be not so subtle.

I'm still looking for an approach to help track down those code
generator problems that doesn't involve delving deeply into the source
code of the applications, as that is hopelessly impractical.

(Probably, I may end up just replacing the whole code generator with a
simpler, more conservative (and slower) one. Then if that works, I have
two versions that I can trace and compare.)
--
Bartc
Keith Thompson
2017-07-04 20:04:51 UTC
Permalink
Raw Message
bartc <***@freeuk.com> writes:
[...]
Post by bartc
That sounds promising. As I said, I'll do another pass through the
sources later. (Compiling the 170Kloc sqlite.c, using my own headers
rather than another compiler's as I've used for that, is another qood
way of picking up missing elements of headers.)
Wouldn't comparing your own headers to the standard be even better?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
bartc
2017-07-04 23:51:57 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by bartc
That sounds promising. As I said, I'll do another pass through the
sources later. (Compiling the 170Kloc sqlite.c, using my own headers
rather than another compiler's as I've used for that, is another qood
way of picking up missing elements of headers.)
Wouldn't comparing your own headers to the standard be even better?
That would be great; where can I download them?

Do they contain definitions for _fstati64? For _wenviron? For struct
_utimbuf? For hundreds of other names that are commonly used in C programs?

Does the standard even know about headers such as sys/stat.h and utime.h?
--
bartc
Keith Thompson
2017-07-05 00:12:35 UTC
Permalink
Raw Message
Post by bartc
Post by Keith Thompson
[...]
Post by bartc
That sounds promising. As I said, I'll do another pass through the
sources later. (Compiling the 170Kloc sqlite.c, using my own headers
rather than another compiler's as I've used for that, is another qood
way of picking up missing elements of headers.)
Wouldn't comparing your own headers to the standard be even better?
That would be great; where can I download them?
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

No, they're not in the form of header files that you can copy and
install. It's a standard document, not an implementation.
Post by bartc
Do they contain definitions for _fstati64? For _wenviron? For struct
_utimbuf? For hundreds of other names that are commonly used in C programs?
No. Aren't those Microsoft-specific? Microsoft has documentation.
Post by bartc
Does the standard even know about headers such as sys/stat.h and utime.h?
As you know, the C standard doesn't. <sys/stat.h> and <utime.h>
are specified by POSIX. If you want to support POSIX, you'll need
to find and read the appropriate documentation.

I have no idea what standard(s) you want to support, but if you want
to support at least ISO C, you should consult the ISO C standard.
That could have let you avoid an incorrect definition of wcslen(),
and missing declarations of raise() and isnan(), mentioned upthread.

Yes, there's more work to do than just adding the ISO C required
declarations to your headers. Do that work, or don't.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
m***@gmail.com
2017-07-04 20:57:55 UTC
Permalink
Raw Message
Post by bartc
Post by m***@gmail.com
Post by bartc
Is this still for Seed7 sources (rather than Seed7-generated C)? As I
said before, it will be more effective for me to whizz through the
sources. I've tried it once but didn't know what files are supposed to
compile, and which shouldn't, but I can try again and use gcc as a guide.
Yes, this is still for Seed7. I created a makefile for mcc64 (mk_mcc64.mak)
and a small frontend for mcc64. The frontend (mcc.c) calls mcc64, nasm (to
produce an .obj) and gcc (to link it together to an executable). Executing
chkccomp.c succeeds. There are several hickups with chkccomp.c, E.g.: I am
currently not sure if system() succeeds in calling mcc.exe. But I get an
version.h file, such that I can compile the sources of the Seed7 interpreter
afterwards. There I get errors about missing functions, etc. (see above for
the most obvious findings). All of this is very experimental. Therefore it
is currently not part of the Seed7 release.
That sounds promising. As I said, I'll do another pass through the
sources later. (Compiling the 170Kloc sqlite.c, using my own headers
rather than another compiler's as I've used for that, is another qood
way of picking up missing elements of headers.)
But with other test programs that I have managed to compile and run (eg.
Lua, Tiny C, Tex) they execute, but then diverge in behaviour from the
same program built with a solid compiler like gcc. This might be a
subtle difference, or it might be not so subtle.
I'm still looking for an approach to help track down those code
generator problems that doesn't involve delving deeply into the source
code of the applications, as that is hopelessly impractical.
A test suite is helpful to find weaknesses systematically.
Probably a C test suite would be the best solution.
For Seed7 I have a test suite. So, if compiling Seed7 succeeds,
mcc64 can be tested indirectly.

I found more missing things in the header files of mcc64.c.
When I compile and run chkccomp.c it shows me that some functions are
missings:
_wchdir
_wgetcwd
_wmkdir
_wrmdir
_wchmod
_wremove
_wrename
_wsystem
_wfopen

Additionally a definition of struct utimbuf is missing from utime.h.

You can try it yourself with the newest release of Seed7
(seed7_05_20170702.tgz). To compile Seed7 with mcc64 I use make7,
which is also part of the Seed7 package. Any other make utility is
also possible.

To get the executable of make7 you need to compile Seed7 with gcc.
Afterwards you can compile make7 with: s7c make7 and save the
executable make7.exe away to some directory in the path.

To compile Seed7 with mcc64 you need the makefile mk_mcc64w.mak,
which follows:

-------------------- begin mk_mcc64w.mak --------------------
# Makefile for make7 and gcc from MinGW. Commands executed by: cmd.exe
# To compile use a Windows console and call:
# ..\bin\make7 -f mk_mingc.mak depend
# ..\bin\make7 -f mk_mingc.mak
# When your make utility uses Unix commands, you should use mk_msys.mak instead.
# When the nmake utility from Windows is available, you can use mk_nmake.mak instead.
# When you are using the MSYS console from MinGW you should use mk_msys.mak instead.

# CFLAGS = -O2 -fomit-frame-pointer -funroll-loops -Wall
# CFLAGS = -O2 -fomit-frame-pointer -Wall -Wstrict-prototypes -Winline -Wconversion -Wshadow -Wpointer-arith
# CFLAGS = -O2 -g -ffunction-sections -fdata-sections $(INCLUDE_OPTIONS) -Wall -Wstrict-prototypes -Winline -Wconversion -Wshadow -Wpointer-arith
CFLAGS = $(INCLUDE_OPTIONS)
# CFLAGS = -O2 -g -pg -Wall -Wstrict-prototypes -Winline -Wconversion -Wshadow -Wpointer-arith
# CFLAGS = -O2 -Wall -Wstrict-prototypes -Winline -Wconversion -Wshadow -Wpointer-arith
# CFLAGS = -O2 -pg -Wall -Wstrict-prototypes -Winline -Wconversion -Wshadow -Wpointer-arith
# CFLAGS = -O2 -funroll-loops -Wall -pg
LDFLAGS =
# LDFLAGS = -Wl,--gc-sections,--stack,8388608,--subsystem,windows
# LDFLAGS = -pg
# LDFLAGS = -pg -lc_p
SYSTEM_LIBS = -lm -lws2_32
# SYSTEM_LIBS = -lm -lws2_32 -lgmp
SYSTEM_CONSOLE_LIBS =
SYSTEM_DRAW_LIBS = -lgdi32
SEED7_LIB = seed7_05.a
CONSOLE_LIB = s7_con.a
DRAW_LIB = s7_draw.a
COMP_DATA_LIB = s7_data.a
COMPILER_LIB = s7_comp.a
ALL_S7_LIBS = ..\bin\$(COMPILER_LIB) ..\bin\$(COMP_DATA_LIB) ..\bin\$(DRAW_LIB) ..\bin\$(CONSOLE_LIB) ..\bin\$(SEED7_LIB)
# CC = g++
CC = mcc
GET_CC_VERSION_INFO = $(CC) --version >

BIGINT_LIB_DEFINE = USE_BIG_RTL_LIBRARY
BIGINT_LIB = big_rtl
# BIGINT_LIB_DEFINE = USE_BIG_GMP_LIBRARY
# BIGINT_LIB = big_gmp

MOBJ = s7.obj
POBJ = runerr.obj option.obj primitiv.obj
LOBJ = actlib.obj arrlib.obj biglib.obj binlib.obj blnlib.obj bstlib.obj chrlib.obj cmdlib.obj conlib.obj dcllib.obj \
drwlib.obj enulib.obj fillib.obj fltlib.obj hshlib.obj intlib.obj itflib.obj kbdlib.obj lstlib.obj pcslib.obj \
pollib.obj prclib.obj prglib.obj reflib.obj rfllib.obj sctlib.obj setlib.obj soclib.obj sqllib.obj strlib.obj \
timlib.obj typlib.obj ut8lib.obj
EOBJ = exec.obj doany.obj objutl.obj
AOBJ = act_comp.obj prg_comp.obj analyze.obj syntax.obj token.obj parser.obj name.obj type.obj \
expr.obj atom.obj object.obj scanner.obj literal.obj numlit.obj findid.obj \
error.obj infile.obj libpath.obj symbol.obj info.obj stat.obj fatal.obj match.obj
GOBJ = syvarutl.obj traceutl.obj actutl.obj executl.obj blockutl.obj \
entutl.obj identutl.obj chclsutl.obj arrutl.obj
ROBJ = arr_rtl.obj bln_rtl.obj bst_rtl.obj chr_rtl.obj cmd_rtl.obj con_rtl.obj dir_rtl.obj drw_rtl.obj fil_rtl.obj \
flt_rtl.obj hsh_rtl.obj int_rtl.obj itf_rtl.obj pcs_rtl.obj set_rtl.obj soc_rtl.obj sql_rtl.obj str_rtl.obj \
tim_rtl.obj ut8_rtl.obj heaputl.obj numutl.obj sigutl.obj striutl.obj \
sql_base.obj sql_lite.obj sql_my.obj sql_oci.obj sql_odbc.obj sql_post.obj
DOBJ = $(BIGINT_LIB).obj cmd_win.obj dir_win.obj dll_win.obj fil_win.obj pcs_win.obj pol_sel.obj stat_win.obj tim_win.obj
OBJ = $(MOBJ)
SEED7_LIB_OBJ = $(ROBJ) $(DOBJ)
DRAW_LIB_OBJ = gkb_rtl.obj drw_win.obj gkb_win.obj
CONSOLE_LIB_OBJ = kbd_rtl.obj con_win.obj
COMP_DATA_LIB_OBJ = typ_data.obj rfl_data.obj ref_data.obj listutl.obj flistutl.obj typeutl.obj datautl.obj
COMPILER_LIB_OBJ = $(POBJ) $(LOBJ) $(EOBJ) $(AOBJ) $(GOBJ)

MSRC = s7.c
PSRC = runerr.c option.c primitiv.c
LSRC = actlib.c arrlib.c biglib.c binlib.c blnlib.c bstlib.c chrlib.c cmdlib.c conlib.c dcllib.c \
drwlib.c enulib.c fillib.c fltlib.c hshlib.c intlib.c itflib.c kbdlib.c lstlib.c pcslib.c \
pollib.c prclib.c prglib.c reflib.c rfllib.c sctlib.c setlib.c soclib.c sqllib.c strlib.c \
timlib.c typlib.c ut8lib.c
ESRC = exec.c doany.c objutl.c
ASRC = act_comp.c prg_comp.c analyze.c syntax.c token.c parser.c name.c type.c \
expr.c atom.c object.c scanner.c literal.c numlit.c findid.c \
error.c infile.c libpath.c symbol.c info.c stat.c fatal.c match.c
GSRC = syvarutl.c traceutl.c actutl.c executl.c blockutl.c \
entutl.c identutl.c chclsutl.c arrutl.c
RSRC = arr_rtl.c bln_rtl.c bst_rtl.c chr_rtl.c cmd_rtl.c con_rtl.c dir_rtl.c drw_rtl.c fil_rtl.c \
flt_rtl.c hsh_rtl.c int_rtl.c itf_rtl.c pcs_rtl.c set_rtl.c soc_rtl.c sql_rtl.c str_rtl.c \
tim_rtl.c ut8_rtl.c heaputl.c numutl.c sigutl.c striutl.c \
sql_base.c sql_lite.c sql_my.c sql_oci.c sql_odbc.c sql_post.c
DSRC = $(BIGINT_LIB).c cmd_win.c dir_win.c dll_win.c fil_win.c pcs_win.c pol_sel.c stat_win.c tim_win.c
SRC = $(MSRC)
SEED7_LIB_SRC = $(RSRC) $(DSRC)
DRAW_LIB_SRC = gkb_rtl.c drw_win.c gkb_win.c
CONSOLE_LIB_SRC = kbd_rtl.c con_win.c
COMP_DATA_LIB_SRC = typ_data.c rfl_data.c ref_data.c listutl.c flistutl.c typeutl.c datautl.c
COMPILER_LIB_SRC = $(PSRC) $(LSRC) $(ESRC) $(ASRC) $(GSRC)

s7: ..\bin\s7.exe ..\prg\s7.exe
..\bin\s7 -l ..\lib level
@echo.
@echo Use 'make s7c' (with your make command) to create the compiler.
@echo.

s7c: ..\bin\s7c.exe ..\prg\s7c.exe
@echo.
@echo Use 'make test' (with your make command) to check Seed7.
@echo.

..\bin\s7.exe: $(OBJ) $(ALL_S7_LIBS)
$(CC) $(LDFLAGS) $(OBJ) $(ALL_S7_LIBS) $(SYSTEM_DRAW_LIBS) $(SYSTEM_CONSOLE_LIBS) $(SYSTEM_LIBS) $(SYSTEM_DB_LIBS) -o ..\bin\s7

..\prg\s7.exe: ..\bin\s7.exe
copy ..\bin\s7.exe ..\prg /Y

..\bin\s7c.exe: ..\prg\s7c.exe
copy ..\prg\s7c.exe ..\bin /Y

..\prg\s7c.exe: ..\prg\s7c.sd7 $(ALL_S7_LIBS)
..\bin\s7 -l ..\lib ..\prg\s7c -l ..\lib -b ..\bin -O2 ..\prg\s7c

clear: clean

clean:
del mcc64.exe
del mcc.exe
del *.obj
del *.asm
del ..\bin\*.a
del ..\bin\s7.exe
del ..\bin\s7c.exe
del ..\prg\s7.exe
del ..\prg\s7c.exe
del depend
del macros
del chkccomp.h
del version.h
del setwpath.exe
del wrdepend.exe
del sudo.exe
@echo.
@echo Use 'make depend' (with your make command) to create the dependencies.
@echo.

distclean: clean
copy level_bk.h level.h /Y

test:
..\bin\s7 -l ..\lib ..\prg\chk_all build
@echo.
@echo Use 'sudo make install' (with your make command) to install Seed7.
@echo Or open a console as administrator, go to the directory seed7/src
@echo and use 'make install' (with your make command) to install Seed7.
@echo.

install: setwpath.exe
.\setwpath.exe add ..\bin

uninstall: setwpath.exe
.\setwpath.exe remove ..\bin

dep: depend

strip:
strip ..\bin\s7.exe

mcc64.exe: mcc64.c
gcc -m64 -O3 mcc64.c -o mcc64.exe

mcc.exe: mcc.c mcc64.exe
mcc64 mcc.c
nasm -fwin64 mcc.asm
gcc -o mcc.exe mcc.obj

chkccomp.h:
echo #define LIST_DIRECTORY_CONTENTS "dir" >> chkccomp.h
echo #define STAT_MISSING >> chkccomp.h
echo #define MYSQL_DLL "libmariadb.dll", "libmysql.dll" >> chkccomp.h
echo #define MYSQL_USE_DLL >> chkccomp.h
echo #define SQLITE_DLL "sqlite3.dll" >> chkccomp.h
echo #define SQLITE_USE_DLL >> chkccomp.h
echo #define POSTGRESQL_DLL "libpq.dll" >> chkccomp.h
echo #define POSTGRESQL_USE_DLL >> chkccomp.h
echo #define ODBC_LIBS "-lodbc32" >> chkccomp.h
echo #define ODBC_DLL "odbc32.dll" >> chkccomp.h
echo #define ODBC_USE_LIB >> chkccomp.h
echo #define OCI_DLL "oci.dll" >> chkccomp.h
echo #define OCI_USE_DLL >> chkccomp.h

version.h: chkccomp.h mcc.exe
echo #define PATH_DELIMITER '\\' > version.h
echo #define SEARCH_PATH_DELIMITER ';' >> version.h
echo #define NULL_DEVICE "NUL:" >> version.h
echo #define WITH_SQL >> version.h
echo #define CONSOLE_WCHAR >> version.h
echo #define OS_STRI_WCHAR >> version.h
echo #define os_fstat _fstati64 >> version.h
echo #define DEFINE_WSTATI64_EXT >> version.h
echo #define os_lstat wstati64Ext >> version.h
echo #define os_stat wstati64Ext >> version.h
echo #define os_stat_orig _wstati64 >> version.h
echo #define os_stat_struct struct _stati64 >> version.h
echo #define os_fseek fseeko64 >> version.h
echo #define os_ftell ftello64 >> version.h
echo #define os_off_t off64_t >> version.h
echo #define os_environ _wenviron >> version.h
echo #define os_putenv _wputenv >> version.h
echo #define os_getch _getwch >> version.h
echo #define QUOTE_WHOLE_SHELL_COMMAND >> version.h
echo #define USE_WINSOCK >> version.h
echo #define $(BIGINT_LIB_DEFINE) >> version.h
echo #define OBJECT_FILE_EXTENSION ".obj" >> version.h
echo #define LIBRARY_FILE_EXTENSION ".a" >> version.h
echo #define EXECUTABLE_FILE_EXTENSION ".exe" >> version.h
echo #define C_COMPILER "$(CC)" >> version.h
echo #define GET_CC_VERSION_INFO_OPTIONS "--version >" >> version.h
echo #define CC_OPT_DEBUG_INFO "-g" >> version.h
echo #define CC_OPT_NO_WARNINGS "-w" >> version.h
echo #define CC_FLAGS "" >> version.h
echo #define CC_ERROR_FILDES 2 >> version.h
echo #define LINKER_OPT_NO_DEBUG_INFO "-Wl,--strip-debug" >> version.h
echo #define LINKER_OPT_OUTPUT_FILE "-o " >> version.h
echo #define LINKER_FLAGS "$(LDFLAGS)" >> version.h
echo #define SYSTEM_LIBS "$(SYSTEM_LIBS)" >> version.h
echo #define SYSTEM_CONSOLE_LIBS "$(SYSTEM_CONSOLE_LIBS)" >> version.h
echo #define SYSTEM_DRAW_LIBS "$(SYSTEM_DRAW_LIBS)" >> version.h
$(GET_CC_VERSION_INFO) cc_vers.txt
$(CC) chkccomp.c -lm -o chkccomp
.\chkccomp.exe version.h
del chkccomp.exe
del cc_vers.txt
echo #define SEED7_LIB "$(SEED7_LIB)" >> version.h
echo #define CONSOLE_LIB "$(CONSOLE_LIB)" >> version.h
echo #define DRAW_LIB "$(DRAW_LIB)" >> version.h
echo #define COMP_DATA_LIB "$(COMP_DATA_LIB)" >> version.h
echo #define COMPILER_LIB "$(COMPILER_LIB)" >> version.h
$(CC) -o setpaths setpaths.c
.\setpaths.exe "S7_LIB_DIR=$(S7_LIB_DIR)" "SEED7_LIBRARY=$(SEED7_LIBRARY)" >> version.h
del setpaths.exe
$(CC) setwpath.c -o setwpath
$(CC) wrdepend.c -o wrdepend
$(CC) sudo.c -w -o sudo

depend: version.h
.\wrdepend.exe $(CFLAGS) -M $(SRC) "> depend"
.\wrdepend.exe $(CFLAGS) -M $(SEED7_LIB_SRC) ">> depend"
.\wrdepend.exe $(CFLAGS) -M $(CONSOLE_LIB_SRC) ">> depend"
.\wrdepend.exe $(CFLAGS) -M $(DRAW_LIB_SRC) ">> depend"
.\wrdepend.exe $(CFLAGS) -M $(COMP_DATA_LIB_SRC) ">> depend"
.\wrdepend.exe $(CFLAGS) -M $(COMPILER_LIB_SRC) ">> depend"
@echo.
@echo Use 'make' (with your make command) to create the interpreter.
@echo.

level.h:
..\bin\s7 -l ..\lib level

..\bin\$(SEED7_LIB): $(SEED7_LIB_OBJ)
..\bin\call_ar r ..\bin\$(SEED7_LIB) $(SEED7_LIB_OBJ)

..\bin\$(CONSOLE_LIB): $(CONSOLE_LIB_OBJ)
..\bin\call_ar r ..\bin\$(CONSOLE_LIB) $(CONSOLE_LIB_OBJ)

..\bin\$(DRAW_LIB): $(DRAW_LIB_OBJ)
..\bin\call_ar r ..\bin\$(DRAW_LIB) $(DRAW_LIB_OBJ)

..\bin\$(COMP_DATA_LIB): $(COMP_DATA_LIB_OBJ)
..\bin\call_ar r ..\bin\$(COMP_DATA_LIB) $(COMP_DATA_LIB_OBJ)

..\bin\$(COMPILER_LIB): $(COMPILER_LIB_OBJ)
..\bin\call_ar r ..\bin\$(COMPILER_LIB) $(COMPILER_LIB_OBJ)

..\bin\%.exe: ..\prg\%.sd7 ..\bin\s7c.exe
..\bin\s7c.exe -l ..\lib -b ..\bin -O2 $<
move /Y $(<:.sd7=.exe) ..\bin

bas7: ..\bin\bas7.exe
calc7: ..\bin\calc7.exe
cat: ..\bin\cat.exe
comanche: ..\bin\comanche.exe
diff7: ..\bin\diff7.exe
find7: ..\bin\find7.exe
ftp7: ..\bin\ftp7.exe
ftpserv: ..\bin\ftpserv.exe
hd: ..\bin\hd.exe
make7: ..\bin\make7.exe
sql7: ..\bin\sql7.exe
sydir7: ..\bin\sydir7.exe
tar7: ..\bin\tar7.exe
toutf8: ..\bin\toutf8.exe
which: ..\bin\which.exe

utils: ..\bin\bas7.exe ..\bin\calc7.exe ..\bin\cat.exe ..\bin\comanche.exe ..\bin\diff7.exe \
..\bin\find7.exe ..\bin\ftp7.exe ..\bin\ftpserv.exe ..\bin\hd.exe ..\bin\make7.exe \
..\bin\sql7.exe ..\bin\sydir7.exe ..\bin\tar7.exe ..\bin\toutf8.exe ..\bin\which.exe

wc: $(SRC)
echo SRC:
wc $(SRC)
echo SEED7_LIB_SRC:
wc $(SEED7_LIB_SRC)
echo CONSOLE_LIB_SRC:
wc $(CONSOLE_LIB_SRC)
echo DRAW_LIB_SRC:
wc $(DRAW_LIB_SRC)
echo COMP_DATA_LIB_SRC:
wc $(COMP_DATA_LIB_SRC)
echo COMPILER_LIB_SRC:
wc $(COMPILER_LIB_SRC)

lint: $(SRC)
lint -p $(SRC) $(SYSTEM_DRAW_LIBS) $(SYSTEM_CONSOLE_LIBS) $(SYSTEM_LIBS) $(SYSTEM_DB_LIBS)

lint2: $(SRC)
lint -Zn2048 $(SRC) $(SYSTEM_DRAW_LIBS) $(SYSTEM_CONSOLE_LIBS) $(SYSTEM_LIBS) $(SYSTEM_DB_LIBS)

ifeq (depend,$(wildcard depend))
include depend
endif

ifeq (macros,$(wildcard macros))
include macros
endif
-------------------- end mk_mcc64w.mak --------------------

The makefile mk_mcc64w.mak assumes that mcc64.c and mcc.c are in the seed7/src
directory. The program mcc.c, which is a driver for mcc64.c follows:

-------------------- begin mcc.c --------------------
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define COMMAND_SIZE 1024


int main (int argc, char *argv[])
{
int pos;
size_t len;
char *source = NULL;
char *assembler = NULL;
char *obj = NULL;
char *output = NULL;
char *includeDir = NULL;
int doLink = 1;
char command[COMMAND_SIZE];

for (pos = 1; pos < argc; pos++) {
if (argv[pos][0] == '-') {
switch (argv[pos][1]) {
case 'o':
if (argv[pos][2] != '\0') {
output = &argv[pos][2];
} else if (pos < argc - 1) {
pos++;
output = argv[pos];
} else {
printf(" ***** Option %s without required value.\n", argv[pos]);
} /* if */
break;
case 'c':
doLink = 0;
break;
case 'I':
if (argv[pos][2] != '\0') {
includeDir = &argv[pos][2];
} else if (pos < argc - 1) {
pos++;
includeDir = argv[pos];
} else {
printf(" ***** Option %s without required value.\n", argv[pos]);
} /* if */
break;
default:
printf(" ***** Unknown option: %s\n", argv[pos]);
break;
} /* switch */
} else {
len = strlen(argv[pos]);
if (len >= 3 && strcmp(&argv[pos][len - 2], ".c") == 0) {
source = argv[pos];
assembler = malloc(len + 5);
strcpy(assembler, source);
strcpy(&assembler[len - 2], ".asm");
obj = malloc(len + 3);
strcpy(obj, source);
strcpy(&obj[len - 2], ".obj");
} else {
printf(" ***** Unknown parameter: %s\n", argv[pos]);
} /* if */
} /* if */
} /* for */
if (source != NULL) {
if (includeDir != NULL) {
len = strlen(includeDir);
if (len > 1 && (includeDir[len - 1] == '/' ||
includeDir[len - 1] == '\\')) {
sprintf(command, "mcc64 -i:%s %s", includeDir, source);
} else {
sprintf(command, "mcc64 -i:%s/ %s", includeDir, source);
} /* if */
} else {
sprintf(command, "mcc64 %s", source);
} /* if */
printf("%s\n", command);
system(command);
sprintf(command, "nasm -fwin64 %s", assembler);
printf("%s\n", command);
system(command);
if (doLink) {
if (output != NULL) {
sprintf(command, "gcc -o %s %s", output, obj);
printf("%s\n", command);
system(command);
} else {
sprintf(command, "gcc %s", obj);
printf("%s\n", command);
system(command);
} /* if */
} /* if */
} else {
printf(" ***** No input files.\n");
} /* if */
return 0;
}
-------------------- end mcc.c --------------------

Both (mk_mcc64w.mak and mcc.c) are experimental.
I hope that helps to compile Seed7 with mcc64.c.


Regards,
Thomas Mertes
--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
bartc
2017-07-05 00:23:36 UTC
Permalink
Raw Message
Post by m***@gmail.com
Post by bartc
That sounds promising. As I said, I'll do another pass through the
sources later. (Compiling the 170Kloc sqlite.c, using my own headers
rather than another compiler's as I've used for that, is another qood
way of picking up missing elements of headers.)
I found more missing things in the header files of mcc64.c.
When I compile and run chkccomp.c it shows me that some functions are
Yeah, I found a LOT more than that!

However, I'm not sure this is going to work out. The standard/POSIX
library elements I can eventually put in place (however they must also
appear in MSVCRT.DLL or later versions such as MSVCR120.DLL).

The ones from windows.h are far more numerous, but that will be just a
hard slog to work through, adding one at a time. But it will have to be
specific to a project (eg. Seed7) as doing everything in windows.h is
not practical.

But then I came across a module that included winsock2.h. The MS version
of that is 4000 lines (and possible other includes; I haven't checked).
The mingw version is 1200 lines but includes perhaps 15 other special
headers in a special directory structure.

This is not something I'm going to be able to port across, and I
wouldn't want to do that work. It's no longer just a list of individual
function declarations or types that can be added on at a time, but a
complete system that needs to implemented as a whole. (And I've no idea
about winsock stuff so can't test it.)

So if Seed7 depends on that, then the mcc compiler is not going to work
with it. Not unless someone else comes in to look after supplying
headers. (Perhaps actually using ones in mingw, which seem to be public
domain, but they are very mingw-centric so the result is going to be
even more of a sprawling mess than the original, to get them adapted.)
Post by m***@gmail.com
You can try it yourself with the newest release of Seed7
(seed7_05_20170702.tgz). To compile Seed7 with mcc64 I use make7,
which is also part of the Seed7 package. Any other make utility is
also possible.
OK, I see you've put a lot of effort into this!

This morning [Tues] I tried compiling 150 .c files for Seed7, and mcc
failed on 55. gcc failed on about 20 (likely non-Windows), so around 35
to be worked on.

At the moment I think about half of those 35 now compile.

I will have a closer look at this build system tomorrow. And I'll update
the mcc64 and its headers then too.
Post by m***@gmail.com
if (len > 1 && (includeDir[len - 1] == '/' ||
includeDir[len - 1] == '\\')) {
sprintf(command, "mcc64 -i:%s %s", includeDir, source);
} else {
sprintf(command, "mcc64 -i:%s/ %s", includeDir, source);
Yeah, I should have added the trailing '/' logic so that it was
optional. It seemed simple enough to require people to type it; I didn't
expect someone else to have to code it!
--
bartc
jacobnavia
2017-07-05 14:12:03 UTC
Permalink
Raw Message
Not unless someone else comes in to look after supplying headers.
Just install lcc-win and be done with it Bart
bartc
2017-07-05 15:06:56 UTC
Permalink
Raw Message
Post by jacobnavia
Not unless someone else comes in to look after supplying headers.
Just install lcc-win and be done with it Bart
My project was a C compiler, which everyone keeps telling me is
completely separate from every other part of a practical language
system, including even standard headers.

Standard headers, OK, but I didn't bargain for all these other headers,
which apparently are so difficult to write that they HAVE to be
customised to each compiler, even on the same platform.

Anyway I have your compiler and a number of others. But it's rare that I
have a substantial program that compiles effortlessly with all of them.

And some have their own problems (tcc doesn't have winsock2.h either,
nor its plethora of dependent headers, although it's more likely to be
able to use mingw's headers with less change).

Because I disagree with the idea that platform-specific headers need to
be written in incompatible, non-standard code, with dedicated versions
that only work with one compiler (and often incomprehensibly), I'm not
going to compound the issue by doing the same!
--
Bartc
Ian Collins
2017-07-05 19:31:21 UTC
Permalink
Raw Message
Post by bartc
Because I disagree with the idea that platform-specific headers need to
be written in incompatible, non-standard code, with dedicated versions
that only work with one compiler (and often incomprehensibly), I'm not
going to compound the issue by doing the same!
Considering many systems provide headers that work fine with any
compiler, that argument is nonsense.
--
Ian
bartc
2017-07-05 19:51:07 UTC
Permalink
Raw Message
Post by Ian Collins
Post by bartc
Because I disagree with the idea that platform-specific headers need to
be written in incompatible, non-standard code, with dedicated versions
that only work with one compiler (and often incomprehensibly), I'm not
going to compound the issue by doing the same!
Considering many systems provide headers that work fine with any
compiler, that argument is nonsense.
Example, please. And preferably also a link to the header in question.

(Maybe it is different in Linux where the system provides a bunch of
headers in /usr/include or wherever, where they might have restrianed
themselves from using gcc-isms, but the situation on Windows is different.

However I've just tried compiling:

#include <stdio.h>

(which for some reason has the need to pull in a couple of dozen more
headers), using the headers from /usr/include, and it fails with:

'__builtin_va_list'

being undefined. That is not a standard C feature, so what is it doing
in a universal header?)
--
bartc
Ian Collins
2017-07-05 20:03:10 UTC
Permalink
Raw Message
Post by bartc
Post by Ian Collins
Post by bartc
Because I disagree with the idea that platform-specific headers need to
be written in incompatible, non-standard code, with dedicated versions
that only work with one compiler (and often incomprehensibly), I'm not
going to compound the issue by doing the same!
Considering many systems provide headers that work fine with any
compiler, that argument is nonsense.
Example, please. And preferably also a link to the header in question.
Probably everything other than Windows...
--
Ian
David Brown
2017-07-05 20:19:20 UTC
Permalink
Raw Message
Post by bartc
Post by Ian Collins
Post by bartc
Because I disagree with the idea that platform-specific headers need to
be written in incompatible, non-standard code, with dedicated versions
that only work with one compiler (and often incomprehensibly), I'm not
going to compound the issue by doing the same!
Considering many systems provide headers that work fine with any
compiler, that argument is nonsense.
Example, please. And preferably also a link to the header in question.
(Maybe it is different in Linux where the system provides a bunch of
headers in /usr/include or wherever, where they might have restrianed
themselves from using gcc-isms, but the situation on Windows is different.
#include <stdio.h>
(which for some reason has the need to pull in a couple of dozen more
'__builtin_va_list'
being undefined. That is not a standard C feature, so what is it doing
in a universal header?)
There is a distinction here between OS headers and standard C library
headers. I suspect that when Ian says "many systems provide headers
that work fine with any compiler", he means many /operating systems/
provide headers that work with any compiler /targeting that OS/. Thus
if you look at, say, /usr/include/linux/time.h that is part of the Linux
system headers, then it is standard C and does not require anything
compiler-specific. (It turns out that many other Linux headers use
extensions supported by gcc, clang, icc, and probably several other
compilers - but which are still not standard. Presumably Ian is
thinking of other OS's.)

It is fine for an OS to make certain requirements of compilers in its
headers. For example, an OS will specify a certain ABI (like "int is
32-bit, long is 64-bit"), and can happily assume these things in its
headers.

There is nothing stopping MS from providing headers for the Windows APIs
in standard C, plus any (if needed) specified required extensions that
all compilers for Windows would have to support (such as for dealing
with different calling conventions). If MS choose not to specify these
requirements, and not to provide such standard headers, that is the
choice of MS - it is not a limitation of C. Other OS's (according to
Ian) manage it.

Standard C library headers, on the other hand, are less flexible - some
parts cannot be written in pure standard C. And some parts can give
more features using compiler-specific extensions. Much of it can be
independent, and there are many C standard libraries that are at least
/mostly/ independent of the compiler.

So <stdio.h> requires co-ordination with the compiler. But "windows.h"
does not.
Ben Bacarisse
2017-07-05 20:43:16 UTC
Permalink
Raw Message
bartc <***@freeuk.com> writes:

<snip>
Post by m***@gmail.com
#include <stdio.h>
(which for some reason has the need to pull in a couple of dozen more
'__builtin_va_list'
being undefined. That is not a standard C feature, so what is it doing
in a universal header?)
What do you mean by a universal header?
--
Ben.
bartc
2017-07-05 20:54:35 UTC
Permalink
Raw Message
Post by Ben Bacarisse
<snip>
Post by m***@gmail.com
#include <stdio.h>
(which for some reason has the need to pull in a couple of dozen more
'__builtin_va_list'
being undefined. That is not a standard C feature, so what is it doing
in a universal header?)
What do you mean by a universal header?
One containing generic C code not tied to an implementation (of C),
although what it may define things specific to a platform.

I assume the headers in /usr/include would be of that kind.
--
bartc
Ben Bacarisse
2017-07-05 21:40:21 UTC
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
<snip>
Post by m***@gmail.com
#include <stdio.h>
(which for some reason has the need to pull in a couple of dozen more
'__builtin_va_list'
being undefined. That is not a standard C feature, so what is it doing
in a universal header?)
What do you mean by a universal header?
One containing generic C code not tied to an implementation (of C),
although what it may define things specific to a platform.
I assume the headers in /usr/include would be of that kind.
Ah. That's not a reliable assumption. It may well be true for some
"API" header files (for example my /usr/include contains sqlite3.h which
is probably compiler agnostic) but it it very unlikely to be true for
language headers.
--
Ben.
James R. Kuyper
2017-07-05 21:01:13 UTC
Permalink
Raw Message
Post by Ben Bacarisse
<snip>
Post by m***@gmail.com
#include <stdio.h>
(which for some reason has the need to pull in a couple of dozen more
'__builtin_va_list'
being undefined. That is not a standard C feature, so what is it doing
in a universal header?)
What do you mean by a universal header?
Given the prior context of this discussion, I suspect that he's
referring to one of those headers which, according to Ian, "work fine
with any compiler". Why he thinks that <stdio.h> would be one of those
files is more of a mystery. As a C standard header, it's part of a C
implementation, and there's no particular reason why it should be
expected to work with any other implementation of C.

Now, it's commonplace for people to use a C compiler and a C standard
library provided by different vendors so there's a certain amount of
mix-and-match atmosphere which BartC doesn't seem to approve of.

However, there's nothing that says that arbitrary combinations of one
vendor's C compiler and a different vendor's C standard library are
guaranteed to work together to qualify as a conforming implementation of
C. That's something that needs to be determined on a case-by-case basis,
and it's not likely to be true unless at least one of the two vendors
went out of it's way to make it true (this is, in fact, often the case).
bartc
2017-07-05 21:24:22 UTC
Permalink
Raw Message
Post by James R. Kuyper
Post by Ben Bacarisse
<snip>
Post by m***@gmail.com
#include <stdio.h>
(which for some reason has the need to pull in a couple of dozen more
'__builtin_va_list'
being undefined. That is not a standard C feature, so what is it doing
in a universal header?)
What do you mean by a universal header?
Given the prior context of this discussion, I suspect that he's
referring to one of those headers which, according to Ian, "work fine
with any compiler". Why he thinks that <stdio.h> would be one of those
files is more of a mystery. As a C standard header, it's part of a C
implementation, and there's no particular reason why it should be
expected to work with any other implementation of C.
OK. For which C implementations is /usr/include/stdio.h for?
--
bartc
j***@verizon.net
2017-07-05 23:21:19 UTC
Permalink
Raw Message
Post by bartc
Post by James R. Kuyper
Post by Ben Bacarisse
<snip>
Post by m***@gmail.com
#include <stdio.h>
(which for some reason has the need to pull in a couple of dozen more
'__builtin_va_list'
being undefined. That is not a standard C feature, so what is it doing
in a universal header?)
What do you mean by a universal header?
Given the prior context of this discussion, I suspect that he's
referring to one of those headers which, according to Ian, "work fine
with any compiler". Why he thinks that <stdio.h> would be one of those
files is more of a mystery. As a C standard header, it's part of a C
implementation, and there's no particular reason why it should be
expected to work with any other implementation of C.
OK. For which C implementations is /usr/include/stdio.h for?
That depends upon what's installed on your system. The corresponding file on my system says:

/* Define ISO C stdio on top of C++ iostreams.
Copyright (C) 1991, 1994-2012 Free Software Foundation, Inc.
This file is part of the GNU C Library.

What does it say on your system?

The GNU C Library can be used in combination with gcc to constitute a fully conforming implementation of C, but only if you bother setting the options that put it into fully conforming mode. Since I gather that you're opposed to such common-sense ideas as finding out what options those are, and how to use them, what you'll get if you use gcc without using any options is an implementation of GnuC, a language distinct from C, though with an very strong family resemblance to C. This is not a good place to discuss that language, a forum devoted to GnuC would be more appropriate.
bartc
2017-07-05 23:51:04 UTC
Permalink
Raw Message
Post by j***@verizon.net
Post by bartc
OK. For which C implementations is /usr/include/stdio.h for?
/* Define ISO C stdio on top of C++ iostreams.
Copyright (C) 1991, 1994-2012 Free Software Foundation, Inc.
This file is part of the GNU C Library.
What does it say on your system?
About the same. (This is in a copy of /usr/include and associated
directories that I have under Windows when I was testing things under
Windows.)
Post by j***@verizon.net
The GNU C Library can be used in combination with gcc to constitute a fully conforming implementation of C, but only if you bother setting the options that put it into fully conforming mode.
I'm using a generic C compiler. It won't understand any such options.

Since I gather that you're opposed to such common-sense ideas as
finding out what options those are, and how to use them, what you'll get
if you use gcc without using any options is an implementation of GnuC, a
language distinct from C, though with an very strong family resemblance
to C. This is not a good place to discuss that language, a forum devoted
to GnuC would be more appropriate.
You mention gcc a lot. So is it only for gcc? Keith says his version is
used by gcc, clang and tcc.

clang is a clone of gcc. tcc I found tends to use some of the same
headers as mingw. So it sounds like those three are very pally in having
to deal with each other's headers. But are the headers actually in more
or less generic C, or some gcc dialect?

Well, one of the first things in that file is __BEGIN_DECLS. Not very
standard, but that was a red herring (it's used for C++ otherwise is an
empty macro).

My processing of this stdio.h - which includes a lot of other stuff -
still hits a brick wall at __builtin_va_list. This is used here (my
indentation):

#ifndef __GNUC_VA_LIST
#define __GNUC_VA_LIST
typedef __builtin_va_list __gnuc_va_list;
#endif

So it seems that builtin_va_list is something special that needs to be
known to any compiler using this set of headers.

From your comments, you seem to be suggesting that these headers are
for a language called GnuC, of which __builtin_va_list is one of the
features.

So I still haven't found a platform header than is in generic C and that
can be used by any C compiler. Ian didn't come back with an example
other than a glib remark (pun unintended).
--
bartc
Keith Thompson
2017-07-06 00:39:33 UTC
Permalink
Raw Message
Post by bartc
Post by j***@verizon.net
Post by bartc
OK. For which C implementations is /usr/include/stdio.h for?
That depends upon what's installed on your system. The corresponding
/* Define ISO C stdio on top of C++ iostreams.
Copyright (C) 1991, 1994-2012 Free Software Foundation, Inc.
This file is part of the GNU C Library.
What does it say on your system?
About the same. (This is in a copy of /usr/include and associated
directories that I have under Windows when I was testing things under
Windows.)
Post by j***@verizon.net
The GNU C Library can be used in combination with gcc to constitute a
fully conforming implementation of C, but only if you bother setting
the options that put it into fully conforming mode.
I'm using a generic C compiler. It won't understand any such options.
Can you invoke your "generic C compiler" (whatever that means) in a way
that causes it to (attempt to) be fully conforming?

[...]
Post by bartc
clang is a clone of gcc. tcc I found tends to use some of the same
headers as mingw. So it sounds like those three are very pally in having
to deal with each other's headers. But are the headers actually in more
or less generic C, or some gcc dialect?
What do you mean by "each other's headers"? They all (can) use headers
provided by GNU libc (libc6-dev:amd64), not by each other.
Post by bartc
Well, one of the first things in that file is __BEGIN_DECLS. Not very
standard, but that was a red herring (it's used for C++ otherwise is an
empty macro).
So, not a problem.
Post by bartc
My processing of this stdio.h - which includes a lot of other stuff -
still hits a brick wall at __builtin_va_list. This is used here (my
When I feed "__builtin_va_list" to gcc or clang, it recognizes it as
a type name. When I do the same to tcc, it complains. Apparently
/usr/include/stdio.h manages to avoid referring to __builtin_va_list
when invoked by a compiler that doesn't understand it. I'll leave
it to you to figure out how.

[...]
Post by bartc
So it seems that builtin_va_list is something special that needs to be
known to any compiler using this set of headers.
It seems not.
Post by bartc
From your comments, you seem to be suggesting that these headers are
for a language called GnuC, of which __builtin_va_list is one of the
features.
So I still haven't found a platform header than is in generic C and that
can be used by any C compiler. Ian didn't come back with an example
other than a glib remark (pun unintended).
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
bartc
2017-07-06 01:13:34 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by bartc
I'm using a generic C compiler. It won't understand any such options.
Can you invoke your "generic C compiler" (whatever that means) in a way
that causes it to (attempt to) be fully conforming?
How would 'fully conforming' help it to make sense of non-standard
elements in a header?
Post by Keith Thompson
Post by bartc
My processing of this stdio.h - which includes a lot of other stuff -
still hits a brick wall at __builtin_va_list. This is used here (my
When I feed "__builtin_va_list" to gcc or clang, it recognizes it as
a type name. When I do the same to tcc, it complains. Apparently
/usr/include/stdio.h manages to avoid referring to __builtin_va_list
when invoked by a compiler that doesn't understand it.
So how does it define __gnuc_va_list? Can it do without? In that case
why is it needed at all?

Put yourself in a the position of generic C compiler - one that
understands only what's in the C standard, and it comes across this
typedef; how it is it supposed to ignore it? Or is it supposed to try
it, see the errors, then hack a way around it?
Post by Keith Thompson
I'll leave
it to you to figure out how.
OK, don't tell me. I can bypass that problem and come onto the next:
size_t hasn't been defined. Exactly where it should be defined is a bit
of a mystery, but I can see stuff like this:

/* We need `size_t' for the following definitions. */
#ifndef __size_t
typedef __SIZE_TYPE__ __size_t;
# if defined __USE_XOPEN || __USE_XOPEN2K8
typedef __SIZE_TYPE__ size_t;
# endif
#else
/* The GNU CC stddef.h version defines __size_t as empty. We need a real
definition. */
# undef __size_t
# define __size_t size_t
#endif

Nice and tidy - not. But it looks like size_t may be defined on top of
the existing type __SIZE_TYPE__.

Is __SIZE_TYPE__ another built-in feature of GnuC? Or is it itself
defined elsewhere?

This is the kind of thing that does my head in. Is no one capable of
writing just a straight declaration?. Getting back to my own sane
version, there stddef.h contains this:

typedef unsigned long long int size_t;

A brief of fresh air.

But back to the Linux headers. You really think there is nothing
remarkable about them, and that ANY C compiler should be able to make
use of those headers without problems, and without doing anything? Or
does it have to jump through some hoops first (ie. adapt itself to use
that specialised version)?
--
bartc
David Brown
2017-07-06 09:13:46 UTC
Permalink
Raw Message
Post by bartc
So how does it define __gnuc_va_list? Can it do without? In that case
why is it needed at all?
Support for variable argument functions (va_list, va_arg, etc.) cannot
be written in pure standard C. The same applies (I think) to offsetof,
to the C99 type-generic maths functions (prior to C11 _Generic), and a
few other points.

It is not /possible/ to make headers like <stdarg.h> and <stdlib.h> that
are totally portable.

These headers are part of a /C standard library/, which is part of the
/implementation/ (along with the compiler). They are /not/ platform
headers, or OS headers. And headers like these require a certain amount
of cooperation between the library writers and the compiler writers.

It is quite possible to write a C standard library that works on a range
of systems with a range of compilers. It is quite possible to write a
compiler that works with a range of C standard libraries. Both of these
are done - glibc is used with many compilers, not just gcc. And gcc is
used with many libraries, not just glibc. But there is no such thing as
a "generic C compiler" or a "generic C standard library" that can be
used interchangeably.

So if you want your compiler to work with glibc headers and library (or
any other particular standard library), you have two choices. Modify
the headers and libraries to support your compiler, or modify your
compiler to support the library.
Post by bartc
Put yourself in a the position of generic C compiler - one that
understands only what's in the C standard, and it comes across this
typedef; how it is it supposed to ignore it? Or is it supposed to try
it, see the errors, then hack a way around it?
Post by Keith Thompson
I'll leave
it to you to figure out how.
size_t hasn't been defined. Exactly where it should be defined is a bit
/* We need `size_t' for the following definitions. */
#ifndef __size_t
typedef __SIZE_TYPE__ __size_t;
# if defined __USE_XOPEN || __USE_XOPEN2K8
typedef __SIZE_TYPE__ size_t;
# endif
#else
/* The GNU CC stddef.h version defines __size_t as empty. We need a real
definition. */
# undef __size_t
# define __size_t size_t
#endif
Nice and tidy - not. But it looks like size_t may be defined on top of
the existing type __SIZE_TYPE__.
The good folks that specified the standard library decided that size_t
should be used in more than one header (<stdlib.h> and <stddef.h> at
least). They also said that you should be able to include either one of
these or both, in any order. They also failed to make a distinction
between things that are really tightly tied to the compiler details
(such as VA support, setjmp, offsetof, numeric limits, maths functions
that are implemented directly as cpu instructions), things that are
generic C code (sorting functions, other maths functions), and things
that are OS specific (file functions). I can't answer for other people,
but to me this is a bit unfortunate and makes the whole process of
writing libraries and putting together compiler implementations a good
deal more complicated than it could otherwise have been. However, that
is the way C is, and we who work with C, must live with it.

A consequence of that sort of thing is conditional compilation stuff
like you see above. This hides such complications from the user. So
the user - the programmer - can include <stdlib.h>, or <stddef.h>, or
whatever headers he wants, as often as he wants, in any order he wants,
and use "size_t" as he likes. He does not /care/ if "size_t" is defined
in one header or another, or in a compiler-specific fashion, or by
headers provided by the library vendor or by headers provided by the
compiler vendor. To the user, it all just works.

For /you/, as a compiler /implementer/, you need to figure out this
stuff. As a user, it doesn't matter as long as it works.
Post by bartc
Is __SIZE_TYPE__ another built-in feature of GnuC? Or is it itself
defined elsewhere?
This is the kind of thing that does my head in. Is no one capable of
writing just a straight declaration?. Getting back to my own sane
typedef unsigned long long int size_t;
A brief of fresh air.
For a simple system, that is probably enough. Your libraries and
headers don't need to support a range of architectures, or a range of C
standards, or a range of operating systems, or a range of compilers, or
a range of compiler options. You don't need to be concerned about the
possibility of other headers from other places defining the identifier
in a different way (perhaps a macro rather than a typedef, or perhaps to
a different underlying type of the same size).

Of course it would be nicer if the C language made such simple
definitions practical in more serious use-cases. But again, that is not
the way the C world is, and if you want to live and work in that world
you have to deal with it. Remember, one of the guiding principles of
the C committee when choosing to make choices to the language - or
choosing to keep existing odd behaviour - is that they do not care how
much effort is involved for implementers (of the library or compiler).
They only care about the effort for programmers. And for a C
programmer, there is nothing wrong with the collection of preprocessor
stuff in the earlier definition - your nice, clean alternative provides
no advantages at all to the programmer.
Post by bartc
But back to the Linux headers. You really think there is nothing
remarkable about them, and that ANY C compiler should be able to make
use of those headers without problems, and without doing anything? Or
does it have to jump through some hoops first (ie. adapt itself to use
that specialised version)?
Have you looked at the /Linux/ headers, rather than the glibc standard C
library headers?

Again, I can't answer for other people, but a quick look at a few of the
Linux headers on my system suggest that they will work for any C
compilers that can generate code for Linux - but not for C compilers
that generate code for other systems or other targets. So I don't
expect to be able to use them with a compiler for an 8-bit AVR
microcontroller, even though that compiler is gcc.
bartc
2017-07-06 10:29:31 UTC
Permalink
Raw Message
Post by David Brown
Post by bartc
So how does it define __gnuc_va_list? Can it do without? In that case
why is it needed at all?
Support for variable argument functions (va_list, va_arg, etc.) cannot
be written in pure standard C. The same applies (I think) to offsetof,
to the C99 type-generic maths functions (prior to C11 _Generic), and a
few other points.
It is not /possible/ to make headers like <stdarg.h> and <stdlib.h> that
are totally portable.
These headers are part of a /C standard library/, which is part of the
/implementation/ (along with the compiler). They are /not/ platform
headers, or OS headers.
We're talking about the headers which are in /usr/include of a Linux system.

I thought that was a special place, and not for containing specific
headers for specific C implementations, or even for non-C languages.

So the question I've been asking is, for which C implementation are they
for? The answer appears to be that they are for a special version of C
called GnuC. But that isn't quite enough, as I don't think GnuC is a
compiler.

What I've been hinting at in the past is the Linux and gcc are so
closely intertwined that you can't prise them apart. But people keep
saying that that is not the case.
Post by David Brown
And headers like these require a certain amount
of cooperation between the library writers and the compiler writers.
There are a very small set of features that require 'magic'. For
example, if the same stddef.h was used for both 32- and 64-bit versions
of a compiler, then size_t might be defined differently. How does the
header know whether it's being invoked in 32- or 64-bit mode?

But as I said that is a very small set that can be localised (to one
special header for example).
Post by David Brown
It is quite possible to write a C standard library that works on a range
of systems with a range of compilers. It is quite possible to write a
compiler that works with a range of C standard libraries. Both of these
are done - glibc is used with many compilers, not just gcc. And gcc is
used with many libraries, not just glibc. But there is no such thing as
a "generic C compiler" or a "generic C standard library" that can be
used interchangeably.
So if you want your compiler to work with glibc headers and library (or
any other particular standard library), you have two choices. Modify
the headers and libraries to support your compiler, or modify your
compiler to support the library.
I'm more interested in Windows at the moment. And many headers that are
used by some applications do seem to have to come with the compiler. And
each compiler's headers do seem to be incompatible with any other.

But I'm not prepared to have to duplicate another compiler's features
just to use their headers. I don't believe that should be necessary /for
most libraries/.

Why does every compiler have a different set of headers for winsock2.h?
Does it really have to do a task so difficult that it requires intrinsic
language support?
Post by David Brown
For a simple system, that is probably enough. Your libraries and
headers don't need to support a range of architectures,
Why should they? You buy a machine with a fixed architecture, and it
either comes with the right headers, or there are available the headers
suited for that. I'm saying such headers can be 99% generic across
compilers, not across every possible platform.
Post by David Brown
Of course it would be nicer if the C language made such simple
definitions practical in more serious use-cases. But again, that is not
the way the C world is, and if you want to live and work in that world
you have to deal with it.
No you don't. Some people can make a stand. If I had the time and
inclination, and didn't prefer to work on my own saner stuff (which, for
some inexplicable reason, relaxes me rather infuriates me) then I
believe I could write a set of headers which will work for any compiler.
Post by David Brown
Post by bartc
But back to the Linux headers. You really think there is nothing
remarkable about them, and that ANY C compiler should be able to make
use of those headers without problems, and without doing anything? Or
does it have to jump through some hoops first (ie. adapt itself to use
that specialised version)?
Have you looked at the /Linux/ headers, rather than the glibc standard C
library headers?
Again, I can't answer for other people, but a quick look at a few of the
Linux headers on my system suggest that they will work for any C
compilers that can generate code for Linux - but not for C compilers
that generate code for other systems or other targets.
I don't know what you mean. I have something called the Linux kernel
sources, which seem to consist of over 19,600 .h files. I suppose if I
had several lifetimes, I can browse through all of them. Otherwise
looking at random files is a waste of time.
--
bartc
Ben Bacarisse
2017-07-06 10:55:03 UTC
Permalink
Raw Message
bartc <***@freeuk.com> writes:
<snip>
Post by bartc
We're talking about the headers which are in /usr/include of a Linux system.
I thought that was a special place, and not for containing specific
headers for specific C implementations, or even for non-C languages.
So the question I've been asking is, for which C implementation are
they for? The answer appears to be that they are for a special version
of C called GnuC. But that isn't quite enough, as I don't think GnuC
is a compiler.
/use/include is, traditionally, used to hold the include files for the
system's default compiler -- the one called cc (often these days just a
link).

When that's gcc (as it almost always is), the headers in /use/include
are not for any one language but for gcc. For example, they will
include C++ headers as well.

Other headers there (usually in sub-directories) may be compiler
agnostic.
Post by bartc
What I've been hinting at in the past is the Linux and gcc are so
closely intertwined that you can't prise them apart. But people keep
saying that that is not the case.
My phone has a Linux system with no C compiler at all. I'd call that
prising them apart. But what you probably mean that Linux and gcc are
very closely bound up with each other, something no one has disagreed
with.

<snip>
--
Ben.
Keith Thompson
2017-07-06 16:10:25 UTC
Permalink
Raw Message
Ben Bacarisse <***@bsb.me.uk> writes:
[...]
Post by Ben Bacarisse
My phone has a Linux system with no C compiler at all. I'd call that
prising them apart. But what you probably mean that Linux and gcc are
very closely bound up with each other, something no one has disagreed
with.
(My phone has a Linux kernel and a C compiler, but that's beside
the point.)

The Linux kernel depends on gcc, or on some compiler that's very
close to it (I'm not sure whether clang can build the kernel).
gcc does not particularly depend on Linux. For example, gcc runs
on various non-Linux UNIX-like systems (and typically works with a
library other than glibc on such systems). It even runs on Windows.
I'm fairly sure that gcc is older than the Linux kernel.

As I recall, when you install gcc from source on Solaris (a non-Linux
UNIX system), it runs something called "fixincludes" that creates
modified versions of some Solaris-provided header files; gcc's
include file search path is then set to use the modified headers.
I don't believe that step is necessary on systems with the a Linux
kernel and glibc.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Brown
2017-07-06 13:11:10 UTC
Permalink
Raw Message
Post by bartc
Post by David Brown
Post by bartc
So how does it define __gnuc_va_list? Can it do without? In that case
why is it needed at all?
Support for variable argument functions (va_list, va_arg, etc.) cannot
be written in pure standard C. The same applies (I think) to offsetof,
to the C99 type-generic maths functions (prior to C11 _Generic), and a
few other points.
It is not /possible/ to make headers like <stdarg.h> and <stdlib.h> that
are totally portable.
These headers are part of a /C standard library/, which is part of the
/implementation/ (along with the compiler). They are /not/ platform
headers, or OS headers.
We're talking about the headers which are in /usr/include of a Linux system.
I thought that was a special place, and not for containing specific
headers for specific C implementations, or even for non-C languages.
It is a traditional place for include files on a *nix system. Since
*nix systems traditionally have a C compiler (and it is gcc on Linux
systems, either gcc or clang on BSD systems, clang on MacOS, and could
be gcc or something else on other *nix) and a standard C library, this
is where the include files for these go. It is also full of platform
include files (for Linux, these are in the linux subdirectory), and lots
of other libraries put their include files there. And headers that are
distributed with the compiler itself, rather than a C library, will also
be there somewhere.

That does not mean that everything in that directory is a platform
header, or a standard library header, or compiler specific, or compiler
independent. There is a lot in there.
Post by bartc
So the question I've been asking is, for which C implementation are they
for? The answer appears to be that they are for a special version of C
called GnuC. But that isn't quite enough, as I don't think GnuC is a
compiler.
I don't know exact details, but glibc supports a range of compilers. It
certainly makes use of some gcc extensions (various __attribute__
annotations, for example). Many of these are supported on other
compilers (clang, icc, CodeWarrior, etc.). For some parts at least, a
compiler will have to support these extensions to use the headers.

Some of the Linux platform headers also appear to require a few of these
extensions.
Post by bartc
What I've been hinting at in the past is the Linux and gcc are so
closely intertwined that you can't prise them apart. But people keep
saying that that is not the case.
People say it is not the case, because it is not the case. And we keep
saying it, because you repeatedly get things mixed up. When a whole lot
of people say the same thing that contradicts /your/ views, perhaps it
is time to consider that /you/ might be wrong?

gcc does /not/ require Linux. I use gcc for a dozen different targets,
most of which are /not/ running Linux. Mostly the host computer is
Linux, but sometimes it is Windows. The targets range from 8-bit
devices to 64-bit devices. They use a variety of C libraries - in some
cases they are propriety third-party libraries, sometimes the
general-purpose embedded library "newlib" (or "newlib-nano"), sometimes
an embedded Linux library (maybe glibc, maybe a different one), and
sometimes dedicated libraries written specifically for the target.

Linux does not require gcc to run either. When we make embedded Linux
systems, gcc is rarely included in the system.

Currently, gcc /is/ needed to /compile/ Linux - I don't believe the work
on modifying clang and the Linux kernel to work together is complete
yet. But remember, MSVC is needed to compile Windows - but would you
say that Windows and MSVC are "so intertwined that you can't prise them
apart" ?
Post by bartc
Post by David Brown
And headers like these require a certain amount
of cooperation between the library writers and the compiler writers.
There are a very small set of features that require 'magic'. For
example, if the same stddef.h was used for both 32- and 64-bit versions
of a compiler, then size_t might be defined differently. How does the
header know whether it's being invoked in 32- or 64-bit mode?
There is not much "magic" required, I agree. But there is some. And
there may be more features available when "magic" is used.

For example, there is a traditional implementation of "offsetof":

#define offsetof(st, m) ((size_t)&(((st *)0)->m))

This works fine for many compilers, but is actually undefined behaviour
- some compilers might generate unexpected code when this is used. If
you are using gcc or a compiler that supports this extension, you can use:

#define offsetof(st, m) __builtin_offsetof(st, m)

For gcc, this is not undefined behaviour.

And it is quite possible to see lots of extra "magic" used when
available. How do you know if "size_t" should be typedef'ed to
"unsigned long int" or "unsigned long long int", or something else?
Lots of compilers define internal names like __size_t__ or _SIZE_T or
other reserved identifiers. If your library header can use:

typedef __SIZE_TYPE_ size_t;

from the compiler, then it is /better/ than using:

typedef unsigned long int size_t;

or
typedef unsigned long long int size_t;

The compiler-determined version will be correct regardless of the OS or
its size ("unsigned long int" will work for 32-bit and 64-bit Linux, but
not 64-bit Windows). And compiler-specific features like this may give
the compiler more information in some cases.
Post by bartc
But as I said that is a very small set that can be localised (to one
special header for example).
That is certainly possible to do. It is common in embedded operating
systems, for example, to have a single header called "portability.h" or
"compiler.h" with compiler/target specific settings. Then the other
headers can use that file rather than having to use conditional
compilation in lots of places. Why doesn't glibc use this method? I
have no idea.
Post by bartc
Post by David Brown
It is quite possible to write a C standard library that works on a range
of systems with a range of compilers. It is quite possible to write a
compiler that works with a range of C standard libraries. Both of these
are done - glibc is used with many compilers, not just gcc. And gcc is
used with many libraries, not just glibc. But there is no such thing as
a "generic C compiler" or a "generic C standard library" that can be
used interchangeably.
So if you want your compiler to work with glibc headers and library (or
any other particular standard library), you have two choices. Modify
the headers and libraries to support your compiler, or modify your
compiler to support the library.
I'm more interested in Windows at the moment. And many headers that are
used by some applications do seem to have to come with the compiler. And
each compiler's headers do seem to be incompatible with any other.
But I'm not prepared to have to duplicate another compiler's features
just to use their headers. I don't believe that should be necessary /for
most libraries/.
For most libraries, it is not (or at least, "should not be") necessary.
Most libraries can be written in pure standard C. The C standard
library is different. And low-level OS libraries may also depend on
compiler specifics. (A classic example is the common "alloca" function
- it cannot be written in C, but it is part of the Posix standard.)
Post by bartc
Why does every compiler have a different set of headers for winsock2.h?
Does it really have to do a task so difficult that it requires intrinsic
language support?
Now that I /don't/ know. I expect there to be a small amount of "magic"
there, to get the right calling conventions (because MS never
established a standard ABI for Windows, and the calling convention used
for the Win32 API is different from that used by any c compiler). But
that should be handled by a single macro definition somewhere.
Post by bartc
Post by David Brown
For a simple system, that is probably enough. Your libraries and
headers don't need to support a range of architectures,
Why should they? You buy a machine with a fixed architecture, and it
either comes with the right headers, or there are available the headers
suited for that. I'm saying such headers can be 99% generic across
compilers, not across every possible platform.
Platform headers should be generic for all compilers suited for the
platform, within a reasonable range of supported standards. But
standard C library headers will only work with particular given
compilers. And if you make a C library that will work with many
compilers, it will necessarily have a large amount of compiler
adaptation pre-processing.
Post by bartc
Post by David Brown
Of course it would be nicer if the C language made such simple
definitions practical in more serious use-cases. But again, that is not
the way the C world is, and if you want to live and work in that world
you have to deal with it.
No you don't. Some people can make a stand. If I had the time and
inclination, and didn't prefer to work on my own saner stuff (which, for
some inexplicable reason, relaxes me rather infuriates me) then I
believe I could write a set of headers which will work for any compiler.
Not a C standard library, no - you could not.

And even for the bits you /could/ write, and even picking a limited
range of compilers (say, the compilers you use at the moment for
Windows), then if you made them work /well/ for those compilers you'd
find they are full of macros and conditionals. When you look at a
header and find it has lots of gcc __attributes__ marking parameters as
nonnull, or giving alignments, etc., they are there for good reason.
They let the compiler do a better job. So your new library header needs
to have these too - otherwise it is not as good as the old one.

Now, I am sure you could organise your new headers better than, say,
glibc, when starting from scratch. Putting more of the
compiler-specific stuff into a few common headers would IMHO be nicer
organisation. And I am sure you could tidy up some of the most obscure
and outdated systems and compilers.

But you can't make such a simple and neat set of headers as you think
you can, and still support a range of compilers and systems.
Post by bartc
Post by David Brown
Post by bartc
But back to the Linux headers. You really think there is nothing
remarkable about them, and that ANY C compiler should be able to make
use of those headers without problems, and without doing anything? Or
does it have to jump through some hoops first (ie. adapt itself to use
that specialised version)?
Have you looked at the /Linux/ headers, rather than the glibc standard C
library headers?
Again, I can't answer for other people, but a quick look at a few of the
Linux headers on my system suggest that they will work for any C
compilers that can generate code for Linux - but not for C compilers
that generate code for other systems or other targets.
I don't know what you mean. I have something called the Linux kernel
sources, which seem to consist of over 19,600 .h files. I suppose if I
had several lifetimes, I can browse through all of them. Otherwise
looking at random files is a waste of time.
I mean look at the /Linux/ headers, rather than the standard C headers.
You know which ones are the standard C headers - these are described in
the C standards. The Linux headers are quite easy to spot - they are
the ones in the Linux kernel sources you have looked at, or in the
/usr/include/linux directory.
s***@casperkitty.com
2017-07-06 19:22:25 UTC
Permalink
Raw Message
Post by David Brown
For most libraries, it is not (or at least, "should not be") necessary.
Most libraries can be written in pure standard C. The C standard
library is different. And low-level OS libraries may also depend on
compiler specifics. (A classic example is the common "alloca" function
- it cannot be written in C, but it is part of the Posix standard.)
If a platform specifies that variadic functions will have their arguments
laid out in a certain way, and specifies the format for va_list, then
compilers *which are suitable for low-level programming on that platform*
should make it possible to implement <stdarg.h> in a fashion that will
work on all such compilers. Likewise if a platform defines a function
like sbrk() which can be used to acquire memory, even if it has restrictions
such as 4K granularity, implementations which are suitable for low-level
programming will make it possible to implement malloc() and friends in C
code that will work on all such compilers.

Of course, the Standard does not require that all implementations be
suitable for low-level programming use, and doesn't say what features such
implementations should provide, but an ability to manipulate addresses and
convert them into usable pointers would seem a key prerequisite.
Post by David Brown
Platform headers should be generic for all compilers suited for the
platform, within a reasonable range of supported standards. But
standard C library headers will only work with particular given
compilers. And if you make a C library that will work with many
compilers, it will necessarily have a large amount of compiler
adaptation pre-processing.
Platform-dependent aspects will generally be unavoidable, but it will
usually not be difficult to write code for a platform that will work on
any compiler for that platform which by default interprets Undefined
Behavior as "behave in a documented characteristic of the environment"
in cases where environment documents a behavior and the Standard doesn't
mandate anything else.

While the Standard doesn't talk about what environments document what
characteristic behaviors, on most platforms it's pretty clear what sorts
of behaviors should be "passed through" by a compiler that is trying to
maximize compatibility with low-level programs.
David Kleinecke
2017-07-07 00:23:13 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Brown
For most libraries, it is not (or at least, "should not be") necessary.
Most libraries can be written in pure standard C. The C standard
library is different. And low-level OS libraries may also depend on
compiler specifics. (A classic example is the common "alloca" function
- it cannot be written in C, but it is part of the Posix standard.)
If a platform specifies that variadic functions will have their arguments
laid out in a certain way, and specifies the format for va_list, then
compilers *which are suitable for low-level programming on that platform*
should make it possible to implement <stdarg.h> in a fashion that will
work on all such compilers. Likewise if a platform defines a function
like sbrk() which can be used to acquire memory, even if it has restrictions
such as 4K granularity, implementations which are suitable for low-level
programming will make it possible to implement malloc() and friends in C
code that will work on all such compilers.
Of course, the Standard does not require that all implementations be
suitable for low-level programming use, and doesn't say what features such
implementations should provide, but an ability to manipulate addresses and
convert them into usable pointers would seem a key prerequisite.
Post by David Brown
Platform headers should be generic for all compilers suited for the
platform, within a reasonable range of supported standards. But
standard C library headers will only work with particular given
compilers. And if you make a C library that will work with many
compilers, it will necessarily have a large amount of compiler
adaptation pre-processing.
Platform-dependent aspects will generally be unavoidable, but it will
usually not be difficult to write code for a platform that will work on
any compiler for that platform which by default interprets Undefined
Behavior as "behave in a documented characteristic of the environment"
in cases where environment documents a behavior and the Standard doesn't
mandate anything else.
While the Standard doesn't talk about what environments document what
characteristic behaviors, on most platforms it's pretty clear what sorts
of behaviors should be "passed through" by a compiler that is trying to
maximize compatibility with low-level programs.
One solution - which the standard committee did not adopt -
would have been to "shall" that the arguments of a function
be laid out in the same way as a struct with the same
types as the parameter definitions.

I hope that their reason for not doing this was something
more substantial than that struct use semicolons and functions
commas.
Richard Damon
2017-07-07 03:13:24 UTC
Permalink
Raw Message
Post by David Kleinecke
One solution - which the standard committee did not adopt -
would have been to "shall" that the arguments of a function
be laid out in the same way as a struct with the same
types as the parameter definitions.
I hope that their reason for not doing this was something
more substantial than that struct use semicolons and functions
commas.
A better reason is that layout may not be easy to get, or violates the
platforms ABI.

One big issue is that many processors have downward growing stacks, and
place the parameters on the stack in the order declared (sort of needed
for vardic functions), for machines like this, given a function like

int foo(int a, int b)

the address of a is likely larger than the address of b, while for a
struct I believe that given

struct foo {
int a;
int b;
}

then for a given object of type foo, the address of a needs to be less
than the address of b.

Also, many ABI's put some of the parameters in registers for the call,
so they don't exist anywhere in memory by default.
Richard Bos
2017-07-07 05:47:15 UTC
Permalink
Raw Message
Post by Richard Damon
Post by David Kleinecke
One solution - which the standard committee did not adopt -
would have been to "shall" that the arguments of a function
be laid out in the same way as a struct with the same
types as the parameter definitions.
I hope that their reason for not doing this was something
more substantial than that struct use semicolons and functions
commas.
A better reason is that layout may not be easy to get, or violates the
platforms ABI.
Or that it is sometimes more efficient to pass some parameters in
registers and others on the stack, even in the same function call.

Richard
David Brown
2017-07-07 08:49:41 UTC
Permalink
Raw Message
Post by Richard Bos
Post by Richard Damon
Post by David Kleinecke
One solution - which the standard committee did not adopt -
would have been to "shall" that the arguments of a function
be laid out in the same way as a struct with the same
types as the parameter definitions.
I hope that their reason for not doing this was something
more substantial than that struct use semicolons and functions
commas.
A better reason is that layout may not be easy to get, or violates the
platforms ABI.
Or that it is sometimes more efficient to pass some parameters in
registers and others on the stack, even in the same function call.
Or that if the compiler knows all about a function and its uses (it is
static, and doesn't escape) then it can inline in, use different non-ABI
registers, re-arrange some of the parameters, replace some with
constants, etc.

And for some targets, the natural alignment and layout on the stack
(matching register "push" and "pop" instructions, for example) does not
match the alignments you want for efficient memory usage in structs.

Yes, I think the committee had lots of good reasons for not forcing
parameter layout to be like a struct!
s***@casperkitty.com
2017-07-07 14:30:09 UTC
Permalink
Raw Message
Post by David Brown
Or that if the compiler knows all about a function and its uses (it is
static, and doesn't escape) then it can inline in, use different non-ABI
registers, re-arrange some of the parameters, replace some with
constants, etc.
Of course, a compiler could do such things in cases where a function never
performs arithmetic on the addresses of its arguments nor exposes those
addresses to code that might do so, while still being compatible with code
that expects arguments to be laid out in the fashion consistent with its
ABI.

The Standard doesn't require that any implementations define an ABI at all,
but if an execution platform says that calls to outside functions must be
handled a certain way, and calls from outside functions will be handled that
way, it seems silly to say that a programmer needing semantics equivalent to
calling a function the compiler knows nothing about should have to jump
through hoops to achieve them.
David Kleinecke
2017-07-07 22:42:11 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Brown
Or that if the compiler knows all about a function and its uses (it is
static, and doesn't escape) then it can inline in, use different non-ABI
registers, re-arrange some of the parameters, replace some with
constants, etc.
Of course, a compiler could do such things in cases where a function never
performs arithmetic on the addresses of its arguments nor exposes those
addresses to code that might do so, while still being compatible with code
that expects arguments to be laid out in the fashion consistent with its
ABI.
The Standard doesn't require that any implementations define an ABI at all,
but if an execution platform says that calls to outside functions must be
handled a certain way, and calls from outside functions will be handled that
way, it seems silly to say that a programmer needing semantics equivalent to
calling a function the compiler knows nothing about should have to jump
through hoops to achieve them.
Thanks guys. But you haven't offered any compelling reasons.
At least nothing that compels me. How specific hardware
implementations handle arguments and structs are not reasons
the standard committee should have paid attention to.

It would be child's play for a compiler to turn a function's
parameter set (expressed either new or old-style) into a
struct specification and make the function call a one argument
call to the struct. Since the struct is not a normal C struct
it could be allocated on the stack, if one wanted to, rather
than the heap. The call argument would be a pointer so a void
function would have a null argument.

But the standards committee didn't do this. So it is all a
pipe dream. Perhaps people inventing C alternatives might give
heed. It would make C more elegant (by removing variadics)
Richard Damon
2017-07-08 02:24:40 UTC
Permalink
Raw Message
Post by David Kleinecke
Post by s***@casperkitty.com
Post by David Brown
Or that if the compiler knows all about a function and its uses (it is
static, and doesn't escape) then it can inline in, use different non-ABI
registers, re-arrange some of the parameters, replace some with
constants, etc.
Of course, a compiler could do such things in cases where a function never
performs arithmetic on the addresses of its arguments nor exposes those
addresses to code that might do so, while still being compatible with code
that expects arguments to be laid out in the fashion consistent with its
ABI.
The Standard doesn't require that any implementations define an ABI at all,
but if an execution platform says that calls to outside functions must be
handled a certain way, and calls from outside functions will be handled that
way, it seems silly to say that a programmer needing semantics equivalent to
calling a function the compiler knows nothing about should have to jump
through hoops to achieve them.
Thanks guys. But you haven't offered any compelling reasons.
At least nothing that compels me. How specific hardware
implementations handle arguments and structs are not reasons
the standard committee should have paid attention to.
This statement flies against the basic design goal that started the C
language. Many pieces of the rules were done the way they were
specifically to allow some machine to be able to do things the way that
were natural for it.
Post by David Kleinecke
It would be child's play for a compiler to turn a function's
parameter set (expressed either new or old-style) into a
struct specification and make the function call a one argument
call to the struct. Since the struct is not a normal C struct
it could be allocated on the stack, if one wanted to, rather
than the heap. The call argument would be a pointer so a void
function would have a null argument.
This might be a fine attitude for a limited purpose language, but C was
intended to serve as a system building language, and saying ALL
functions will take only one direct parameter which will be a pointer to
a struct of all the parameters just doesn't work out. I will note that
if you want to define your function that way, there is nothing from
stopping you from defining that way, just create the struct type, and
passing it.

One other big issue is that C was designed as a very much '1-pass'
language. It was quite possible to write a compiler that just marched
through the file and generated machine code output without much need to
look ahead or go back. and what was required tended to fall into the
sort of thing that the linker would tend to be able to do. Again, with a
down growing stack, the compiler might need to look ahead quite a bit
(at all the parameters to the function) to determine how to even start
putting results into it.
Post by David Kleinecke
But the standards committee didn't do this. So it is all a
pipe dream. Perhaps people inventing C alternatives might give
heed. It would make C more elegant (by removing variadics)
Try to implement something with anywhere near the power of printf
without them. They are also a very good example of a function that
couldn't have its call parameters converted into a struct. For the sort
of language that C was aiming to be, varidics were important.
David Kleinecke
2017-07-08 03:00:40 UTC
Permalink
Raw Message
Post by Richard Damon
Post by David Kleinecke
Thanks guys. But you haven't offered any compelling reasons.
At least nothing that compels me. How specific hardware
implementations handle arguments and structs are not reasons
the standard committee should have paid attention to.
This statement flies against the basic design goal that started the C
language. Many pieces of the rules were done the way they were
specifically to allow some machine to be able to do things the way that
were natural for it.
Is there anything other than ++/-- that is aimed at specific
hardware?
Post by Richard Damon
Post by David Kleinecke
It would be child's play for a compiler to turn a function's
parameter set (expressed either new or old-style) into a
struct specification and make the function call a one argument
call to the struct. Since the struct is not a normal C struct
it could be allocated on the stack, if one wanted to, rather
than the heap. The call argument would be a pointer so a void
function would have a null argument.
This might be a fine attitude for a limited purpose language, but C was
intended to serve as a system building language, and saying ALL
functions will take only one direct parameter which will be a pointer to
a struct of all the parameters just doesn't work out.
I don't think you read my post carefully enough. I didn't
change what the programmer writing C code would do. I suggested
a change in how the compiler translated what the programmer
wrote.
Post by Richard Damon
One other big issue is that C was designed as a very much '1-pass'
language. It was quite possible to write a compiler that just marched
through the file and generated machine code output without much need to
look ahead or go back. and what was required tended to fall into the
sort of thing that the linker would tend to be able to do. Again, with a
down growing stack, the compiler might need to look ahead quite a bit
(at all the parameters to the function) to determine how to even start
putting results into it.
But that is, once again child's play for the compiler (because it
has seen the function prototype) for all but the variadic functions.
With a variadic all that is requires in temporary compilation to an
internal stack followed by turning the stack over into the actual
code. But maybe it would be better to write variadic arguments into
the heap rather than the stack. Lots of options for the compiler
writer.
Post by Richard Damon
Try to implement something with anywhere near the power of printf
without them. They are also a very good example of a function that
couldn't have its call parameters converted into a struct. For the sort
of language that C was aiming to be, varidics were important.
Of course I keep variadic functions. I really meant (and I regret
making yet another mistake) was losing the macros that implement
the variadics in real life C.
s***@casperkitty.com
2017-07-08 05:18:13 UTC
Permalink
Raw Message
Post by David Kleinecke
Post by Richard Damon
This might be a fine attitude for a limited purpose language, but C was
intended to serve as a system building language, and saying ALL
functions will take only one direct parameter which will be a pointer to
a struct of all the parameters just doesn't work out.
I don't think you read my post carefully enough. I didn't
change what the programmer writing C code would do. I suggested
a change in how the compiler translated what the programmer
wrote.
C implementations suitable for systems programming use are often called
upon to interact with pieces of code over which the compiler has no
control. If I need a C program to call an outside library function which
expects arguments to be laid out in memory a certain way, I need the
compiler to lay out its arguments that way. The C Standard makes no
distinction between declarations of functions in other compilation
units processed by the same compiler, versus those outside the compiler's
control. If a platform has one common method for laying out function
arguments, having a compiler use that convention all the time may be easier
than requiring that programmers make a distinction not provided for by the
Standard.
Ben Bacarisse
2017-07-08 11:19:51 UTC
Permalink
Raw Message
Post by David Kleinecke
Post by Richard Damon
Post by David Kleinecke
Thanks guys. But you haven't offered any compelling reasons.
At least nothing that compels me. How specific hardware
implementations handle arguments and structs are not reasons
the standard committee should have paid attention to.
This statement flies against the basic design goal that started the C
language. Many pieces of the rules were done the way they were
specifically to allow some machine to be able to do things the way that
were natural for it.
Is there anything other than ++/-- that is aimed at specific
hardware?
++ and -- were not aimed at any specific hardware.

But that's not really what Richard is saying. For an example, read what
little C has to say about << and >> or even about + in signed types.
The lack of prescription is designed to allow as many implementations as
possible to use a single machine instruction.

<snip>
Post by David Kleinecke
Of course I keep variadic functions. I really meant (and I regret
making yet another mistake) was losing the macros that implement
the variadics in real life C.
Languages that pass arguments as structs (often called tuples) can have
a neat mechanism for variadic functions only when the type system for
tuples is far more sophisticated than C's is for structs. I don't see
how you suggestion would help with variadic functions at all, unless you
are planning other major overhauls.
--
Ben.
David Kleinecke
2017-07-08 17:14:12 UTC
Permalink
Raw Message
Post by Ben Bacarisse
Post by David Kleinecke
Post by Richard Damon
Post by David Kleinecke
Thanks guys. But you haven't offered any compelling reasons.
At least nothing that compels me. How specific hardware
implementations handle arguments and structs are not reasons
the standard committee should have paid attention to.
This statement flies against the basic design goal that started the C
language. Many pieces of the rules were done the way they were
specifically to allow some machine to be able to do things the way that
were natural for it.
Is there anything other than ++/-- that is aimed at specific
hardware?
++ and -- were not aimed at any specific hardware.
It is my understanding that these exist solely because there
were hardware instructions doing these things on the machine
Ritchie was using.
Post by Ben Bacarisse
But that's not really what Richard is saying. For an example, read what
little C has to say about << and >> or even about + in signed types.
The lack of prescription is designed to allow as many implementations as
possible to use a single machine instruction.
I have observed that the standard says '+' yields the "sum"
but never defines "sum".

However this is a different kind of accommodation to hardware
than what I was talking about. Being vague to cover a multitude
of possibilities is one thing. Adding a language feature to
allow a specific hardware capabilities is another. For example,
C has no simple way to use the rotation instructions of the
Intel 86 family. I don't know that anyone would use them were
they somehow expressible in C but they are there.
Post by Ben Bacarisse
<snip>
Post by David Kleinecke
Of course I keep variadic functions. I really meant (and I regret
making yet another mistake) was losing the macros that implement
the variadics in real life C.
Languages that pass arguments as structs (often called tuples) can have
a neat mechanism for variadic functions only when the type system for
tuples is far more sophisticated than C's is for structs. I don't see
how you suggestion would help with variadic functions at all, unless you
are planning other major overhauls.
I don't see that at all. On one side the calling program builds
the argument structure on the basis of the types of the arguments.
On the other side printf, for example, builds the argument
structure from its % control entries. Everybody knows one must be
sure the two match - but the runtime is not required to check.
Post by Ben Bacarisse
--
Ben.
Ben Bacarisse
2017-07-08 17:34:08 UTC
Permalink
Raw Message
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Post by Richard Damon
Post by David Kleinecke
Thanks guys. But you haven't offered any compelling reasons.
At least nothing that compels me. How specific hardware
implementations handle arguments and structs are not reasons
the standard committee should have paid attention to.
This statement flies against the basic design goal that started the C
language. Many pieces of the rules were done the way they were
specifically to allow some machine to be able to do things the way that
were natural for it.
Is there anything other than ++/-- that is aimed at specific
hardware?
++ and -- were not aimed at any specific hardware.
It is my understanding that these exist solely because there
were hardware instructions doing these things on the machine
Ritchie was using.
That's a common misconception. See Ritchie's "The Development of the C
Language" internal memo. (I have no link right now -- it's moved about
in the past).
Post by David Kleinecke
Post by Ben Bacarisse
But that's not really what Richard is saying. For an example, read what
little C has to say about << and >> or even about + in signed types.
The lack of prescription is designed to allow as many implementations as
possible to use a single machine instruction.
I have observed that the standard says '+' yields the "sum"
but never defines "sum".
There's little need since it's a well-known concept. The freedom comes
from what it permitted on overflow.
Post by David Kleinecke
However this is a different kind of accommodation to hardware
than what I was talking about. Being vague to cover a multitude
of possibilities is one thing. Adding a language feature to
allow a specific hardware capabilities is another.
Of course, but I don't think that's the case. Do you have an example?

C has many features that are designed to make efficient use of hardware
capabilities, but they are, by and large, not specific hardware
capabilities. For example, banning functions from being nested means
that only the simplest and most widely available stack accesses are
needed. Pointers are clearly intended to map to addresses, but pretty
much every machine has suitable addresses. These things are not
inspired by specific hardware.

BTW, rotates are used in many cryptographic functions and modern
compilers are good at turning the equivalent pair of shifts into a
rotate instruction.
Post by David Kleinecke
For example,
C has no simple way to use the rotation instructions of the
Intel 86 family. I don't know that anyone would use them were
they somehow expressible in C but they are there.
But this is a counter-example, isn't it? If some early machine at Bell
Labs. had had a rotate instruction and such an operator had got into C
you would have an example of C being designed for specific hardware.
But that did not happen, either because no such machine had rotate, or
because C was not designed as you seem to think.
Post by David Kleinecke
Post by Ben Bacarisse
<snip>
Post by David Kleinecke
Of course I keep variadic functions. I really meant (and I regret
making yet another mistake) was losing the macros that implement
the variadics in real life C.
Languages that pass arguments as structs (often called tuples) can have
a neat mechanism for variadic functions only when the type system for
tuples is far more sophisticated than C's is for structs. I don't see
how you suggestion would help with variadic functions at all, unless you
are planning other major overhauls.
I don't see that at all.
That's fine. Maybe you could show us what you mean with an example?

<snip>
--
Ben.
Keith Thompson
2017-07-08 20:35:28 UTC
Permalink
Raw Message
[...]
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Is there anything other than ++/-- that is aimed at specific
hardware?
++ and -- were not aimed at any specific hardware.
It is my understanding that these exist solely because there
were hardware instructions doing these things on the machine
Ritchie was using.
That's a common misconception. See Ritchie's "The Development of the C
Language" internal memo. (I have no link right now -- it's moved about
in the past).
https://www.bell-labs.com/usr/dmr/www/chist.html

It's not an internal memo. It was published at the second History of
Programming Language Conference (HOPL II) in 1993.

Thompson went a step further by inventing the ++ and --
operators, which increment or decrement; their prefix or postfix
position determines whether the alteration occurs before or
after noting the value of the operand. They were not in the
earliest versions of B, but appeared along the way. People
often guess that they were created to use the auto-increment
and auto-decrement address modes provided by the DEC PDP-11
on which C and Unix first became popular. This is historically
impossible, since there was no PDP-11 when B was developed. The
PDP-7, however, did have a few `auto-increment' memory cells,
with the property that an indirect memory reference through
them incremented the cell. This feature probably suggested
such operators to Thompson; the generalization to make them
both prefix and postfix was his own. Indeed, the auto-increment
cells were not used directly in implementation of the operators,
and a stronger motivation for the innovation was probably his
observation that the translation of ++x was smaller than that
of x=x+1.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Ben Bacarisse
2017-07-08 22:36:57 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Is there anything other than ++/-- that is aimed at specific
hardware?
++ and -- were not aimed at any specific hardware.
It is my understanding that these exist solely because there
were hardware instructions doing these things on the machine
Ritchie was using.
That's a common misconception. See Ritchie's "The Development of the C
Language" internal memo. (I have no link right now -- it's moved about
in the past).
https://www.bell-labs.com/usr/dmr/www/chist.html
It's not an internal memo. It was published at the second History of
Programming Language Conference (HOPL II) in 1993.
I didn't know that (obviously!). Thank you for the reference.

<snip>
--
Ben.
David Kleinecke
2017-07-09 03:09:47 UTC
Permalink
Raw Message
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Post by Richard Damon
Post by David Kleinecke
Thanks guys. But you haven't offered any compelling reasons.
At least nothing that compels me. How specific hardware
implementations handle arguments and structs are not reasons
the standard committee should have paid attention to.
This statement flies against the basic design goal that started the C
language. Many pieces of the rules were done the way they were
specifically to allow some machine to be able to do things the way that
were natural for it.
Is there anything other than ++/-- that is aimed at specific
hardware?
++ and -- were not aimed at any specific hardware.
It is my understanding that these exist solely because there
were hardware instructions doing these things on the machine
Ritchie was using.
That's a common misconception. See Ritchie's "The Development of the C
Language" internal memo. (I have no link right now -- it's moved about
in the past).
Post by David Kleinecke
Post by Ben Bacarisse
But that's not really what Richard is saying. For an example, read what
little C has to say about << and >> or even about + in signed types.
The lack of prescription is designed to allow as many implementations as
possible to use a single machine instruction.
I have observed that the standard says '+' yields the "sum"
but never defines "sum".
There's little need since it's a well-known concept. The freedom comes
from what it permitted on overflow.
Post by David Kleinecke
However this is a different kind of accommodation to hardware
than what I was talking about. Being vague to cover a multitude
of possibilities is one thing. Adding a language feature to
allow a specific hardware capabilities is another.
Of course, but I don't think that's the case. Do you have an example?
C has many features that are designed to make efficient use of hardware
capabilities, but they are, by and large, not specific hardware
capabilities. For example, banning functions from being nested means
that only the simplest and most widely available stack accesses are
needed. Pointers are clearly intended to map to addresses, but pretty
much every machine has suitable addresses. These things are not
inspired by specific hardware.
BTW, rotates are used in many cryptographic functions and modern
compilers are good at turning the equivalent pair of shifts into a
rotate instruction.
Post by David Kleinecke
For example,
C has no simple way to use the rotation instructions of the
Intel 86 family. I don't know that anyone would use them were
they somehow expressible in C but they are there.
But this is a counter-example, isn't it? If some early machine at Bell
Labs. had had a rotate instruction and such an operator had got into C
you would have an example of C being designed for specific hardware.
But that did not happen, either because no such machine had rotate, or
because C was not designed as you seem to think.
My contention was that C should not accommodate to specific
hardware features. I stated that it did not do so with the
single exception of ++/--. It appears that the situation with
++/-- I knew was an urban legend and even ++/-- is not an
exception. But some people felt that the intentional vagueness
of the standard was somehow an accommodation. It is, however
praiseworthy, not the kind of accommodation I had in mind.

C has no counterpart to the sign bit as it is used in Intel 86
rotation hardware and I don't see how rotating through the sign
bit can be squeezed into C. Of course, every situation where
rotating through the sign bit would be useful can be coded using
less direct-to-hardware code in C. That is, a compiler optimizer
would have to able to recognize when rotating through the sign
bit makes sense (and to know that the hardware will support it).
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
<snip>
Post by David Kleinecke
Of course I keep variadic functions. I really meant (and I regret
making yet another mistake) was losing the macros that implement
the variadics in real life C.
Languages that pass arguments as structs (often called tuples) can have
a neat mechanism for variadic functions only when the type system for
tuples is far more sophisticated than C's is for structs. I don't see
how you suggestion would help with variadic functions at all, unless you
are planning other major overhauls.
I don't see that at all.
That's fine. Maybe you could show us what you mean with an example?
Maybe you show with an example why the type system needs to be
"far more sophisticated".
Richard Damon
2017-07-09 18:03:39 UTC
Permalink
Raw Message
Post by David Kleinecke
My contention was that C should not accommodate to specific
hardware features. I stated that it did not do so with the
single exception of ++/--. It appears that the situation with
++/-- I knew was an urban legend and even ++/-- is not an
exception. But some people felt that the intentional vagueness
of the standard was somehow an accommodation. It is, however
praiseworthy, not the kind of accommodation I had in mind.
C has no counterpart to the sign bit as it is used in Intel 86
rotation hardware and I don't see how rotating through the sign
bit can be squeezed into C. Of course, every situation where
rotating through the sign bit would be useful can be coded using
less direct-to-hardware code in C. That is, a compiler optimizer
would have to able to recognize when rotating through the sign
bit makes sense (and to know that the hardware will support it).
I think you misunderstand my comment. C does not provide direct access
to 'special' hardware features, but is defined loosely enough that it
can normally directly use the 'normal' hardware instructions without
needing to add to handle corner cases where the particular machine
doesn't handle it 'to a standard'. Things like overflow and signed
shifts are only vaguely defined, letting the program get useful results
in the normal cases, but letting the implementation avoid extra overhead
to handle the less important/common cases.

I suspect that a Rotate operation wasn't defined in the language due to
the lack of a common useful base definition usable on all machines.
There may be some without a rotate instruction, and some may only have
rotate with carry, and defining the state of the 'Carry' bit doesn't
well match the C view of the machine.
Ben Bacarisse
2017-07-09 23:42:16 UTC
Permalink
Raw Message
David Kleinecke <***@gmail.com> writes:
<snip>
Post by David Kleinecke
My contention was that C should not accommodate to specific
hardware features. I stated that it did not do so with the
single exception of ++/--. It appears that the situation with
++/-- I knew was an urban legend and even ++/-- is not an
exception. But some people felt that the intentional vagueness
of the standard was somehow an accommodation. It is, however
praiseworthy, not the kind of accommodation I had in mind.
I now have no idea where you stand on what you seemed to suggest were
errors made the committee by taking into account how common hardware
does things. C is stamped-through with the consequences of design
decisions made to allow simple, efficient implementation on a variety of
modern hardware. That's a good thing for C (it would not have been
right for many other languages) but you seemed to suggest otherwise:

"How specific hardware implementations handle arguments and structs
are not reasons the standard committee should have paid attention to."

<snip>
<snip>
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Of course I keep variadic functions. I really meant (and I regret
making yet another mistake) was losing the macros that implement
the variadics in real life C.
Languages that pass arguments as structs (often called tuples) can have
a neat mechanism for variadic functions only when the type system for
tuples is far more sophisticated than C's is for structs. I don't see
how you suggestion would help with variadic functions at all, unless you
are planning other major overhauls.
I don't see that at all.
That's fine. Maybe you could show us what you mean with an example?
Maybe you show with an example why the type system needs to be
"far more sophisticated".
I don't see how I can. I asked you because I don't understand your
proposal. Passing arguments as a struct would appear to limit the
arguments in that same way as simply declaring them. You must have
something more in mind.

Languages where a function can view the arguments as a tuple include
Juila and Python. You then get variadic functions for free because you
can do so much more with a tuple in these languages than what you can do
with a pointer to a struct in C (even when it's a void *).

(Lots of languages opt for using arrays or list in the role.)
--
Ben.
David Kleinecke
2017-07-10 02:15:26 UTC
Permalink
Raw Message
Post by Ben Bacarisse
<snip>
Post by David Kleinecke
My contention was that C should not accommodate to specific
hardware features. I stated that it did not do so with the
single exception of ++/--. It appears that the situation with
++/-- I knew was an urban legend and even ++/-- is not an
exception. But some people felt that the intentional vagueness
of the standard was somehow an accommodation. It is, however
praiseworthy, not the kind of accommodation I had in mind.
I now have no idea where you stand on what you seemed to suggest were
errors made the committee by taking into account how common hardware
does things. C is stamped-through with the consequences of design
decisions made to allow simple, efficient implementation on a variety of
modern hardware. That's a good thing for C (it would not have been
I was commenting on a design choice that the Standards
Committee did not make. I don't consider that suggesting
the committee made a mistake.

The are several posters to comp.lang.c who are designing
near-C languages and this kind of matter should interest
them. The rest of us can learn something by looking at
what the committee did not do.

To take another example - the Standards Committee did not
undertake to revise the operator precedences even though
K&R expressed (!978 book) the opinion that they weren't
correct. I think the committee done right.
David Kleinecke
2017-07-10 02:24:06 UTC
Permalink
Raw Message
Post by Ben Bacarisse
<snip>
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Of course I keep variadic functions. I really meant (and I regret
making yet another mistake) was losing the macros that implement
the variadics in real life C.
Languages that pass arguments as structs (often called tuples) can have
a neat mechanism for variadic functions only when the type system for
tuples is far more sophisticated than C's is for structs. I don't see
how you suggestion would help with variadic functions at all, unless you
are planning other major overhauls.
I don't see that at all.
That's fine. Maybe you could show us what you mean with an example?
Maybe you show with an example why the type system needs to be
"far more sophisticated".
I don't see how I can. I asked you because I don't understand your
proposal. Passing arguments as a struct would appear to limit the
arguments in that same way as simply declaring them. You must have
something more in mind.
Languages where a function can view the arguments as a tuple include
Juila and Python. You then get variadic functions for free because you
can do so much more with a tuple in these languages than what you can do
with a pointer to a struct in C (even when it's a void *).
(Lots of languages opt for using arrays or list in the role.)
I described what I had in mind in another recent post - but I
don't remember exactly where.

My underlying agenda is to simplify the support statements
for free-standing programs. The C89 standard suggests four
headers from the library must be included (actually "must"
for a "conforming free-standing" program - free-standing
programs that do not conform are not so burdened). Three
of these headers are data. But the fourth is the variadic
macros. It would be nice to get those macros out of the
way.
s***@casperkitty.com
2017-07-08 05:12:48 UTC
Permalink
Raw Message
Post by Richard Damon
One other big issue is that C was designed as a very much '1-pass'
language. It was quite possible to write a compiler that just marched
through the file and generated machine code output without much need to
look ahead or go back. and what was required tended to fall into the
sort of thing that the linker would tend to be able to do. Again, with a
down growing stack, the compiler might need to look ahead quite a bit
(at all the parameters to the function) to determine how to even start
putting results into it.
Pushing parameters in right-to-left order doesn't require any fancy look-
ahead logic. Without any optimization logic, a function call
foo(a+b,c+d,e+f) could be processed by observing one will have to make
a call to foo (let's say it's the 573rd function call thus far, then
generating:

jmp c573_prep
c573_go:
call _foo
jmp c573_done
c573_p0:
load and push(_a)
load and push(_b)
pop two numbers and push result
jmp c573_go
c573_p1:
load and push(_c)
load and push(_d)
pop two numbers and push result
jmp c573_p0
c573_p2:
load and push(e)
load and push(f)
pop two numbers and push result
jmp c573_p1
c573_prep .equ __c573_p2
c573_done:

Obviously that code would be horribly inefficient, but a peephole optimizer
could buffer some code before writing it and look for opportunities to
clean things up. For example, if before writing a piece of code of the form

jmp label4
label1:
... action 1
jmp label3

it encounters a jmp to label1, the code from label1 to the jmp label3 can
be moved from the previous location to the location of the jmp. Note that
if the buffer fills up and has to be written before out-of-order bits of
code get cleaned up, code will be less efficient than ideal, but it will
still work.
David Kleinecke
2017-07-08 17:14:34 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Richard Damon
One other big issue is that C was designed as a very much '1-pass'
language. It was quite possible to write a compiler that just marched
through the file and generated machine code output without much need to
look ahead or go back. and what was required tended to fall into the
sort of thing that the linker would tend to be able to do. Again, with a
down growing stack, the compiler might need to look ahead quite a bit
(at all the parameters to the function) to determine how to even start
putting results into it.
Pushing parameters in right-to-left order doesn't require any fancy look-
ahead logic. Without any optimization logic, a function call
foo(a+b,c+d,e+f) could be processed by observing one will have to make
a call to foo (let's say it's the 573rd function call thus far, then
jmp c573_prep
call _foo
jmp c573_done
load and push(_a)
load and push(_b)
pop two numbers and push result
jmp c573_go
load and push(_c)
load and push(_d)
pop two numbers and push result
jmp c573_p0
load and push(e)
load and push(f)
pop two numbers and push result
jmp c573_p1
c573_prep .equ __c573_p2
Obviously that code would be horribly inefficient, but a peephole optimizer
could buffer some code before writing it and look for opportunities to
clean things up. For example, if before writing a piece of code of the form
jmp label4
... action 1
jmp label3
it encounters a jmp to label1, the code from label1 to the jmp label3 can
be moved from the previous location to the location of the jmp. Note that
if the buffer fills up and has to be written before out-of-order bits of
code get cleaned up, code will be less efficient than ideal, but it will
still work.
I agree. But I assume a backend on the compiler that straightens
out all these things.

That is, in terms of your example, I see four "segments" and
parts of two others. A segment being all the code from a label
to a jump. The parser creates a collection of segments. The
backend untangles this collection. If there is only one jump
to some label (and it's not a conditional jump) the segment
ending in the jump is immediately followed by the segment after
the label. So the backend changes your example to

jmp c573_prep
c573_prep .equ __c573_p2
c573_p2:
load and push(e)
load and push(f)
pop two numbers and push result
jmp c573_p1
c573_p1:
load and push(_c)
load and push(_d)
pop two numbers and push result
jmp c573_p0
c573_p0:
load and push(_a)
load and push(_b)
pop two numbers and push result
jmp c573_go
c573_go:
call _foo
jmp c573_done
c573_done:

and all the jumps and labels drop away giving

load and push(e)
load and push(f)
pop two numbers and push result
load and push(_c)
load and push(_d)
pop two numbers and push result
load and push(_a)
load and push(_b)
pop two numbers and push result
call _foo
just as expected. (I would use the computing stack a bit
differently - but no mind.)
s***@casperkitty.com
2017-07-08 21:59:22 UTC
Permalink
Raw Message
Post by David Kleinecke
Post by s***@casperkitty.com
jmp c573_prep
call _foo
jmp c573_done
load and push(_a)
load and push(_b)
pop two numbers and push result
jmp c573_go
load and push(_c)
load and push(_d)
pop two numbers and push result
jmp c573_p0
load and push(e)
load and push(f)
pop two numbers and push result
jmp c573_p1
c573_prep .equ __c573_p2
Obviously that code would be horribly inefficient, but a peephole optimizer
could buffer some code before writing it and look for opportunities to
clean things up.
I agree. But I assume a backend on the compiler that straightens
out all these things.
That is, in terms of your example, I see four "segments" and
parts of two others. A segment being all the code from a label
to a jump. The parser creates a collection of segments. The
backend untangles this collection. If there is only one jump
to some label (and it's not a conditional jump) the segment
ending in the jump is immediately followed by the segment after
the label. So the backend changes your example to
jmp c573_prep
c573_prep .equ __c573_p2
load and push(e)
load and push(f)
pop two numbers and push result
jmp c573_p1
load and push(_c)
load and push(_d)
pop two numbers and push result
jmp c573_p0
load and push(_a)
load and push(_b)
pop two numbers and push result
jmp c573_go
call _foo
jmp c573_done
and all the jumps and labels drop away giving
load and push(e)
load and push(f)
pop two numbers and push result
load and push(_c)
load and push(_d)
pop two numbers and push result
load and push(_a)
load and push(_b)
pop two numbers and push result
call _foo
just as expected. (I would use the computing stack a bit
differently - but no mind.)
If enough buffering is available to hold all the pending segments, they can
be output in execution order without need for jumps. My point was that even
if a compiler doesn't have enough memory to buffer such things, it can still
generate correct (albeit inefficient) code without such ability. While it
is common for compilers to generate assembly code, it's also possible to go
direct to machine code which can be written to disk if a compiler is able to
backpatch it later. If writing needs to be sequential, backpatching can be
done at runtime if the compiler includes a list of patches in the program to
be executed.
David Kleinecke
2017-07-09 03:19:59 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Kleinecke
Post by s***@casperkitty.com
jmp c573_prep
call _foo
jmp c573_done
load and push(_a)
load and push(_b)
pop two numbers and push result
jmp c573_go
load and push(_c)
load and push(_d)
pop two numbers and push result
jmp c573_p0
load and push(e)
load and push(f)
pop two numbers and push result
jmp c573_p1
c573_prep .equ __c573_p2
Obviously that code would be horribly inefficient, but a peephole optimizer
could buffer some code before writing it and look for opportunities to
clean things up.
I agree. But I assume a backend on the compiler that straightens
out all these things.
That is, in terms of your example, I see four "segments" and
parts of two others. A segment being all the code from a label
to a jump. The parser creates a collection of segments. The
backend untangles this collection. If there is only one jump
to some label (and it's not a conditional jump) the segment
ending in the jump is immediately followed by the segment after
the label. So the backend changes your example to
jmp c573_prep
c573_prep .equ __c573_p2
load and push(e)
load and push(f)
pop two numbers and push result
jmp c573_p1
load and push(_c)
load and push(_d)
pop two numbers and push result
jmp c573_p0
load and push(_a)
load and push(_b)
pop two numbers and push result
jmp c573_go
call _foo
jmp c573_done
and all the jumps and labels drop away giving
load and push(e)
load and push(f)
pop two numbers and push result
load and push(_c)
load and push(_d)
pop two numbers and push result
load and push(_a)
load and push(_b)
pop two numbers and push result
call _foo
just as expected. (I would use the computing stack a bit
differently - but no mind.)
If enough buffering is available to hold all the pending segments, they can
be output in execution order without need for jumps. My point was that even
if a compiler doesn't have enough memory to buffer such things, it can still
generate correct (albeit inefficient) code without such ability. While it
is common for compilers to generate assembly code, it's also possible to go
direct to machine code which can be written to disk if a compiler is able to
backpatch it later. If writing needs to be sequential, backpatching can be
done at runtime if the compiler includes a list of patches in the program to
be executed.
I use segments and jumps freely to express the output of
the parser in as simple a way as possible. Then I add a
backend (post-processor?) to re-arrange the code in the
best order. Essentially I convert the structured code back
into spaghetti code and then apply some general algorithms
to locate loops and so on. This sounds off-base but it seems
to work more reliably than many other approaches. I hesitate
to call it optimization because it rationalizes rather
improves. Since the parser + postprocessor machinery is
already there it is obviously the way to implement function
calls as well.
s***@casperkitty.com
2017-07-07 06:23:13 UTC
Permalink
Raw Message
Post by Richard Damon
Post by David Kleinecke
One solution - which the standard committee did not adopt -
would have been to "shall" that the arguments of a function
be laid out in the same way as a struct with the same
types as the parameter definitions.
I hope that their reason for not doing this was something
more substantial than that struct use semicolons and functions
commas.
A better reason is that layout may not be easy to get, or violates the
platforms ABI.
One big issue is that many processors have downward growing stacks, and
place the parameters on the stack in the order declared (sort of needed
for vardic functions), for machines like this, given a function like
int foo(int a, int b)
the address of a is likely larger than the address of b, while for a
struct I believe that given
struct foo {
int a;
int b;
}
then for a given object of type foo, the address of a needs to be less
than the address of b.
On many (probably most) C implementation with a downward-growing stack, arguments are passed right-to-left. so they end up on the stack in
order of increasing address. Such behavior was likely inherited from
the 1974 Unix compiler.

On many platforms the layout of arguments in memory won't match that of
a struct containing the same items. Among other things individually-pushed
objects on the stack may have different alignment requirements from structure
members. Further, on some platforms it will often be more efficient to
pass some arguments in registers than on the stack. This may be handled by
having directives which are platform-specific (but usually shared among
quality implementations for the platform) to control how arguments should
be passed to different functions, or by using a register-passing convention
that will a called function to use a fixed sequence of code to rearrange
the stack to the way it would be if all arguments were passed right to left.
Ben Bacarisse
2017-07-07 09:27:44 UTC
Permalink
Raw Message
David Kleinecke <***@gmail.com> writes:
<snip>
Post by David Kleinecke
One solution - which the standard committee did not adopt -
would have been to "shall" that the arguments of a function
be laid out in the same way as a struct with the same
types as the parameter definitions.
I hope that their reason for not doing this was something
more substantial than that struct use semicolons and functions
commas.
Perhaps enough has been said, but I'll just add that, at the time the
committee was first standardising C, semicolons /were/ used. Function
definitions looked like this:

f(x, y, n, a)
double x, y;
int n, *p;
{ ...
}

I've always thought it a shame that C did not keep this compact style
for prototypes by simply putting the declarations into the ()s. A
function taking more than one const unsigned long long * argument is
overly wordy in my opinion.
--
Ben.
Keith Thompson
2017-07-06 16:18:07 UTC
Permalink
Raw Message
[...]
Post by bartc
Post by David Brown
These headers are part of a /C standard library/, which is part of the
/implementation/ (along with the compiler). They are /not/ platform
headers, or OS headers.
We're talking about the headers which are in /usr/include of a Linux system.
I thought that was a special place, and not for containing specific
headers for specific C implementations, or even for non-C languages.
I'm not sure where you got that idea, but it's not true, or at least
it's not that simple.
Post by bartc
So the question I've been asking is, for which C implementation are
they for? The answer appears to be that they are for a special version
of C called GnuC. But that isn't quite enough, as I don't think GnuC
is a compiler.
GNU C (which is what the gcc documentation calls it) is a dialect
of the C language supported by the gcc compiler. gcc is a compiler,
which can be part of a C implementation; GNU C is a language.

The files in /usr/include are *part* of a C implementation.
There are a number of different C implementations that use them,
or that use some of them. Some of those implementations support
the GNU C dialect. Others do not.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Brown
2017-07-06 18:42:23 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by bartc
Post by David Brown
These headers are part of a /C standard library/, which is part of the
/implementation/ (along with the compiler). They are /not/ platform
headers, or OS headers.
We're talking about the headers which are in /usr/include of a Linux system.
I thought that was a special place, and not for containing specific
headers for specific C implementations, or even for non-C languages.
I'm not sure where you got that idea, but it's not true, or at least
it's not that simple.
Post by bartc
So the question I've been asking is, for which C implementation are
they for? The answer appears to be that they are for a special version
of C called GnuC. But that isn't quite enough, as I don't think GnuC
is a compiler.
GNU C (which is what the gcc documentation calls it) is a dialect
of the C language supported by the gcc compiler. gcc is a compiler,
which can be part of a C implementation; GNU C is a language.
The files in /usr/include are *part* of a C implementation.
There are a number of different C implementations that use them,
or that use some of them. Some of those implementations support
the GNU C dialect. Others do not.
To be accurate - /some/ of the files there are part of a C
implementation. When I install "libsqlite3-dev" package on my Linux
system, and it puts "sqlite3.h" and "sqlite3ext.h" in /usr/include,
these files do not make up part of any C implementation.
Keith Thompson
2017-07-06 19:02:00 UTC
Permalink
Raw Message
[...]
Post by David Brown
Post by Keith Thompson
The files in /usr/include are *part* of a C implementation.
There are a number of different C implementations that use them,
or that use some of them. Some of those implementations support
the GNU C dialect. Others do not.
To be accurate - /some/ of the files there are part of a C
implementation. When I install "libsqlite3-dev" package on my Linux
system, and it puts "sqlite3.h" and "sqlite3ext.h" in /usr/include,
these files do not make up part of any C implementation.
Quite right, thanks.

Though I suppose one could argue that they implement an extension:

A conforming implementation may have extensions (including
additional library functions), provided they do not alter the
behavior of any strictly conforming program. (N1570 4p6)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Tim Rentsch
2017-07-10 16:06:48 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by David Brown
Post by Keith Thompson
The files in /usr/include are *part* of a C implementation.
There are a number of different C implementations that use them,
or that use some of them. Some of those implementations support
the GNU C dialect. Others do not.
To be accurate - /some/ of the files there are part of a C
implementation. When I install "libsqlite3-dev" package on my Linux
system, and it puts "sqlite3.h" and "sqlite3ext.h" in /usr/include,
these files do not make up part of any C implementation.
Quite right, thanks.
[...]
Only if the implementation in question documents them.
Tim Rentsch
2017-07-10 16:20:15 UTC
Permalink
Raw Message
[Please excuse the broken threading, my news client got confused
and messed things up irrecoverably :(]
[...]
%i or %d would do but would be less clear. Are you saying that %x cannot
be used portably to print negative numbers in a hexadecimal form? UB
usually means that anything could happen, doesn't it? In this case, does
that technically mean more is at risk that just getting erroneous output?
That's correct. The "%x" format requires an argument of type unsigned
int. You can get away with giving it an argument of type signed int
*if* the value is representable in both types (this guarantee is made in
a non-normative footnote).
printf("0x%x\n", -1);
has undefined behavior. In practice, it's very very likely to
print the value of UINT_MAX in hexadecimal (typically 0xffffffff).
It would print a different value on a system that uses something
other than 2's-complement (you're unlikely to encounter such a
system). A conforming compiler *could* recognize that the behavior
is undefined and reject it, or generate code that does something
unexpected, but that's unlikely.
What's the "right" way to print signed numbers as hex?
Convert to unsigned.
printf("0x%x\n", (unsigned)-1);
has defined behavior that matches the likely behavior of
printf("0x%x\n", -1). Note that this won't print the value of -1;
-1 and 0xffffffff (equivalently 4294967295) are two distinct values.
int n = -1;
if (n >= 0) {
printf("0x%x\n", (unsigned)n);
}
else {
printf("-0x%x\n", (unsigned)-n);
}
There might still be a glitch for n==INT_MIN, since in that case -n can
overflow.
Here is a way that doesn't use if(), and also has no problem
with INT_MIN:

int n = -1;
printf( "%c0x%x\n", " -"[n<0], n<0 ? 0u-n : n );
David Brown
2017-07-06 09:21:06 UTC
Permalink
Raw Message
Post by bartc
Is __SIZE_TYPE__ another built-in feature of GnuC? Or is it itself
defined elsewhere?
This is the kind of thing that does my head in. Is no one capable of
writing just a straight declaration?. Getting back to my own sane
typedef unsigned long long int size_t;
A brief of fresh air.
Just another note about this - redefining typedefs is not allowed in
C99, even if the types are the same. (It /is/ allowed in C11.) So if
you have that typedef in <stddef.h>, and also in <stdlib.h>, then you
will have a conflict in C99 (and C90) if both headers are included.

Of course, you can allow such duplicate typedefs as an extension in your
compiler. You can also allow them just within standard headers - they
are part of the implementation and you can have special rules there.

But if you are writing a library that can be used with different
compilers, and different options, you have to be much more careful - and
you end up with ugly conditional compilation.
bartc
2017-07-06 09:50:25 UTC
Permalink
Raw Message
Post by David Brown
Post by bartc
This is the kind of thing that does my head in. Is no one capable of
writing just a straight declaration?. Getting back to my own sane
typedef unsigned long long int size_t;
Just another note about this - redefining typedefs is not allowed in
C99, even if the types are the same. (It /is/ allowed in C11.) So if
you have that typedef in <stddef.h>, and also in <stdlib.h>,
My stdlib includes stddef which contains these defs (I thought that was
what it was for), and the latter can be guarded. So such things, NULL
etc, are only defined in one place.

then you
Post by David Brown
will have a conflict in C99 (and C90) if both headers are included.
Of course, you can allow such duplicate typedefs as an extension in your
compiler.
This:

typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;

compiles without warnings, in default modes, across 5 compilers
including gcc. Only MSVC raises actual errors. But I guess they like to
do their own thing.

With that amount of support, then user programs are also likely to use
that feature, so you may be obliged to allow it.
--
bartc
Ian Collins
2017-07-06 10:00:51 UTC
Permalink
Raw Message
Post by bartc
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
compiles without warnings, in default modes, across 5 compilers
including gcc. Only MSVC raises actual errors. But I guess they like to
do their own thing.
gcc ~/temp/x.c -std=c99
/home/ian/temp/x.c:4:24: error: redefinition of typedef 'size_t'
/home/ian/temp/x.c:3:22: note: previous declaration of 'size_t' was here
/home/ian/temp/x.c:5:24: error: redefinition of typedef 'size_t'
/home/ian/temp/x.c:4:24: note: previous declaration of 'size_t' was here
/home/ian/temp/x.c:6:24: error: redefinition of typedef 'size_t'
/home/ian/temp/x.c:5:24: note: previous declaration of 'size_t' was here
--
Ian
Ben Bacarisse
2017-07-06 10:18:18 UTC
Permalink
Raw Message
bartc <***@freeuk.com> writes:
<snip>
Post by bartc
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
compiles without warnings, in default modes, across 5 compilers
including gcc.
Few compilers are conforming in default mode.
Post by bartc
Only MSVC raises actual errors. But I guess they like to do their own
thing.
With that amount of support, then user programs are also likely to use
that feature, so you may be obliged to allow it.
If you implement C11 you /must/ allow it. If you implement C99 or
earlier you /must/ produce a diagnostic. If you implement "Bart C",
like gcc implements "GNU C" (in default mode), you are free to do what
you like.

On a practical level, quite a bit of standards conformance can be
handled by setting flags to indicate what messages to give. You parse
and process C11 but turn on warnings (or errors) about duplicate
typedefs, _Generic, mixed statements and declarations and so on as need
be.
--
Ben.
Keith Thompson
2017-07-06 16:36:30 UTC
Permalink
Raw Message
[...]
Post by bartc
Post by David Brown
Post by bartc
typedef unsigned long long int size_t;
Just another note about this - redefining typedefs is not allowed in
C99, even if the types are the same. (It /is/ allowed in C11.) So if
you have that typedef in <stddef.h>, and also in <stdlib.h>,
My stdlib includes stddef which contains these defs (I thought that was
what it was for), and the latter can be guarded. So such things, NULL
etc, are only defined in one place.
Then you have a non-conforming implementation. Standard headers do not
include each other unless specifically stated (for example, <inttypes.h>
includes <stdint.h>).

For this translation unit:

#include <stdlib.h>
ptrdiff_t p;

ptrdiff_t is an undeclared identifier, and must be diagnosed. (See
N1570 7.1.3; the identifier "ptrdiff_t" is reserved only if <stddef.h>
is #included.)

[...]
Post by bartc
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
compiles without warnings, in default modes, across 5 compilers
including gcc. Only MSVC raises actual errors. But I guess they like to
do their own thing.
With that amount of support, then user programs are also likely to use
that feature, so you may be obliged to allow it.
Not if you want a conforming C90 or C99 implementation. If you're not
interested in conformance, please say so.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
bartc
2017-07-09 20:26:06 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by bartc
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
compiles without warnings, in default modes, across 5 compilers
including gcc. Only MSVC raises actual errors. But I guess they like to
do their own thing.
With that amount of support, then user programs are also likely to use
that feature, so you may be obliged to allow it.
Not if you want a conforming C90 or C99 implementation. If you're not
interested in conformance, please say so.
Didn't I just above that 5 out of 6 compilers I tried /on my platform/
allow it? Perhaps they're not too interested in conformance either!
Except for MSVC, which is usually the one that has poor C99 conformance.

That fact remains that if someone writes program with a duplicate
typedef, and compiles and runs it on one or more of these 5 out of 6
with default options, then in order for a new compiler to build the same
code, it will also have to allow it.

The same goes for a dozen features that strictly ought not to be allowed.

C doesn't help by allowing so many things to be duplicated or
multiply-defined that it is easy to get the idea that anything goes.
/It/ should have been much stricter.
--
Bartc
Ian Collins
2017-07-10 05:54:20 UTC
Permalink
Raw Message
Post by bartc
Post by Keith Thompson
Post by bartc
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
compiles without warnings, in default modes, across 5 compilers
including gcc. Only MSVC raises actual errors. But I guess they like to
do their own thing.
With that amount of support, then user programs are also likely to use
that feature, so you may be obliged to allow it.
Not if you want a conforming C90 or C99 implementation. If you're not
interested in conformance, please say so.
Didn't I just above that 5 out of 6 compilers I tried /on my platform/
allow it?
You said "compiles without warnings, in default modes, across 5
compilers". Presumably none of them conform to C90 or C99 in default mode.

None of the compilers on my system do either, gcc defaults to it's own
conventions and the native compiler defaults to C11, so the code
compiles fine...
Post by bartc
That fact remains that if someone writes program with a duplicate
typedef, and compiles and runs it on one or more of these 5 out of 6
with default options, then in order for a new compiler to build the same
code, it will also have to allow it.
Anyone who builds with default options on most compilers is asking for
trouble, duplicate typedefs or not.
--
Ian
bartc
2017-07-10 08:52:53 UTC
Permalink
Raw Message
Post by Ian Collins
Didn't I [say] above that 5 out of 6 compilers I tried /on my platform/
allow it?
You said "compiles without warnings, in default modes, across 5
compilers". Presumably none of them conform to C90 or C99 in default mode.
None of the compilers on my system do either, gcc defaults to it's own
conventions and the native compiler defaults to C11, so the code
compiles fine...
I tried this on my gcc:

c:\c>type c.c
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;

c:\c>\tdm\bin\gcc -Wextra -Wall -std=c99 -c c.c

c:\c>\tdm\bin\gcc --version
gcc (tdm64-1) 5.1.0 ...

So even with C99, gcc says nothing. I can't but think that it's not
taking this fatal error seriously! It needs -Wpedantic to give even a
warning.
Post by Ian Collins
Anyone who builds with default options on most compilers is asking for
trouble, duplicate typedefs or not.
Be that as it may, if a programmer does use default options (perhaps so
that anyone else can also compile with default options), then you want
to be able to compile that code if possible, and not fail to compile
purely on this technicality.

And as I showed above, someone can give quite a lot of options, and the
code could still pass.

However I /will/ insist on reporting some errors when /I/ think they are
serious enough. Repeating a typedef for the same name, I don't consider
serious provided the same type is involved.
--
bartc
Ian Collins
2017-07-10 09:19:55 UTC
Permalink
Raw Message
Post by bartc
Post by Ian Collins
Didn't I [say] above that 5 out of 6 compilers I tried /on my platform/
allow it?
You said "compiles without warnings, in default modes, across 5
compilers". Presumably none of them conform to C90 or C99 in default mode.
None of the compilers on my system do either, gcc defaults to it's own
conventions and the native compiler defaults to C11, so the code
compiles fine...
c:\c>type c.c
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
c:\c>\tdm\bin\gcc -Wextra -Wall -std=c99 -c c.c
c:\c>\tdm\bin\gcc --version
gcc (tdm64-1) 5.1.0 ...
So even with C99, gcc says nothing. I can't but think that it's not
taking this fatal error seriously! It needs -Wpedantic to give even a
warning.
Maybe there was a bug in gcc 5? I have 5.3.1 on my box and is also
reports nothing. 4.x and 6.x do issue the expected diagnostics.
Post by bartc
Post by Ian Collins
Anyone who builds with default options on most compilers is asking for
trouble, duplicate typedefs or not.
Be that as it may, if a programmer does use default options (perhaps so
that anyone else can also compile with default options), then you want
to be able to compile that code if possible, and not fail to compile
purely on this technicality.
Using the default options to be able to compile that code if possible is
a great way of avoiding portability... I had no end of problems porting
poorly written gcc/Linux code to the native compiler on my system.
Thankfully most opensource code no uses sensible options to improve
portability.
Post by bartc
And as I showed above, someone can give quite a lot of options, and the
code could still pass.
probably down to a bug!
Post by bartc
However I /will/ insist on reporting some errors when /I/ think they are
serious enough. Repeating a typedef for the same name, I don't consider
serious provided the same type is involved.
Then you won't conform to the old standards, so I assume you are aiming
for C11?
--
Ian
bartc
2017-07-10 09:37:21 UTC
Permalink
Raw Message
Post by Ian Collins
Post by bartc
c:\c>\tdm\bin\gcc -Wextra -Wall -std=c99 -c c.c
c:\c>\tdm\bin\gcc --version
gcc (tdm64-1) 5.1.0 ...
So even with C99, gcc says nothing. I can't but think that it's not
taking this fatal error seriously! It needs -Wpedantic to give even a
warning.
Maybe there was a bug in gcc 5? I have 5.3.1 on my box and is also
reports nothing. 4.x and 6.x do issue the expected diagnostics.
I also did this:

c:\c>gcc -Wextra -Wall -std=c99 -c c.c

c:\c>gcc --version
gcc (GCC) 4.8.1 ...

('gcc' is a rogue installation somewhere on my machine, but is still a
working 4.x version.)
Post by Ian Collins
Using the default options to be able to compile that code if possible is
a great way of avoiding portability... I had no end of problems porting
poorly written gcc/Linux code to the native compiler on my system.
Thankfully most opensource code no uses sensible options to improve
portability.
My open source stuff stipulates just one option: -m32 or -m64 depending
on whether there is a 32 or 64 in the filename.

For other compilers that is specified in words as I have no idea what
the option would be (actually, there are usually separate compilers for
32 and 64 bits).

The idea of stipulating a long list of obscure, gcc-centric options as a
way of /improving/ portability (on top of bash scripts, makefiles et al)
is rather perverse...
Post by Ian Collins
Post by bartc
However I /will/ insist on reporting some errors when /I/ think they are
serious enough. Repeating a typedef for the same name, I don't consider
serious provided the same type is involved.
Then you won't conform to the old standards, so I assume you are aiming
for C11?
I'm not paying much attention to such standards at the moment. Other
things have priority, such as whether something works at all.
--
bartc
Ian Collins
2017-07-10 09:44:13 UTC
Permalink
Raw Message
Post by bartc
Post by Ian Collins
Post by bartc
c:\c>\tdm\bin\gcc -Wextra -Wall -std=c99 -c c.c
c:\c>\tdm\bin\gcc --version
gcc (tdm64-1) 5.1.0 ...
So even with C99, gcc says nothing. I can't but think that it's not
taking this fatal error seriously! It needs -Wpedantic to give even a
warning.
Maybe there was a bug in gcc 5? I have 5.3.1 on my box and is also
reports nothing. 4.x and 6.x do issue the expected diagnostics.
c:\c>gcc -Wextra -Wall -std=c99 -c c.c
c:\c>gcc --version
gcc (GCC) 4.8.1 ...
('gcc' is a rogue installation somewhere on my machine, but is still a
working 4.x version.)
Yes, odd. 4.5 complains, 4.9 does not!
Post by bartc
Post by Ian Collins
Using the default options to be able to compile that code if possible is
a great way of avoiding portability... I had no end of problems porting
poorly written gcc/Linux code to the native compiler on my system.
Thankfully most opensource code no uses sensible options to improve
portability.
My open source stuff stipulates just one option: -m32 or -m64 depending
on whether there is a 32 or 64 in the filename.
For other compilers that is specified in words as I have no idea what
the option would be (actually, there are usually separate compilers for
32 and 64 bits).
Not often, at least in Unix land there is one compiler for both.
Post by bartc
The idea of stipulating a long list of obscure, gcc-centric options as a
way of /improving/ portability (on top of bash scripts, makefiles et al)
is rather perverse...
I don't know about windows, (Visual Studio pays lip service to C
standard compatibility) but every other compiler I use accepts -std=cxx
as an option which should be enough for decent portability.
--
Ian
Keith Thompson
2017-07-10 16:17:45 UTC
Permalink
Raw Message
Ian Collins <ian-***@hotmail.com> writes:
[...]
Post by Ian Collins
Post by bartc
c:\c>type c.c
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
c:\c>\tdm\bin\gcc -Wextra -Wall -std=c99 -c c.c
c:\c>\tdm\bin\gcc --version
gcc (tdm64-1) 5.1.0 ...
So even with C99, gcc says nothing. I can't but think that it's not
taking this fatal error seriously! It needs -Wpedantic to give even a
warning.
That's correct. `gcc -Wextra -Wall -std=c99` is not a conforming C99
compiler. `gcc -Wextra -Wall -Wpedantic -std=c99` is (or at least
attempts to be).
Post by Ian Collins
Maybe there was a bug in gcc 5? I have 5.3.1 on my box and is also
reports nothing. 4.x and 6.x do issue the expected diagnostics.
Not a bug, see above.

"gcc -std=..." is intended to correctly compile *correct* C code in
accordance with the specified standard. It deliberately does not
issue all required diagnostics. You need to use "-pedantic", or
"-Wpedantic" (I believe they're equivalent), or "-pedantic-errors"
to get full conformance (modulo bugs).

I personally dislike many of the choices the authors of gcc have
made regarding how the compiler behaves in its default mode. But at
least it provides a mode that does attempt to conform fully to the
C standard.
Post by Ian Collins
Post by bartc
Post by Ian Collins
Anyone who builds with default options on most compilers is asking for
trouble, duplicate typedefs or not.
Be that as it may, if a programmer does use default options (perhaps so
that anyone else can also compile with default options), then you want
to be able to compile that code if possible, and not fail to compile
purely on this technicality.
So you want to support extensions supported by other compilers. That's
great. But if you do not diagnose duplicate typedefs in some mode, then
you do not have a conforming C90 or C99 compiler.
Post by Ian Collins
Using the default options to be able to compile that code if possible is
a great way of avoiding portability... I had no end of problems porting
poorly written gcc/Linux code to the native compiler on my system.
Thankfully most opensource code no uses sensible options to improve
portability.
Post by bartc
And as I showed above, someone can give quite a lot of options, and the
code could still pass.
probably down to a bug!
Post by bartc
However I /will/ insist on reporting some errors when /I/ think they are
serious enough. Repeating a typedef for the same name, I don't consider
serious provided the same type is involved.
One of the things I dislike about gcc is that its authors have made
decisions about which diagnostics, required by the standard, they
consider important and which are not, and have imposed those decisions
on their users. Fortunately they've (grudgingly, it seems) provided
options to enforce the standard's rules, and to make violations fatal.

You seem to be following a similar path.
Post by Ian Collins
Then you won't conform to the old standards, so I assume you are aiming
for C11?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-07-10 16:23:35 UTC
Permalink
Raw Message
Post by Keith Thompson
So you want to support extensions supported by other compilers. That's
great. But if you do not diagnose duplicate typedefs in some mode, then
you do not have a conforming C90 or C99 compiler.
All that is required for conformance is that a compiler have a mode where,
it will produce a diagnostic if given any program containing a constraint
violation. If there exists any command-line flag that would cause the
compiler to produce a diagnostic regardless of the supplied source text,
but would not otherwise interfere with operation, such a flag would be
sufficient to satisfy the Standard with regard to all source texts that do
not contain #error directives.
Keith Thompson
2017-07-10 16:44:27 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
So you want to support extensions supported by other compilers. That's
great. But if you do not diagnose duplicate typedefs in some mode, then
you do not have a conforming C90 or C99 compiler.
All that is required for conformance is that a compiler have a mode where,
it will produce a diagnostic if given any program containing a constraint
violation.
It also has to diagnose violations of syntax rules and (this is the hard
part) it has to implement the semantics correctly. (Yes, it can do so
only for the "one program" of 5.2.4.1 if the author doesn't care about
usefulness.)
Post by s***@casperkitty.com
If there exists any command-line flag that would cause the
compiler to produce a diagnostic regardless of the supplied source text,
but would not otherwise interfere with operation, such a flag would be
sufficient to satisfy the Standard with regard to all source texts that do
not contain #error directives.
Sure, you could have a simple perverse implementation that's
conforming *except* for programs that contain #error directives.
What would be the point of that?

It would be slightly more interesting to have a perverse
implementation that actually conforms to the standard, which means
handling #error correctly. (I've considered implementing such a
thing myself, mostly as a joke.)

But since we were talking about real-world conforming *useful*
implementations, not about perverse minimally conforming
implementations, I'm not sure why you felt the need to bring up a
point that (a) has already been discussed here, and (b) contributes
nothing to the current discussion.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
bartc
2017-07-10 17:46:37 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by bartc
However I /will/ insist on reporting some errors when /I/ think they are
serious enough. Repeating a typedef for the same name, I don't consider
serious provided the same type is involved.
One of the things I dislike about gcc is that its authors have made
decisions about which diagnostics, required by the standard, they
consider important and which are not, and have imposed those decisions
on their users. Fortunately they've (grudgingly, it seems) provided
options to enforce the standard's rules, and to make violations fatal.
You seem to be following a similar path.
Yes, some of gcc's decisions are poor, reporting unnecessary warnings
while serious matters are just let through.

But apparently that's fine because you just have to use the right
combination of options to make it work exactly how you want. That's if
you're even aware of every possible issue that could need that treatment.

As for my approach, the errors I generate are usually obviously wrong
and, while sometimes it is possible to continue compiling, usually it
won't do so until a problem is fixed (since compilation is instant,
frequent recompiling from scratch has no penalty).

Calling a function without its declaration in scope is one example. Most
compilers, including gcc (unless you transform it into YOUR-GCC with a
long bunch of behaviour-modifying options) will not treat that as a
compile-error.
--
bartc
Tim Rentsch
2017-07-10 16:54:16 UTC
Permalink
Raw Message
[...]
Post by Ian Collins
Post by bartc
c:\c>type c.c
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
typedef unsigned int size_t;
c:\c>\tdm\bin\gcc -Wextra -Wall -std=c99 -c c.c
c:\c>\tdm\bin\gcc --version
gcc (tdm64-1) 5.1.0 ...
So even with C99, gcc says nothing. I can't but think that it's not
taking this fatal error seriously! It needs -Wpedantic to give even a
warning.
Maybe there was a bug in gcc 5? I have 5.3.1 on my box and is also
reports nothing. 4.x and 6.x do issue the expected diagnostics.
Post by bartc
Post by Ian Collins
Anyone who builds with default options on most compilers is asking for
trouble, duplicate typedefs or not.
Be that as it may, if a programmer does use default options (perhaps so
that anyone else can also compile with default options), then you want
to be able to compile that code if possible, and not fail to compile
purely on this technicality.
Using the default options to be able to compile that code if possible
is a great way of avoiding portability... I had no end of problems
porting poorly written gcc/Linux code to the native compiler on my
system. Thankfully most opensource code no uses sensible options to
improve portability.
You are wasting your breath. Bart has been shown the way to
water many times, but obviously he would rather keep drinking the
kool-aid instead. Just let him drink the kool-aid! Responding
only makes things worse...
bartc
2017-07-10 17:33:56 UTC
Permalink
Raw Message
Post by Tim Rentsch
Post by Ian Collins
Post by bartc
Be that as it may, if a programmer does use default options (perhaps so
that anyone else can also compile with default options), then you want
to be able to compile that code if possible, and not fail to compile
purely on this technicality.
Using the default options to be able to compile that code if possible
is a great way of avoiding portability... I had no end of problems
porting poorly written gcc/Linux code to the native compiler on my
system. Thankfully most opensource code no uses sensible options to
improve portability.
You are wasting your breath. Bart has been shown the way to
water many times, but obviously he would rather keep drinking the
kool-aid instead. Just let him drink the kool-aid! Responding
only makes things worse...
PLEASE STOP MAKING PERSONAL REMARKS.

I really don't know what your problem is. Please go back and play with
your toys while some of us try and have a grown-up discussion about C
and C implementations without any name-calling or mickey-taking.

--------------------------------------------------------------

Now, here's an example of using duplicate typedefs in REAL CODE for the
purpose of testing assertions:

#ifndef SORTPP_PASS
#define C_ASSERT(e) typedef char __C_ASSERT__[(e)?1:-1]
#else
#define C_ASSERT(e) /* nothing */
#endif

...

C_ASSERT((sizeof(XSAVE_FORMAT) & (XSAVE_ALIGN - 1)) == 0);
C_ASSERT((FIELD_OFFSET(XSAVE_AREA, Header) & (XSAVE_ALIGN - 1)) == 0);
C_ASSERT(MINIMAL_XSTATE_AREA_LENGTH == 512 + 64);

This ends up, if no asserts fail, in declaring these three typedefs:

typedef char __C_ASSERT__[1];
typedef char __C_ASSERT__[1];
typedef char __C_ASSERT__[1];

So here, duplicate typedefs /have/ to be allowed in order to make the
technique work. This was seen in an MS Windows header, but I also saw it
in an lccwin64 header.

I only saw it because my compiler failed to match the three arrays (I'll
look into why shortly).

I want to be able to at least part-process windows.h (to investigate how
it might be transformed into a flattened, generic representation) but
that means being tolerant of certain styles of code.

Stubbornly refusing to do so because 'oh it doesn't conform to C99'
really isn't going to help. In fact, this example exactly demonstrates
the point I made in the first paragraph above.
--
bartc
s***@casperkitty.com
2017-07-10 19:05:02 UTC
Permalink
Raw Message
Post by bartc
Now, here's an example of using duplicate typedefs in REAL CODE for the
#ifndef SORTPP_PASS
#define C_ASSERT(e) typedef char __C_ASSERT__[(e)?1:-1]
#else
#define C_ASSERT(e) /* nothing */
#endif
Why not

#define C_ASSERT(e) extern int dummy_assert[(e)?1:-1]

or

#define C_ASSERT(e) extern int dummy_assert_function(int x[(e)?1:-1][])

Those should be allowable in the same contexts as the typedef version,
but should not cause any trouble at link time if nothing tries to actually
access the dummy array or call the dummy function.

j***@verizon.net
2017-07-06 16:08:03 UTC
Permalink
Raw Message
...
Post by bartc
Post by j***@verizon.net
The GNU C Library can be used in combination with gcc to constitute a fully
conforming implementation of C, but only if you bother setting the options
that put it into fully conforming mode.
I'm using a generic C compiler. It won't understand any such options.
Which compiler is it? Most C compilers, even the "generic" ones, have a fairly large number of options they recognize. In particular, there's very few C compilers that fully conform to any version of the C standard unless you select the options that tell it to do so.

...
Post by bartc
You mention gcc a lot. ...
That's because it's one of the most widely used compilers, particularly on the Linux platforms I use at work and at home.

. > ... So is it only for gcc? Keith says his version is
Post by bartc
used by gcc, clang and tcc.
Keith's probably right - he has more experience with those other compilers than I do.

...
Post by bartc
Well, one of the first things in that file is __BEGIN_DECLS. Not very
standard, but that was a red herring (it's used for C++ otherwise is an
empty macro).
That's precisely the technique that is commonly used to allow such headers to be used by a wide variety of compilers (including, in this case, C++ compilers).

...
Post by bartc
From your comments, you seem to be suggesting that these headers are
for a language called GnuC, of which __builtin_va_list is one of the
features.
It's not the language GnuC, but the compiler, gcc, which is relevant. gcc uses those same headers when implementing GnuC or when implementing C90, C99, C2011, as well as the various versions of C++. A lot of the complexity that bothers you so much is due to the fact that those same headers are used for all of those different languages. Most of the complexity comes from the fact that it needs to work for all of those different languages AND on a wide variety of hardware and operating systems.
Post by bartc
So I still haven't found a platform header than is in generic C and that
can be used by any C compiler. Ian didn't come back with an example
other than a glib remark (pun unintended).
Do you have any third-party libraries installed? If a library's documentation claims that it can be used with a wide variety of C compilers, that library's headers will, of necessity, be fairly portable. That doesn't guarantee that they are written in generic C, but the parts that are not generic will be controlled by conditional compilation (#if, etc) or by the use of macros whose definition depends upon which compiler is being used.

Example: one of the libraries I use the most is the HDF4 library . It's usable on a fairly wide variety of platforms:
<https://support.hdfgroup.org/release4/platforms.html>. If you download it <https://support.hdfgroup.org/release4/obtain.html>, you'll find that the headers are written in fairly generic C. I don't know whether you'd consider them sufficiently generic, but they're not loaded down with the huge number of gcc-isms that you'll find in gcc's versions of the standard library headers, nor are they loaded down with the huge number of MSVC-isms that you'll find in MicroSoft's versions of those same headers.
Keith Thompson
2017-07-05 23:21:53 UTC
Permalink
Raw Message
bartc <***@freeuk.com> writes:
[...]
Post by bartc
OK. For which C implementations is /usr/include/stdio.h for?
It depends.

On my system (Ubuntu 16.10), the file /usr/include/stdio.h
is provided by the libc6-dev:amd64 package, so it's for any C
implementation that uses libc6-dev:amd64 as its library. It appears
that gcc, clang, and tcc all use that file when I compile a program
containing `#include <stdio.h>`.

I would guess that it's similar on your system.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-07-05 21:33:33 UTC
Permalink
Raw Message
Post by m***@gmail.com
#include <stdio.h>
(which for some reason has the need to pull in a couple of dozen more
'__builtin_va_list'
being undefined. That is not a standard C feature, so what is it doing
in a universal header?)
The Standard requires that including <stdio.h> declare a prototype for
functions like vsprintf without requiring that <stdarg.h> be included
first (it expressly states that standard header files may be included in
any order). On the other hand, a program which includes <stdio.h> but not <stdarg.h>, and doesn't invoke vsprintf, would be allowed to create a static
symbol with the name va_list.

These two requirements together effectively mean that stdio.h must define
a type compatible with va_list, but cannot define a type with that name.
That can be worked around by using a reserved identifier for that type, but
any <stdio.h> file that does so will be dependent upon having something else
in the system define such an identifier.

Some header files like <stdarg.h> need to make use of implementation-specific
features, but many others like <stdio.h> wouldn't need to do so in the
absence of the problem mentioned above. Outside of such issues, most
standard header files could simply provide the definitions given within
the the Standard.
jacobnavia
2017-07-06 14:01:53 UTC
Permalink
Raw Message
Post by bartc
Post by jacobnavia
Not unless someone else comes in to look after supplying headers.
Just install lcc-win and be done with it Bart
My project was a C compiler, which everyone keeps telling me is
completely separate from every other part of a practical language
system, including even standard headers.
Standard headers, OK, but I didn't bargain for all these other headers,
which apparently are so difficult to write that they HAVE to be
customised to each compiler, even on the same platform.
Anyway I have your compiler and a number of others. But it's rare that I
have a substantial program that compiles effortlessly with all of them.
And some have their own problems (tcc doesn't have winsock2.h either,
nor its plethora of dependent headers, although it's more likely to be
able to use mingw's headers with less change).
Because I disagree with the idea that platform-specific headers need to
be written in incompatible, non-standard code, with dedicated versions
that only work with one compiler (and often incomprehensibly), I'm not
going to compound the issue by doing the same!
The windows headers from lcc-win do not use anything else than standard
C as far as I remember. I eliminated all stuff specific to MSVBC

Just use them.
Keith Thompson
2017-07-06 16:39:37 UTC
Permalink
Raw Message
Post by jacobnavia
Post by bartc
Post by jacobnavia
Not unless someone else comes in to look after supplying headers.
Just install lcc-win and be done with it Bart
My project was a C compiler, which everyone keeps telling me is
completely separate from every other part of a practical language
system, including even standard headers.
Nobody told you that they're *completely* separate. The components of
an implementation have to work together.

[...]
Post by jacobnavia
The windows headers from lcc-win do not use anything else than standard
C as far as I remember. I eliminated all stuff specific to MSVBC
Just use them.
Are those headers licensed in a way that won't cause any problems?
For example, could someone bundle them with a commercial product without
your permission?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
bartc
2017-07-05 20:24:33 UTC
Permalink
Raw Message
Post by bartc
Post by m***@gmail.com
I found more missing things in the header files of mcc64.c.
When I compile and run chkccomp.c it shows me that some functions are
I will have a closer look at this build system tomorrow. And I'll update
the mcc64 and its headers then too.
I've put more information about what I'm up to here:

https://github.com/bartg/langs/blob/master/bccproj/Seed7.md
--
bartc
Ike Naar
2017-07-05 05:06:43 UTC
Permalink
Raw Message
Post by m***@gmail.com
len = strlen(includeDir);
if (len > 1 && (includeDir[len - 1] == '/' ||
includeDir[len - 1] == '\\')) {
sprintf(command, "mcc64 -i:%s %s", includeDir, source);
} else {
sprintf(command, "mcc64 -i:%s/ %s", includeDir, source);
} /* if */
That doesn't work if includeDir contains the single-character string "/".
Is there a reason for testing 'len > 1' instead of 'len > 0' ?
s***@casperkitty.com
2017-07-04 16:45:08 UTC
Permalink
Raw Message
Post by bartc
Post by m***@gmail.com
There seems to be also a problem, when the file name of an include
Lex error include?
or
Lex error Can't find include file
#define STDIO "stdio.h"
#include STDIO
Although there may conceivably be some implementations where everything
between the first and last quote marks in an "include" directive is
regarded as a filename (even if that name would contain quotes) and some
programs might rely upon that, many things would be much cleaner if the
Standard specified that string literals in #include files should be
concatenated the same as elsewhere in the absence of any compelling reason
for an implementation to do otherwise, and also specified an alternative
string-literal-based form for <header names>.

If many source files located at different points in the source tree need to
make reference to some headers associated with the foo library, using

#include FOOLIB_PATH "defs.h"
#include FOOLIB_PATH "graphics.h"

and using '-DFOOLIB_PATH="/file/path/to/foo_lib"' on the command line to
set the location of that library seems much cleaner than having to either
have everyplace that includes those headers know their whereabouts, or
else having to include the directory containing the foo library headers
in the global include search path (which could then break any code that
would need to use some other library's graphics.h header).
James R. Kuyper
2017-07-05 15:01:50 UTC
Permalink
Raw Message
Am Dienstag, 4. Juli 2017 00:25:44 UTC+2 schrieb Bart:
...
Post by bartc
#define FILE <sys/stat.h>
#include FILE
I can't do. (And if it involves building a filename from the tokens
'sys', '/', 'stat', '.' and 'h', then I don't want to! Sheesh...)
I find it hard to imagine how it would be difficult to do that. However:

"A preprocessing directive of the form

# include pp-tokens new-line

(that does not match one of the two previous forms) is permitted. The
preprocessing tokens after include in the directive are processed just
as in normal text. (Each identifier currently defined as a macro name is
replaced by its replacement list of preprocessing tokens.) The directive
resulting after all replacements shall match one of the two previous
forms.170) The method by which a sequence of preprocessing tokens
between a < and a > preprocessing token pair or a pair of " characters
is combined into a single header name preprocessing token is
implementation-defined." (6.10.2p4)

"implementation-defined" gives you a lot of freedom - certainly enough
to justify breaking such code (for instance, it could be defined as
always resulting in an empty file name). So just define the combination
method to be whatever is convenient for your implementation, the only
significant constraint is that the result of the combination must be a
single preprocessing token - which means that you will have to figure
out where the bracketing "" or <> tokens are.
bartc
2017-07-05 15:19:14 UTC
Permalink
Raw Message
Post by James R. Kuyper
...
Post by bartc
#define FILE <sys/stat.h>
#include FILE
I can't do. (And if it involves building a filename from the tokens
'sys', '/', 'stat', '.' and 'h', then I don't want to! Sheesh...)
I find it hard to imagine how it would be difficult to do that.
It is just Wrong.

Imagine writing a string literal which has to be constructing by
concatenating identifiers and symbols. So:

Hello!

is two tokens. And it can go wrong; this works (looks for a file NULL.h):

#include <stdio.h>
#include <NULL.h>

But:

#include <stdio.h>
#define FILE <NULL.h>
#include FILE

gives an error (assuming that NULL is (void*)0 or similar). You can
think up other examples where a macro 'stdio' exists, and it screws any
<stdio.h> which is part of a macro.

And why doesn't that first NULL also expand to (void*)0? See, there are
lots of inconsistencies. Only C can complicate something that ought to
be so straightforward.
--
bartc
j***@verizon.net
2017-07-05 20:23:31 UTC
Permalink
Raw Message
Post by bartc
Post by James R. Kuyper
...
Post by bartc
#define FILE <sys/stat.h>
#include FILE
I can't do. (And if it involves building a filename from the tokens
'sys', '/', 'stat', '.' and 'h', then I don't want to! Sheesh...)
I find it hard to imagine how it would be difficult to do that.
It is just Wrong.
Imagine writing a string literal which has to be constructing by
Hello!
#include <stdio.h>
#include <NULL.h>
#include <stdio.h>
#define FILE <NULL.h>
#include FILE
gives an error (assuming that NULL is (void*)0 or similar). You can
think up other examples where a macro 'stdio' exists, and it screws any
<stdio.h> which is part of a macro.
And why doesn't that first NULL also expand to (void*)0?
The rules are set up that way so that people can have a choice of whether or not macro substitution is performed. Think how inconvenient it would be if someone you had no control over has placed a relevant header in a file they chose to name "NULL.h". Other people have complicated needs that require construction of the file name using preprocessing directives. The rules as written allow for both options.

If you don't want macro substitution to be performed, use #include <filename> or #include "filename". If you do want macro substitution to be performed, arrange that the relevant #include directive doesn't match either of those two forms until after macro substitutions have been performed (this implies that at least one of the two bracketing characters must be created by macro substitution).
bartc
2017-07-05 20:50:17 UTC
Permalink
Raw Message
[Syntax of #include]
Post by j***@verizon.net
Post by bartc
And why doesn't that first NULL also expand to (void*)0?
The rules are set up that way so that people can have a choice of whether or not macro substitution is performed. Think how inconvenient it would be if someone you had no control over has placed a relevant header in a file they chose to name "NULL.h". Other people have complicated needs that require construction of the file name using preprocessing directives. The rules as written allow for both options.
If you don't want macro substitution to be performed, use #include <filename> or #include "filename". If you do want macro substitution to be performed, arrange that the relevant #include directive doesn't match either of those two forms until after macro substitutions have been performed (this implies that at least one of the two bracketing characters must be created by macro substitution).
6.4.7 says Header Names are one of:

< h-char-sequence >
" q-char-sequence "

They are not defined in terms of tokens. p3 further goes on to talk
about characters.

But when expanding macros, eg. the expansion of FILE here:

#define FILE "<stdio.h>"

the results will be a stream of tokens, not characters.

However A.3/6.10 then define an include directive as:

# include pp-tokens new-line

This contradicts the stuff about header names.

pp-tokens is a list of preprocessing tokens. And a single preprocessing
token can be a header name. As well as a punctuator, and punctuators can
include < and >.

Presumably if '#include <' is seen, the < will be the start of a header
name, not a punctuator. But then an identifier may be seen that is a
macro that expands to something starting with <. But now the stream
after that < token will be further tokens not characters.....?

Oh, there was something else. I think it is possible to have this:

#include "abc" "/" "def.h"

The strings concatenate. But these aren't normal strings, but
q-char-sequences. Do the same string contenation rules apply to those?

Yes, it's all perfectly clear!
--
bartc
s***@casperkitty.com
2017-07-05 21:39:43 UTC
Permalink
Raw Message
Post by bartc
#include "abc" "/" "def.h"
The strings concatenate. But these aren't normal strings, but
q-char-sequences. Do the same string contenation rules apply to those?
The Standard would allow compilers to interpret the above as including a
file "abc/def.h", and it would seem logical and convenient for compilers
should do so absent a compelling reason to do otherwise. I haven't found
any compilers that allow such concatenation, however.
j***@verizon.net
2017-07-05 22:05:24 UTC
Permalink
Raw Message
On Wednesday, July 5, 2017 at 4:50:28 PM UTC-4, Bart wrote:
...
Post by bartc
< h-char-sequence >
" q-char-sequence "
They are not defined in terms of tokens.
Correct. A header-name is a type of pre-processing token (6.4p1). It wouldn't make sense for it to be made up of tokens (pre-processing or otherwise).

...
Post by bartc
# include pp-tokens new-line
This contradicts the stuff about header names.
No, it does not. Since a header name is a pp-token,

#include header-name newline

matches the syntax given in 6.10p1. Sections 6.10.2p2 and p3 describe what happens if there's exactly one pp-token between #include and newline, and that token is a header-name. 6.10.2p4 describes what happens in all other cases where a line starts with #include followed by at least one pre-processing token.
Post by bartc
pp-tokens is a list of preprocessing tokens. And a single preprocessing
token can be a header name. As well as a punctuator, and punctuators can
include < and >.
Presumably if '#include <' is seen, the < will be the start of a header
name, not a punctuator.
That's a reasonable presumption, but it's not guaranteed. Like any other type of pre-processing token, you need to make sure that matches the grammar rules for a header name, before parsing it as such. If it doesn't match those rules, then the '<' will have to be parsed as a punctuator, and the rest of the line will have to parsed as other kinds of pre-processing tokens.
Post by bartc
... But then an identifier may be seen that is a
macro that expands to something starting with <. But now the stream
after that < token will be further tokens not characters.....?
You make it more complicated than it needs to be, and then complain about the complication you created. Here's how you should handle it:

1. Identify a line starting with # and include as a #include preprocessing directive.
2. If followed immediately by a '<', check whether the remaining tokens qualify as the first kind of header-name. Otherwise, if followed by a '"', check whether the remaining tokens qualify as the second kind of header name. Otherwise, parse the rest of the line as a sequence of one or more of the other types of pre-processing tokens.
3. If that third option applies, perform macro replacement on that set of pre-processing tokens. Then combine those tokens into a single pre-processing token, using whatever method you find convenient. If the result of that combination doesn't match one of the first two forms for a #include directive, generate a diagnostic, and then proceed in whatever fashion you think appropriate.
Post by bartc
#include "abc" "/" "def.h"
The strings concatenate.
But these aren't normal strings, but
q-char-sequences. Do the same string contenation rules apply to those?
q-char-sequences aren't string literals. The string concatenation rules don't apply to them. The only clause that tells you what to do with a q-char-sequence is 6.10.2p3, and that clause only applies if there's only a single q-char-sequence bracketed by a single pair of double quotes. Therefore, you have to use 6.10.2p4 instead. That requires parsing those characters as three separate string-literals, and then combining them into a single pre-processing token in a manner that you're free to define. Assuming that the combination method you choose to define results in a proper header-name, the entire pre-processing directive, including the header-name, gets replaced by the content of the corresponding header. Otherwise, the behavior is undefined, and you can do whatever you want.
All of this occurs during translation phase 4. String literal concatenation doesn't occur until translation phase six, and therefore doesn't get a chance to apply to those three string literals - they've long since been combined, interpreted, and discarded.
bartc
2017-07-05 22:26:05 UTC
Permalink
Raw Message
Post by j***@verizon.net
Post by bartc
Presumably if '#include <' is seen, the < will be the start of a header
name, not a punctuator.
That's a reasonable presumption, but it's not guaranteed. Like any other type of pre-processing token, you need to make sure that matches the grammar rules for a header name, before parsing it as such. If it doesn't match those rules, then the '<' will have to be parsed as a punctuator, and the rest of the line will have to parsed as other kinds of pre-processing tokens.
Post by bartc
... But then an identifier may be seen that is a
macro that expands to something starting with <. But now the stream
after that < token will be further tokens not characters.....?
You make it more complicated than it needs to be,
No. It's complicated because it is.

and then complain about the complication you created. Here's how you
Post by j***@verizon.net
1. Identify a line starting with # and include as a #include preprocessing directive.
2. If followed immediately by a '<', check whether the remaining tokens qualify as the first kind of header-name.
Yeah, it's so easy! What are you reading at this point immediately after
#include, characters or tokens?

Well, both can yield "<". Then what, do you have to do a
character-by-character pass to check if this is a <...> name, or is it a
token pass?

What if the "<" is the result of a macro expansion? In which case a
character stream is no longer available, only tokens.

Or are there two separate handlers for "<", one for an actual "<" just
after include, and one for when "<" is a token from a macro expansion?

The same applies to the '"' after include, but now, if you try and read
it in token mode, you will get a whole string. You will need to peek at
the next significant character. And the " that occurs in a macro
expansion again needs different handling.
Post by j***@verizon.net
Otherwise, if followed by a '"', check whether the remaining tokens
Stop there: if reading as tokens, then do you start reading from just
before the ", or just after? If just before, you will get a string token
first; does that have to be examined to see if it's a q-string rather
than a regular string?

And if just after, then you will be looking at a token sequence followed
by a lone ", which will be an error. And you have will to deconstruct
each token into characters to see if they all qualify as a q-string.

It's a mishmash of character and token processing. I'm surprised
actually that so many compilers manage to make it work. But please don't
say it's simple.
Post by j***@verizon.net
Post by bartc
#include "abc" "/" "def.h"
The strings concatenate.
But these aren't normal strings, but
q-char-sequences. Do the same string contenation rules apply to those?
q-char-sequences aren't string literals. The string concatenation rules don't apply to them.
OK. I misunderstood an example by supercat like this:

#include PATH "file.h"

as code that was working now.
--
bartc
j***@verizon.net
2017-07-05 23:58:16 UTC
Permalink
Raw Message
Post by bartc
Post by j***@verizon.net
Post by bartc
Presumably if '#include <' is seen, the < will be the start of a header
name, not a punctuator.
That's a reasonable presumption, but it's not guaranteed. Like any other
type of pre-processing token, you need to make sure that matches the
grammar rules for a header name, before parsing it as such. If it doesn't
match those rules, then the '<' will have to be parsed as a punctuator, and
the rest of the line will have to parsed as other kinds of pre-processing
tokens.
Post by bartc
... But then an identifier may be seen that is a
macro that expands to something starting with <. But now the stream
after that < token will be further tokens not characters.....?
You make it more complicated than it needs to be,
No. It's complicated because it is.
Almost everything seems complicated to you; that's no indication of real-world complexity. This is complicated, but it's not as complicated as you make it out to be.
Post by bartc
and then complain about the complication you created. Here's how you
Post by j***@verizon.net
1. Identify a line starting with # and include as a #include preprocessing directive.
2. If followed immediately by a '<', check whether the remaining tokens
Sorry - "tokens" should have been characters.
Post by bartc
qualify as the first kind of header-name.
Yeah, it's so easy! What are you reading at this point immediately after
#include, characters or tokens?
You should be doing the same thing you always do after you complete reading one token (in this case, "include"): you should start reading characters until you've read far enough to determine that they are a match to one of the applicable grammar rules. In this case, if the line matches the first or second form for the #include directive, you won't be able to prove that fact until you've read the terminating newline, so you'll have to defer recognition of <filename.h> or "this.that" as header names until after that point.

If the line doesn't match one of those two forms, you can stop as soon as you've read the first character that is inconsistent with either form. Then go back to the place immediately after the "include" token, and parse the rest of the line as ordinary pre-processing tokens.
Post by bartc
What if the "<" is the result of a macro expansion? In which case a
character stream is no longer available, only tokens.
Then you're solidly in the context of the third form of the #include directive, and how you combine those pre-processing tokens into a single preprocessing token is implementation-defined. That means you're free to define whatever method you find most convenient for combining them.

For header-name, identifier, pp-number, character-constant, and string-literal tokens, I don't see how you could possibly correctly implement the semantics of C unless you retain the corresponding sequence of characters in some form as part of your representation of the token. For punctuator tokens, you could just represent them with a code number, but for each code number, there's a unique string that can be formed containing the associated punctuator character(s). That being the case. Thus, for every pre-processing token, there's an associated string of characters that either must be retained, or can trivially be re-created. Concatenating all of those strings would seem a simple task to me, and therefore the obvious combination algorithm to use.

But for some reason you consider it too complicated, in which case you can choose some combination algorithm that you consider simpler. In particular, if it really bothers you, the algorithm can consist of discarding all of those tokens and replacing them with an empty header name. As long as you document that this is the algorithm you're using, that won't violate conformance.
Post by bartc
Post by j***@verizon.net
Otherwise, if followed by a '"', check whether the remaining tokens
Stop there: if reading as tokens,
Sorry, that was a typo. It should have said "characters".
Post by bartc
It's a mishmash of character and token processing. I'm surprised
actually that so many compilers manage to make it work. But please don't
say it's simple.
I didn't say it was simple, only that it's simpler than you make it out to be.
Post by bartc
Post by j***@verizon.net
Post by bartc
#include "abc" "/" "def.h"
The strings concatenate.
But these aren't normal strings, but
q-char-sequences. Do the same string contenation rules apply to those?
q-char-sequences aren't string literals. The string concatenation rules don't apply to them.
#include PATH "file.h"
as code that was working now.
It might be, depending upon the implementation-defined aspects of that code. The standard mandates parsing that code as the third form of the #include directive, which means parsing PATH as an identifier and "file.h" as a string literal. I didn't pay any attention to his example, but I'd assume that PATH is the name of a macro whose replacement is a string literal. The standard requires that the macro replacement be performed, and then that string literal would be combined with "file.h". How that combination is done is implementation-defined, but the most obvious, and probably the most common, way to combine those string literals would be to concatenate them the same way that would have applied in translation phase six, if they had survived to that translation phase. If so, the resulting pre-processing token would match the second form of the #include directive, and would therefore be interpreted as a header-name.
s***@casperkitty.com
2017-07-05 22:45:50 UTC
Permalink
Raw Message
Post by j***@verizon.net
Post by bartc
Presumably if '#include <' is seen, the < will be the start of a header
name, not a punctuator.
That's a reasonable presumption, but it's not guaranteed. Like any other type of pre-processing token, you need to make sure that matches the grammar rules for a header name, before parsing it as such. If it doesn't match those rules, then the '<' will have to be parsed as a punctuator, and the rest of the line will have to parsed as other kinds of pre-processing tokens.
Does the Standard define the behavior of any source text which contains
a processed line (after backslash-newline elimination) that starts with

#include <

and does not immediately follow that < with the remainder of the name of
a standard header? What other constructs would it define for a line that
starts with the above?
j***@verizon.net
2017-07-06 00:09:48 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by j***@verizon.net
Post by bartc
Presumably if '#include <' is seen, the < will be the start of a header
name, not a punctuator.
That's a reasonable presumption, but it's not guaranteed. Like any other
type of pre-processing token, you need to make sure that matches the
grammar rules for a header name, before parsing it as such. If it doesn't
match those rules, then the '<' will have to be parsed as a punctuator, and
the rest of the line will have to parsed as other kinds of pre-processing
tokens.
Does the Standard define the behavior of any source text which contains
a processed line (after backslash-newline elimination) that starts with
#include <
and does not immediately follow that < with the remainder of the name of
a standard header? What other constructs would it define for a line that
starts with the above?
It does not fully define the behavior, but it does specify what should be done with that line in 6.10.2p4. A key part of that process is implementation-defined, which is why I can't say that it's defined by the standard. There's a shall that could be violated by the result, and if it is, the behavior is undefined. However, if that implementation-defined process does not produce a result that violates the "shall", there's nothing problematic about the final result.

For example, an implementation is free to define that if the resulting list of pre-processing tokens is

punctuator: <
identifier: filename
punctuator: .
identifier: txt
punctuator: "

then those tokens should be combined to folementation-define procerm the single preprocessing token "filename.txt", making

#include <filename.txt"

equivalent to

#include "filename.txt"
Tim Rentsch
2017-07-10 16:41:48 UTC
Permalink
Raw Message
Post by j***@verizon.net
Post by s***@casperkitty.com
Post by bartc
Presumably if '#include <' is seen, the < will be the start of a header
name, not a punctuator.
That's a reasonable presumption, but it's not guaranteed. Like any other
type of pre-processing token, you need to make sure that matches the
grammar rules for a header name, before parsing it as such. If it doesn't
match those rules, then the '<' will have to be parsed as a punctuator, and
the rest of the line will have to parsed as other kinds of pre-processing
tokens.
Does the Standard define the behavior of any source text which contains
a processed line (after backslash-newline elimination) that starts with
#include <
and does not immediately follow that < with the remainder of the name of
a standard header? What other constructs would it define for a line that
starts with the above?
It does not fully define the behavior, but it does specify what
should be done with that line in 6.10.2p4. A key part of that
process is implementation-defined, which is why I can't say that its
defined by the standard. There's a shall that could be violated by
the result, and if it is, the behavior is undefined. However, if
that implementation-defined process does not produce a result that
violates the "shall", there's nothing problematic about the final
result.
Did you overlook 6.4 p3?
Post by j***@verizon.net
For example, an implementation is free to define that if the
resulting list of pre-processing tokens is
punctuator: <
identifier: filename
punctuator: .
identifier: txt
punctuator: "
then those tokens should be combined to folementation-define procerm
the single preprocessing token "filename.txt", [...]
As I read the Standard, an implementation could do anything it
chooses in such a circumstance, because of an explicit statement
of undefined behavior given in 6.4 p3:

If a ' or a " character matches the last category, the
behavior is undefined.

(Note that "the last category" is "each non-white-space character
that cannot be one of the above".)

ISTM that a single quote or double quote by itself (outside of a
comment of course, and not part of a string or character literal)
is _always_ undefined behavior. Yes?
j***@verizon.net
2017-07-10 16:53:24 UTC
Permalink
Raw Message
Post by Tim Rentsch
Post by j***@verizon.net
Post by s***@casperkitty.com
Post by bartc
Presumably if '#include <' is seen, the < will be the start of a header
name, not a punctuator.
That's a reasonable presumption, but it's not guaranteed. Like any other
type of pre-processing token, you need to make sure that matches the
grammar rules for a header name, before parsing it as such. If it doesn't
match those rules, then the '<' will have to be parsed as a punctuator, and
the rest of the line will have to parsed as other kinds of pre-processing
tokens.
Does the Standard define the behavior of any source text which contains
a processed line (after backslash-newline elimination) that starts with
#include <
and does not immediately follow that < with the remainder of the name of
a standard header? What other constructs would it define for a line that
starts with the above?
It does not fully define the behavior, but it does specify what
should be done with that line in 6.10.2p4. A key part of that
process is implementation-defined, which is why I can't say that its
defined by the standard. There's a shall that could be violated by
the result, and if it is, the behavior is undefined. However, if
that implementation-defined process does not produce a result that
violates the "shall", there's nothing problematic about the final
result.
Did you overlook 6.4 p3?
Yes, I did. That does not, of course, render such an implementation non-conforming, but I should have chosen a different example, which avoided that issue. Possibly one that involves converting all characters to lower case?
Post by Tim Rentsch
Post by j***@verizon.net
For example, an implementation is free to define that if the
resulting list of pre-processing tokens is
punctuator: <
identifier: filename
punctuator: .
identifier: txt
punctuator: "
then those tokens should be combined to folementation-define procerm
the single preprocessing token "filename.txt", [...]
As I read the Standard, an implementation could do anything it
chooses in such a circumstance, because of an explicit statement
If a ' or a " character matches the last category, the
behavior is undefined.
(Note that "the last category" is "each non-white-space character
that cannot be one of the above".)
ISTM that a single quote or double quote by itself (outside of a
comment of course, and not part of a string or character literal)
is _always_ undefined behavior. Yes?
Loading...