Post by j***@yahoo.co.inI think, the main problem with the newbies like me is that we haven't
worked on vast number of different processor architectures. The
architectures that I have worked on have same size and same
representation for all types of pointers and they can be converted
to int/long without losing any information. ...
Can you please give an example (or point out some links)
of some architectures where different pointers have different size
and representation, where the pointers are not plain integers ?
They are not as common today as they once were. On the other
hand, they are more common today than they were before. :-)
(That is, they were common, then they became rare, and now they
less-rare than the intermediate time.)
In the old days of "mainframes" and "minicomputers", there were
a lot of machines that were "word-addressed" (rather than "byte-
addressed", as most of today's machines are). That is, while
addresses were still integral units, "address 0" might contain
the machine's first 36-bit word, then "addresss 1" contains the
second group of 36 bits, and so on.
This was in fact the case on Univac 11xx-series architectures. To
make a C compiler work on such a machine, you have two options:
- make "char" 36 bits, or
- make "char" 9 bits, and pack 4 "char"s into a machine word.
For many reasons, the latter was usually (maybe always) chosen.
The underlying hardware was, peculiarly, "semi-capable" of addressing
a 9-bit "quarterword" within a word: if you knew *which* quarter
word to select at compile time, you could do that; but if you had
to pick one out at runtime, you could not.
(I never got into the details of the few C compilers available for
11xx machines, so this is about all I can say about those.)
On some Cray machines, the same problem led to putting the byte
offset for the (64-bit) machine word address in some of the high
order bits of a "char *" pointer, so if you had some machine word
at address 0x1234, and set a "char *" to point to it:
char *cp = (char *)0x1234;
printf("cp = %16.16lx; cp + 1 = %16.16lx\n", (long)cp, (long)(cp + 1));
the values printed would be 0000000000001234 and 0001000000001234
respectively. The high bits count 0 to 7, then the low bits
increment.
On some models of the PR1ME, the same problem occurred, but all 32
bits of the 32-bit address of the 16-bit-wide machine word were in
use, so "char *" became 48 bits wide, with the extra 16 bits holding
a single bit specifying which byte of the 16-bit word to use.
On the Data General Eclipse (MV/10000 series), which evolved out
of the Nova, the hardware could address individual 8-bit bytes
directly. However, there were *two* native pointer formats, one
for 8-bit bytes, and one for 16-bit words. The C compiler used
both: "char *" used 8-bit-byte pointers, and "short *" and other
larger data types (int and long) used 16-bit-word pointers. The
difference between byte and word pointer was that the word pointer
had one extra bit at the top, the "indirect" bit, while the byte
pointer had one extra bit at the bottom, the "byte offset". To
convert a byte pointer to a word pointer, one did a right-shift,
discarding the byte-offset and introducing a zero-bit for the
"indirect" bit; to convert a word pointer to a byte pointer, one
did a left-shift, discarding the "indirect" bit and introducing
a zero bit for the byte offset.
No doubt there were others; those are the only ones I know off-hand.
We also have the IBM AS/400, even today; but its main "odd" feature
for pointers is that function pointers are much wider than data
pointers. (Data pointers seem to be 64 bits, so should fit in a
"long long".)
Today, we are seeing a resurgence of the "dual mode" systems that
were common in the 80286 days, except instead of "16-bit pointer"
and "32-bit pointer", we now have "32-bit pointer" and "64-bit
pointer". In some cases, the "mode" (32 or 64 bit addressing) of
the machine is per-process, but in at least one, it is per-stack-
frame: on the V9 sparc, one indicates that the current stack frame
is operating in 64-bit mode by offseting the stack pointer by 2047:
a %sp whose numerical value is congruent to zero mod 8 indicates
a 32-bit frame, while one whose numerical value is congruent to 7
mod 8 indicates a 64-bit frame. (Thus, it should be possible to
call 32-bit libraries from 64-bit programs, and vice versa, provided
the 32-bit code is only dealing with addresses in the lowest four
gigabytes.)
Post by j***@yahoo.co.inOn such architecures, what happens when a pointer is converted
to an integer or viceversa?
This depends on the machine. If there is an integral type wide
enough, it generally "just works". The conversion is usually
entirely straightforward: for instance, on the Eclipse, converting
a pointer -- whether byte or word -- to a 32-bit integer leaves
the bit pattern unchanged. This allows you to inspect or set the
byte-offset or indirect bit, whichever the pointer has. Note that
this means the ring and segment numbers are shifted one bit left
or right respectively (pointers on the Eclipse were split into
"ring", "segment", and "base address", along with the indirect or
byte-offset bit at the top or bottom).
Probably the PR1ME simply could not convert "char *" to integral.
On the 80286, you had to use a "long" to store a 32-bit pointer.
On some of today's 32-and/or-64-bit machines, you may have to use
a "long long" to store a pointer. (C compiler writers can choose
from various models: "ILP32" means int, long, and pointers are all
32 bits; "I32LP64" means "int is 32 bits, long and pointers are 64
bits"; and "IL32P64" means "int and long are 32 bits, pointers are
64 bits". While I32LP64 is clearly much more elegant, some compiler
writers have chosen IL32P64 to accomodate poorly-written C code
that assumes "int" and "long" are identical. Of course, this breaks
C code that assumes "long" and "pointer" are identical -- but so
it goes. Note that only ILP32 keeps broken code "working"; but it
gives up the advantage of the 64-bit address space!)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.