Discussion:
Good hash for pointers
(too old to reply)
Malcolm McLean
2024-05-23 11:11:19 UTC
Permalink
What is a good hash function for pointers to use in portable ANSI C?

The pointers are nodes of a tree, which are read only, and I want to
associate read/write data with them. So potentially a lage number of
pointers,and they might be consecutively ordered if they are taken from
an array, or they might be returned from repeared calls to malloc() with
small allocations. Obviously I have no control over pointer size or
internal representation.
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
Richard Harnden
2024-05-23 14:37:44 UTC
Permalink
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
All your pointers from malloc are going to be unique.
All of their low bits are going to be zero, because they need to align
on some n-bit boundary.
All of their high bit are going to be the same, because that's just how
it works.

So just take the middle: hash = ((intptr_t) ptr) >> 4 & 0xffff;

Good enough?
Keith Thompson
2024-05-23 22:51:29 UTC
Permalink
Post by Richard Harnden
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
All your pointers from malloc are going to be unique.
All of their low bits are going to be zero, because they need to align
on some n-bit boundary.
All of their high bit are going to be the same, because that's just
how it works.
So just take the middle: hash = ((intptr_t) ptr) >> 4 & 0xffff;
I'd use uintptr_t, not intptr_t. I prefer not to think about how
bitwise operations work on signed values unless I have to.

You're assuming a 16-bit hash. I have no idea whether that suits
Malcolm's requirements.

Another approach might be to divide the pointer representation into
N-bit chunks and xor them together.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
Malcolm McLean
2024-05-23 23:42:29 UTC
Permalink
Post by Keith Thompson
Post by Richard Harnden
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
All your pointers from malloc are going to be unique.
All of their low bits are going to be zero, because they need to align
on some n-bit boundary.
All of their high bit are going to be the same, because that's just
how it works.
So just take the middle: hash = ((intptr_t) ptr) >> 4 & 0xffff;
I'd use uintptr_t, not intptr_t. I prefer not to think about how
bitwise operations work on signed values unless I have to.
You're assuming a 16-bit hash. I have no idea whether that suits
Malcolm's requirements.
Another approach might be to divide the pointer representation into
N-bit chunks and xor them together.
I have a tree structure which represents an XML document. So potentaiily
very large, though the code is for the Baby X resource compiler, and
people are not want to embed absolutely massive XML data files into C
source.

I'm implemententing XPath querys for it, and my alorithm requires some
workspace data, actually just a boolean flag, associated with each node.
So I chose a hash table as the most efficent way of providing the flag,
without having to touch the XMLNODE structure itself.

The current parser allocates the nodes with repeated small requests to
malloc(). However of course this is the sort of thing that could be
optimised to use a fixed block allocator. The XPath module shouldn't
break as result.
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
Kaz Kylheku
2024-05-23 20:34:13 UTC
Permalink
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I don't think that it's practical to try to do this in some portable
way, let alone to an old dialect of C from 1989.

Let's say we have a type uintptr_t to which we can convert a void *
pointer.

In the TXR Lisp run time, here is what I did (paraphrased):

switch (CHAR_BIT * sizeof (void *)) {
case 32:
return ((uintptr_t) ptr) >> 4;
case 64:
return ((uintptr_t) ptr) >> 5;
}

The idea is that on 32 bits, we suspect that pointers from malloc may be
aligned on 16 bits, so the least significant four bits don't tell us
anything useful; they are likely all zero, which is bad. On 64 bits, we
throw away five bits.

This is actually an important hashing function because it's used
for hashing objects that use eq equality even when compared using
equal.

For instance, all structs for which an equality substitute is not
defined.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Tim Rentsch
2024-05-23 22:49:42 UTC
Permalink
Post by Malcolm McLean
What is a good hash function for pointers to use in portable
ANSI C?
I have a preliminary question. Do you really mean ANSI C, or
is C99 acceptable?
Malcolm McLean
2024-05-23 23:43:59 UTC
Permalink
Post by Tim Rentsch
Post by Malcolm McLean
What is a good hash function for pointers to use in portable
ANSI C?
I have a preliminary question. Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
Tim Rentsch
2024-05-23 23:52:03 UTC
Permalink
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
What is a good hash function for pointers to use in portable
ANSI C?
I have a preliminary question. Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better. I'm asking about
what your requirements are.
Malcolm McLean
2024-05-24 00:28:45 UTC
Permalink
Post by Tim Rentsch
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I have a preliminary question. Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better. I'm asking about
what your requirements are.
C 89.
I don't want to pull in C99 types and so on just for a hash function.
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
Tim Rentsch
2024-05-24 01:39:56 UTC
Permalink
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I have a preliminary question. Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better. I'm asking about
what your requirements are.
C 89.
I don't want to pull in C99 types and so on just for a hash function.
In that case I think you are stuck with using a half-baked
solution. The standard integer types available in C89 just
aren't a good fit in a 64-bit world.
bart
2024-05-24 10:14:36 UTC
Permalink
Post by Tim Rentsch
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I have a preliminary question. Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better. I'm asking about
what your requirements are.
C 89.
I don't want to pull in C99 types and so on just for a hash function.
In that case I think you are stuck with using a half-baked
solution. The standard integer types available in C89 just
aren't a good fit in a 64-bit world.
I assume the C89 implementation is one that can target current 64 bit
machines.

Then char, short, int, long long will almost certainly have widths of 8,
16, 32 and 64 bits respectively.

(I don't know if 'long long' was part of C89, but it sounds like Malcolm
just doesn't want to be bothered with stdint.h, and any compiler used is
like to support it.

I can't stand it either. Just today I wrote these two lines:

typedef unsigned long long u64;
typedef unsigned char byte;

to avoid stdint.h and its ugly set of types.)
Malcolm McLean
2024-05-24 11:05:19 UTC
Permalink
Post by bart
Post by Tim Rentsch
Post by Malcolm McLean
Post by Malcolm McLean
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I have a preliminary question.  Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better.  I'm asking about
what your requirements are.
C 89.
I don't want to pull in C99 types and so on just for a hash function.
In that case I think you are stuck with using a half-baked
solution.  The standard integer types available in C89 just
aren't a good fit in a 64-bit world.
I assume the C89 implementation is one that can target current 64 bit
machines.
Then char, short, int, long long will almost certainly have widths of 8,
16, 32 and 64 bits respectively.
(I don't know if 'long long' was part of C89, but it sounds like Malcolm
just doesn't want to be bothered with stdint.h, and any compiler used is
like to support it.
  typedef unsigned long long u64;
  typedef unsigned char byte;
to avoid stdint.h and its ugly set of types.)
The code is very clean.

And it seems I've got to introduce a lot of ugliness for an efficent
hash function. But pointers will of course be in registers, and the hash
index should also be in a register. And the hash lookup is in the inner
loop, and it is called for every node in the xml document which is
visited. So the hash function should be an all register, inline function.

Then I can't be the first person to want to use a pointer as a key for
hash table.

And I don't want things to break if pointer representation changes.
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
Chris M. Thomasson
2024-05-24 17:49:42 UTC
Permalink
On 5/24/2024 4:05 AM, Malcolm McLean wrote:
[...]
Post by Malcolm McLean
Then I can't be the first person to want to use a pointer as a key for
hash table.
And I don't want things to break if pointer representation changes.
Check this out, it relies on hashing pointers, the multex... ;^)

https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/SkSqpSxGCAAJ

It can be used for many different things...
Chris M. Thomasson
2024-05-24 17:51:45 UTC
Permalink
Post by Chris M. Thomasson
[...]
Post by Malcolm McLean
Then I can't be the first person to want to use a pointer as a key for
hash table.
And I don't want things to break if pointer representation changes.
Check this out, it relies on hashing pointers, the multex... ;^)
https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/SkSqpSxGCAAJ
It can be used for many different things...
The naive hash is:
______________________
// A table of locks
struct mutex_table
{
std::vector<std::mutex> m_locks;

mutex_table(std::size_t size) : m_locks(size) { assert(size > 0); }

// totally contrived simple hash...
std::size_t hash(void const* ptr)
{
std::uintptr_t ptr_uint = (std::uintptr_t)ptr;
return (std::size_t)(((ptr_uint << 9) * 103) % m_locks.size());
}
};
______________________

;^)
Tim Rentsch
2024-05-24 13:18:15 UTC
Permalink
Post by bart
Post by Tim Rentsch
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I have a preliminary question. Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better. I'm asking about
what your requirements are.
C 89.
I don't want to pull in C99 types and so on just for a hash
function.
In that case I think you are stuck with using a half-baked
solution. The standard integer types available in C89 just
aren't a good fit in a 64-bit world.
I assume the C89 implementation is one that can target current 64
bit machines.
Then char, short, int, long long will almost certainly have widths
of 8, 16, 32 and 64 bits respectively.
C89 doesn't have long long.
Post by bart
(I don't know if 'long long' was part of C89, but it sounds like
Malcolm just doesn't want to be bothered with stdint.h, and any
compiler used is like to support it.
What he said was C89. He didn't mention stdint.h. I take
him at his word. If what he wants is something different,
he should say clearly what it is, and not make people guess
about it. (To be clear this recommendation is intended for
every questioner, not just Malcolm.)
Malcolm McLean
2024-05-24 14:07:28 UTC
Permalink
Post by Tim Rentsch
Post by bart
Post by Tim Rentsch
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I have a preliminary question. Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better. I'm asking about
what your requirements are.
C 89.
I don't want to pull in C99 types and so on just for a hash
function.
In that case I think you are stuck with using a half-baked
solution. The standard integer types available in C89 just
aren't a good fit in a 64-bit world.
I assume the C89 implementation is one that can target current 64
bit machines.
Then char, short, int, long long will almost certainly have widths
of 8, 16, 32 and 64 bits respectively.
C89 doesn't have long long.
Post by bart
(I don't know if 'long long' was part of C89, but it sounds like
Malcolm just doesn't want to be bothered with stdint.h, and any
compiler used is like to support it.
What he said was C89. He didn't mention stdint.h. I take
him at his word. If what he wants is something different,
he should say clearly what it is, and not make people guess
about it. (To be clear this recommendation is intended for
every questioner, not just Malcolm.)
cJSON.c is C89. And these additions to the resource compiler were
inspired by a menntion of cJSON.c here.

So a C89 hash for a pointer to an unsigned int would be ideal. However
it might be impossible to write one which is both efficient in terms of
machine instructions and a good hash function in that it distributes the
hashes evenly given an uneven distribution of keys. And pointers
returned from repeated calls to malloc() are going to have an odd
distribution of values.
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
Scott Lurndal
2024-05-24 14:51:47 UTC
Permalink
Post by Malcolm McLean
cJSON.c is C89. And these additions to the resource compiler were
inspired by a menntion of cJSON.c here.
So a C89 hash for a pointer to an unsigned int would be ideal. However
it might be impossible to write one which is both efficient in terms of
machine instructions and a good hash function in that it distributes the
hashes evenly given an uneven distribution of keys. And pointers
returned from repeated calls to malloc() are going to have an odd
distribution of values.
Just stick the bool in your DOM[*] nodes. Forget about const.


[*] Document Object Model (i.e. tree structure).
Tim Rentsch
2024-05-25 09:49:00 UTC
Permalink
Post by Tim Rentsch
Post by bart
Post by Tim Rentsch
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
Post by Tim Rentsch
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I have a preliminary question. Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better. I'm asking about
what your requirements are.
C 89.
I don't want to pull in C99 types and so on just for a hash function.
In that case I think you are stuck with using a half-baked
solution. The standard integer types available in C89 just
aren't a good fit in a 64-bit world.
I assume the C89 implementation is one that can target current 64
bit machines.
Then char, short, int, long long will almost certainly have widths
of 8, 16, 32 and 64 bits respectively.
C89 doesn't have long long.
Post by bart
(I don't know if 'long long' was part of C89, but it sounds like
Malcolm just doesn't want to be bothered with stdint.h, and any
compiler used is like to support it.
What he said was C89. He didn't mention stdint.h. I take
him at his word. If what he wants is something different,
he should say clearly what it is, and not make people guess
about it. (To be clear this recommendation is intended for
every questioner, not just Malcolm.)
cJSON.c is C89. And these additions to the resource compiler were
inspired by a menntion of cJSON.c here.
So a C89 hash for a pointer to an unsigned int would be ideal. However
it might be impossible to write one which is both efficient in terms
of machine instructions and a good hash function in that it
distributes the hashes evenly given an uneven distribution of
keys. And pointers returned from repeated calls to malloc() are going
to have an odd distribution of values.
I suggest you look for a way to write and use a hash function that
uses unsigned long long without that causing cJSON.c to need
compiling by a post-C89 ruleset.
David Brown
2024-05-24 15:00:16 UTC
Permalink
Post by bart
Post by Tim Rentsch
Post by Malcolm McLean
Post by Malcolm McLean
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I have a preliminary question.  Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better.  I'm asking about
what your requirements are.
C 89.
I don't want to pull in C99 types and so on just for a hash function.
In that case I think you are stuck with using a half-baked
solution.  The standard integer types available in C89 just
aren't a good fit in a 64-bit world.
I assume the C89 implementation is one that can target current 64 bit
machines.
Targeting 64-bit machines does not imply support for 64-bit types using
standard C89 integer types. You probably find that most, if not all,
C89 compilers that target 64-bit processors but have 32-bit "long" will
have extensions for a 64-bit type. (It won't be an "extended integer
type", as defined by the C standards, but it will be good enough for the
job.) But if you use such extensions then you are not using standard C89.
Post by bart
Then char, short, int, long long will almost certainly have widths of 8,
16, 32 and 64 bits respectively.
(I don't know if 'long long' was part of C89, but it sounds like Malcolm
just doesn't want to be bothered with stdint.h, and any compiler used is
like to support it.
C89 does not have "long long". And "long" is 32-bit on 64-bit windows.
There is no way in C89 to pick an integer type that is at least or
exactly 64 bits. There is also no way to refer an integer type that is
big enough to support an integer conversion from a pointer. (In C99, in
<stdint.h>, you have "uintptr_t" and "intptr_t" which can be used if and
only if the platform supports an integer type big enough for the integer
conversion of pointers.)
Malcolm McLean
2024-05-24 16:10:45 UTC
Permalink
Post by David Brown
Post by bart
Post by Tim Rentsch
Post by Malcolm McLean
Post by Malcolm McLean
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I have a preliminary question.  Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better.  I'm asking about
what your requirements are.
C 89.
I don't want to pull in C99 types and so on just for a hash function.
In that case I think you are stuck with using a half-baked
solution.  The standard integer types available in C89 just
aren't a good fit in a 64-bit world.
I assume the C89 implementation is one that can target current 64 bit
machines.
Targeting 64-bit machines does not imply support for 64-bit types using
standard C89 integer types.  You probably find that most, if not all,
C89 compilers that target 64-bit processors but have 32-bit "long" will
have extensions for a 64-bit type.  (It won't be an "extended integer
type", as defined by the C standards, but it will be good enough for the
job.)  But if you use such extensions then you are not using standard C89.
At the moment I would expect most people who run the Baby X resource
compiler to have a desktop machine with several gigabytes of memory
istalled and 64 bit pointers.
But whilst the file xpath.c is being developed to enable the resource
compiler to specify fields, of course it is a portable C file. I would
hope that people might use it elsewhere. Maybe on very strange
architectures. And of course I want it run on those architectures
without modification.
Post by David Brown
Post by bart
Then char, short, int, long long will almost certainly have widths of
8, 16, 32 and 64 bits respectively.
(I don't know if 'long long' was part of C89, but it sounds like
Malcolm just doesn't want to be bothered with stdint.h, and any
compiler used is like to support it.
C89 does not have "long long".  And "long" is 32-bit on 64-bit windows.
There is no way in C89 to pick an integer type that is at least or
exactly 64 bits.  There is also no way to refer an integer type that is
big enough to support an integer conversion from a pointer.  (In C99, in
<stdint.h>, you have "uintptr_t" and "intptr_t" which can be used if and
only if the platform supports an integer type big enough for the integer
conversion of pointers.)
So if we use uintptr_t the code might break.
And if we use unsigned long we have to assume maximum 32 bits.
However it's unlikely anyone will have an xml file with more than a
billion nodes.
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
Michael S
2024-05-24 16:27:48 UTC
Permalink
On Fri, 24 May 2024 17:10:45 +0100
Post by Malcolm McLean
So if we use uintptr_t the code might break.
And if we use unsigned long we have to assume maximum 32 bits.
However it's unlikely anyone will have an xml file with more than a
billion nodes.
It's not about # of nodes. It's about good hash functions tend to use
64-bit multiplication.
David Brown
2024-05-24 07:41:54 UTC
Permalink
Post by Malcolm McLean
Post by Malcolm McLean
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
I have a preliminary question.  Do you really mean ANSI C, or
is C99 acceptable?
C89 is better.
But the pass has been sold.
I'm not asking which you think is better.  I'm asking about
what your requirements are.
C 89.
I don't want to pull in C99 types and so on just for a hash function.
Tim is right that the <stdint.h> types are /vastly/ better for this sort
of thing. And you won't find any C89/C90 compilers that don't support
<stdint.h> even in C89/C90 mode. You might choose to avoid other C99
features, in case you have any users that insist on C89/C90, but you
can't get anywhere with just the standard C89/C90 types. On a platform
such as 64-bit Windows, pointers are 64-bit but the biggest C89/C90
integer type is "unsigned long", which is 32-bit on that platform. You
can't get at the upper bits of the pointer without compiler-specific
extensions - and if you allow extensions, you might as well allow
<stdint.h> and make life far simpler for everyone.

If you insist on using C89/C90, make sure that the code is compatible
(with the same semantics) for more modern standards. There are few
fanatics left who think C89/C90 is "better" than C99 or newer standards,
so the majority of potential users will want C99, C11, or - perhaps most
likely - a compiler-specific variation of one of these. It would be a
shame if your code relied on pre-C99 integer promotion rules, or used
implicit int, or had identifiers that conflict with newer standards.
These would undoubtedly be far worse for users than any purely imagined
problems caused by #include <stdint.h>.

(As a pedantic quibble, I believe ANSI invariably re-publish ISO C
standards as updated ANSI C standards. So "ANSI C" currently refers to
C17. It is best, IMHO, to explicitly say C89 and/or C90 rather than
"ANSI C".)
Chris M. Thomasson
2024-05-24 00:32:04 UTC
Permalink
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
The pointers are nodes of a tree, which are read only, and I want to
associate read/write data with them. So potentially a lage number of
pointers,and they might be consecutively ordered if they are taken from
an array, or they might be returned from repeared calls to malloc() with
small allocations. Obviously I have no control over pointer size or
internal representation.
I kind of remember an old one for 32-bit pointers. It was something like
multiply by a prime and shift to the right by 7 or 8. Shit, I think. It
was way back on comp.programming.threads.
Chris M. Thomasson
2024-05-24 01:59:19 UTC
Permalink
Post by Chris M. Thomasson
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
The pointers are nodes of a tree, which are read only, and I want to
associate read/write data with them. So potentially a lage number of
pointers,and they might be consecutively ordered if they are taken
from an array, or they might be returned from repeared calls to
malloc() with small allocations. Obviously I have no control over
pointer size or internal representation.
I kind of remember an old one for 32-bit pointers. It was something like
multiply by a prime and shift to the right by 7 or 8. Shit, I think. It
was way back on comp.programming.threads.
There was an old paper from microsoft that had a pointer hash. It was
used to index into a table of locks. I need to find it.
jak
2024-05-24 02:09:41 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
The pointers are nodes of a tree, which are read only, and I want to
associate read/write data with them. So potentially a lage number of
pointers,and they might be consecutively ordered if they are taken
from an array, or they might be returned from repeared calls to
malloc() with small allocations. Obviously I have no control over
pointer size or internal representation.
I kind of remember an old one for 32-bit pointers. It was something
like multiply by a prime and shift to the right by 7 or 8. Shit, I
think. It was way back on comp.programming.threads.
There was an old paper from microsoft that had a pointer hash. It was
used to index into a table of locks. I need to find it.
<https://wiki.osdev.org/Global_Descriptor_Table>
Bonita Montero
2024-05-24 18:28:39 UTC
Permalink
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
The pointers are nodes of a tree, which are read only, and I want to
associate read/write data with them. So potentially a lage number of
pointers,and they might be consecutively ordered if they are taken from
an array, or they might be returned from repeared calls to malloc() with
small allocations. Obviously I have no control over pointer size or
internal representation.
Use FNV.
Malcolm McLean
2024-05-24 18:57:17 UTC
Permalink
Post by Bonita Montero
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
The pointers are nodes of a tree, which are read only, and I want to
associate read/write data with them. So potentially a lage number of
pointers,and they might be consecutively ordered if they are taken
from an array, or they might be returned from repeared calls to
malloc() with small allocations. Obviously I have no control over
pointer size or internal representation.
Use FNV.
Here's an attempt.

/* FNV hash of a pointer */
static unsigned int hash(void *address)
{
int i;
unsigned long answer = 2166136261;
unsigned char *byte = (unsigned char *) &address;

for (i = 0; i < sizeof(void *); i++)
{
answer *= 16777619;
answer ^= byte[i];
}
return (unsigned int) (answer & 0xFFFFFFFF);
}

Now what will compilers make of that?
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
bart
2024-05-24 23:54:34 UTC
Permalink
Post by Malcolm McLean
Post by Bonita Montero
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
The pointers are nodes of a tree, which are read only, and I want to
associate read/write data with them. So potentially a lage number of
pointers,and they might be consecutively ordered if they are taken
from an array, or they might be returned from repeared calls to
malloc() with small allocations. Obviously I have no control over
pointer size or internal representation.
Use FNV.
Here's an attempt.
/* FNV hash of a pointer */
static unsigned int hash(void *address)
{
    int i;
    unsigned long answer = 2166136261;
    unsigned char *byte = (unsigned char *) &address;
    for (i = 0; i < sizeof(void *); i++)
    {
        answer *= 16777619;
        answer ^= byte[i];
    }
    return (unsigned int) (answer & 0xFFFFFFFF);
}
Now what will compilers make of that?
Compiler, or performance?

I tried this function with the test program shown below. I used it to
populate a hash table of 64K entries with pointers from successive calls
to malloc.

Results, in terms of clashes, for different numbers N of entries
populated out of 64K were:

10000 1100
30000 12000
50000 67000
60000 216000
65535 5500000 (largest N possible)

Result were rather variable as malloc produces different patterns of
pointers on different runs. These were simply the result from the first run.

Was this good? I'd no idea. But as a comparison, I used my own hash
function, normally used to hash identifiers, shown below the main
program as the function 'bchash'.

If this is subsituted instead, the results were:

10000 230
30000 3800
50000 10300
60000 50300
65535 2700000

Hash tables need a certain amount of free capacity to stay efficient, so
3/4 full (about 50K/64K) is about right.

Again I don't know if these figures are good, they are just better than
from your hash() function, for the inputs I used, within this test, and
for this size of table.

No doubt there are much better ones.



------------------------------------------
#include <stdio.h>
#include <stdlib.h>

static unsigned int hash(void *address)
{
int i;
unsigned long answer = 2166136261;
unsigned char *byte = (unsigned char *) &address;

for (i = 0; i < sizeof(void *); i++)
{
answer *= 16777619;
answer ^= byte[i];
}
return (unsigned int) (answer & 0xFFFFFFFF);
}

void* table[65536];

int main(void) {
void* p;

int clashes=0, wrapped;
int j;

for (int i=0; i<30000; ++i) {
p = malloc(1);
j = hash(p) & 65535;

wrapped=0;
while (table[j]) {
++clashes;
++j;
if (j>65535) {
if (wrapped) { puts("Table full"); exit(1);}
wrapped=1;
j=0;
}
}
table[j] = p;

}
printf("Clashes %d\n", clashes);
}


------------------------------------------

static unsigned int bchash(void *address)
{
int i;
unsigned long hsum = 0;
unsigned char *byte = (unsigned char *) &address;

for (i = 0; i < sizeof(void *); i++) {
hsum = (hsum<<4) - hsum + byte[i];
}
return (hsum<<5) - hsum;
}
Tim Rentsch
2024-05-25 09:12:28 UTC
Permalink
Post by bart
Post by Malcolm McLean
Post by Bonita Montero
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
The pointers are nodes of a tree, which are read only, and I want
to associate read/write data with them. So potentially a lage
number of pointers,and they might be consecutively ordered if they
are taken from an array, or they might be returned from repeared
calls to malloc() with small allocations. Obviously I have no
control over pointer size or internal representation.
Use FNV.
Here's an attempt.
/* FNV hash of a pointer */
static unsigned int hash(void *address)
{
int i;
unsigned long answer = 2166136261;
unsigned char *byte = (unsigned char *) &address;
for (i = 0; i < sizeof(void *); i++)
{
answer *= 16777619;
answer ^= byte[i];
}
return (unsigned int) (answer & 0xFFFFFFFF);
}
Now what will compilers make of that?
Compiler, or performance?
I tried this function with the test program shown below. I used it to
populate a hash table of 64K entries with pointers from successive
calls to malloc.
Results, in terms of clashes, for different numbers N of entries
10000 1100
30000 12000
50000 67000
60000 216000
65535 5500000 (largest N possible)
Result were rather variable as malloc produces different patterns of
pointers on different runs. These were simply the result from the
first run.
Was this good? I'd no idea. But as a comparison, I used my own hash
function, normally used to hash identifiers, shown below the main
program as the function 'bchash'.
10000 230
30000 3800
50000 10300
60000 50300
65535 2700000
Hash tables need a certain amount of free capacity to stay efficient,
so 3/4 full (about 50K/64K) is about right.
Again I don't know if these figures are good, they are just better
than from your hash() function, for the inputs I used, within this
test, and for this size of table.
No doubt there are much better ones.
------------------------------------------
#include <stdio.h>
#include <stdlib.h>
static unsigned int hash(void *address)
{
int i;
unsigned long answer = 2166136261;
unsigned char *byte = (unsigned char *) &address;
for (i = 0; i < sizeof(void *); i++)
{
answer *= 16777619;
answer ^= byte[i];
}
return (unsigned int) (answer & 0xFFFFFFFF);
}
void* table[65536];
int main(void) {
void* p;
int clashes=0, wrapped;
int j;
for (int i=0; i<30000; ++i) {
p = malloc(1);
j = hash(p) & 65535;
wrapped=0;
while (table[j]) {
++clashes;
++j;
if (j>65535) {
if (wrapped) { puts("Table full"); exit(1);}
wrapped=1;
j=0;
}
}
table[j] = p;
}
printf("Clashes %d\n", clashes);
}
------------------------------------------
static unsigned int bchash(void *address)
{
int i;
unsigned long hsum = 0;
unsigned char *byte = (unsigned char *) &address;
for (i = 0; i < sizeof(void *); i++) {
hsum = (hsum<<4) - hsum + byte[i];
}
return (hsum<<5) - hsum;
}
It looks like your hash function was tuned for this testing
setup. With different choices for testing it does much
worse.

The testing is done with malloc() blocks all of the same
requested size, and that size is 1. Typical workloads
are likely to be both larger and more variable.

When adding a new entry finds a collision with an old
entry, linear probing is used to find a free slot. It's
well understood that linear probing suffers badly as the
load factor goes up. Better to take a few high bits of
the hash value -- as few as five or six is fine -- to
have the reprobe sequences have different strides.

Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
bart
2024-05-25 11:28:49 UTC
Permalink
Post by Tim Rentsch
It looks like your hash function was tuned for this testing
setup. With different choices for testing it does much
worse.
It wasn't tuned at all; it was a copy of what I've longed used within my
lexers.

It is designed to give a good spread when inputs are similar or change
sequentially. (Well, not exactly designed; more trial and error.)
Post by Tim Rentsch
The testing is done with malloc() blocks all of the same
requested size, and that size is 1. Typical workloads
are likely to be both larger and more variable.
When adding a new entry finds a collision with an old
entry, linear probing is used to find a free slot. It's
well understood that linear probing suffers badly as the
load factor goes up. Better to take a few high bits of
the hash value -- as few as five or six is fine -- to
have the reprobe sequences have different strides.
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier.
You mean Malcolm's version? That also uses a loop. Neither need to
process all 8 bytes of a 64-bit pointer; 6 will do, and probably just 4.
And such a loop can be trivially unrolled.

In a case like
Post by Tim Rentsch
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
My first thought in this thread was to just make use of the pointer
value directly. My version was written only to compare against Malcolm's
FNV code.

If I try the other suggestions, RH's and KK's which simply shift the
pointer value (similar to my idea), the results I got were interesting:
sometimes there were zero clashes (almost as though bits 4 to 19 of the
pointer were a perfect hash), but sometimes there were millions.

But when I tried to mix up the input values, then all methods seemed to
converge to similar results.

In that case you might as well use the simplest and fastest method.

However it really needs testing within the actual application.
Tim Rentsch
2024-05-25 18:12:43 UTC
Permalink
Post by Tim Rentsch
[...]
It looks like your hash function was tuned for this testing
setup. With different choices for testing it does much
worse.
It wasn't tuned at all; it was a copy of what I've longed used
within my lexers.
It is designed to give a good spread when inputs are similar or
change sequentially. (Well, not exactly designed; more trial
and error.)
The trial and error process you mention had the effect of tuning
your function for the testing setup used, whether you meant it to
or not.
Post by Tim Rentsch
The testing is done with malloc() blocks all of the same
requested size, and that size is 1. Typical workloads
are likely to be both larger and more variable.
When adding a new entry finds a collision with an old
entry, linear probing is used to find a free slot. It's
well understood that linear probing suffers badly as the
load factor goes up. Better to take a few high bits of
the hash value -- as few as five or six is fine -- to
have the reprobe sequences have different strides.
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier.
You mean Malcolm's version? That also uses a loop. Neither
need to process all 8 bytes of a 64-bit pointer; 6 will do, and
probably just 4.
And such a loop can be trivially unrolled.
Both your function and the function posted by Malcolm are slow.
It's just that yours is slower. It isn't hard to produce a
hash function of equal quality that runs more than twice as
fast.
Post by Tim Rentsch
In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
My first thought in this thread was to just make use of the
pointer value directly. My version was written only to compare
against Malcolm's FNV code.
If I try the other suggestions, RH's and KK's which simply
shift the pointer value (similar to my idea), the results I got
were interesting: sometimes there were zero clashes (almost as
though bits 4 to 19 of the pointer were a perfect hash), but
sometimes there were millions.
But when I tried to mix up the input values, then all methods
seemed to converge to similar results.
In that case you might as well use the simplest and fastest
method.
However it really needs testing within the actual application.
Let me tell you a little secret. Hash functions are like random
number generators: good ones never do exceptionally well or
exceptionally poorly. The reason for this is simple - whenever
there are scenarios where one does well then inevitably there are
other scenarios where it does poorly. A really high quality hash
function will never do badly, no matter what mix of inputs is
given. The key takeaway here is to try as many different input
sets as you can, but don't look for candidates that do well;
rather, throw away any that do poorly on any input mix. The ones
that are left are all, with high probability, just fine no matter
what application they are used in. That is not to say there is
no need to test within the intended application, it's always good
to do a sanity check, but the point of the sanity check is just
to make sure the earlier process didn't miss something; it isn't
the right way to compare alternatives.
bart
2024-05-25 19:31:22 UTC
Permalink
Post by Tim Rentsch
Post by Tim Rentsch
[...]
It looks like your hash function was tuned for this testing
setup. With different choices for testing it does much
worse.
It wasn't tuned at all; it was a copy of what I've longed used
within my lexers.
It is designed to give a good spread when inputs are similar or
change sequentially. (Well, not exactly designed; more trial
and error.)
The trial and error process you mention had the effect of tuning
your function for the testing setup used, whether you meant it to
or not.
Post by Tim Rentsch
The testing is done with malloc() blocks all of the same
requested size, and that size is 1. Typical workloads
are likely to be both larger and more variable.
When adding a new entry finds a collision with an old
entry, linear probing is used to find a free slot. It's
well understood that linear probing suffers badly as the
load factor goes up. Better to take a few high bits of
the hash value -- as few as five or six is fine -- to
have the reprobe sequences have different strides.
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier.
You mean Malcolm's version? That also uses a loop. Neither
need to process all 8 bytes of a 64-bit pointer; 6 will do, and
probably just 4.
And such a loop can be trivially unrolled.
Both your function and the function posted by Malcolm are slow.
It's just that yours is slower.
With unoptimised code, mine is up to 50% faster (suggesting an
inherently faster method).

Optimised via gcc -O3, it's 17% slower.

If I duplicate the tests in my language, which has a meagre optimiser,
then mine is 3 times the speed (I haven't looked at why; something about
the MM code that keeps locals out of registers I guess):

Unopt gcc-O3 My lang opt (uses 64 bits)

MM hash 2.7 0.34 1.9 seconds
BC hash 1.7 0.4 0.6

There are anyway any number of tweaks that can be made; no need to do
all 64 bits for a start, and the loops can be manually unrolled.
Suggesting it is slower is premature.
Post by Tim Rentsch
It isn't hard to produce a
hash function of equal quality that runs more than twice as
fast.
With or without the aid of an optimising compiler? You can't always tell
with the latter whether it's you who's clever, or the compiler.
Post by Tim Rentsch
Let me tell you a little secret. Hash functions are like random
number generators: good ones never do exceptionally well or
exceptionally poorly. The reason for this is simple - whenever
there are scenarios where one does well then inevitably there are
other scenarios where it does poorly. > The ones
that are left are all, with high probability, just fine no matter
what application they are used in. That is not to say there is
no need to test within the intended application, it's always good
to do a sanity check, but the point of the sanity check is just
to make sure the earlier process didn't miss something; it isn't
the right way to compare alternatives.
So what's a good one for use within the identifier lookup of a
compiler's lexer?
Tim Rentsch
2024-05-26 05:54:59 UTC
Permalink
Post by bart
Post by Tim Rentsch
Post by Tim Rentsch
[...]
It looks like your hash function was tuned for this testing
setup. With different choices for testing it does much
worse.
It wasn't tuned at all; it was a copy of what I've longed used
within my lexers.
It is designed to give a good spread when inputs are similar or
change sequentially. (Well, not exactly designed; more trial
and error.)
The trial and error process you mention had the effect of tuning
your function for the testing setup used, whether you meant it to
or not.
Post by Tim Rentsch
The testing is done with malloc() blocks all of the same
requested size, and that size is 1. Typical workloads
are likely to be both larger and more variable.
When adding a new entry finds a collision with an old
entry, linear probing is used to find a free slot. It's
well understood that linear probing suffers badly as the
load factor goes up. Better to take a few high bits of
the hash value -- as few as five or six is fine -- to
have the reprobe sequences have different strides.
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier.
You mean Malcolm's version? That also uses a loop. Neither
need to process all 8 bytes of a 64-bit pointer; 6 will do, and
probably just 4.
And such a loop can be trivially unrolled.
Both your function and the function posted by Malcolm are slow.
It's just that yours is slower.
With unoptimised code, mine is up to 50% faster (suggesting an
inherently faster method).
Optimised via gcc -O3, it's 17% slower.
On my test platform the Malcolm code runs between dead even up
to about 44% faster, depending on optimization level, except
for -O0 where it runs about 12% slower. Note that these results
represent only one measurement of each case, run on only one
target environment.
Post by bart
If I duplicate the tests in my language, which has a meagre optimiser,
then mine is 3 times the speed (I haven't looked at why; something
Unopt gcc-O3 My lang opt (uses 64 bits)
MM hash 2.7 0.34 1.9 seconds
BC hash 1.7 0.4 0.6
There are anyway any number of tweaks that can be made; no need to do
all 64 bits for a start, and the loops can be manually
unrolled. Suggesting it is slower is premature.
I am simply reporting results of empirical observations made on
my test system. Obviously different environments might produce
different results.
Post by bart
Post by Tim Rentsch
It isn't hard to produce a
hash function of equal quality that runs more than twice as
fast.
With or without the aid of an optimising compiler? You can't always
tell with the latter whether it's you who's clever, or the compiler.
As long as both functions are run at the same level of optimization
I don't think it matters much. For the sake of concreteness we can
stipulate code quality comparable to gcc -O1 if that helps you.
Post by bart
Post by Tim Rentsch
Let me tell you a little secret. Hash functions are like random
number generators: good ones never do exceptionally well or
exceptionally poorly. The reason for this is simple - whenever
there are scenarios where one does well then inevitably there are
other scenarios where it does poorly. > The ones
that are left are all, with high probability, just fine no matter
what application they are used in. That is not to say there is
no need to test within the intended application, it's always good
to do a sanity check, but the point of the sanity check is just
to make sure the earlier process didn't miss something; it isn't
the right way to compare alternatives.
So what's a good one for use within the identifier lookup of a
compiler's lexer?
That depends a lot on what sort of lookup is done and on the
degree to which the hash computation is integrated into the
lexing code. How many entries will the hash table have?
That also makes a difference. If the hash table is never
more than 50% full then we can expect most lookups will
succeed on the first try and almost all will succeed in
no more than two tries.
Bonita Montero
2024-05-25 15:00:11 UTC
Permalink
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Tim Rentsch
2024-05-25 17:40:22 UTC
Permalink
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor. I tested
this scheme against four other hash functions, and in
four out of five workloads it was always dead last, by
a noticeable margin.
Malcolm McLean
2024-05-25 17:56:29 UTC
Permalink
Post by Tim Rentsch
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor. I tested
this scheme against four other hash functions, and in
four out of five workloads it was always dead last, by
a noticeable margin.
The lower bits of a pointer are often all zeroes. And mutlipying by any
integer will not set them. And that is catastrophic for a hash.
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
Tim Rentsch
2024-05-25 18:23:55 UTC
Permalink
Post by Tim Rentsch
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor. I tested
this scheme against four other hash functions, and in
four out of five workloads it was always dead last, by
a noticeable margin.
The lower bits of a pointer are often all zeroes. And mutlipying
by any integer will not set them. And that is catastrophic for a
hash.
It can be but it doesn't have to be. It depends on how the hash
value is used to determine a place or places to look in the
table.
Janis Papanagnou
2024-05-25 21:13:31 UTC
Permalink
Post by Malcolm McLean
Post by Tim Rentsch
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor. I tested
this scheme against four other hash functions, and in
four out of five workloads it was always dead last, by
a noticeable margin.
The lower bits of a pointer are often all zeroes. And mutlipying by any
integer will not set them. And that is catastrophic for a hash.
I haven't read all the thread, but if I'd be going to tackle that
I'd first try to map the pointer onto a fitting integral type (if
that's possible; sizeof()), or first "fold" (XOR) the upper/lower
parts of the pointer (so that zero-sequences, if they appear, also
won't affect the outcome too badly as a side effect) to reduce the
size of the data, and finally use any established hash algorithm
on the (folded/reduced) integral type.

Shall the created hash codes also be portable across platforms?

Janis
Malcolm McLean
2024-05-25 22:07:45 UTC
Permalink
Post by Janis Papanagnou
Post by Malcolm McLean
Post by Tim Rentsch
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor. I tested
this scheme against four other hash functions, and in
four out of five workloads it was always dead last, by
a noticeable margin.
The lower bits of a pointer are often all zeroes. And mutlipying by any
integer will not set them. And that is catastrophic for a hash.
I haven't read all the thread, but if I'd be going to tackle that
I'd first try to map the pointer onto a fitting integral type (if
that's possible; sizeof()), or first "fold" (XOR) the upper/lower
parts of the pointer (so that zero-sequences, if they appear, also
won't affect the outcome too badly as a side effect) to reduce the
size of the data, and finally use any established hash algorithm
on the (folded/reduced) integral type.
Shall the created hash codes also be portable across platforms?
Yes, the Baby X resource compiler should be portable to any hosted
platform with a reasonable up to date C compiler.

But we also need portablity in another sense. The hash table is hashing
nodes of a XML document model created by the functions in the file
xmlparser2.c. These make repeated calls to malloc() for small
allocations of nodes. But I have no control, over either pointer
representation or the platform's implemetation of malloc(). So the
pointers could be at seemingly rsndom loctions in. memory, or they could
be evenly spaced at increments of 64 bytes, and thus differing in only a
few bits. Then if someone wants to help out with this project, an
obvious thing to do would be to speed up the xml parser by replacing the
calls to malloc() with a custom allocator. All the XPath functions are
in the file xpath.c. They shouldn't suddenly hit the treacle because
someone has changed the memory allocation system in xmlparser2.c.
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
bart
2024-05-25 22:42:13 UTC
Permalink
Post by Malcolm McLean
Post by Janis Papanagnou
I haven't read all the thread, but if I'd be going to tackle that
I'd first try to map the pointer onto a fitting integral type (if
that's possible; sizeof()), or first "fold" (XOR) the upper/lower
parts of the pointer (so that zero-sequences, if they appear, also
won't affect the outcome too badly as a side effect) to reduce the
size of the data, and finally use any established hash algorithm
on the (folded/reduced) integral type.
Shall the created hash codes also be portable across platforms?
Yes, the Baby X resource compiler should be portable to any hosted
platform with a reasonable up to date C compiler.
But we also need portablity in another sense. The hash table is hashing
nodes of a XML document model created by the functions in the file
xmlparser2.c. These make repeated calls to malloc() for small
allocations of nodes. But I have no control, over either pointer
representation or the platform's implemetation of malloc(). So the
pointers could be at seemingly rsndom loctions in. memory, or they could
be evenly spaced at increments of 64 bytes, and thus differing in only a
few bits.
OK, parsing is a task I'm familiar with. Parsing a normal language into
a syntax tree doesn't sound too different from parsing XML into a tree
of nodes.

Such a task can be done at up to several million lines per second for
source code. Are you getting throughput considerably below that?

It's not clear to me what the hashing is for on the nodes. What exactly
is the input being looked up, and what is being compared? Because
something sounds off: you don't normally look up a pointer to try and
find it in a table; you already have the pointer pointing at the data!
Post by Malcolm McLean
Then if someone wants to help out with this project, an
obvious thing to do would be to speed up the xml parser by replacing the
calls to malloc() with a custom allocator.
If that's a bottleneck (you need to establish that; perhaps use a double
call to malloc, and see if it slows down significantly), then the custom
allocator is simple: just allocate from a pool. It's simpler still if
you don't need to free that memory before the program ends
Richard Harnden
2024-05-26 18:58:31 UTC
Permalink
Post by bart
It's not clear to me what the hashing is for on the nodes. What exactly
is the input being looked up, and what is being compared? Because
something sounds off: you don't normally look up a pointer to try and
find it in a table; you already have the pointer pointing at the data!
This.

Are you really hashing the pointer?
Kaz Kylheku
2024-05-26 22:42:08 UTC
Permalink
Post by bart
It's not clear to me what the hashing is for on the nodes. What exactly
is the input being looked up, and what is being compared? Because
something sounds off: you don't normally look up a pointer to try and
find it in a table; you already have the pointer pointing at the data!
This happens when you need to associate objects with external
information which is not in those objects.

One such situation is when you replace use of strings as hash keys with
interned strings (atoms). Atoms are typically pointers and so hash that way.

This is a smart thing to do because atoms are faster to work with.
Comparing whether two values are the same atom is just a pointer
comparison, and hashing the atom is just a pointer hash: much faster
than a string hash, and with more easily assured good behavior.

The interning mechanism maintains its own hash table which maps the
strings to the atoms. By that mechanism, two identical strings are
mapped to the same atom. That table is only used at the boundaries of
the system when external data is read. The data contains identifiers
which are interned, and then they are manipulated as atoms.

Lisp languages have a symbol type, which is a kind of interned
string. Symbols are regularly used as keys into dictionary structures
like hash tables or assocation lists.

Certain important properties are put into the symbol itself, so
that no hashing is required to find them.

In classical Lisps, the global value of a variable named by a symbol
is stored in the symbol itself. The property list of a symbol is also
stored in the symbol.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Bonita Montero
2024-05-26 16:05:03 UTC
Permalink
Post by Malcolm McLean
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier.  In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor.  I tested
this scheme against four other hash functions, and in
four out of five workloads it was always dead last, by
a noticeable margin.
The lower bits of a pointer are often all zeroes. And mutlipying by
any integer will not set them. And that is catastrophic for a hash.
If you take a large integer they will be scrambled and if the
prime is large enough there will be a good equal distribution.
Bonita Montero
2024-05-26 16:07:32 UTC
Permalink
Post by Bonita Montero
Post by Malcolm McLean
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier.  In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor.  I tested
this scheme against four other hash functions, and in
four out of five workloads it was always dead last, by
a noticeable margin.
 >
The lower bits of a pointer are often all zeroes. And mutlipying by
any  integer will not set them. And that is catastrophic for a hash.
If you take a large integer they will be scrambled and if the
prime is large enough there will be a good equal distribution.
If you've got a 64 bit uintptr_t and take the largest prime that
fits into 64 bit - 18446744073709551557 - the equal distribution
should be perfect.
Bonita Montero
2024-05-26 16:04:13 UTC
Permalink
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor. ...
If the prime is large enough there'e no regular distribution.
Tim Rentsch
2024-05-26 16:24:58 UTC
Permalink
Post by Bonita Montero
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor. ...
If the prime is large enough there'e no regular distribution.
Your conjecture is contradicted by empirical facts.
Bonita Montero
2024-05-26 16:36:25 UTC
Permalink
Post by Tim Rentsch
Post by Bonita Montero
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor. ...
If the prime is large enough there'e no regular distribution.
Your conjecture is contradicted by empirical facts.
There are no empirival facts for that since two times the
taken prime is beyond the 64 bit address space.
Tim Rentsch
2024-05-26 17:20:55 UTC
Permalink
Post by Bonita Montero
Post by Tim Rentsch
Post by Bonita Montero
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor. ...
If the prime is large enough there'e no regular distribution.
Your conjecture is contradicted by empirical facts.
There are no empirival facts for that since two times the
taken prime is beyond the 64 bit address space.
You don't listen very well do you?

I say the output quality is poor because I have run tests that
show the poor output quality. I've done that with a prime of my
own choosing and also with 18446744073709551557, the value you
suggested. In both cases the test runs show results that are
clearly worse than all the other hash functions tested, including
bart's and malcolm's. Furthermore the results for your suggested
calculation are worse across the board, on every variety of
dynamic workload in my test suite.

Your proposed hash function is too weak to be taken as a serious
candidate.
Bonita Montero
2024-05-26 17:39:14 UTC
Permalink
Post by Tim Rentsch
I say the output quality is poor because I have run tests that
show the poor output quality.
If you chose a prime whose double is beyond 64 bit there's an
equal distribution among the 64 bit modulos.
Post by Tim Rentsch
I've done that with a prime of my own choosing and also with
18446744073709551557, the value you suggested.
Show me your code.
Bonita Montero
2024-05-26 17:54:17 UTC
Permalink
Post by Bonita Montero
Post by Tim Rentsch
I say the output quality is poor because I have run tests that
show the poor output quality.
If you chose a prime whose double is beyond 64 bit there's an
equal distribution among the 64 bit modulos.
Post by Tim Rentsch
I've done that with a prime of my own choosing and also with
18446744073709551557, the value you suggested.
Show me your code.
This is the bucket's loads of my hash-algorithm with a number which
is a multiple of eight, i.e. uint64_t-aligned data:

#include <iostream>
#include <vector>

using namespace std;

int main()
{
vector<size_t> buckets( (1ull << 30) / sizeof(size_t) );
for( size_t b = buckets.size(); b; )
++buckets[(size_t)(--b * 8u * 0xFFFFFFFFFFFFFFc5u) % buckets.size()];
vector<size_t> loads;
for( size_t f : buckets )
{
if( loads.size() <= f )
loads.resize( f + 1 );
++loads[f];
}
for( size_t l = 0; size_t ld : loads )
cout << ++l << ": " << ld << endl;
}

1: 117440512
2: 0
3: 0
4: 0
5: 0
6: 0
7: 0
8: 0
9: 16777216
Bonita Montero
2024-05-27 06:07:00 UTC
Permalink
This shows the perfect equal distribution if I take a 32 bit integer
and multiply it by the largest 32 bit prime:

#include <iostream>
#include <vector>
#include <random>

using namespace std;

int main()
{
vector<uint8_t> buckets( 1ull << 32 );
for( size_t b = buckets.size(); b; )
if( !++buckets[(uint32_t)--b * 0xFFFFFFFBu % buckets.size()] )
return EXIT_FAILURE;
vector<size_t> loads;
for( uint8_t ld : buckets )
{
if( loads.size() <= ld )
loads.resize( ld + 1 );
++loads[ld];
}
for( size_t l = 0; size_t ld : loads )
cout << l++ << ": " << 100.0 * (ptrdiff_t)ld /
(ptrdiff_t)buckets.size() << "%" << endl;
}


Result:

0: 0%
1: 100%
Ben Bacarisse
2024-05-28 10:07:53 UTC
Permalink
Post by Bonita Montero
This shows the perfect equal distribution if I take a 32 bit integer
There was no need to write this program. It just illustrates a specific
example of a basic theorem where the multiplier and modulus are
co-prime.
Post by Bonita Montero
#include <iostream>
#include <vector>
#include <random>
using namespace std;
int main()
{
vector<uint8_t> buckets( 1ull << 32 );
for( size_t b = buckets.size(); b; )
if( !++buckets[(uint32_t)--b * 0xFFFFFFFBu % buckets.size()] )
return EXIT_FAILURE;
vector<size_t> loads;
for( uint8_t ld : buckets )
{
if( loads.size() <= ld )
loads.resize( ld + 1 );
++loads[ld];
}
for( size_t l = 0; size_t ld : loads )
cout << l++ << ": " << 100.0 * (ptrdiff_t)ld /
(ptrdiff_t)buckets.size() << "%" << endl;
}
0: 0%
1: 100%
--
Ben.
Bonita Montero
2024-05-30 08:10:23 UTC
Permalink
Post by Ben Bacarisse
Post by Bonita Montero
This shows the perfect equal distribution if I take a 32 bit integer
There was no need to write this program. It just illustrates a specific
example of a basic theorem where the multiplier and modulus are
co-prime.
For Tim it was necessary.
Bonita Montero
2024-05-30 09:27:17 UTC
Permalink
I further improved my code to show sth. which hasn't to do anything
with the initial question. If I don't multiply an indexed value but
a random value with the prime I still get a random value. The dis-
tribution of the bucked-depths is then 1 / (e * N!), which is accor-
ding to the coupon collector's problem:

#include <iostream>
#include <vector>
#include <random>

using namespace std;

int main()
{
constexpr bool SIXTEEN = true;
using width_t = conditional_t<SIXTEEN, uint16_t, uint32_t>;
constexpr width_t PRIME = SIXTEEN ? 65521 : 0xFFFFFFFBu;
constexpr size_t N = (size_t)(width_t)-1 + 1;
vector<uint8_t> buckets;
auto distrib = [&]( auto gen )
{
buckets.resize( 0 );
buckets.resize( N );
for( size_t b = buckets.size(); b; )
if( !++buckets[gen( (width_t)--b ) * PRIME & (width_t)-1] )
return;
vector<size_t> loads;
for( uint8_t ld : buckets )
{
if( loads.size() <= ld )
loads.resize( ld + 1 );
++loads[ld];
}
for( size_t l = 0; ptrdiff_t ld : loads )
cout << l++ << ": " << 100.0 * ld / N << "%" << endl;
};
distrib( []( width_t i ) { return i; } );
cout << endl;
mt19937_64 mt;
uniform_int_distribution<uint32_t> uid( 0, -1 );
distrib( [&]( width_t ) { return uid( mt ); } );
}
Tim Rentsch
2024-05-31 02:26:55 UTC
Permalink
Post by Bonita Montero
Post by Ben Bacarisse
Post by Bonita Montero
This shows the perfect equal distribution if I take a 32 bit integer
There was no need to write this program. It just illustrates a specific
example of a basic theorem where the multiplier and modulus are
co-prime.
For Tim it was necessary.
No it was not. I already understood this before my posting
of Sunday, May 26.
Tim Rentsch
2024-05-31 02:27:48 UTC
Permalink
Post by Bonita Montero
Post by Tim Rentsch
I say the output quality is poor because I have run tests that
show the poor output quality.
If you chose a prime whose double is beyond 64 bit there's an
equal distribution among the 64 bit modulos.
Post by Tim Rentsch
I've done that with a prime of my own choosing and also with
18446744073709551557, the value you suggested.
Show me your code.
Oh get real. It's not my job to make up for your
laziness.
Michael S
2024-06-02 07:45:06 UTC
Permalink
On Thu, 30 May 2024 19:27:48 -0700
Post by Tim Rentsch
Post by Bonita Montero
Post by Tim Rentsch
I say the output quality is poor because I have run tests that
show the poor output quality.
If you chose a prime whose double is beyond 64 bit there's an
equal distribution among the 64 bit modulos.
Post by Tim Rentsch
I've done that with a prime of my own choosing and also with
18446744073709551557, the value you suggested.
Show me your code.
Oh get real. It's not my job to make up for your
laziness.
So, what were your conclusions?
Ignoring the speed of computation, would something like cryptographic
hash scaled to bucket size be a best hash for this type of application?
Or some sort of less thorough grinding of the bits is better?
Chris M. Thomasson
2024-06-02 19:42:51 UTC
Permalink
Post by Michael S
On Thu, 30 May 2024 19:27:48 -0700
Post by Tim Rentsch
Post by Bonita Montero
Post by Tim Rentsch
I say the output quality is poor because I have run tests that
show the poor output quality.
If you chose a prime whose double is beyond 64 bit there's an
equal distribution among the 64 bit modulos.
Post by Tim Rentsch
I've done that with a prime of my own choosing and also with
18446744073709551557, the value you suggested.
Show me your code.
Oh get real. It's not my job to make up for your
laziness.
So, what were your conclusions?
Ignoring the speed of computation, would something like cryptographic
hash scaled to bucket size be a best hash for this type of application?
Or some sort of less thorough grinding of the bits is better?
Use this one... lol! I laugh because its not exactly fast... ;^)

http://fractallife247.com/test/hmac_cipher/ver_0_0_0_1?ct_hmac_cipher=eb1e27ed692b3cc98275fd002178baa0cbfc2e520a749d70ba7b4028b83982f4b42fd9d33a0f03e69b37fdbf3b63853b33c822afdf32f9810bdc542690e1bab726db1b9b9787d50432f521aa424296a83c319f2a299492ff72a9b61ae641f164cdbb864e2f1c047c6767a83c7678a0e30d63c935bb7c3326ffb3203179ee08e406851cb8e572c5005d81900ee87281e0743f3e41e64323776330408d76d6eb0cd569144a81cbbfa9b0a3cb94bfea4e25bfd94ec008b30573a2046178b02b9c855ec98a6eaa507d14fece8064
Chris M. Thomasson
2024-06-03 19:35:58 UTC
Permalink
On 6/2/2024 12:42 PM, Chris M. Thomasson wrote:
[...]
Post by Chris M. Thomasson
Use this one... lol! I laugh because its not exactly fast... ;^)
http://fractallife247.com/test/hmac_cipher/ver_0_0_0_1?ct_hmac_cipher=eb1e27ed692b3cc98275fd002178baa0cbfc2e520a749d70ba7b4028b83982f4b42fd9d33a0f03e69b37fdbf3b63853b33c822afdf32f9810bdc542690e1bab726db1b9b9787d50432f521aa424296a83c319f2a299492ff72a9b61ae641f164cdbb864e2f1c047c6767a83c7678a0e30d63c935bb7c3326ffb3203179ee08e406851cb8e572c5005d81900ee87281e0743f3e41e64323776330408d76d6eb0cd569144a81cbbfa9b0a3cb94bfea4e25bfd94ec008b30573a2046178b02b9c855ec98a6eaa507d14fece8064
FWIW, here is what you should get if you click on the link:

Loading Image...
Tim Rentsch
2024-06-02 23:02:05 UTC
Permalink
Post by Michael S
On Thu, 30 May 2024 19:27:48 -0700
Post by Tim Rentsch
Post by Bonita Montero
Post by Tim Rentsch
I say the output quality is poor because I have run tests that
show the poor output quality.
If you chose a prime whose double is beyond 64 bit there's an
equal distribution among the 64 bit modulos.
Post by Tim Rentsch
I've done that with a prime of my own choosing and also with
18446744073709551557, the value you suggested.
Show me your code.
Oh get real. It's not my job to make up for your
laziness.
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for
this type of application? Or some sort of less thorough
grinding of the bits is better?
Factors that matter:
will hashing be done using rehashing or aggregates (lists
or trees, etc) grouped under the first probe location?
(or both?)

can keys be compared for ordering or just for equality?

how expensive is it to test for key equality?
(a) when the keys are equal
(b) when the keys are not equal

32-bit hash values suffice for (a) re-hashing schemes with
tables up to perhaps 2**30 entries, or (b) schemes that
use direct hashing and bucket aggregation with tables up
to 2**32 slots. Anything bigger should use 64-bit hash
values.

hash functions should always produce a full-size hash value
(either 32 bits or 64 bits). Fitting the hash value to
the size of the table is the responsibility of the hash
table management code, not the hash function.

The goal:
An ideal hash function has the property that changing any
single bit of the input key has a 50% chance of flipping
any output bit, with no correlation between output bit
m and output bit n (m != n) for changing a given input
bit, and no correlation between changes to output bit n
for changing input bit i and changing input bit j (i != j)

Evaluating hash functions:
One way to evaluate a hash function is to measure changes
in output bits as a function of changing various inputs
bits, either exhaustively or statistically, looking for an
ideal distribution as described above. Usually doing this
isn't very helpful, because most hash functions don't come
close to the ideal, and it's hard to interpret results that
are not close to the ideal, even though those hashes may be
perfectly fine in practice

So basically we are left with comparing a candidate hash
function, performance-wise, to other hash functions that
are going to have good performance results. Three obvious
choices for those: a slow but high quality hash such as a
cryptographic hash; using a good random number generator
to produce keys where the hash value is equal to the key
value; comparing against theoretical ideals. Looking at
each in turn:

High-quality-but-slow-hash: it isn't hard to make a hash
function that approximates an ideal hash, as long as we
are willing to have it run like a turtle. Or simply use
one of the many known cryptographic hashes and fold the
output down to a reasonably size hash values (most of
these cryptographic hash functions produce large hash
values, 128 bits and up, which isn't a great fit for
ordinary hash table use). In my recent testing I simply
rolled my own high quality hash, which ran about 10 times
slower than the slowest of the other hashes I looked at.

Using a good RNG to produce keys the same as the hash
value: I expect this method to be self-explanatory.

Comparing against theoretical ideals: it isn't hard to
do a calculation that computes a statistical average for
the two kinds of hashing methods. More specifically,

(a) for re-hashing, compute the average number of
probes taken for a hash table with k out of n
slots already filled, and use that to compute
the average number of probes need to fill the
hash table up to, say, 85%

(b) for direct hashing, compute the average number
of slots occupied after inserting some number
of values representing a fixed fraction of the
table size. For example, if we have a table
with 100,000 slots, if we insert 100,000 values
then on the average about 63% of the slots
should be occupied (very close to 1 - 1/e).

Use one or more of the above comparison values (they
should all be very close) to compare against the
performance of the candidate hash function for each
of the possible workloads. My workload set included:

(a) results of malloc() with fixed sized blocks,
with a size going from 1 to some modest value
(50 or 100)

(b) results of malloc() with randomly chosen sizes
based on some distribution

(c) sequential addresses in an array, varying the
element size between 1 and some modest value
(maybe 50)

(d) results of malloc() with a deterministic set
of sizes. I tried varying ones of these, such
as size(i) == i/17 + 1 (filling a table of
65636 entries varying percentages full).

What we would like to see:
for every workload tested, results with 10 or 20%
of the comparison metric used

What we don't want to see (this is important):
any result far away from the comparison metric on any
workload. Importantly this includes results that are
significantly _better_ than the comparison metric.
The reason for this is that doing a lot better on one
test inevitably means doing worse on a different test.
The same is true of random number generators: the
best ones look "averagely random" all the time, never
either "sub random" or "super random".

Of course we always want high quality, but the standards
are higher for (a) direct hashing, and/or (b) cases where
comparing keys is expensive. Note that if comparing keys
is expensive we might choose to store the full hash value
with each table entry, to make "trivial reject" on key
comparison be faster.

Probably much of the above was obvious. I make no apologies for
that. Also it may seem disjointed or unorganized. If so then
sorry about that chief, it's a big topic and there's a lot of
ground to cover, and I tried to hit the most important
highlights.
Michael S
2024-06-03 07:50:05 UTC
Permalink
On Sun, 02 Jun 2024 16:02:05 -0700
Post by Tim Rentsch
Post by Michael S
On Thu, 30 May 2024 19:27:48 -0700
Post by Tim Rentsch
Post by Bonita Montero
Post by Tim Rentsch
I say the output quality is poor because I have run tests that
show the poor output quality.
If you chose a prime whose double is beyond 64 bit there's an
equal distribution among the 64 bit modulos.
Post by Tim Rentsch
I've done that with a prime of my own choosing and also with
18446744073709551557, the value you suggested.
Show me your code.
Oh get real. It's not my job to make up for your
laziness.
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for
this type of application? Or some sort of less thorough
grinding of the bits is better?
<snip>
Post by Tim Rentsch
Probably much of the above was obvious. I make no apologies for
that. Also it may seem disjointed or unorganized. If so then
sorry about that chief, it's a big topic and there's a lot of
ground to cover, and I tried to hit the most important
highlights.
It sounds like you *postulate* that crypto is an ideal.
I am less in axioms and more interested in your experimental findings.
Tim Rentsch
2024-06-04 01:02:21 UTC
Permalink
Post by Michael S
On Sun, 02 Jun 2024 16:02:05 -0700
Post by Tim Rentsch
Post by Michael S
On Thu, 30 May 2024 19:27:48 -0700
[... concerning quality of hash functions ...]
Post by Michael S
Post by Tim Rentsch
Post by Michael S
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for
this type of application? Or some sort of less thorough
grinding of the bits is better?
<snip>
Post by Tim Rentsch
Probably much of the above was obvious. I make no apologies for
that. Also it may seem disjointed or unorganized. If so then
sorry about that chief, it's a big topic and there's a lot of
ground to cover, and I tried to hit the most important
highlights.
It sounds like you *postulate* that crypto is an ideal.
I think it's more accurate to say that I _expect_ an established
crypto-quality hash function such as MD5 to be good approximation
to an ideal hash function. I expect that because (a) the design
criteria for such hash functions resembles or includes the design
criteria for a near-ideal hash function, (b) the people who have
done these are smart people with (usually) a fair amount of
experience doing such things, and (c) the established ones have
gone through a lot of use and testing by the user community, who
likely would have found any serious weaknesses.
Post by Michael S
I am less in axioms and more interested in your experimental
findings.
I'm not sure what you're looking for here. The testing I did for
the various recently posted hash functions was simply running a
few ad hoc test scenarios and comparing the results against each
other and also against the results of hash function of my own
devising likely to produce high quality results. One piece of
empirical evidence: in all of the tests I ran, that hash function
never gave results that varied by more than 10% or so, not matter
what test was done. By contrast, most or all of the other hash
functions sometimes gave results that varied by a factor of 4 or
more, depending on what scenario was being tested. Let me say
again that the test scenarios were ad hoc, and I was not being
systematic but just eyeballing the numbers.

I have a fair amount of experience (at least hundreds of hours)
experimenting with various random number generators, including
several of my own devising. The experience includes systematic
testing using extensive test suites (done by people who really
know what they are doing, which in this arena does not include
myself). Random number generators and hash functions have a lot
of principles in common. Suppose we have a hash function that
takes a 64-bit key and produces a 32-bit hash value. If we use
that hash function with simply successive 64-bit integer values,
and the outputs pass the statistical tests of a good random
number generator test suite, you can be sure that hash function
has very high quality. A key observation about random number
generators is this: if the output values are N bits, generally
the amount of internal state should be at least 2N bits. The
same thing is true of hash functions. If a hash function
produces, say, 32-bit values, internally there should be at least
64 bits of state that is mushed together in various ways, finally
being combined in the final step to produce the 32-bit result.
Another design principle is to replicate information. For
example, if we are hashing a character string to produce a 32-bit
hash value, we might combine each character in the string with
two different 32-bit "accumulators", perhaps adding into one and
xoring into the other, and rotating them by different amounts
after each character scanned. Composing code on the fly (not
compiled, and certainly not tested):

unsigned
string_hash( const char *s ){
unsigned a = 12345, b = 0xabcdef;
char c;
while( c = *s++ ){
a += c, b ^= c;
a = a << 23 | a >> 32-23;
b = b << 10 | b >> 32-10;
a -= b; // why not? ;)
}
return a ^ b;
}

This code has a fair chance of producing acceptable hash values
for strings that have at least three characters. (Let me say
again that I have done no testing at all.)

So I hope this posting has some of what you're looking for. And
if it doesn't, well, then I am as much in the dark as ever. :)
Michael S
2024-06-04 08:38:39 UTC
Permalink
On Mon, 03 Jun 2024 18:02:21 -0700
Post by Tim Rentsch
Post by Michael S
I am less in axioms and more interested in your experimental
findings.
I'm not sure what you're looking for here.
I'd give an example.
You said that some of the variants had 4x differences between cases.
From my perspective, if you found a hash function that performs up to 3
times better* than "crypto-alike" hash in majority of tests and is 1.33x
worse that "crypto-alike" in few other tests, it's something that I'd
consider as valuable option.

* - i.e. produces 3x less collisions at, say, occupation ratio of 0.7
Tim Rentsch
2024-06-05 05:10:41 UTC
Permalink
Post by Michael S
On Mon, 03 Jun 2024 18:02:21 -0700
Post by Tim Rentsch
Post by Michael S
I am less in axioms and more interested in your experimental
findings.
I'm not sure what you're looking for here.
I'd give an example.
You said that some of the variants had 4x differences between cases.
From my perspective, if you found a hash function that performs up to 3
times better* than "crypto-alike" hash in majority of tests and is 1.33x
worse that "crypto-alike" in few other tests, it's something that I'd
consider as valuable option.
* - i.e. produces 3x less collisions at, say, occupation ratio of 0.7
Okay. For the sake of discussion let me call a "crypto-alike" hash
an "average hash" (meaning in some sense statistically average, or
never very far from random hash values).

Unfortunately the example you describe is something that won't
happen and can't happen. I say it won't happen because for all
the hash functions I've looked at, I've never seen one that is
"better than average" most of the time and perhaps a little bit
"worse than average" only occasionally. The reverse situation
pops up fairly often: a hash function that is a little bit
"better than average" in a small number of cases, and "no
better than average or worse than average (sometimes by quite
a lot)" in most cases.

For the second part, I say it can't happen because there isn't
enough headroom for the amount of performance improvement you
mention. For an occupancy rate of 0.7, an average hash function
using a rehashing approach uses only 1.7 probes per insertion
(with a minimum of 1) to fill the table. There is no way to
get a dramatic performance improvement. Even with an 85% load
factor, an average hash function takes just a little over 2
probes (roughly 2.2 probes) per value inserted.

Going the other direction, I've seen examples of hash functions
that in some circumstances are _worse_ than average by a factor of
10 or more. The bad examples just come up - I don't especially go
looking for them. The small possible upside gain is basically
never worth the much larger potential downside risk.
Bonita Montero
2024-06-03 14:34:37 UTC
Permalink
Post by Michael S
So, what were your conclusions?
Ignoring the speed of computation, would something like cryptographic
hash scaled to bucket size be a best hash for this type of application?
Or some sort of less thorough grinding of the bits is better?
There's no need for a crypto-hash here.
Michael S
2024-06-03 14:46:04 UTC
Permalink
On Mon, 3 Jun 2024 16:34:37 +0200
Post by Bonita Montero
Post by Michael S
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for this
type of application? Or some sort of less thorough grinding of the
bits is better?
There's no need for a crypto-hash here.
Do you think I don't know?
Crypto hash is just an example of near-ideal pseudo-random uniformity.
Bonita Montero
2024-06-03 15:54:30 UTC
Permalink
Post by Michael S
On Mon, 3 Jun 2024 16:34:37 +0200
Post by Bonita Montero
Post by Michael S
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for this
type of application? Or some sort of less thorough grinding of the
bits is better?
There's no need for a crypto-hash here.
Do you think I don't know?
Crypto hash is just an example of near-ideal pseudo-random uniformity.
As I've shown for pointers you get a perfect equal distribution with
multiplying by an appropriate prime.
bart
2024-06-03 16:24:36 UTC
Permalink
Post by Bonita Montero
Post by Michael S
On Mon, 3 Jun 2024 16:34:37 +0200
Post by Bonita Montero
Post by Michael S
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for this
type of application? Or some sort of less thorough grinding of the
bits is better?
There's no need for a crypto-hash here.
Do you think I don't know?
Crypto hash is just an example of near-ideal pseudo-random uniformity.
As I've shown for pointers you get a perfect equal distribution with
multiplying by an appropriate prime.
A pointer with 8-byte or 16-byte alignment will have the bottom 3-4 bits
zero.

No matter what number you multiply them by, prime or otherwise, those
3-4 bits will always be zero.

If you mask the result to fit a table of size power-of-two, then the
resulting index will can only ever refer to every 8th or every 16th
slot; there will 8-16x as many clashes as there ought to be.

So some extra work is needed to get around that, for example
right-shifting before masking as some here have done, something you have
never mentioned.
Michael S
2024-06-03 17:16:46 UTC
Permalink
On Mon, 3 Jun 2024 17:24:36 +0100
Post by bart
Post by Bonita Montero
Post by Michael S
On Mon, 3 Jun 2024 16:34:37 +0200
Post by Bonita Montero
Post by Michael S
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for this
type of application? Or some sort of less thorough grinding of
the bits is better?
There's no need for a crypto-hash here.
Do you think I don't know?
Crypto hash is just an example of near-ideal pseudo-random
uniformity.
As I've shown for pointers you get a perfect equal distribution with
multiplying by an appropriate prime.
A pointer with 8-byte or 16-byte alignment will have the bottom 3-4
bits zero.
No matter what number you multiply them by, prime or otherwise, those
3-4 bits will always be zero.
If you mask the result to fit a table of size power-of-two, then the
resulting index will can only ever refer to every 8th or every 16th
slot; there will 8-16x as many clashes as there ought to be.
So some extra work is needed to get around that, for example
right-shifting before masking as some here have done, something you
have never mentioned.
According to my understanding, Bonita and Tim are discussing hash
generator which output is not used as is. They assume that index of
the slot will be calculated as (Hash(key)*bucket_size)/(Hash_max+1).
For big enough Hash_max (Bonita suggests 2**63-1), poor quality of
few LS bits of Hash(key) does not matter.
bart
2024-06-03 18:48:29 UTC
Permalink
Post by Michael S
On Mon, 3 Jun 2024 17:24:36 +0100
Post by bart
Post by Bonita Montero
Post by Michael S
On Mon, 3 Jun 2024 16:34:37 +0200
Post by Bonita Montero
Post by Michael S
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for this
type of application? Or some sort of less thorough grinding of
the bits is better?
There's no need for a crypto-hash here.
Do you think I don't know?
Crypto hash is just an example of near-ideal pseudo-random
uniformity.
As I've shown for pointers you get a perfect equal distribution with
multiplying by an appropriate prime.
A pointer with 8-byte or 16-byte alignment will have the bottom 3-4
bits zero.
No matter what number you multiply them by, prime or otherwise, those
3-4 bits will always be zero.
If you mask the result to fit a table of size power-of-two, then the
resulting index will can only ever refer to every 8th or every 16th
slot; there will 8-16x as many clashes as there ought to be.
So some extra work is needed to get around that, for example
right-shifting before masking as some here have done, something you
have never mentioned.
According to my understanding, Bonita and Tim are discussing hash
generator which output is not used as is.
For big enough Hash_max (Bonita suggests 2**63-1), poor quality of
few LS bits of Hash(key) does not matter.
Tim might, I'm not sure sure about BM.
Post by Michael S
The lower bits of a pointer are often all zeroes. And mutlipying by
any integer will not set them. And that is catastrophic for a hash.
BM:
If you take a large integer they will be scrambled and if the
prime is large enough there will be a good equal distribution.

I've reiterated MM's point, which BM just ignored.
Post by Michael S
They assume that index of
the slot will be calculated as (Hash(key)*bucket_size)/(Hash_max+1).
I don't remember seeing that anywhere. But, bucket_size is the size of
the hash-table?

And Hash_max+1 is likely to be a power of two?

I prefer my hash values to be evaluated independently of table size.
They usually don't have a meaningful maximum value, other than the 64
bit of their type, and I'd rather they didn't have sequences of zero
bits especially at the bottom end.

If hash_max was 2**63-1, say, then dividing by hash_max+1 would probably
give you zero anyway, especially if you first multiplied by a bucket
size that was a power of two, as all the interesting bits would
disappear past the top bit!
Michael S
2024-06-03 19:41:00 UTC
Permalink
On Mon, 3 Jun 2024 19:48:29 +0100
Post by bart
I don't remember seeing that anywhere. But, bucket_size is the size
of the hash-table?
Yes
Post by bart
And Hash_max+1 is likely to be a power of two?
Yes. At very least 2**32.
Post by bart
I prefer my hash values to be evaluated independently of table size.
They usually don't have a meaningful maximum value, other than the 64
bit of their type, and I'd rather they didn't have sequences of zero
bits especially at the bottom end.
Bottom is in the eye of beholder :-)
Post by bart
If hash_max was 2**63-1,
63 was my mistake. He suggests 2**64-1. It seems.
Post by bart
say, then dividing by hash_max+1 would
probably give you zero anyway, especially if you first multiplied by
a bucket size that was a power of two, as all the interesting bits
would disappear past the top bit!
Huh?
If bucket size is 2**n then
index = Hash(key)*2**n/2**64 == Hash(key) >> (64-n);

May be, you are thinking about C language arithmetic? That not what I
had in mind. In post above I used equations in their integer math
meaning rather than integer C arithmetic meaning.
Michael S
2024-06-03 19:51:27 UTC
Permalink
On Mon, 3 Jun 2024 19:48:29 +0100
Post by bart
Tim might, I'm not sure sure about BM.
You are probably right about Bonita. His/her last post(s) make no sense.
Tim Rentsch
2024-06-03 23:51:44 UTC
Permalink
[...]
Post by Michael S
For big enough Hash_max (Bonita suggests 2**63-1), poor quality
of few LS bits of Hash(key) does not matter.
Tim might, [...]
I really don't know where you could have gotten that idea.
Tim Rentsch
2024-06-04 00:01:26 UTC
Permalink
Post by Michael S
On Mon, 3 Jun 2024 17:24:36 +0100
Post by bart
Post by Bonita Montero
Post by Michael S
On Mon, 3 Jun 2024 16:34:37 +0200
Post by Bonita Montero
Post by Michael S
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for this
type of application? Or some sort of less thorough grinding of
the bits is better?
There's no need for a crypto-hash here.
Do you think I don't know?
Crypto hash is just an example of near-ideal pseudo-random
uniformity.
As I've shown for pointers you get a perfect equal distribution with
multiplying by an appropriate prime.
A pointer with 8-byte or 16-byte alignment will have the bottom 3-4
bits zero.
No matter what number you multiply them by, prime or otherwise, those
3-4 bits will always be zero.
If you mask the result to fit a table of size power-of-two, then the
resulting index will can only ever refer to every 8th or every 16th
slot; there will 8-16x as many clashes as there ought to be.
So some extra work is needed to get around that, for example
right-shifting before masking as some here have done, something you
have never mentioned.
According to my understanding, Bonita and Tim are discussing hash
generator which output is not used as is. They assume that index of
the slot will be calculated as (Hash(key)*bucket_size)/(Hash_max+1).
To clarify, I have been talking about hash functions that deliver
a "full sized" hash value, either 32 or 64 bits, without regard
to how hash table management code might make use of that value.
I think that's sort of what you said, but it seemed reasonable
to offer a clarification.

As a point of information, I've decided to adopt a policy of
not reading any further postings from Bonita Montero, who
seems to have nothing useful or interesting to say.
Bonita Montero
2024-06-03 18:25:43 UTC
Permalink
Post by bart
A pointer with 8-byte or 16-byte alignment will have the bottom 3-4 bits
zero.
No matter what number you multiply them by, prime or otherwise, those
3-4 bits will always be zero.
Since the modulo is coprime with the prime being multiplied with this
isn't true.
Michael S
2024-06-03 16:50:22 UTC
Permalink
On Mon, 3 Jun 2024 17:54:30 +0200
Post by Bonita Montero
Post by Michael S
On Mon, 3 Jun 2024 16:34:37 +0200
Post by Bonita Montero
Post by Michael S
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for this
type of application? Or some sort of less thorough grinding of the
bits is better?
There's no need for a crypto-hash here.
Do you think I don't know?
Crypto hash is just an example of near-ideal pseudo-random
uniformity.
As I've shown for pointers you get a perfect equal distribution with
multiplying by an appropriate prime.
What you had shown is, if we borrow terminology from characterization
of analog-to-digital converters, integral uniformity for ramp input.
That is not sufficient. Good hash need both integral and differential
uniformity.
According to Tim Rentsch, your method does not have the later.

What do you do exactly? Multiply by big prime modulo 2**64?
By intuition, without testing, I'd say that it does not sound as enough.
By intuition, without testing, doing it twice with two different primes,
then adding (or xoring) results together and doing multiplication again
with third prime sounds like enough.
By intuition, without testing, I'd rather use composite multiplication
factors instead of primes, choosing each factor as product of 3-4
non-small primes with all 9-12 primes involved being different.

But there should be simpler procedures that are just as good.
Bonita Montero
2024-06-03 18:31:41 UTC
Permalink
Post by Michael S
What you had shown is, if we borrow terminology from characterization
of analog-to-digital converters, integral uniformity for ramp input.
That is not sufficient. Good hash need both integral and differential
uniformity.
The modulo and the prime being multilied with are coprime, so the
equal distribution is perfect. You won't get any regular distribution
on the values with regular input.
Post by Michael S
According to Tim Rentsch, your method does not have the later.
I falsified him and he stopped responding.
Michael S
2024-06-04 21:59:16 UTC
Permalink
On Sun, 26 May 2024 10:20:55 -0700
Post by Tim Rentsch
Post by Bonita Montero
Post by Tim Rentsch
Post by Bonita Montero
Post by Bonita Montero
Post by Tim Rentsch
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier. In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
A (size_t)pointer * LARGE_PRIME can be sufficient,
ignoring the overflow.
Plenty fast but the output quality is poor. ...
If the prime is large enough there'e no regular distribution.
Your conjecture is contradicted by empirical facts.
There are no empirival facts for that since two times the
taken prime is beyond the 64 bit address space.
You don't listen very well do you?
I say the output quality is poor because I have run tests that
show the poor output quality. I've done that with a prime of my
own choosing and also with 18446744073709551557, the value you
suggested. In both cases the test runs show results that are
clearly worse than all the other hash functions tested, including
bart's and malcolm's. Furthermore the results for your suggested
calculation are worse across the board, on every variety of
dynamic workload in my test suite.
Your proposed hash function is too weak to be taken as a serious
candidate.
18446744073709551557 is indeed very bad (too close to 2**64).
But I tried other prime and at least in my simple tests it performs
rather well.
May be my tests are to simplistic, I don't pretend to understand much
about it.

// hash_aes4.c
// Reference. Hopefully behaves similarly to crypto hash
#include <stdint.h>
#include <string.h>
#include <x86intrin.h>

uint64_t hash_ref(void* key)
{
uint64_t ukey = (uint64_t)key;
__m128i xkey = _mm_set_epi64x(ukey, 42);
static const uint64_t bar[8] = {
0xBB09BBCC90B24BF2, 0x825C622FF2792A01,
0x94F0535CB06D4060, 0x939C756246DBFD1D,
0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06,
0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C,
};
for (int i = 0; i < 4; ++i) {
__m128i rkey;
memcpy(&rkey, &bar[i*2], sizeof(rkey));
xkey = _mm_aesenc_si128(xkey, rkey);
}
memcpy(&ukey, &xkey, sizeof(ukey));
return ukey;
}

// hash_bonita.c
// Bonita's with different prime
#include <stdint.h>

uint64_t hash_foo(void* key)
{
uint64_t ukey = (uint64_t)key;
static const uint64_t LARGE_PRIME = 0xbb09bbcc90b24bcdull;
return ukey * LARGE_PRIME;
}


// ptr_hash.c
// test bench
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

uint64_t hash_foo(void*);
uint64_t hash_ref(void*);

static void test(
void** pointers, size_t n_pointers,
size_t* table, size_t tab_sz,
uint64_t (*hash_function)(void*));

static const char UsageStr[] =
"ptr_hash - examine performance of pointer hash function\n"
"Usage\n"
"ptr_hash n m sz1 [sz2]\n"
"where\n"
" n - the size of hash table\n"
" m - # of pointers to put in the table\n"
" sz1 - minimal size of allocated blocks\n"
" sz2 - [optional] maximal size of allocated blocks. Default = sz1\n"
;
int main(int argz, char** argv)
{
if (argz < 4) {
fprintf(stderr, "%s", UsageStr);
return 1;
}

size_t params[4];
for (int arg_i = 1; arg_i < 5 && arg_i < argz; ++arg_i) {
long val = strtol(argv[arg_i], NULL, 0);
if (val <= 0) {
fprintf(stderr
,"Bad parameter %s. Please specify positive number\n"
,argv[arg_i]);
return 1;
}
params[arg_i-1] = val;
}
size_t n = params[0];
size_t m = params[1];
size_t sz1 = params[2];
size_t sz2 = params[3];
if (argz == 4)
sz2 = sz1;
if (sz1 > sz2) {
size_t tmp = sz1; sz1 = sz2; sz2 = tmp;
}

int ret = 1;
void **pointers = malloc(m * sizeof(*pointers));
if (pointers) {
size_t* table = malloc(n * sizeof(*table));
if (table) {
ret = 0;
srand(1);
const unsigned __int128 RAND_SCALE = (unsigned __int128)RAND_MAX
+ 1;
for (size_t i = 0; i < m; ++i) {
size_t r = rand();
size_t sz = (unsigned __int128)(sz2-sz1+1)*r / RAND_SCALE + sz1;
void* block = malloc(sz);
if (!block) {
ret = 1;
perror("malloc()");
for (size_t k = 0; k < i; ++k)
free(pointers[k]);
break;
}
pointers[i] = block;
}
if (ret == 0) {
for (size_t i = 0; i < m; ++i)
free(pointers[i]); // we don't need allocated blocks for a
test
printf("ref: ");
test(pointers, m, table, n, hash_ref);
printf("uut: ");
test(pointers, m, table, n, hash_foo);
}
free(table);
} else {
perror("malloc()");
}
free(pointers);
} else {
perror("malloc()");
}

return ret;
}

static void test(
void** pointers, size_t n_pointers,
size_t* table, size_t tab_sz,
uint64_t (*hash_function)(void*))
{
for (size_t i = 0; i < tab_sz; ++i)
table[i] = 0;
// simulate hash operation
for (size_t i = 0; i < n_pointers; ++i) {
uint64_t h = hash_function(pointers[i]);
table[(size_t)(((unsigned __int128)tab_sz * h)>>64)] += 1;
}
// statistics
size_t occupied = 0;
size_t collisions = 0;
size_t worst = 0;
for (size_t i = 0; i < tab_sz; ++i) {
size_t c = table[i];
occupied += c != 0;
collisions += c > 1;
if (worst < c)
worst = c;
}
printf(
"%zu keys, %zu slots, %zu occupied %zu collisions. Worst collision:
%zu\n"
,n_pointers
, tab_sz
, occupied
, collisions
, worst
);
}


Compiled as:
gcc -s -Wall -O2 -march=native ptr_hash.c hash_bonita.c hash_aes4.c


Run as:
./a 100000 75000 80 1000

Result:
ref: 75000 keys, 100000 slots, 52799 occupied 17431 collisions. Worst
collision: 7
uut: 75000 keys, 100000 slots, 53201 occupied 17139 collisions. Worst
collision: 6
Bonita Montero
2024-06-05 09:10:24 UTC
Permalink
Post by Michael S
18446744073709551557 is indeed very bad (too close to 2**64).
It doesn't matter how close the prime is to 2 ^ 64 since the
results are chopped.
Michael S
2024-06-05 09:34:50 UTC
Permalink
On Wed, 5 Jun 2024 11:10:24 +0200
Post by Bonita Montero
Post by Michael S
18446744073709551557 is indeed very bad (too close to 2**64).
It doesn't matter how close the prime is to 2 ^ 64 since the
results are chopped.
I have no idea what you are talking about.
18446744073709551557 == -59.
That's a very small number. With factor like that, for typical case of
pointers that are <= 2**40 and
for table size in 1000s or 10,000s, all keys (==pointers) would be
pushed into the same (uppermost) slot of the table with no hashing at
all.
You can run my test bench and see it yourself.

On the other hand, prime in range (2**63..2**64) that does not have too
many bits set or cleared works very well. But majority of non-primes
with the same properties would work very well as well.
Bonita Montero
2024-06-05 10:05:07 UTC
Permalink
Post by Michael S
I have no idea what you are talking about.
18446744073709551557 == -59.
That's a very small number.
You should take the value not signed. Addition and subtraction work
the same signed and unsigned, but not multiplication and division.
As I've shown you get a perfekt equal distribution if the input
also has a equal distribution.
Michael S
2024-06-05 10:11:24 UTC
Permalink
On Wed, 5 Jun 2024 12:05:07 +0200
Post by Bonita Montero
Post by Michael S
I have no idea what you are talking about.
18446744073709551557 == -59.
That's a very small number.
You should take the value not signed. Addition and subtraction work
the same signed and unsigned, but not multiplication and division.
As I've shown you get a perfekt equal distribution if the input
also has a equal distribution.
You had not shown shite.
If you can not understand simple things on paper then run my test
bench.
Tim Rentsch
2024-06-05 15:58:46 UTC
Permalink
Post by Michael S
On Sun, 26 May 2024 10:20:55 -0700
[..comments on a hash function that simply multiplies the key
by a large prime (18446744073709551557) to give the hash
value, and noting that tests have shown poor performance..]
Post by Michael S
18446744073709551557 is indeed very bad (too close to 2**64).
But I tried other prime and at least in my simple tests it performs
rather well.
May be my tests are to simplistic, I don't pretend to understand much
about it.
[lots of C code]
gcc -s -Wall -O2 -march=native ptr_hash.c hash_bonita.c hash_aes4.c
I couldn't get the AES-based hash function to compile. There
was a complaint about not being able to inline an 'always-inline'
function. I played around with it for a while but didn't get
anywhere.

I did get your own hash function out and put it into my little
test rig. There may be summary comments below.
Post by Michael S
./a 100000 75000 80 1000
ref: 75000 keys, 100000 slots, 52799 occupied 17431 collisions. [...]
uut: 75000 keys, 100000 slots, 53201 occupied 17139 collisions. [...]
Both of these occupancy rates are close to the expected theoretical
average, which I calculated to be 52764. If I had been more
ambitious I might have done some statistical sampling to get some
idea of the standard deviation, but I wasn't that energetic
today. :)

If we're trying to write a general hash function, it should work
well across a wide range of input key sets, and also for different
kinds of hash table management. Let's look at the output side
first. There are two independent axes for hash tables: table
size, and direct mapping versus using rehashing. For table size,
the two prominent choices are some prime in the appropriate range,
or else a power of two. Clearly any size could be chosen, but
primes have the nice property that they divide up the space of hash
values very evenly, whereas powers of two allow a cheaper way of
reducing a hash value to the range of table indices (anding versus a
modulo operation). A good hash function should work well with both.
The choice of direct mapping versus rehashing depends on expected
occupancy rates (which I am used to calling "load factor"). For a
load factor of 75 to 80 percent or less, generally rehashing is
better. Direct mapping has the advantage that it can handle load
factors above 100%, at a cost of dealing with whatever subsidiary
data structure (usually linked lists) is used when two or more keys
map to the same index in the hash table. (I expect you know all
that, I wrote it mostly just to clarify my own thinking.)

For the recent testing I did, I chose a power of two (65536) for the
table size, and rehashing rather than direct mapping. Both of these
choices make it easier to evaluate how well hash functions perform.
Having a power-of-two table size reveals weaknesses in the hash
function more readily than taking the hash value modulo a prime;
using rehashing allows an easy computation of how many probes on
average are needed to do an insertion, which is an easier metric
to use for comparisons than occupancy rates (which typically need
some statistical calculations to appreciate the meaning of different
resulting occupancy rates).

Let me emphasize that I don't mean to advocate any of the different
hash table models over the others. The key point is that a good
hash function should work with all of them.

On the input side, it's important to have a wide range of input
key sets, with different sorts of distributions. The input key
sets might include: linear addresses out of a single array, so
a+i*m with i in the range [0..k], for different values of m (to
be thorough all m in the range [1..64], all multiples of 2 from
64 to 128, all multiples of 4 from 128 to 256, all multiples of 8
from 256 to 512, and so forth up to all multiples of 64 up from
2048 to 4096); results of malloc() using fixed sizes (all of the
m values from the last example); results of malloc() using a
range of sizes, for several kinds of random distributions);
results of malloc() using linearly increasing sizes; and other
kinds of variations. In my testing I used several fixed sizes to
malloc() all less than 60; rand()%997 + 43 for malloc() sizes;
malloc( i/7 + 1 ) for each key value sub i; malloc( i/17 + 1 )
for each key value sub i; linear addressing in an array, with
various multipliers (ad hoc rather than systematic choices);
quadratic addressing in an array (which means array + i*i*m, for
index i and multiplier choice m) -- here again the choices for
m were ad hoc rather than system.

An interesting result under the "in array" choices is that they
sometimes would change from run to run, presumably as the array was
put at different starting positions. Sometimes the differences were
surprisingly large.

Like I said before, a good hash function should produce average
or near-average values for all of these scenarios. It might be
nice to see some above-average results under some workloads, but
they don't make up for well below-average results under other
workloads. The situation is pretty much the opposite of what
happens with quicksort: the worst case behavior of quicksort is
quadratic, but we know that in practice that almost never
happens, so quicksort is used because its typical or average
performance is better than that of other choices. With hashing,
better-than-average performance is rare, and doesn't give much
benefit, but worse-than-average performance is more likely, and
can be disastrous. Obviously there can be other factors that
might motivate other choices in some situations (eg, perfect
hashing), but for a general-use hash function the first design
criterion should be close-to-average behavior under all scenarios
tested.
Michael S
2024-06-05 16:51:59 UTC
Permalink
On Wed, 05 Jun 2024 08:58:46 -0700
Post by Tim Rentsch
I couldn't get the AES-based hash function to compile. There
was a complaint about not being able to inline an 'always-inline'
function. I played around with it for a while but didn't get
anywhere.
Not that using my reference hash is particularly important (I needed it
as a quick way to get something I could rely on; you already have
something else with similar properties) but it's still worth
investigation.

There are two possibilities:
1. gcc compiler's -march=native logic incorrectly detects CPU type on
your computer.
2. you CPU really does not support AES instructions.

If the former, you can force compiler's hand with -maes flag.
If the later, then you can compile it, but wouldn't be able to run my
"reference" hash function on this particular computer

What CPU are you using?
Tim Rentsch
2024-06-06 04:24:40 UTC
Permalink
Post by Michael S
On Wed, 05 Jun 2024 08:58:46 -0700
Post by Tim Rentsch
I couldn't get the AES-based hash function to compile. There
was a complaint about not being able to inline an 'always-inline'
function. I played around with it for a while but didn't get
anywhere.
Not that using my reference hash is particularly important (I needed it
as a quick way to get something I could rely on; you already have
something else with similar properties) but it's still worth
investigation.
1. gcc compiler's -march=native logic incorrectly detects CPU type on
your computer.
2. you CPU really does not support AES instructions.
If the former, you can force compiler's hand with -maes flag.
If the later, then you can compile it, but wouldn't be able to run my
"reference" hash function on this particular computer
I finally got it to work. I tried -maes before, and it still
failed, but maybe I was using some other option that messed
things up. In any case it works now, and the AES hash gives
great results AFAICT.
Michael S
2024-06-05 16:59:05 UTC
Permalink
On Wed, 05 Jun 2024 08:58:46 -0700
Post by Tim Rentsch
I did get your own hash function out and put it into my little
test rig. There may be summary comments below.
<snip>

As you probably already paid attention, I like bottom lines.
What is a bottom line here?
Did you you encounter cases in which almost-bonita's-but-not-quite
hash function performed badly?
Tim Rentsch
2024-06-06 04:40:27 UTC
Permalink
Post by Michael S
On Wed, 05 Jun 2024 08:58:46 -0700
Post by Tim Rentsch
I did get your own hash function out and put it into my little
test rig. There may be summary comments below.
<snip>
As you probably already paid attention, I like bottom lines.
What is a bottom line here?
The bottom line is both the multiply-by-a-large-prime hashes
should be avoided.
Post by Michael S
Did you you encounter cases in which almost-bonita's-but-not-quite
hash function performed badly?
Yes. A clear example is using directly mapped hash table that
has a power of two elements, with inputs all from malloc( 48 )
calls. All keys map to one of only 1024 entries, in a hash
table that has 65536 slots.
Michael S
2024-06-06 08:00:09 UTC
Permalink
On Wed, 05 Jun 2024 21:40:27 -0700
Post by Tim Rentsch
Post by Michael S
On Wed, 05 Jun 2024 08:58:46 -0700
Post by Tim Rentsch
I did get your own hash function out and put it into my little
test rig. There may be summary comments below.
<snip>
As you probably already paid attention, I like bottom lines.
What is a bottom line here?
The bottom line is both the multiply-by-a-large-prime hashes
should be avoided.
I am not convinced.
So far I had seen no disadvantages for particular application mentioned
by Malcolm.
Of course, it is no good for Malcolm, because it can't be coded in what
Malcolm erroneously call "ANSI C". It is not easy to be coded even in
ISO C11. But in my own practice it does not matter. I happily use
language extensions, like gnu __int128 or Microsoft's __umulh() when
they are the best tool for a job.
Post by Tim Rentsch
Post by Michael S
Did you you encounter cases in which almost-bonita's-but-not-quite
hash function performed badly?
Yes. A clear example is using directly mapped hash table that
has a power of two elements, with inputs all from malloc( 48 )
calls. All keys map to one of only 1024 entries, in a hash
table that has 65536 slots.
I don't see it.
$ ./a 0x10000 0xC000 48
ref: 49152 keys, 65536 slots, 34590 occupied 11295 collisions. Worst
collision: 7
uut: 49152 keys, 65536 slots, 35791 occupied 11872 collisions. Worst
collision: 3

It sounds like in your tests you don't use 'linear scaling to table
size' for translation of hash value to table index. IMHO, you should.
May be, for some particular hash functions other methods of translation
also work, but I see no reasons to use other methods. This one is quite
fast, robust and looks to me like the most logical. For power-of-2 sized
tables it degenerates into simple right shift.
Michael S
2024-06-06 10:35:34 UTC
Permalink
On Wed, 05 Jun 2024 08:58:46 -0700
Post by Tim Rentsch
If we're trying to write a general hash function, it should work
well across a wide range of input key sets, and also for different
kinds of hash table management. Let's look at the output side
first. There are two independent axes for hash tables: table
size, and direct mapping versus using rehashing. For table size,
the two prominent choices are some prime in the appropriate range,
or else a power of two. Clearly any size could be chosen, but
primes have the nice property that they divide up the space of hash
values very evenly, whereas powers of two allow a cheaper way of
reducing a hash value to the range of table indices (anding versus a
modulo operation). A good hash function should work well with both.
The choice of direct mapping versus rehashing depends on expected
occupancy rates (which I am used to calling "load factor"). For a
load factor of 75 to 80 percent or less, generally rehashing is
better. Direct mapping has the advantage that it can handle load
factors above 100%, at a cost of dealing with whatever subsidiary
data structure (usually linked lists) is used when two or more keys
map to the same index in the hash table. (I expect you know all
that, I wrote it mostly just to clarify my own thinking.)
I never had interest in implementation of hash methods, typically just
took what language or library provides and used it.
All my knowledge of internals are 50 y.o. news (TAoCP, and no I did not
read it 50 years ago, but I did read it first time slightly less than
40 years ago and not sure if I re-read this particular topic since
then).
So, I am not sure that I understood everything above.
In particular, w.r.t. rehashing, IIRC, Knuth came to conclusion that
simple circular increment of the index is superior to more elaborate
methods. I don't know if conclusion was changed in the following years,
not even sure that I recollect it correctly.

Despite (or due to?) my relative ignorance of the topic, I'd dare to
disagree with couple of your points:
1. I don't think that hash function should be evaluated in isolation
from the rest of algorithm. IMHO, they should be co-developed. There
is nothing wrong with hash function that works well with one particular
"back end" and poorly with all others as long as limitations are
clearly stated.
2. I don't like "modulo" method of reduction of hash value to range. I
don't like it for any chosen size of the bucket, not just for power of
two, but even for prime. Of course, for power of two size my dislike to
it is deeper.
Bonita Montero
2024-06-07 18:53:49 UTC
Permalink
I tested the AES hash function with a C++20 program. The hash-function
is also re-written in C++-style:

#include <iostream>
#include <utility>
#include <vector>
#include <cstdint>
#include <algorithm>
#include <execution>
#include <thread>
#include <span>
#include <intrin.h>

using namespace std;

uint64_t MichaelsHash( uint64_t key );

int main()
{
static_assert(sizeof(size_t) == 8,"");
// number of buckets
#if defined(NDEBUG)
constexpr ptrdiff_t N_BUCKETS = 1ull << 32;
#else
constexpr ptrdiff_t N_BUCKETS = 1 << 16;
#endif
// don't zero-initialize update-array
union ndi_sz { size_t s; ndi_sz() {} };
vector<ndi_sz> rawUpdates( N_BUCKETS );
// native size_t span to update-array
span<size_t> updates( &rawUpdates[0].s, N_BUCKETS );
int hc = thread::hardware_concurrency();
// thread partitition end to update-array
ptrdiff_t end = N_BUCKETS;
// partitition index to update array
ptrdiff_t p = hc;
vector<jthread> threads;
threads.reserve( hc );
do
{
// begin to update array
ptrdiff_t begin = (ptrdiff_t)((double)--p / hc * N_BUCKETS);
threads.emplace_back( [&]( size_t begin, size_t end )
{
for( ; begin != end; ++begin )
updates[begin] = MichaelsHash( begin ) % N_BUCKETS;
}, begin, end );
// next end to update array is the begin before
end = begin;
} while( p > 0 );
// wait for all threads to finish
threads.resize( 0 );
// sort update array parallel
sort( execution::parallel_policy(), updates.begin(), updates.end() );
vector<uint8_t> buckets( N_BUCKETS );
// update buckets according to the update-list
for( size_t u : updates )
if( !++buckets[u] )
return EXIT_FAILURE;
// release memory of the update list
rawUpdates.clear();
rawUpdates.shrink_to_fit();
// sum up loads
vector<size_t> loads;
for( size_t bucketLoad : buckets )
{
if( loads.size() <= bucketLoad )
loads.resize( bucketLoad + 1 );
++loads[bucketLoad];
}
// print loads percentage
for( size_t ld = 0; ptrdiff_t nLoad : loads )
cout << ld++ << ": " << (double)nLoad / (ptrdiff_t)buckets.size() << endl;
}

uint64_t MichaelsHash( uint64_t key )
{
__m128i xkey = _mm_set_epi64x( key, 42 );
using bar_t = pair<uint64_t, uint64_t>;
static bar_t const bars[8] =
{
{ 0xBB09BBCC90B24BF2, 0x825C622FF2792A01 },
{ 0x94F0535CB06D4060, 0x939C756246DBFD1D },
{ 0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06 },
{ 0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C },
};
for( bar_t const &bar : bars )
{
__m128i rkey;
rkey.m128i_u64[0] = bar.first;
rkey.m128i_u64[1] = bar.second;
xkey = _mm_aesenc_si128( xkey, rkey );
}
return xkey.m128i_u64[0];
}

The hash function is tested with 2 ^ 32 buckets. The problem with that
is that the bucket-counters are updated with total random memory access,
which is extremely slow. So I first prepare an update-list, wich is a
vector of size_t's with the indices of the buckets being updated later.
The memory access for this update-list is updated lineary, which is very
fast. The update-list is written with the available number of threads.
This list is sorted after that and quicksorting also has linear memory
acces. To improve that further the quicksort is done with parallel
execution. The update-list takes 2 ^ 32 size_t's, which is 32GiB. The
sorted update list ressults in linear updates of the buckets.
This sounds rather complex, but I first tried it with random updates
to the buckets on a single core and on my 7950X Zen4-CPU I stopped
after 20min.
The resulting load-depths are according to the coupon collector'
problem (1 / (e * N!)):

C:\Users\Boni\Documents\Source\MichaelAesHash>timep
x64\Release\MichaelAesHash.exe
0: 36.7875%
1: 36.7884%
2: 18.3945%
3: 6.13053%
4: 1.53304%
5: 0.306619%
6: 0.0510489%
7: 0.00729787%
8: 0.000908948%
9: 0.000102189%
10: 1.07335e-05%
11: 8.14907e-07%
12: 4.65661e-08%
real 53.783s
user 11m:39.984s
sys 6.141s
cylces 3.446.315.603.052
Bonita Montero
2024-06-09 11:35:25 UTC
Permalink
With the code I've shown I proved that the hash AES hashfunction
is totally stochastically and thereby a good hash function. But
I was still discontent with the speed of the code.
Usually you're not able to update the buckets of a hashtable with
multiple threads without any locking. But with my code the buckets
are only simple counters which could be updated atomically.
So I could reduce the code and partitioned the input range to the
hash function with the available number of cores. Here's the code:

#include <iostream>
#include <vector>
#include <cstdint>
#include <thread>
#include <span>
#include <intrin.h>

using namespace std;

uint64_t MichaelsHash( uint64_t key );

int main( int argc, char ** )
{
#if defined(NDEBUG)
constexpr ptrdiff_t N_BUCKETS = 1ull << 32;
#else
constexpr ptrdiff_t N_BUCKETS = 1 << 16;
#endif
vector<uint8_t> buckets( N_BUCKETS );
int hc = thread::hardware_concurrency(), p = hc;
vector<jthread> threads;
threads.reserve( hc );
ptrdiff_t end = N_BUCKETS, begin;
do
begin = (ptrdiff_t)((double)--p / hc * N_BUCKETS),
threads.emplace_back( [&]( size_t begin, size_t end )
{
span<atomic_uint8_t> atomics( (atomic_uint8_t *)buckets.data(),
buckets.size() );
for( ; begin != end; ++begin )
if( !++atomics[MichaelsHash( begin ) % (size_t)N_BUCKETS] )
terminate();
}, begin, end );
while( (end = begin) );
threads.resize( 0 );
// sum up loads
vector<size_t> loads;
for( size_t bucketLoad : buckets )
{
if( loads.size() <= bucketLoad )
loads.resize( bucketLoad + 1 );
++loads[bucketLoad];
}
// print loads percentage
for( size_t ld = 0; ptrdiff_t nLoad : loads )
cout << ld++ << ": " << 100.0 * nLoad / (ptrdiff_t)buckets.size() <<
"%" << endl;
}

uint64_t MichaelsHash( uint64_t key )
{
__m128i xkey = _mm_set_epi64x( key, 42 );
using bar_t = pair<uint64_t, uint64_t>;
static bar_t const bars[8] =
{
{ 0xBB09BBCC90B24BF2, 0x825C622FF2792A01 },
{ 0x94F0535CB06D4060, 0x939C756246DBFD1D },
{ 0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06 },
{ 0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C },
};
for( bar_t const &bar : bars )
xkey = _mm_aesenc_si128( xkey, _mm_set_epi64x( bar.second, bar.first ) );
return xkey.m128i_u64[0];
}

Now the code is about six times faster and I get a eight times
speedup over single-threaded processing with the same code. Of
course the results are still the same.
Richard Harnden
2024-06-09 11:40:27 UTC
Permalink
Post by Bonita Montero
With the code I've shown I proved that the hash AES hashfunction
is totally stochastically and thereby a good hash function. But
I was still discontent with the speed of the code.
Usually you're not able to update the buckets of a hashtable with
multiple threads without any locking. But with my code the buckets
are only simple counters which could be updated atomically.
So I could reduce the code and partitioned the input range to the
#include <iostream>
This is not C
--
This email has been checked for viruses by AVG antivirus software.
www.avg.com
Bonita Montero
2024-06-09 13:09:16 UTC
Permalink
Post by Richard Harnden
Post by Bonita Montero
With the code I've shown I proved that the hash AES hashfunction
is totally stochastically and thereby a good hash function. But
I was still discontent with the speed of the code.
Usually you're not able to update the buckets of a hashtable with
multiple threads without any locking. But with my code the buckets
are only simple counters which could be updated atomically.
So I could reduce the code and partitioned the input range to the
#include <iostream>
This is not C
Its about the general principle of Michael's hash and how to prove that
it has a distibution of the bucket loads which is similar to a random
number generator. I did it in C++ because that's only half the code.
Malcolm McLean
2024-06-10 00:34:21 UTC
Permalink
Post by Bonita Montero
uint64_t MichaelsHash( uint64_t key )
{
    __m128i xkey = _mm_set_epi64x( key, 42 );
    using bar_t = pair<uint64_t, uint64_t>;
    static bar_t const bars[8] =
    {
        { 0xBB09BBCC90B24BF2, 0x825C622FF2792A01 },
        { 0x94F0535CB06D4060, 0x939C756246DBFD1D },
        { 0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06 },
        { 0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C },
    };
    for( bar_t const &bar : bars )
        xkey = _mm_aesenc_si128( xkey, _mm_set_epi64x( bar.second,
bar.first ) );
    return xkey.m128i_u64[0];
}
Now the code is about six times faster and I get a eight times
speedup over single-threaded processing with the same code. Of
course the results are still the same.
Thank you.
I have your permission to drop that in?
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc
Tim Rentsch
2024-06-10 01:31:15 UTC
Permalink
Post by Malcolm McLean
Post by Bonita Montero
uint64_t MichaelsHash( uint64_t key )
{
__m128i xkey = _mm_set_epi64x( key, 42 );
using bar_t = pair<uint64_t, uint64_t>;
static bar_t const bars[8] =
{
{ 0xBB09BBCC90B24BF2, 0x825C622FF2792A01 },
{ 0x94F0535CB06D4060, 0x939C756246DBFD1D },
{ 0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06 },
{ 0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C },
};
for( bar_t const &bar : bars )
xkey = _mm_aesenc_si128( xkey, _mm_set_epi64x( bar.second,
bar.first ) );
return xkey.m128i_u64[0];
}
Now the code is about six times faster and I get a eight times
speedup over single-threaded processing with the same code. Of
course the results are still the same.
I have your permission to drop that in?
Note that this code was cribbed from Michael S. If you
think it's important to ask permission, I think he is
the one you should be asking.

By the way, I thought you were looking for code that works
in standard C, and acceptable under C90 rules. Have you
changed your mind about that? The code above is a far
cry from C, let alone C90.
Michael S
2024-06-10 12:14:53 UTC
Permalink
On Sun, 09 Jun 2024 18:31:15 -0700
Post by Tim Rentsch
Post by Malcolm McLean
Post by Bonita Montero
uint64_t MichaelsHash( uint64_t key )
{
__m128i xkey = _mm_set_epi64x( key, 42 );
using bar_t = pair<uint64_t, uint64_t>;
static bar_t const bars[8] =
{
{ 0xBB09BBCC90B24BF2, 0x825C622FF2792A01 },
{ 0x94F0535CB06D4060, 0x939C756246DBFD1D },
{ 0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06 },
{ 0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C },
};
for( bar_t const &bar : bars )
xkey = _mm_aesenc_si128( xkey, _mm_set_epi64x( bar.second, bar.first ) );
return xkey.m128i_u64[0];
}
Now the code is about six times faster and I get a eight times
speedup over single-threaded processing with the same code. Of
course the results are still the same.
I have your permission to drop that in?
Note that this code was cribbed from Michael S.
I don't permit to use my name in variant, modified by Bonita.
I prefer if my name is not used even on my own variant.
This code was never intended for production use. It is intended for use
as reference, as something that is "certainly random enough and then
many billions times more than enough". It is much much slower than
necessary for non-crypto hash.
On the other hand, it is probably billions of billions times weaker
than what now considered "good enough" for crypto hash.

If you ask my personal opinion, I think that multiplication modulo
2**64 by prime (or even non-prime) close in absolute value to 2**63.5
is sufficient, as long as it used with appropriate method of conversion
of hash function to table index, which, for this hash function, happens
to be linear scaling by size of hash table.
Ironically, it was the method originally proposed by Bonita. The only
thing (s)he didn't understand was that factors that contain many
consecutive '0' or '1' bits, esp. so in more significant bits,
should be avoided.
Post by Tim Rentsch
If you
think it's important to ask permission, I think he is
the one you should be asking.
By the way, I thought you were looking for code that works
in standard C, and acceptable under C90 rules. Have you
changed your mind about that? The code above is a far
cry from C, let alone C90.
Exactly.
I think that all methods that are both good and fast rely on 64-bit
unsigned integer arithmetic. And it seems to me that C90 does not
provide it in portable manner.
So, under portable C90 constraints one has to choose between good and
fast.

Bonita Montero
2024-06-10 07:35:40 UTC
Permalink
Post by Malcolm McLean
Post by Bonita Montero
uint64_t MichaelsHash( uint64_t key )
{
     __m128i xkey = _mm_set_epi64x( key, 42 );
     using bar_t = pair<uint64_t, uint64_t>;
     static bar_t const bars[8] =
     {
         { 0xBB09BBCC90B24BF2, 0x825C622FF2792A01 },
         { 0x94F0535CB06D4060, 0x939C756246DBFD1D },
         { 0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06 },
         { 0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C },
     };
     for( bar_t const &bar : bars )
         xkey = _mm_aesenc_si128( xkey, _mm_set_epi64x( bar.second,
bar.first ) );
     return xkey.m128i_u64[0];
}
Now the code is about six times faster and I get a eight times
speedup over single-threaded processing with the same code. Of
course the results are still the same.
Thank you.
I have your permission to drop that in?
Yes, but the table has a copy-paste-bug. It's eight pairs long
and the rest of the pairs is default-initialized. I should look
like this:

static bar_t const bars[] =
{
{ 0xBB09BBCC90B24BF2, 0x825C622FF2792A01 },
{ 0x94F0535CB06D4060, 0x939C756246DBFD1D },
{ 0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06 },
{ 0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C }
};
Bonita Montero
2024-05-26 18:06:11 UTC
Permalink
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
The pointers are nodes of a tree, which are read only, and I want to
associate read/write data with them. So potentially a lage number of
pointers,and they might be consecutively ordered if they are taken from
an array, or they might be returned from repeared calls to malloc() with
small allocations. Obviously I have no control over pointer size or
internal representation.
Can you explain this ? The only puropose of what I'm aware why hashing
pointers makes sense is taking them as keys for an API talking to exter-
nal code. And an operating system kernel could take pointers as handles
and verify if they're legal with a hashtable lookup.
Malcolm McLean
2024-05-26 19:10:00 UTC
Permalink
Post by Bonita Montero
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
The pointers are nodes of a tree, which are read only, and I want to
associate read/write data with them. So potentially a lage number of
pointers,and they might be consecutively ordered if they are taken
from an array, or they might be returned from repeared calls to
malloc() with small allocations. Obviously I have no control over
pointer size or internal representation.
Can you explain this ? The only puropose of what I'm aware why hashing
pointers makes sense is taking them as keys for an API talking to exter-
nal code. And an operating system kernel could take pointers as handles
and verify if they're legal with a hashtable lookup.
Yes.
I have an XML parser, which creates a tree of XML data, with child and
sibling nodes. XML documents can be very large, so this tree could
contain a lot of nodes.

Now I am implementing an XPath query function. XPath passes an
exoression like "//book/author/name" and picks out all nodes with the
element tag "name" which are childen of a node with the element tag
"author" and grandchildren of a node with the element tag "book".

Now the algorithm I have chosen requires me to associate some temporary
data with each node as workspace (specifically, I mark a node as
deleted). However the nodes have no "deleted" member.
And we shouldn't have to change the nodes to write secondary code that
works on the structures.

So the obvious thing to do is to just use a hash table with pointer
values as keys. I can then associate as much data as i want with a node,
without having to hack and change the parser. But I'll need that data
for every node each time I traverse the tree. So the hash table look up
is going to be the most expensive, rate limiting step.
--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm
Bonita Montero
2024-05-26 18:24:31 UTC
Permalink
Post by Malcolm McLean
What is a good hash function for pointers to use in portable ANSI C?
The pointers are nodes of a tree, which are read only, and I want to
associate read/write data with them. So potentially a lage number of
pointers,and they might be consecutively ordered if they are taken from
an array, or they might be returned from repeared calls to malloc() with
small allocations. Obviously I have no control over pointer size or
internal representation.
That's what std::hash<void *> maps to on libstdc++:

template<typename _Tp>
struct hash<_Tp*> : public __hash_base<size_t, _Tp*>
{
size_t
operator()(_Tp* __p) const noexcept
{ return reinterpret_cast<size_t>(__p); }
};

So if libstdc++ directly casts the pointer to a size_t ... ;-)
Loading...