Farmhash.Sharp is a .NET port of Google’s Farmhash algorithm for calculating 32bit and 64bit non-cryptographic hashes. Farmhash.Sharp has great performance characteristics when calculating 64bit hashes, especially on short strings or a subsequence of byte arrays. See benchmarks comparing other libraries to Farmhash.Sharp.
Install from NuGet:
dotnet add package Farmhash.Sharp
Farmhash.Sharp is built against NET Standard 1.0 so can be ran on any of the following:
Since 0.4, Farmhash.Sharp contains methods that are part of the public API and
unsafe. Almost all platforms should be unaffected by this detail.
The 32bit hash will output a four byte hash (32 bits in length), while the 64bit hash will output an eight byte hash (64 bits in length).
Let’s see the API in action (this time with a byte array).
Congratulations, you’re now an expert at using the Farmhash.Sharp library!
The way Farmhash.Sharp knows how to hash a string of an arbitrary encoding is to look at the raw bytes that compose the string.
For information on disabling the 32bit preference, see the following blog post.
On a regular basis, I need a non-cryptographic hash calculated from a portion of a byte array. This byte array
may end up becoming a string, but allocating a string just to throw it away for
GetHashCode caused too much
GC pressure and resulted in degraded performance. Additionally,
GetHashCode produces only a 32bit hash
My requirements became:
At the time there was not a package on NuGet that satisified these requirements, so I ported farmhash to .NET.
Using BenchmarkDotNet, the FarmHash.Sharp benchmarking code pits several non-cryptographic hash functions against each other in terms of throughput.
Benchmarking was done on the following machine:
BenchmarkDotNet=v0.10.3.0 OS=Microsoft Windows NT 6.2.9200.0 Processor=Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz, ProcessorCount=8 Frequency=3914059 ticks, Resolution=255.4893 ns
To run the benchmarks on the linux machine:
Please do not take these graphs as absolute truths. Run the benchmark code yourself to confirm findings.
For the first graph we will compare a cryptographic (albeit a bad one, as it’s MD5) with Farmhash.Sharp, which is a non-cryptographic hash function. Whereas a non-cryptographic function only has to optimize against collisions and speed, a cryptographic function needs to also minimize pathological input.
Without getting bogged down into too many specifics, Farmhash easily crushes MD5.
What may be surprising is that depending on the runtime Farmhash is running on, the throughput can be dramatically affected. To show this, I’ve restricted the data to only show the 64bit hash of Farmhash across different Clr Jit runtimes to see which Jit wins.
I suppose the .NET team should be commended, as the latest Jit (their 64bit Ryu Jit) has 5-10x more throughput than the old Jit with results more pronounced against the legacy 32bit Jit.
Does mono have the same behavior?
Nope. 32bit and 64bit Mono have approximately the same throughput for 64bit Farmhash. If you have a keen eye, you may have noticed that the y axis scale changed, which naturally lends itself to the question of how Mono, Clr, and the new Core runtime compare against each other.
For both 32bit and 64bit Farmhash functions, the 64bit core and 64bit ryu runtimes win across any sized payload. Both the core and ryu probably use a lot of the same code under under the hood.
The following benchmark was done:
Please click on the image for a better look!
In each configuration, which library has the highest relative throughput compared to competitors in the same row?
Previous heatmap detailed relative throughput, but that was for each facet’s (configuration’s) payload size. How can one tell what configuration yields the highest throughput at a given payload size. Welcome to the next heatmap.
What are some takeaways? Well, if you are constrained to a platform you are deploying, you’ll choose the library that performed the best relative to others according to your constraints. If you’re interested in highest throughput:
A good question would be how much efficiency is lost because we’re using
C# and not C++, as the original farmhash algorithm uses C++. You can find the
benchmark code here.
It uses two versions of the algorithm, one that uses hardware acceleration
(SIMD instructions), denoted by
in the graph, and another compilation that does not use hardware acceleration.
I’m pleased to report that for small payloads (<= 25 bytes), Farmhash.Sharp is around about the fastest if not the fastest. It’s only at larger payloads do we see C++’s lead extend as hardware acceleration becomes more effective. Still, for large payloads, Farmhash.Sharp has half the throughput as hardware accelerated C++, which in my opinion, is quite impressive.
When deploying on a 64bit application, always choose the 64bit Farmhash version. If, for whatever reason, Farmhash isn’t for you, choose xxHash found in Ravendb.
Code used to generate the graphs can be found in analysis.R in the github repo.
The library is available under MIT, which allows modification and redistribution for both commercial and non-commercial purposes. For more information see the License file in the GitHub repository.