The 32bit hash will output a four byte hash (32 bits in length), while the 64bit hash will output an eight byte hash (64 bits in length).
The way Farmhash.Sharp knows how to hash a string of arbitrary encoding is to look at the raw bytes that compose the string. This is great for convenience, but may not be the best if working directly with arrays or strings across encodings.
To get bytes from our text, we need to decide on the encoding. Common examples are ASCII, UTF-8, and Windows-1252. In this tutorial, we’re going to keep things simple and assume that our text is encoded as ASCII
With our bytes handy, it is now time to calculate the hash! Choosing which hash to use may be the hardest decision in this library. If you need the lowest probability of collisions, then your choice is simple, go with Hash64. If you need the fastest speed then it depends on the architecture of the machine being ran on and how your project is compiled:
See the benchmarking section for concrete numbers.
For more information on disabling the 32bit preference, see the following blog post.
And for good measure, let’s see the API in action once again.
Congratulations, you’re now an expert at using the Farmhash.Sharp library!
Using BenchmarkDotNet, the FarmHash.Sharp benchmarking code pits several non-cryptographic hash functions against each other in terms of throughput.
Benchmarking was done on the following machine:
BenchmarkDotNet=v0.10.3.0 OS=Microsoft Windows NT 6.2.9200.0 Processor=Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz, ProcessorCount=8 Frequency=3914059 ticks, Resolution=255.4893 ns
To run the benchmarks on the linux machine:
Please do not take these graphs as absolute truths. Run the benchmark code yourself to confirm findings.
For the first graph we will compare a cryptographic (albeit a bad one, as it’s MD5) with Farmhash.Sharp, which is a non-cryptographic hash function. Whereas a non-cryptographic function only has to optimize against collisions and speed, a cryptographic function needs to also minimize pathological input.
Without getting bogged down into too many specifics, Farmhash easily crushes MD5.
What may be surprising is that depending on the runtime Farmhash is running on, the throughput can be dramatically affected. To show this, I’ve restricted the data to only show the 64bit hash of Farmhash across different Clr Jit runtimes to see which Jit wins.
I suppose the .NET team should be commended, as the latest Jit (their 64bit Ryu Jit) has 5-10x more throughput than the old Jit with results more pronounced against the legacy 32bit Jit.
Does mono have the same behavior?
Nope. 32bit and 64bit Mono have approximately the same throughput for 64bit Farmhash. If you have a keen eye, you may have noticed that the y axis scale changed, which naturally lends itself to the question of how Mono, Clr, and the new Core runtime compare against each other.
For both 32bit and 64bit Farmhash functions, the 64bit core and 64bit ryu runtimes win across any sized payload. Both the core and ryu probably use a lot of the same code under under the hood.
Restricting runtimes to just core-64bit, net-ryu-64bit, and mono-64bit let’s see how Farmhash.Sharp stacks up against other non-cryptographic functions. I’ll present all the graphs first with a brief synopsis afterwards. The graphs show the relative throughput of each hash function relative to the fastest hash function in that category. So the higher the bar chart, the better that hash function is for that payload size.
A good question would be how much efficiency is lost because we’re using
C# and not C++, as the original farmhash algorithm uses C++. You can find the
benchmark code here.
It uses two versions of the algorithm, one that uses hardware acceleration
(SIMD instructions), denoted by
in the graph, and another compilation that does not use hardware acceleration.
I’m pleased to report that for small payloads (<= 25 bytes), Farmhash.Sharp is around about the fastest if not the fastest. It’s only at larger payloads do we see C++’s lead extend as hardware acceleration becomes more effective. Still, for large payloads, Farmhash.Sharp has half the throughput as hardware accelerated C++, which in my opinion, is quite impressive.
When deploying on a 64bit application, always choose the 64bit Farmhash version. If, for whatever reason, Farmhash isn’t for you, choose xxHash found in Ravendb.
Code used to generate the graphs can be found in analysis.R in the github repo.
The library is available under MIT, which allows modification and redistribution for both commercial and non-commercial purposes. For more information see the License file in the GitHub repository.