Hashing (Part I.5)

‹ Back to Public Key Cryptography

In the previous article we have discussed the basics of public key cryptography and I wanted to continue with an introduction into the way it is used in most cryptocurrencies, using Bitcoin as an example. While doing research on the subject, however, it occurred to me that there is an important concept we have not covered at all, namely hashing. For cryptocurrencies this concept is vital, as hashes are one of the basic elements underlying blockchain technology. This article aims to provide an intro to hashes as a general concept, to cryptographic hashes, as a prerequisite to how they can be used to create digital signatures (while noting that there is a lot more to it than what is covered here).

Hashing (without qualifiers) is the procedure of mapping a larger-sized input to a smaller-sized output. It is used a lot in programming in general (for instance, the hash map is one of the first data structures you learn about in Algorithms and Data Structures), and for good reason – it helps you organise things into groups of the same sort so you can later retrieve what you need very fast, since you know beforehand where you put it. Let me provide an example:

Let’s say you want to sort your contacts on your smartphone into groups of the same sort so you can find them easier. The input data of this program would be the set of contacts you have in your phone. The output keys or hashes are the groups you want to sort them into, for example “Family”, “School”, “Work” etc. These groups will probably have many members in them and most of them will already be known to you, since you likely call them regularly and you don’t search for them that often. However, think how useful it is to have a group “Emergencies” where you can store the number of the road assist company, plumber etc. and find them using this key or label instantly when you need them. Hopefully this is data you don’t need very often, but when you do need it, you need it fast. The mapping function that takes your contacts and hashes them into the right keys is the most tedious part of this exercise – it involves you going through each one of your contacts and deciding what group they belong to. The point is that once this is done, you will not have to go through each of them again to search for the value you need, since by knowing the key you can find the right contacts contained therein.

The same idea is applied when hashing any data, except it usually involves a mapping function that looks at intrinsic properties of the data itself and maps it to the right key using some collection of rules. In addition, in the procedure discussed above, we have intentionally allowed more data values to be sorted into the same key, but this is usually undesirable in actual usage, being called a collision. There are ways to mitigate it (such as chaining, which basically entails allowing it to happen, storing multiple values at the same key, pretty much what we have done above) and re-hashing (using a tie-breaking mechanism to create a new hash that is unique – there are multiple strategies for this).

So, based on the above, we can define the following:

d, k

Where d is the data, k is the key, and

H(d) -> k

Where H is the mapping or hash function that determines the key k for input d

Cryptographic hash functions are built on this same basic principle, while at the same time exhibiting certain additional properties:

The cryptographic hash function used in Bitcoin for signing is called SHA-256. It produces a hash value of 256 bits, which is usually represented as a string of 64 characters in hexadecimal notation. You can play a little bit with the toy hashing app below to get a sense of how it works:

Your hash will appear here!

The signature is then created based on the hash – but more on that soon.

‹ Back to Public Key Cryptography

References

  1. Usually when discussing input length in this context, we refer to the length of a byte array