Mastering Ethereum Challenge Part 4 - Cryptographic Hash Functions... say what?

WTF is a cryptographic hash function?


Cryptographic hash functions (CHF) are very important primitives that make blockchains the significant force they are today. The basic use of CHF is to map any kind of data e.g. text, audio, video, complex computer code... and transform it into a fixed length output called a hash.

When data is "hashed", it allows us to refer and verify the integrity of data. Thus, if we can verify the data is not tampered throughout its life, we can trust the data better.

There are many kinds of cryptographic hash functions. For example:

  • Bitcoin uses sha256.
  • Ethereum uses keccak256.

CHF are shortcuts to encrypt any kind of data to output a fixed length of data... called a hash. This is an example of what a cryptographic hash function looks like.

keccak256=("insert any data in here")

So, type this into any Javascript terminal and it would produce a fixed output. (If you don't have Javascript, you can visit here to play around with keccak256 directly on the webpage without needing to install Javascript on your computer. It's quite fun 😄.)

For example:

keccak256=("a") would give me an output of 3ac225168df54212a25c1c01fd35bebfea408fdac2e31ddd6f80a4bbf9a5f1cb.

keccak256=("ab") would give me an output of 67fad3bfa1e0321bd021ca805ce14876e50acac8ca8532eda8cbf924da565160.

Just a slight change of adding a letter would give me a totally different hash.

Thus, the input data could be:

  • A video
  • A song
  • A piece of text
  • A complexly written computer program

But the output is always a hash of the same length.

Why do cryptographic hash functions matter?


CHFs matter because they are:

  • Deterministic - the same input will always give the same output using the CHF. Thus, the keccak256 CHF, for example, gives us a very easy reference to verify data integrity.

  • Infeasible to guess the reverse - a good cryptographic hash function must be such that it is easy to verify but difficult to guess the inputs. It is easy to guess two multiplication inputs needed to output 20. But what if I were to ask you guess what are the two multiplication inputs needed that would lead to 123,345,567... it would be more difficult... yes? So, in the case of cryptographic hash functions, it is very difficult to guess what would output a particular hash.

  • Collision resistant - a tiny change in the input would radically change the output... thus it is extremely unlikely that two unique inputs exist that would lead to one same outcome.


How cryptographic hash functions are used in blockchains?


Cool thing about hash functions is that:

  • You can use the output of one hash function... to be the input for another hash function. Example keccak245=("67fad3bfa1e0321bd021ca805ce14876e50acac8ca8532eda8cbf924da565160") would give you an output of
    63a9f18b64ca5a98ad9dba59259edb0710892614501480a9bed568d98450c151... which you use for another hash function so on and so forth. Thus, creating a chain of historical data that is always properly referenced and difficult to tamper.

  • Storing hashed passwords is safer than storing plain text passwords. For example, if there is a security breach, the hacker would only gain your hash 67fad3bfa1e0321bd021ca805ce14876e50acac8ca8532eda8cbf924da565160... but the hacker won't know what is the answer that would output this hash. Virtually impossible to guess it's actually just a simple text ab .

Summary


Cool, now we have learned:

  • Cryptographic hash functions
  • What they are
  • Why they matter
  • How they are used

More to come.