Randomness as a concept

Introduction to Randomness

By: Ulysse Pavloff

Randomness could be defined as the unpredictability of events considering the information previously available.

You could also see randomness as a lack of information. If you have enough knowledge about a system and you're able to predict its future state, the randomness disappears. A cast of the dice wouldn't be random if you could know what number you are going to get, depending on how you threw the dice.

Randomness in computer science

Why do we need random numbers in the first place? Randomness is needed when you seek unpredictability, when prediction would bias the ‘game'. For instance, if an online poker player knew the algorithm used to shuffle the cards, he could predict the cards of everyone and win the game easily. There are a lot of trivial examples for games, but we'll see examples of domains where randomness is paramount.

Knowing that randomness is associated with unpredictability, how do computers cope with randomness? Considering the fact that they are deterministic machines that compute code without ambiguity, they should not be able to produce something unpredictable. And they don't. They do exactly as they're told (counter example).

So, how can computers produce seemingly random results without inherent randomness?

Pseudo randomness

A computer is unable to escape determinism, and that is a good thing in itself: you don't want a computer to obtain different results for the same operation. A computer is a deterministic machine that should always output the same result for a given input.

That is why true randomness needs to be created by something other than basic computing. For this we need entropy. Entropy is to be viewed as a loss of information about the system. More entropy implies less information which means more uncertainty.

Randomness in computers has thus tried a myriad of means to reach the true randomness holy grail. Among the techniques used, we can mention radioactive decay, user activity, atmospheric noise, and so on. This is a way to ‘catch' entropy and use it for the generation of a random number.

For instance, here is the message you get if you want to generate a GPG key in Linux (type gpg --gen-key in terminal):

We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy.

But what is pseudo-randomness? It is a phenomenon that seemingly produces randomness but is completely repeatable.

To circumvent the fact that computing cannot produce true randomness on its own, computers and programs generally use a seed. The seed act as a generator influencing every expected random choice. And although it will seem random, you can replicate the results of every code/experiment if you have the same seed.

That is why usually when you train neural networks in AI you specify the seed. This way anyone with your code and dataset will be able to replicate your AI and verify your results. You can look at the paper with code website which links code to scientific papers, mostly in the machine learning field. Most of the code there has a specific seed already included allowing anyone to reproduce the results.

Aside from being used for flipping coins, creating unpredictability in games, or replicating results in AI, randomness is also crucial for cryptography, and de facto, blockchains.

Randomness for cryptography

One important use case of randomness is cryptography. Whether it is for key generation or transactions, you can never be too careful about how randomness is created.

In 2013, a major flaw in the random generation (RNG) of Android wallets has been exploited, leading to at least 55.821 bitcoins being stolen. The vulnerability stemmed from the inherent function java.security.SecureRandom , ironically not so secured.

The attack worked as follows: each time a user makes a Bitcoin transaction, she needs to prove the ownership of the private key linked to the public key generating the coins transfer. This proof is done using the ECDSA algorithm. To sign a transaction, a user needs three things:

Her private key.
Transaction/message hash.
A random number.

The random number should differ between two transactions, otherwise, anyone can compute the private key.

To understand how you can compute the key you need to know about the cryptography behind it; if you are interested check the following part.

Explanation of the cryptography:

Bitcoin uses the mathematics of elliptic curves digital signatures algorithm (ECDSA). Let us say that Alice has the private and public key pair $(K_{p r i v}, K_{p u b}) (K_{priv}, K_{pub})$ where $K_{p u b} = K_{p r i v} * g K_{pub} = K_{priv} ∗ g$ . g being the generator point.

To sign a message Alice needs to do the following:

Select a random integer $k k$ .

Compute $k * g = (x_{1}, y_{1}) k ∗ g = (x_{1}, y_{1})$ .

Compute $r = x_{1} (m o d n) r = x_{1} \pmod{n}$ . (if $r = 0 r=0$ go to step 1).

Compute $k^{- 1} (m o d n) k^{-1} \pmod{n}$ . Where $k^{- 1} k^{-1}$ is the multiplicative inverse and satisfies $k * k^{- 1} (m o d n) = 1 k ∗ k^{-1} \pmod{n} = 1$ .

Compute the hash of the message $m : e = h a s h (m) (m o d n) m: e = hash(m) (mod n)$ .

Compute $s = k^{- 1} * (e + K_{p r i v} * r) (m o d n) s = k^{-1} ∗(e + K_{priv} ∗ r) \pmod{n}$ . (if $s = 0 s=0$ go to step 1)

Alice's signature for the message is $(r, s) (r,s)$ .

Now if we find two signatures of Alice: $(r, s) (r,s)$ and $(r, s^{'}) (r,s')$ , using the same random integer $k k$ for two different messages $m m$ and $m^{'} m'$ , we can derive the private key $K_{p r i v} K_{priv}$ of Alice.

Indeed, we start by finding that $e = hash (m) (m o d n) e = \texttt{hash}(m) \pmod{n}$ and $e^{'} = hash (m^{'}) (m o d n) e' = \texttt{hash}(m') \pmod{n}$ .

Since k is the same in the two messages we have $s - s^{'} = k^{- 1} (e - e^{'}) (m o d n) s - s' = k^{-1} (e - e') \pmod{n}$ , leading to $k = (e - e^{'}) / (s - s^{'}) (m o d n) k = (e - e')/(s - s') \pmod{n}$ . At this point, we figured out the ‘random' integer.

Next, we use the equation where the only unknown variable is $K_{p r i v} K_{priv}$ :

$s = k^{- 1} (e + K_{p r i v} \times r) (m o d n) \Leftrightarrow K_{p r i v} = \frac{s k - e}{r} (m o d n) s = k^{-1}(e+K_{priv}\times r)\pmod{n} \Leftrightarrow K_{priv} = \dfrac{sk-e}{r}\pmod{n}$

The use of a deterministic number rather than a random one is now the current method to avoid these vulnerabilities.

To alleviate this problem, secure wallets now change public key addresses after every transaction to avoid this kind of attack among others.

Since a lot of cryptography is based on secret keys, one should not create a secret key in a way that is replicable. What is the point of hiding your recovery phrase in the most secret place if it can be brute-forced easily? For instance, let's say you used python to create your recovery phrase with this naïve code:

import random
from eth_account import Account
bits = random.getrandbits(256)
print(bits)
>>> 42113660129043767433473637172206104164089819165490969477913202590807626782422
private_key = hex(bits)
print(private_key)
>>> 0x5d1b7ca7c93d0a88226a04a9d2d9909ebd522bd12de62ef3a99df33fa81e96d6
account = Account.from_key(private_key)
account.address
>>> '0x2986eF55C9BC9364c7306BddBe5030C0559b0DD9'

If someone were to send this to his friend saying something like: “hey you can use this code if you want to create your own private key! I just created mine, you can now send me eth at: 0x2986eF55C9BC9364c7306BddBe5030C0559b0DD9 :)” Now his friends know the code and the time around when it was used. If she is a little bit mischievous then she might try to find his private key. Because most native libraries for randomness aren't intended for cryptography, they're not very secure. In our case, the random library of Python uses the current system time of the computer if no seed is specified. So your mischievous friend can just brute-force with seeds around the time you've created your key, and voilà!

Randomness in Blockchain

Although blockchains appear to be filled with randomness with every public key stemming from a random private key and every hash being unpredictable before being computed, once you want to execute a code that uses randomness on a blockchain, you struggle to find a way.

So, how is randomness produced in blockchain, when every single piece of information is shared on the network? Finding a suitable seed for a ‘good' source of randomness is problematic for two reasons:

if the seed is set before the involvement of participants (casting the dice, drawing the cards, etc.), then, the code and the seed will be visible to everyone (Everyone can thus predict the results).
if the seed is set after the involvement of participants, you risk choosing a manipulable seed. As a matter of fact, what source of randomness will you choose? Data is added to the blockchain through miners, at the arrival of new blocks. Depending on what information you are using from a block, a miner can influence it. And if you draw your seed from a third party, such as an oracle, your trust is based on their reliability.

Each way has its flaws, the first one is obvious, and we will explore the second one later on.

Blockchains using randomness in their protocols

Several blockchains use randomness to determine the role of agents participating in their protocol. Most notorious examples are Cardano, Algorand, Polkadot, and Ethereum. They use randomness to determine the roles of validators, such as who is allowed to propose the next block.

The three first examples all use what is called VRF (Verifiable Random Function)². The VRF is a function that takes as input a secret key and a nonce and outputs a pseudo-random value. The output from which derives the secret key is then used as a seed for randomness. The perk of VRF is the Verifiable part, once you have your output, you can prove that this output was correctly computed and hasn't been tampered with. (Note: Chainlink also uses VRF to provide verifiable randomness in the smart contracts)

Ethereum uses a different mechanism called RANDAO. To produce a random seed, it uses the randomness of every participant such that at least one honest participant ensures randomness.

Conclusion

Understanding how randomness is used and where it stems from is the best way to avoid any vulnerabilities. A deep understanding of randomness in blockchain requires an understanding of cryptography. We will explore in further detail different ways of using cryptography for randomness in the next article.

Notes:

The function rand() generate a random number between 0 and 1 in many programming languages.
The creator of VRF, Silvio Micali, is also the founder of Algorand. He is also one of the guys who conceived the Zero Knowledge Proof you might have heard of.

Resources:

Now, let's have a look at that spell...