The Ethereum Blockchain Paradigm (Part 1)
Introduction
Odds are you’ve heard about the Ethereum blockchain, whether or not you know what it is. It’s been in the news a lot lately, including the cover of some major magazines, but reading those articles can be like gibberish if you don’t have a foundation for what exactly Ethereum is.
So what is it?
In essence, it’s a public database that keeps a permanent record of digital transactions. Importantly, this database doesn’t require any central authority to maintain and secure it. Instead, it operates as a “trustless” transactional system — a framework in which individuals can make peer-to-peer transactions without needing to trust a third party OR one another.
Still confused? That’s where this post comes in. My aim is to explain how Ethereum functions at a technical level, without complex math or scary-looking formulas. Even if you’re not a programmer, I hope you’ll walk away with at least a better grasp of the tech. If some parts are too technical and difficult to grasp, that’s totally fine! There’s really no need to understand every little detail. I recommend just focusing on understanding things at a broader level.
Many of the topics covered in this post are a breakdown of the concepts discussed in the yellow paper. I’ve added my own explanations and diagrams to make understanding Ethereum easier.
Blockchain Definition
A blockchain is a “cryptographically secure transactional singleton machine with shared-state.” [1]
That’s a mouthful, isn’t it?
Let’s break it down.
- “Cryptographically secure” means that the creation of digital currency is secured by complex mathematical algorithms that are obscenely hard to break. Think of a firewall of sorts. They make it nearly impossible to cheat the system (e.g. create fake transactions, erase transactions, etc.)
- “Transactional singleton machine” means that there’s a single canonical instance of the machine responsible for all the transactions being created in the system. In other words, there’s a single global truth that everyone believes in.
- “With shared-state” means that the state stored on this machine is shared and open to everyone.
Ethereum implements this blockchain paradigm.
1. The Ethereum blockchain paradigm explained
The Ethereum blockchain is essentially a transaction-based state machine. In computer science, a state machine refers to something that will read a series of inputs and, based on those inputs, will transition to a new state.
With Ethereum’s state machine, we begin with a “genesis state.” This is analogous to a blank slate before any transactions have happened on the network. When transactions are executed, this genesis state transitions into some final state. At any point in time, this final state represents the current state of Ethereum.
The state of Ethereum has millions of transactions. These transactions are grouped into “blocks.” A block contains a series of transactions, and each block is chained together with its previous block.
To cause a transition from one state to the next, a transaction must be valid. For a transaction to be considered valid, it must go through a validation process known as mining. Mining is when a group of nodes (i.e. computers) expend their compute resources to create a block of valid transactions.
Any node on the network that declares itself as a miner can attempt to create and validate a block. Lots of miners from around the world try to create and validate blocks at the same time. Each miner provides a mathematical “proof” when submitting a block to the blockchain, and this proof acts as a guarantee: if the proof exists, the block must be valid.
For a block to be added to the main blockchain, the miner must prove it faster than any other competitor miner. The process of validating each block by having a miner provide mathematical proof is known as a “proof of work.”
A miner who validates a new block is rewarded with a certain amount of value for doing this work. What is that value? The Ethereum blockchain uses an intrinsic digital token called “Ether.” Every time a miner proves a block, new Ether tokens are generated and awarded.
You might wonder: what guarantees that everyone sticks to one chain of blocks? How can we be sure that there doesn’t exist a subset of miners who will decide to create their own chain of blocks?
Earlier, we defined a blockchain as a transactional singleton machine with a shared state. Using this definition, we can understand the correct current state is a single global truth, which everyone must accept. Having multiple states (or chains) would ruin the whole system because it would be impossible to agree on which state was the correct one. If the chains were to diverge, you might own 10 coins on one chain, 20 on another, and 40 on another. In this scenario, there would be no way to determine which chain was the most “valid.”
Whenever multiple paths are generated, a “fork” occurs. We typically want to avoid forks, because they disrupt the system and force people to choose which chain they “believe” in.
To determine which path is most valid and prevent multiple chains, Ethereum uses a mechanism called the “GHOST protocol.”
“GHOST” = “Greedy Heaviest Observed Subtree”
In simple terms, the GHOST protocol says we must pick the path that has had the most computation done upon it. One way to determine that path is to use the block number of the most recent block (the “leaf block”), which represents the total number of blocks in the current path (not counting the genesis block). The higher the block number, the longer the path and the greater the mining effort that must have gone into arriving at the leaf. Using this reasoning allows us to agree on the canonical version of the current state.
2. Components of Ethereum/All about Accounts 🤯
Now that we have developed a basic understanding of Ethereum blockchain, let’s dive deeper into the components of its system:
Accounts
The global “shared-state” of Ethereum is comprised of many small objects (“accounts”) that are able to interact with one another through a message-passing framework. Each account has a state associated with it and a 20-byte address. An address in Ethereum is a 160-bit identifier that is used to identify any account.
There are two types of accounts:
- Externally owned accounts are controlled by private keys and have no code associated with them.
- Contract accounts are controlled by their contract code and have code associated with them.
Externally owned accounts vs. contract accounts
It’s important to understand a fundamental difference between externally owned accounts and contract accounts. An externally owned account can send messages to other externally owned accounts OR to other contract accounts by creating and signing a transaction using its private key. A message between two externally owned accounts is simply a value transfer. But a message from an externally owned account to a contract account activates the contract account’s code, allowing it to perform various actions (e.g. transfer tokens, write to internal storage, mint new tokens, perform some calculation, create new contracts, etc.).
Unlike externally owned accounts, contract accounts can’t initiate new transactions on their own. Instead, contract accounts can only fire transactions in response to other transactions they have received (from an externally owned account or from another contract account). We’ll learn more about contract-to-contract calls in the “Transactions and Messages” section.
Therefore, any action that occurs on the Ethereum blockchain is always set in motion by transactions fired from externally controlled accounts.
Account state
The account state consists of four components, which are present regardless of the type of account:
- nonce: If the account is an externally owned account, this number represents the number of transactions sent from the account’s address. If the account is a contract account, the nonce is the number of contracts created by the account.
- balance: The number of Wei owned by this address. There are 1e+18 Wei per Ether.
- storageRoot: A hash of the root node of a Merkle Patricia tree (we’ll explain Merkle trees later on). This tree encodes the hash of the storage contents of this account and is empty by default.
- codeHash: The hash of the EVM (Ethereum Virtual Machine — more on this later) code of this account. For contract accounts, this is the code that gets hashed and stored as the codeHash. For externally owned accounts, the codeHash field is the hash of the empty string.
World state
Okay, so we know that Ethereum’s global state consists of a mapping between account addresses and the account states. This mapping is stored in a data structure known as a Merkle Patricia tree.
A Merkle tree (or also referred as “Merkle trie”) is a type of binary tree composed of a set of nodes with:
- a large number of leaf nodes at the bottom of the tree that contains the underlying data
- a set of intermediate nodes, where each node is the hash of its two child nodes
- a single root node, also formed from the hash of its two-child node, representing the top of the tree
The data at the bottom of the tree is generated by splitting the data that we want to store into chunks, then splitting the chunks into buckets, and then taking the hash of each bucket and repeating the same process until the total number of hashes remaining becomes only one: the root hash.
This tree is required to have a key for every value stored inside it. Beginning from the root node of the tree, the key should tell you which child node to follow to get to the corresponding value, which is stored in the leaf nodes. In Ethereum’s case, the key/value mapping for the state tree is between addresses and their associated accounts, including the balance, nonce, codeHash, and storageRoot for each account (where the storageRoot is itself a tree).
This same trie structure is used also to store transactions and receipts. More specifically, every block has a “header” which stores the hash of the root node of three different Merkle trie structures, including:
- State trie
- Transactions trie
- Receipts trie
The ability to store all this information efficiently in Merkle tries is incredibly useful in Ethereum for what we call “light clients” or “light nodes.” Remember that a blockchain is maintained by a bunch of nodes. Broadly speaking, there are two types of nodes: full nodes and light nodes.
A full archive node synchronizes the blockchain by downloading the full chain, from the genesis block to the current head block, executing all of the transactions contained within. Typically, miners store the full archive node, because they are required to do so for the mining process. It is also possible to download a full node without executing every transaction. Regardless, any full node contains the entire chain.
But unless a node needs to execute every transaction or easily query historical data, there’s really no need to store the entire chain. This is where the concept of a light node comes in. Instead of downloading and storing the full chain and executing all of the transactions, light nodes download only the chain of headers, from the genesis block to the current head, without executing any transactions or retrieving any associated state. Because light nodes have access to block headers, which contain hashes of three tries, they can still easily generate and receive verifiable answers about transactions, events, balances, etc.
The reason this works is because hashes in the Merkle tree propagate upward — if a malicious user attempts to swap a fake transaction into the bottom of a Merkle tree, this change will cause a change in the hash of the node above, which will change the hash of the node above that, and so on until it eventually changes the root of the tree.
Any node that wants to verify a piece of data can use something called a “Merkle proof” to do so. A Merkle proof consists of:
- A chunk of data to be verified and it’s hash
- The root hash of the tree
- The “branch” (all of the partner hashes going up along the path from the chunk to the root)
Anyone reading the proof can verify that the hashing for that branch is consistent all the way up the tree, and therefore that the given chunk is actually at that position in the tree.
In summary, the benefit of using a Merkle Patricia tree is that the root node of this structure is cryptographically dependent on the data stored in the tree, and so the hash of the root node can be used as a secure identity for this data. Since the block header includes the root hash of the state, transactions, and receipts trees, any node can validate a small part of the state of Ethereum without needing to store the entire state, which can be potentially unbounded in size.
3. Gas and Payments
One very important concept in Ethereum is the concept of fees. Every computation that occurs as a result of a transaction on the Ethereum network incurs a fee — there’s no free lunch! This fee is paid in a denomination called “gas.”
Gas is the unit used to measure the fees required for a particular computation. Gas price is the amount of Ether you are willing to spend on every unit of gas, and is measured in “gwei.” “Wei” is the smallest unit of Ether, where ¹¹⁸ Wei represents 1 Ether. One gwei is 1,000,000,000 Wei.
With every transaction, a sender sets a gas limit and gas price. The product of gas price and gas limit represents the maximum amount of Wei that the sender is willing to pay for executing a transaction.
For example, let’s say the sender sets the gas limit to 50,000 and a gas price to 20 gwei. This implies that the sender is willing to spend at most 50,000 x 20 gwei = 1,000,000,000,000,000 Wei = 0.001 Ether to execute that transaction.
Remember that the gas limit represents the maximum gas the sender is willing to spend money on. If they have enough Ether in their account balance to cover this maximum, they’re good to go. The sender is refunded for any unused gas at the end of the transaction, exchanged at the original rate.
In the case that the sender does not provide the necessary gas to execute the transaction, the transaction runs “out of gas” and is considered invalid. In this case, the transaction processing aborts, and any state changes that occurred are reversed, such that we end up back at the state of Ethereum prior to the transaction. Additionally, a record of the transaction failing gets recorded, showing what transaction was attempted and where it failed. And since the machine already expended effort to run the calculations before running out of gas, logically, none of the gas is refunded to the sender.
Where exactly does this gas money go? All the money spent on gas by the sender is sent to the “beneficiary” address, which is typically the miner’s address. Since miners are expending the effort to run computations and validate transactions, miners receive the gas fee as a reward.
In the case that the sender does not provide the necessary gas to execute the transaction, the transaction runs “out of gas” and is considered invalid. In this case, the transaction processing aborts, and any state changes that occurred are reversed, such that we end up back at the state of Ethereum prior to the transaction. Additionally, a record of the transaction failing gets recorded, showing what transaction was attempted and where it failed. And since the machine already expended effort to run the calculations before running out of gas, logically, none of the gas is refunded to the sender.
There are fees for storage, too
Not only is gas used to pay for computation steps, but it is also used to pay for storage usage. The total fee for storage is proportional to the smallest multiple of 32 bytes used.
Fees for storage have some nuanced aspects. For example, since increased storage increases the size of the Ethereum state database on all nodes, there’s an incentive to keep the amount of data stored small. For this reason, if a transaction has a step that clears an entry in the storage, the fee for executing that operation of is waived, AND a refund is given for freeing up storage space.
What’s the purpose of fees?
One important aspect of the way Ethereum works is that every single operation executed by the network is simultaneously affected by every full node. However, computational steps on the Ethereum Virtual Machine are very expensive. Therefore, Ethereum smart contracts are best used for simple tasks, like running simple business logic or verifying signatures and other cryptographic objects, rather than more complex uses, like file storage, email, or machine learning, which can put a strain on the network. Imposing fees prevents users from overtaxing the network.
Ethereum is a Turing complete language. (In short, a Turing machine is a machine that can simulate any computer algorithm (for those not familiar with Turing machines, check out this and this). This allows for loops and makes Ethereum susceptible to the halting problem, a problem in which you cannot determine whether or not a program will run infinitely. If there were no fees, a malicious actor could easily try to disrupt the network by executing an infinite loop within a transaction, without any repercussions. Thus, fees protect the network from deliberate attacks.
You might be thinking, “why do we also have to pay for storage?” Well, just like computation, storage on the Ethereum network is a cost that the entire network has to take the burden of.
4. Transaction and messages
We noted earlier that Ethereum is a transaction-based state machine. In other words, transactions occurring between different accounts are what move the global state of Ethereum from one state to the next.
In the most basic sense, a transaction is a cryptographically signed piece of instruction that is generated by an externally owned account, serialized, and then submitted to the blockchain.
There are two types of transactions: message calls and contract creations (i.e. transactions that create new Ethereum contracts).
All transactions contain the following components, regardless of their type:
- nonce: a count of the number of transactions sent by the sender.
- gasPrice: the number of Wei that the sender is willing to pay per unit of gas required to execute the transaction.
- gasLimit: the maximum amount of gas that the sender is willing to pay for executing this transaction. This amount is set and paid upfront, before any computation is done.
- to: the address of the recipient. In a contract-creating transaction, the contract account address does not yet exist, and so an empty value is used.
- value: the amount of Wei to be transferred from the sender to the recipient. In a contract-creating transaction, this value serves as the starting balance within the newly created contract account.
- v, r, s: used to generate the signature that identifies the sender of the transaction.
- init (only exists for contract-creating transactions): An EVM code fragment that is used to initialize the new contract account. init is run only once, and then is discarded. When init is first run, it returns the body of the account code, which is the piece of code that is permanently associated with the contract account.
- data (optional field that only exists for message calls): the input data (i.e. parameters) of the message call. For example, if a smart contract serves as a domain registration service, a call to that contract might expect input fields such as the domain and IP address.
We learned in the “Accounts” section that transactions — both message calls and contract-creating transactions — are always initiated by externally owned accounts and submitted to the blockchain. Another way to think about it is that transactions are what bridge the external world to the internal state of Ethereum.
But this doesn’t mean that contracts can’t talk to other contracts. Contracts that exist within the global scope of Ethereum’s state can talk to other contracts within that same scope. The way they do this is via “messages” or “internal transactions” to other contracts. We can think of messages or internal transactions as being similar to transactions, with the major difference that they are NOT generated by externally owned accounts. Instead, they are generated by contracts. They are virtual objects that, unlike transactions, are not serialized and only exist in the Ethereum execution environment.
When one contract sends an internal transaction to another contract, the associated code that exists on the recipient contract account is executed.
One important thing to note is that internal transactions or messages don’t contain a gasLimit. This is because the gas limit is determined by the external creator of the original transaction (i.e. some externally owned account). The gas limit that the externally owned account sets must be high enough to carry out the transaction, including any sub-executions that occur as a result of that transaction, such as contract-to-contract messages. If, in the chain of transactions and messages, a particular message execution runs out of gas, then that message’s execution will revert, along with any subsequent messages triggered by the execution. However, the parent execution does not need to revert.
Blocks
All transactions are grouped together into “blocks.” A blockchain contains a series of such blocks that are chained together.
In Ethereum, a block consists of:
- the block header
- information about the set of transactions included in that block
- a set of other block headers for the current block’s ommers.
5. Ommers explained
What the heck is an “ommer?” An ommer is a block whose parent is equal to the current block’s parent’s parent. Let’s take a quick dive into what ommers are used for and why a block contains the block headers for ommers.
Because of the way Ethereum is built, block times are much lower (~15 seconds) than those of other blockchains, like Bitcoin (~10 minutes). This enables faster transaction processing. However, one of the downsides of shorter block times is that more competing block solutions are found by miners. These competing blocks are also referred to as “orphaned blocks” (i.e. mined blocks do not make it into the main chain).
The purpose of ommers is to help reward miners for including these orphaned blocks. The ommers that miners include must be “valid,” meaning within the sixth generation or smaller of the present block. After six children, stale orphaned blocks can no longer be referenced (because including older transactions would complicate things a bit).
Ommer blocks receive a smaller reward than a full block. Nonetheless, there’s still some incentive for miners to include these orphaned blocks and reap a reward.
Block header
Let’s get back to blocks for a moment. We mentioned previously that every block has a block “header,” but what exactly is this?
A block header is a portion of the block consisting of:
- parentHash: a hash of the parent block’s header (this is what makes the block set a “chain”)
- ommersHash: a hash of the current block’s list of ommers
- beneficiary: the account address that receives the fees for mining this block
- stateRoot: the hash of the root node of the state trie (recall how we learned that the state trie is stored in the header and makes it easy for light clients to verify anything about the state)
- transactionsRoot: the hash of the root node of the trie that contains all transactions listed in this block
- receiptsRoot: the hash of the root node of the trie that contains the receipts of all transactions listed in this block
- logsBloom: a Bloom filter (data structure) that consists of log information
- difficulty: the difficulty level of this block
- number: the count of current block (the genesis block has a block number of zero; the block number increases by 1 for each each subsequent block)
- gasLimit: the current gas limit per block
- gasUsed: the sum of the total gas used by transactions in this block
- timestamp: the unix timestamp of this block’s inception
- extraData: extra data related to this block
- mixHash: a hash that, when combined with the nonce, proves that this block has carried out enough computation
- nonce: a hash that, when combined with the mixHash, proves that this block has carried out enough computation
Notice how every block header contains three trie structures for:
- state (stateRoot)
- transactions (transactionsRoot)
- receipts (receiptsRoot)
These trie structures are nothing but the Merkle Patricia tries we discussed earlier.
Additionally, there are a few terms from the above description that are worth clarifying. Let’s take a look.
Logs
Ethereum allows for logs to make it possible to track various transactions and messages. A contract can explicitly generate a log by defining “events” that it wants to log.
A log entry contains:
- the logger’s account address,
- a series of topics that represent various events carried out by this transaction, and
- any data associated with these events.
Logs are stored in a bloom filter, which stores the endless log data in an efficient manner.
Transaction receipt
Logs stored in the header come from the log information contained in the transaction receipt. Just as you receive a receipt when you buy something at a store, Ethereum generates a receipt for every transaction. As you’d expect, each receipt contains certain information about the transaction. This receipt includes items like:
- the block number
- block hash
- transaction hash
- gas used by the current transaction
- cumulative gas used in the current block after the current transaction has been executed
- logs created when executing the current transaction
- ..and so on
Block difficulty
The “difficulty” of a block is used to enforce consistency in the time it takes to validate blocks. The genesis block has a difficulty of 131,072, and a special formula is used to calculate the difficulty of every block thereafter. If a certain block is validated more quickly than the previous block, the Ethereum protocol increases that block’s difficulty.
The difficulty of the block affects the nonce, which is a hash that must be calculated when mining a block, using the proof-of-work algorithm.
The relationship between the block’s difficulty and nonce is mathematically formalized as:
The only way to find a nonce that meets a difficulty threshold is to use the proof-of-work algorithm to enumerate all of the possibilities. The expected time to find a solution is proportional to the difficulty — the higher the difficulty, the harder it becomes to find the nonce, and so the harder it is to validate the block, which in turn increases the time it takes to validate a new block. So, by adjusting the difficulty of a block, the protocol can adjust how long it takes to validate a block.