Part II · The Classic Era Chapter 7

Multisig: The m-of-n Threshold

"What is needed is an electronic payment system based on cryptographic proof instead of trust."—Satoshi Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System

Trust is a single point of failure. A company whose treasury depends on one private key is one stolen laptop, one disgruntled employee, one compromised backup away from losing everything. A family's inheritance locked behind one key dies with its holder. An escrow arrangement between two parties who do not trust each other requires a third—but that third must hold power without holding funds.

Bitcoin solves this with multisig: a script that requires \(m\) valid signatures from a designated set of \(n\) public keys. A 2-of-3 multisig means that any two of three keyholders can authorize spending; losing one key does not lose the funds, and compromising one key does not compromise the treasury. The construction is simple, the security improvement is profound, and the mechanism—OP_CHECKMULTISIG—is one of the most consequential opcodes in Bitcoin Script.

In Chapter 6, we saw a 2-of-3 multisig hiding inside a P2SH wrapper. This chapter pulls the curtain back. We examine the opcode itself, its infamous off-by-one bug, the bare multisig format that preceded P2SH, the key-ordering rules that catch the unwary, and the \(m\)-of-\(n\) combinations that make distributed custody possible.

7.1The OP_CHECKMULTISIG Opcode

OP_CHECKMULTISIG (opcode 0xae) is the engine that powers every multisig transaction—bare or wrapped. It takes a structured set of arguments from the stack and verifies that at least \(m\) of \(n\) provided public keys have valid corresponding signatures.

7.1.1Parameter Format

The opcode expects the stack to be arranged as follows (top at right):

OP_CHECKMULTISIG Stack Layout (top → right)
PositionItemDescription
BottomOP_0Dummy byte (off-by-one bug)
sig_1sig_mm signatures
mSignature threshold
pk_1pk_nn public keys
TopnKey count

The opcode pops these values in reverse order:

  1. Pop n (the number of public keys).
  2. Pop n public keys.
  3. Pop m (the required signature count).
  4. Pop m signatures.
  5. Pop one additional item (the dummy byte—see §7.2).

If at least \(m\) of the \(n\) keys have valid matching signatures, the opcode pushes TRUE (1); otherwise it pushes FALSE (0).

7.1.2The Verification Algorithm

OP_CHECKMULTISIG does not try every possible pairing of signatures to keys. Instead, it performs a linear scan: it walks through the keys in order, attempting to match each remaining signature against the current key.

Linear Scan Algorithm
StepAction
1Set key pointer \(\leftarrow\) first key, sig pointer \(\leftarrow\) first signature
2Check: does the current signature match the current key?
3aIf yes: advance both pointers (signature matched)
3bIf no: advance key pointer only (try next key)
4If all \(m\) signatures matched: push TRUE
5If remaining keys < remaining signatures: push FALSE

This algorithm has a critical consequence: signatures must appear in the same order as their corresponding keys. If key 1 appears before key 2 in the script, then sig 1 must appear before sig 2 in the scriptSig. The scan never backtracks—once a key is skipped, it cannot be revisited. We explore this constraint in detail in §7.5.

7.2The Off-By-One Bug

The most famous bug in Bitcoin Script is hiding in plain sight. OP_CHECKMULTISIG pops \(n + m + 3\) items from the stack: \(n\) keys, \(m\) signatures, the two count values, and one extra item that is never used for anything.

Satoshi's Off-By-One

In the original Bitcoin implementation, OP_CHECKMULTISIG contains a loop counter error that causes it to pop one more stack item than it should. This extra item—the "dummy byte"—is consumed and discarded. Every multisig scriptSig must include this dummy value as its first element, or the script will fail when the opcode tries to pop from an empty stack.

Here is the actual code from Bitcoin Core's script/interpreter.cpp (EvalScript, the OP_CHECKMULTISIG case):

// src/script/interpreter.cpp — OP_CHECKMULTISIG

int i = 1;  BUG: should be 0

// Pop n (number of public keys)
int nKeysCount = CScriptNum(stacktop(-i), fRequireMinimal).getint();
int ikey = ++i;           // i = 2  (would be 1 if initialized correctly)
i += nKeysCount;           // i = 2 + n  (should be 1 + n)

// Pop m (number of signatures)
int nSigsCount = CScriptNum(stacktop(-i), fRequireMinimal).getint();
int isig = ++i;            // i = 3 + n  (should be 2 + n)
i += nSigsCount;           // i = 3 + n + m  (should be 2 + n + m)

// Verify signatures against keys (linear scan)...

// Clean up: pop everything we consumed
while (i-- > 1)           CONSEQUENCE: pops n+m+2 items instead of n+m+1
    popstack(stack);

// The extra item that was consumed — the "dummy byte"
if ((flags & SCRIPT_VERIFY_NULLDUMMY) && stacktop(-1).size())
    return set_error(serror, SCRIPT_ERR_SIG_NULLDUMMY);
popstack(stack);           BIP 147: dummy must be OP_0
The bug and its consequence
The fix (BIP 147, deployed 2017)

Because i starts at 1 instead of 0, every stacktop(-i) call indexes one position deeper than necessary. By the time the cleanup loop runs, i has accumulated to 3 + n + m instead of 2 + n + m, so one extra stack item is consumed. That extra item is the dummy byte — OP_0 — that every multisig scriptSig must prepend. Since this bug is in consensus code, fixing it would be a hard fork — every node would need to agree simultaneously that the old behavior is invalid. Instead, the Bitcoin community adopted the bug as part of the protocol.

7.2.1BIP 147: NULLDUMMY

For years, the dummy byte could be any value—miners and nodes accepted anything as the throwaway item. This created a malleability vector: a third party could change the dummy byte in a transaction's scriptSig without invalidating the signature, altering the txid.

BIP 147 (deployed with SegWit activation in August 2017) closed this hole. It requires the dummy element to be exactly OP_0 (an empty byte array). Any other value causes the transaction to be rejected. In our Chapter 6 specimen, the scriptSig begins with 00—that single byte is Satoshi's off-by-one bug, permanently enshrined in consensus.

7.3Bare Multisig

Before P2SH existed (pre-April 2012), the only way to create a multisig output was to place the entire script directly in the scriptPubKey. This is bare multisig—the unadorned form.

7.3.1The Bare Multisig scriptPubKey

Using the three compressed public keys from our Chapter 6 specimen, a bare 2-of-3 multisig scriptPubKey would be:

Bare 2-of-3 Multisig scriptPubKey (105 bytes)
HexOpcodeMeaning
52OP_2\(m = 2\) signatures required
21OP_PUSHBYTES_33Push 33-byte compressed key
02ca…0de7Public key 133 bytes
21OP_PUSHBYTES_33Push 33-byte compressed key
03e5…495fPublic key 233 bytes
21OP_PUSHBYTES_33Push 33-byte compressed key
03ee… ec21Public key 333 bytes
53OP_3\(n = 3\) keys total
aeOP_CHECKMULTISIGVerify \(m\)-of-\(n\)

This is identical to the redeem script from Chapter 6—because the redeem script is a bare multisig script. The difference is where it lives: in bare multisig, these 105 bytes sit in every UTXO that references this output, bloating the UTXO set. In P2SH, only a 23-byte hash occupies the UTXO, and the full script appears only once—at spending time.

7.3.2The Bare Multisig scriptSig

The scriptSig for a bare multisig is simpler than its P2SH counterpart because it does not need to include the redeem script:

Bare 2-of-3 Multisig scriptSig (146 bytes)
HexElementSize
00OP_0 (dummy byte)1 byte
47OP_PUSHBYTES_71Push 71 bytes
3044…3201DER signature 1 + SIGHASH_ALL71 bytes
48OP_PUSHBYTES_72Push 72 bytes
3045…4f01DER signature 2 + SIGHASH_ALL72 bytes

Total: \(1 + 1 + 71 + 1 + 72 = 146\) bytes. Compare this to the P2SH scriptSig (253 bytes): the difference is exactly the redeem script and its push opcode (\(2 + 105 = 107\) bytes).

BIP 11: M-of-N Standard Transactions

BIP 11, authored by Gavin Andresen in October 2011, standardized bare multisig transactions. Before BIP 11, multisig scripts existed in Bitcoin's opcode set but were classified as "non-standard" by default—most nodes would not relay them. BIP 11 made outputs matching the pattern OP_m <keys> OP_n OP_CHECKMULTISIG standard for \(n \leq 3\). This limit of three keys was deliberate: it constrained the maximum scriptPubKey size for bare multisig to 105 bytes (with compressed keys). For larger \(n\), P2SH became the required wrapper.

7.4Stack Execution: Bare 2-of-3

Let us trace the full execution of a bare 2-of-3 multisig, using the same keys and signatures from Chapter 6. In bare multisig, there is no two-phase evaluation—the scriptSig and scriptPubKey concatenate and execute as a single script.

Stack Execution — Bare 2-of-3 Multisig
StepOperationActionStack (top \(\to\) right)
1OP_0Push dummy byte[0]
2Push 71 bytesPush signature 1[0, s1]
3Push 72 bytesPush signature 2[0, s1, s2]
4OP_2Push \(m = 2\)[0, s1, s2, 2]
5Push 33 bytesPush public key 1[…, 2, pk1]
6Push 33 bytesPush public key 2[…, pk1, pk2]
7Push 33 bytesPush public key 3[…, pk2, pk3]
8OP_3Push \(n = 3\)[…, pk3, 3]
9OP_CHECKMULTISIGVerify 2-of-3[TRUE]

At step 9, OP_CHECKMULTISIG pops \(3 + 2 + 3 = 8\) items: 3 (key count), three public keys, 2 (sig count), two signatures, and the dummy 0. It runs the linear scan:

  1. Compare s1 against pk1. Match? Yes. Advance both pointers.
  2. Compare s2 against pk2. Match? Yes. Advance both pointers.
  3. All \(m = 2\) signatures matched. Push TRUE.

The script succeeds. The transaction is valid.

Bare vs P2SH: Same Engine, Different Packaging

Compare this nine-step trace with Chapter 6's two-phase evaluation. The same OP_CHECKMULTISIG call happens in both cases—same keys, same signatures, same linear scan. The only difference is where the script lives: in bare multisig, it is in the scriptPubKey from the moment the output is created; in P2SH, it is hidden until spending. P2SH adds the hash-check overhead (Phase 1) but removes 82 bytes from the UTXO set.

🔧 Interactive: Script VM — Bare Multisig

7.5Key Ordering: The Linear Scan Constraint

The linear scan algorithm (§7.1) imposes a strict rule: signatures must be provided in the same order as their corresponding public keys appear in the script. The scan pointer moves only forward through the key list—it never backtracks.

Order Matters

If a 2-of-3 script lists keys in order [A, B, C], then valid signature combinations are:

But sig_B, sig_A is invalid—even though both signatures are genuine—because the scanner matches sig_B to key B, then cannot backtrack to match sig_A to key A (which was already passed).

🔧 Interactive: Linear Scan Visualizer

7.5.1Why Not Check All Permutations?

A brute-force approach would try every possible pairing of \(m\) signatures to \(n\) keys. For a 2-of-3, that is only \(\binom{3}{2} = 3\) combinations—trivial. But for a 15-of-15, checking all orderings would require \(15!\) comparisons (over \(1.3 \times 10^{12}\)), creating a denial-of-service vulnerability. The linear scan keeps the algorithm \(O(n)\), ensuring that even the maximum-size multisig validates in constant time relative to \(n\).

7.5.2Wallet Implementation

Wallets that construct multisig transactions must sort signatures to match the key order in the redeem script. The standard approach:

  1. For each available private key, produce a signature.
  2. Map each signature to its public key's position in the script.
  3. Order the signatures by key position (ascending).
  4. Prepend the dummy OP_0 byte.

BIP 67 ("Deterministic Pay-to-script-hash multi-signature addresses through public key sorting") standardized lexicographic ordering of public keys in the redeem script itself, eliminating ambiguity about which key is "first."

7.6P2SH-Wrapped Multisig

Chapter 6 presented the full anatomy of a P2SH-wrapped 2-of-3 multisig. Here we summarize the wrapping mechanism and its advantages.

7.6.1The Wrapping Process

To create a P2SH multisig output:

  1. Construct the redeem script. This is identical to the bare multisig scriptPubKey: OP_m <keys> OP_n OP_CHECKMULTISIG.
  2. Hash the redeem script. Compute Hash160 (SHA-256 followed by RIPEMD-160) of the serialized redeem script \(\to\) 20-byte script hash.
  3. Create the P2SH output. The scriptPubKey is OP_HASH160 OP_PUSHBYTES_20 <20-byte hash> OP_EQUAL (23 bytes: a9 14 + 20 bytes + 87).
  4. Derive the address. Base58Check-encode the script hash with version byte 05 \(\to\) a "3-address."

The sender sees only the 23-byte output and the 3-address. The full multisig script remains private until spending.

7.6.2Why P2SH Dominates

P2SH-wrapped multisig replaced bare multisig almost entirely after BIP 16 activation in April 2012:

PropertyBare MultisigP2SH Multisig
Max \(n\) (standard)315
scriptPubKey size (2-of-3)105 bytes23 bytes
UTXO set costHigh (full script)Low (hash only)
Privacy before spendNone (keys visible)Full (only hash visible)
Address formatNone3-prefix
Sender complexityMust know full scriptNeeds only hash

The UTXO-set savings alone would justify P2SH. But the privacy advantage is equally important: a bare multisig output reveals the number of keys, the threshold, and all public keys from the moment of creation. A P2SH output reveals nothing until it is spent.

7.7The m-of-n Landscape

Any combination where \(1 \leq m \leq n \leq 20\) is valid in consensus. In practice, standardness rules and size constraints narrow the field.

7.7.1Size Formulas

For \(n\) compressed keys (33 bytes each):

Redeem script = 1 + n × (1 + 33) + 1 + 1 = 3 + 34n bytes

scriptSig (P2SH) = 1 + m × (1 + 72) + push(redeem) + redeem eq:ssig-size

The push opcode for the redeem script depends on its size: 1 byte for scripts \(\leq 75\) bytes (OP_PUSHBYTES), 2 bytes for scripts \(76\)–\(255\) bytes (OP_PUSHDATA1), or 3 bytes for scripts \(256\)–\(520\) bytes (OP_PUSHDATA2).

7.7.2Common Configurations

\(m\)\(n\)Redeem (bytes)scriptSig (bytes, max)scriptPubKey (bytes)Use Case
113711223Single-sig (unusual)
127114623Recovery backup
227121923Lightning channels
2310525423Standard custody
3517339523Corporate treasury
5724160923Board governance
11155131,32023Maximum P2SH
15155131,61223Max threshold
The 520-Byte Limit

P2SH imposes a maximum redeem script size of 520 bytes—a consensus rule, not merely a standardness limit. With compressed keys (\(34n + 3\) bytes per the formula above), the maximum \(n\) is: \[ 34n + 3 \leq 520 \Rightarrow n \leq 15.2 \Rightarrow n_{\max} = 15 \] This is why P2SH multisig caps at 15 keys. For higher \(n\), SegWit's P2WSH (Chapter 10) raises the script size limit to 10,000 bytes.

7.7.3Signing Combinations

A \(m\)-of-\(n\) multisig has \(nm\) valid signing combinations:

ConfigurationValid Combinations
2-of-3\(\binom{3}{2} = 3\)
3-of-5\(\binom{5}{3} = 10\)
5-of-7\(\binom{7}{5} = 21\)
11-of-15\(\binom{15}{11} = 1{,}365\)

These numbers matter for key management: a 3-of-5 setup tolerates the loss of any 2 keys and the compromise of any 2 keys simultaneously—a dramatic improvement over single-signature custody.

7.8Multisig in Practice

The Rise of Multisig Custody

Multisig evolved from a theoretical capability to the backbone of institutional Bitcoin custody:

7.8.1Lightning Network: 2-of-2

Every Lightning payment channel is built on a 2-of-2 multisig funding transaction. Both parties deposit bitcoin into a shared 2-of-2 output; neither can spend unilaterally. This is the simplest non-trivial multisig, enabling trustless off-chain payments through commitment transactions that both parties sign but do not broadcast. We examine Lightning channels in detail in Chapter 18.

7.8.2The Security Model

The security of \(m\)-of-\(n\) multisig rests on two properties:

For a 2-of-3 setup: lose 1 key and still spend; compromise 1 key and still safe. For 3-of-5: lose 2 keys or have 2 compromised, and funds remain secure. The art of multisig design is choosing \(m\) and \(n\) to balance these two risks for the specific threat model.

7.9What We Learned

7.9.1Looking Ahead

Multisig gave Bitcoin distributed custody, but at a cost: the full redeem script—all \(n\) public keys and the threshold—is revealed on-chain when the output is spent. In Part III, Segregated Witness reduces this cost through the witness discount (Chapter 8) and P2WSH extends the script size limit far beyond P2SH's 520 bytes (Chapter 10). And in Part IV, Taproot's MuSig2 protocol (Chapter 12) achieves a remarkable feat: \(n\)-of-\(n\) multisig that looks, on-chain, exactly like a single-signature spend—32 bytes, one Schnorr signature, no keys enumerated, no threshold revealed.

*Exercises

Litmus (L)

  1. What opcode is 0xae?
  2. How many items does OP_CHECKMULTISIG pop from the stack for a 2-of-3 multisig?
  3. Why must the scriptSig begin with OP_0?
  4. What is the maximum number of keys in a P2SH multisig?
  5. In a 3-of-5 multisig, how many keys can be lost without losing funds?

Hands-On (H)

  1. Compute the redeem script size for a 7-of-11 multisig with compressed keys. Is this valid under P2SH's 520-byte limit?
  2. Using the redeem script size formula from §7.7.1, find the largest \(n\) for which an \(n\)-of-\(n\) P2SH multisig is standard.
  3. A 2-of-3 script lists keys [A, B, C]. List all valid signature orderings in the scriptSig. Which orderings are invalid, and why?
  4. Compute the maximum scriptSig size for a 7-of-11 P2SH multisig.

Proofs and Reasoning (P)

  1. Prove that the linear scan algorithm is correct: if \(m\) valid signatures are present in the correct order, the algorithm always accepts.
  2. Explain why the off-by-one bug cannot be fixed without a hard fork. What would happen if a new node enforced the "correct" behavior?
  3. Why does bare multisig impose a lower maximum \(n\) than P2SH multisig, even though both use the same opcode?

Connections (C)

  1. Lightning funding transactions use 2-of-2 multisig. Why is \(m = n\) appropriate for a payment channel, even though it offers zero fault tolerance?
  2. Compare the on-chain footprint of a P2SH 3-of-5 multisig spend with a Taproot (P2TR) key-path spend. How many bytes does Taproot save?

Bridge (B)

  1. P2WSH (Chapter 10) raises the script size limit from 520 to 10,000 bytes. What is the maximum \(n\) for an \(n\)-of-\(n\) P2WSH multisig with compressed keys?
  2. Schnorr-based MuSig2 (Chapter 12) aggregates \(n\) public keys into a single key. Explain why this eliminates the on-chain cost of enumerating all \(n\) keys.

*Solutions

L1. OP_CHECKMULTISIG.

L2. \(n + m + 3 = 3 + 2 + 3 = 8\) items: three public keys, two signatures, two count values (\(m\) and \(n\)), and the dummy byte.

L3. Because of Satoshi's off-by-one bug. OP_CHECKMULTISIG pops one extra item from the stack beyond what is logically needed. The OP_0 dummy byte is consumed by this bug. BIP 147 requires the dummy to be exactly OP_0 (empty byte array); any other value is now invalid.

L4. 15. The P2SH redeem script has a 520-byte consensus limit. With compressed keys: \(34n + 3 \leq 520 \Rightarrow n \leq 15\).

L5. \(n - m = 5 - 3 = 2\) keys can be lost.

H1. Redeem script size for 7-of-11: \(3 + 34 \times 11 = 3 + 374 = 377\) bytes. Since \(377 \leq 520\), this is valid under P2SH.

H2. \(34n + 3 \leq 520 \Rightarrow 34n \leq 517 \Rightarrow n \leq 15.2\). So \(n_{\max} = 15\). A 15-of-15 P2SH multisig has a redeem script of \(3 + 34 \times 15 = 513\) bytes, which fits within the 520-byte limit.

H3. Keys in order [A, B, C]. Valid signature orderings in scriptSig (after dummy OP_0):

Invalid orderings (and why):

H4. 7-of-11 P2SH scriptSig (maximum):

Total: \(1 + 511 + 3 + 377 = 892\) bytes.

P1. Let keys be \(K_1, \ldots , K_n\) and signatures \(S_1, \ldots , S_m\) where \(S_j\) is valid for \(K_f(j)\) and \(f(1) < f(2) < \cdot s < f(m)\) (signatures are in key order).

The algorithm maintains a key pointer \(k\) (initially 1) and a sig pointer \(s\) (initially 1). At each step:

Since signatures are ordered, when the algorithm reaches \(S_s\), we know \(f(s) \geq k\) (because all earlier signatures matched at positions \(< f(s)\)). The key pointer will advance through \(K_k, K_k+1, \ldots\) until it reaches \(K_f(s)\), at which point \(S_s\) matches. This happens for each \(s \in \{1, \ldots, m\}\), so all \(m\) signatures match. The algorithm never runs out of keys because \(f(m) \leq n\), so there are always enough keys remaining for the remaining signatures.

P2. The off-by-one bug is in consensus code—the rules that determine whether a transaction is valid. If a new node enforced the "correct" behavior (not popping the dummy), it would reject transactions that include the dummy byte (standard multisig transactions that the rest of the network considers valid). This new node would fork off the network, rejecting blocks that contain valid-by-old-rules multisig transactions. Conversely, transactions without the dummy byte would fail on old nodes. Either way, the network splits. This is a hard fork by definition: a change that makes previously valid transactions invalid (or vice versa).

P3. Bare multisig places the full script in the scriptPubKey, which is part of the transaction output and stored in the UTXO set. Bitcoin Core's relay policy (not consensus, but standardness) limits bare multisig to \(n \leq 3\) to prevent UTXO bloat. P2SH multisig places only a 23-byte hash in the scriptPubKey; the full script appears in the scriptSig (input data, not stored in the UTXO set). The consensus limit for P2SH is the 520-byte redeem script cap (\(n \leq 15\)). The difference is not about the opcode but about where the data lives and which policy governs it.

C1. In a Lightning payment channel, both parties must cooperate to update the channel state (signing new commitment transactions). If either party becomes unresponsive, the channel can still be force-closed by broadcasting the latest commitment transaction—this is handled by the commitment/HTLC mechanism, not by multisig fault tolerance. The \(m = n = 2\) ensures that neither party can unilaterally spend the funding output without the other's consent, which is the fundamental security property of a payment channel. Fault tolerance (\(n - m = 0\)) is intentional: losing one key should not allow spending—it should trigger the on-chain dispute mechanism.

C2. P2SH 3-of-5 multisig spend:

P2TR key-path spend:

Savings: the P2TR spend reveals 64 witness bytes (discounted at \(14\) weight) vs. 395 non-witness bytes in P2SH. In weight units: P2TR key-path \(\approx 64\) WU for the signature; P2SH multisig \(\approx 395 \times 4 = 1,580\) WU for the scriptSig. Taproot saves over 95% of the signature weight.

B1. P2WSH maximum: \(34n + 3 \leq 10,000 \Rightarrow 34n \leq 9,997 \Rightarrow n \leq 294\). In consensus, the maximum \(n\) value OP_CHECKMULTISIG accepts is 20 (a separate consensus rule). So the practical maximum under P2WSH is \(n = 20\), limited by the opcode, not the script size.

B2. MuSig2 aggregates \(n\) public keys into a single 32-byte aggregate key \(X\) using a non-interactive key aggregation protocol. All \(n\) participants contribute partial signatures, which are summed into a single 64-byte Schnorr signature that is valid for \(X\). On-chain, the output contains only \(X\) (32 bytes), and the spend reveals only the aggregate signature (64 bytes). No individual public keys are enumerated, no threshold is revealed, and the transaction is indistinguishable from a single-signer spend. This eliminates the \(O(n)\) on-chain cost of traditional multisig.

← Ch. 6 Ch. 8 →