Trust is a single point of failure. A company whose treasury depends on one private key is one stolen laptop, one disgruntled employee, one compromised backup away from losing everything. A family's inheritance locked behind one key dies with its holder. An escrow arrangement between two parties who do not trust each other requires a third—but that third must hold power without holding funds.
Bitcoin solves this with multisig: a script that requires \(m\) valid signatures from a designated set of \(n\) public keys. A 2-of-3 multisig means that any two of three keyholders can authorize spending; losing one key does not lose the funds, and compromising one key does not compromise the treasury. The construction is simple, the security improvement is profound, and the mechanism—OP_CHECKMULTISIG—is one of the most consequential opcodes in Bitcoin Script.
In Chapter 6, we saw a 2-of-3 multisig hiding inside a P2SH wrapper. This chapter pulls the curtain back. We examine the opcode itself, its infamous off-by-one bug, the bare multisig format that preceded P2SH, the key-ordering rules that catch the unwary, and the \(m\)-of-\(n\) combinations that make distributed custody possible.
OP_CHECKMULTISIG (opcode 0xae) is the engine that powers every multisig transaction—bare or wrapped. It takes a structured set of arguments from the stack and verifies that at least \(m\) of \(n\) provided public keys have valid corresponding signatures.
The opcode expects the stack to be arranged as follows (top at right):
| Position | Item | Description |
|---|---|---|
| Bottom | OP_0 | Dummy byte (off-by-one bug) |
sig_1 … sig_m | m signatures | |
m | Signature threshold | |
pk_1 … pk_n | n public keys | |
| Top | n | Key count |
The opcode pops these values in reverse order:
n (the number of public keys).n public keys.m (the required signature count).m signatures.If at least \(m\) of the \(n\) keys have valid matching signatures, the opcode pushes TRUE (1); otherwise it pushes FALSE (0).
OP_CHECKMULTISIG does not try every possible pairing of signatures to keys. Instead, it performs a linear scan: it walks through the keys in order, attempting to match each remaining signature against the current key.
| Step | Action |
|---|---|
| 1 | Set key pointer \(\leftarrow\) first key, sig pointer \(\leftarrow\) first signature |
| 2 | Check: does the current signature match the current key? |
| 3a | If yes: advance both pointers (signature matched) |
| 3b | If no: advance key pointer only (try next key) |
| 4 | If all \(m\) signatures matched: push TRUE |
| 5 | If remaining keys < remaining signatures: push FALSE |
This algorithm has a critical consequence: signatures must appear in the same order as their corresponding keys. If key 1 appears before key 2 in the script, then sig 1 must appear before sig 2 in the scriptSig. The scan never backtracks—once a key is skipped, it cannot be revisited. We explore this constraint in detail in §7.5.
The most famous bug in Bitcoin Script is hiding in plain sight. OP_CHECKMULTISIG pops \(n + m + 3\) items from the stack: \(n\) keys, \(m\) signatures, the two count values, and one extra item that is never used for anything.
In the original Bitcoin implementation, OP_CHECKMULTISIG contains a loop counter error that causes it to pop one more stack item than it should. This extra item—the "dummy byte"—is consumed and discarded. Every multisig scriptSig must include this dummy value as its first element, or the script will fail when the opcode tries to pop from an empty stack.
Here is the actual code from Bitcoin Core's script/interpreter.cpp (EvalScript, the OP_CHECKMULTISIG case):
// src/script/interpreter.cpp — OP_CHECKMULTISIG int i = 1; BUG: should be 0 // Pop n (number of public keys) int nKeysCount = CScriptNum(stacktop(-i), fRequireMinimal).getint(); int ikey = ++i; // i = 2 (would be 1 if initialized correctly) i += nKeysCount; // i = 2 + n (should be 1 + n) // Pop m (number of signatures) int nSigsCount = CScriptNum(stacktop(-i), fRequireMinimal).getint(); int isig = ++i; // i = 3 + n (should be 2 + n) i += nSigsCount; // i = 3 + n + m (should be 2 + n + m) // Verify signatures against keys (linear scan)... // Clean up: pop everything we consumed while (i-- > 1) CONSEQUENCE: pops n+m+2 items instead of n+m+1 popstack(stack); // The extra item that was consumed — the "dummy byte" if ((flags & SCRIPT_VERIFY_NULLDUMMY) && stacktop(-1).size()) return set_error(serror, SCRIPT_ERR_SIG_NULLDUMMY); popstack(stack); BIP 147: dummy must be OP_0
Because i starts at 1 instead of 0, every stacktop(-i) call indexes one position deeper than necessary. By the time the cleanup loop runs, i has accumulated to 3 + n + m instead of 2 + n + m, so one extra stack item is consumed. That extra item is the dummy byte — OP_0 — that every multisig scriptSig must prepend. Since this bug is in consensus code, fixing it would be a hard fork — every node would need to agree simultaneously that the old behavior is invalid. Instead, the Bitcoin community adopted the bug as part of the protocol.
For years, the dummy byte could be any value—miners and nodes accepted anything as the throwaway item. This created a malleability vector: a third party could change the dummy byte in a transaction's scriptSig without invalidating the signature, altering the txid.
BIP 147† (deployed with SegWit activation in August 2017) closed this hole. It requires the dummy element to be exactly OP_0 (an empty byte array). Any other value causes the transaction to be rejected. In our Chapter 6 specimen, the scriptSig begins with 00—that single byte is Satoshi's off-by-one bug, permanently enshrined in consensus.
Before P2SH existed (pre-April 2012), the only way to create a multisig output was to place the entire script directly in the scriptPubKey. This is bare multisig—the unadorned form.
Using the three compressed public keys from our Chapter 6 specimen, a bare 2-of-3 multisig scriptPubKey would be:
| Hex | Opcode | Meaning |
|---|---|---|
52 | OP_2 | \(m = 2\) signatures required |
21 | OP_PUSHBYTES_33 | Push 33-byte compressed key |
02ca…0de7 | Public key 1 | 33 bytes |
21 | OP_PUSHBYTES_33 | Push 33-byte compressed key |
03e5…495f | Public key 2 | 33 bytes |
21 | OP_PUSHBYTES_33 | Push 33-byte compressed key |
03ee… ec21 | Public key 3 | 33 bytes |
53 | OP_3 | \(n = 3\) keys total |
ae | OP_CHECKMULTISIG | Verify \(m\)-of-\(n\) |
This is identical to the redeem script from Chapter 6—because the redeem script is a bare multisig script. The difference is where it lives: in bare multisig, these 105 bytes sit in every UTXO that references this output, bloating the UTXO set. In P2SH, only a 23-byte hash occupies the UTXO, and the full script appears only once—at spending time.
The scriptSig for a bare multisig is simpler than its P2SH counterpart because it does not need to include the redeem script:
| Hex | Element | Size |
|---|---|---|
00 | OP_0 (dummy byte) | 1 byte |
47 | OP_PUSHBYTES_71 | Push 71 bytes |
3044…3201 | DER signature 1 + SIGHASH_ALL | 71 bytes |
48 | OP_PUSHBYTES_72 | Push 72 bytes |
3045…4f01 | DER signature 2 + SIGHASH_ALL | 72 bytes |
Total: \(1 + 1 + 71 + 1 + 72 = 146\) bytes. Compare this to the P2SH scriptSig (253 bytes): the difference is exactly the redeem script and its push opcode (\(2 + 105 = 107\) bytes).
BIP 11†, authored by Gavin Andresen in October 2011, standardized bare multisig transactions. Before BIP 11, multisig scripts existed in Bitcoin's opcode set but were classified as "non-standard" by default—most nodes would not relay them. BIP 11 made outputs matching the pattern OP_m <keys> OP_n OP_CHECKMULTISIG standard for \(n \leq 3\). This limit of three keys was deliberate: it constrained the maximum scriptPubKey size for bare multisig to 105 bytes (with compressed keys). For larger \(n\), P2SH became the required wrapper.
Let us trace the full execution of a bare 2-of-3 multisig, using the same keys and signatures from Chapter 6. In bare multisig, there is no two-phase evaluation—the scriptSig and scriptPubKey concatenate and execute as a single script.
| Step | Operation | Action | Stack (top \(\to\) right) |
|---|---|---|---|
| 1 | OP_0 | Push dummy byte | [0] |
| 2 | Push 71 bytes | Push signature 1 | [0, s1] |
| 3 | Push 72 bytes | Push signature 2 | [0, s1, s2] |
| 4 | OP_2 | Push \(m = 2\) | [0, s1, s2, 2] |
| 5 | Push 33 bytes | Push public key 1 | […, 2, pk1] |
| 6 | Push 33 bytes | Push public key 2 | […, pk1, pk2] |
| 7 | Push 33 bytes | Push public key 3 | […, pk2, pk3] |
| 8 | OP_3 | Push \(n = 3\) | […, pk3, 3] |
| 9 | OP_CHECKMULTISIG | Verify 2-of-3 | [TRUE] |
At step 9, OP_CHECKMULTISIG pops \(3 + 2 + 3 = 8\) items: 3 (key count), three public keys, 2 (sig count), two signatures, and the dummy 0. It runs the linear scan:
s1 against pk1. Match? Yes. Advance both pointers.s2 against pk2. Match? Yes. Advance both pointers.TRUE.The script succeeds. The transaction is valid.
Compare this nine-step trace with Chapter 6's two-phase evaluation. The same OP_CHECKMULTISIG call happens in both cases—same keys, same signatures, same linear scan. The only difference is where the script lives: in bare multisig, it is in the scriptPubKey from the moment the output is created; in P2SH, it is hidden until spending. P2SH adds the hash-check overhead (Phase 1) but removes 82 bytes from the UTXO set.
The linear scan algorithm (§7.1) imposes a strict rule: signatures must be provided in the same order as their corresponding public keys appear in the script. The scan pointer moves only forward through the key list—it never backtracks.
If a 2-of-3 script lists keys in order [A, B, C], then valid signature combinations are:
But sig_B, sig_A is invalid—even though both signatures are genuine—because the scanner matches sig_B to key B, then cannot backtrack to match sig_A to key A (which was already passed).
A brute-force approach would try every possible pairing of \(m\) signatures to \(n\) keys. For a 2-of-3, that is only \(\binom{3}{2} = 3\) combinations—trivial. But for a 15-of-15, checking all orderings would require \(15!\) comparisons (over \(1.3 \times 10^{12}\)), creating a denial-of-service vulnerability. The linear scan keeps the algorithm \(O(n)\), ensuring that even the maximum-size multisig validates in constant time relative to \(n\).
Wallets that construct multisig transactions must sort signatures to match the key order in the redeem script. The standard approach:
OP_0 byte.BIP 67† ("Deterministic Pay-to-script-hash multi-signature addresses through public key sorting") standardized lexicographic ordering of public keys in the redeem script itself, eliminating ambiguity about which key is "first."
Chapter 6 presented the full anatomy of a P2SH-wrapped 2-of-3 multisig. Here we summarize the wrapping mechanism and its advantages.
To create a P2SH multisig output:
OP_m <keys> OP_n OP_CHECKMULTISIG.OP_HASH160 OP_PUSHBYTES_20 <20-byte hash> OP_EQUAL (23 bytes: a9 14 + 20 bytes + 87).05 \(\to\) a "3-address."The sender sees only the 23-byte output and the 3-address. The full multisig script remains private until spending.
P2SH-wrapped multisig replaced bare multisig almost entirely after BIP 16 activation in April 2012:
| Property | Bare Multisig | P2SH Multisig |
|---|---|---|
| Max \(n\) (standard) | 3 | 15 |
| scriptPubKey size (2-of-3) | 105 bytes | 23 bytes |
| UTXO set cost | High (full script) | Low (hash only) |
| Privacy before spend | None (keys visible) | Full (only hash visible) |
| Address format | None | 3-prefix |
| Sender complexity | Must know full script | Needs only hash |
The UTXO-set savings alone would justify P2SH. But the privacy advantage is equally important: a bare multisig output reveals the number of keys, the threshold, and all public keys from the moment of creation. A P2SH output reveals nothing until it is spent.
Any combination where \(1 \leq m \leq n \leq 20\) is valid in consensus. In practice, standardness rules and size constraints narrow the field.
For \(n\) compressed keys (33 bytes each):
Redeem script = 1 + n × (1 + 33) + 1 + 1 = 3 + 34n bytes
scriptSig (P2SH) = 1 + m × (1 + 72) + push(redeem) + redeem eq:ssig-size
The push opcode for the redeem script depends on its size: 1 byte for scripts \(\leq 75\) bytes (OP_PUSHBYTES), 2 bytes for scripts \(76\)–\(255\) bytes (OP_PUSHDATA1), or 3 bytes for scripts \(256\)–\(520\) bytes (OP_PUSHDATA2).
| \(m\) | \(n\) | Redeem (bytes) | scriptSig (bytes, max) | scriptPubKey (bytes) | Use Case |
|---|---|---|---|---|---|
| 1 | 1 | 37 | 112 | 23 | Single-sig (unusual) |
| 1 | 2 | 71 | 146 | 23 | Recovery backup |
| 2 | 2 | 71 | 219 | 23 | Lightning channels |
| 2 | 3 | 105 | 254 | 23 | Standard custody |
| 3 | 5 | 173 | 395 | 23 | Corporate treasury |
| 5 | 7 | 241 | 609 | 23 | Board governance |
| 11 | 15 | 513 | 1,320 | 23 | Maximum P2SH |
| 15 | 15 | 513 | 1,612 | 23 | Max threshold |
P2SH imposes a maximum redeem script size of 520 bytes—a consensus rule, not merely a standardness limit. With compressed keys (\(34n + 3\) bytes per the formula above), the maximum \(n\) is: \[ 34n + 3 \leq 520 \Rightarrow n \leq 15.2 \Rightarrow n_{\max} = 15 \] This is why P2SH multisig caps at 15 keys. For higher \(n\), SegWit's P2WSH (Chapter 10) raises the script size limit to 10,000 bytes.
A \(m\)-of-\(n\) multisig has \(nm\) valid signing combinations:
| Configuration | Valid Combinations |
|---|---|
| 2-of-3 | \(\binom{3}{2} = 3\) |
| 3-of-5 | \(\binom{5}{3} = 10\) |
| 5-of-7 | \(\binom{7}{5} = 21\) |
| 11-of-15 | \(\binom{15}{11} = 1{,}365\) |
These numbers matter for key management: a 3-of-5 setup tolerates the loss of any 2 keys and the compromise of any 2 keys simultaneously—a dramatic improvement over single-signature custody.
Multisig evolved from a theoretical capability to the backbone of institutional Bitcoin custody:
Every Lightning payment channel is built on a 2-of-2 multisig funding transaction. Both parties deposit bitcoin into a shared 2-of-2 output; neither can spend unilaterally. This is the simplest non-trivial multisig, enabling trustless off-chain payments through commitment transactions that both parties sign but do not broadcast. We examine Lightning channels in detail in Chapter 18.
The security of \(m\)-of-\(n\) multisig rests on two properties:
For a 2-of-3 setup: lose 1 key and still spend; compromise 1 key and still safe. For 3-of-5: lose 2 keys or have 2 compromised, and funds remain secure. The art of multisig design is choosing \(m\) and \(n\) to balance these two risks for the specific threat model.
OP_CHECKMULTISIG verifies \(m\)-of-\(n\) signatures using a linear scan algorithm—signatures must be ordered to match their corresponding keys.OP_0.Multisig gave Bitcoin distributed custody, but at a cost: the full redeem script—all \(n\) public keys and the threshold—is revealed on-chain when the output is spent. In Part III, Segregated Witness reduces this cost through the witness discount (Chapter 8) and P2WSH extends the script size limit far beyond P2SH's 520 bytes (Chapter 10). And in Part IV, Taproot's MuSig2 protocol (Chapter 12) achieves a remarkable feat: \(n\)-of-\(n\) multisig that looks, on-chain, exactly like a single-signature spend—32 bytes, one Schnorr signature, no keys enumerated, no threshold revealed.
*Exercises
0xae?OP_CHECKMULTISIG pop from the stack for a 2-of-3 multisig?OP_0?*Solutions
L1. OP_CHECKMULTISIG.
L2. \(n + m + 3 = 3 + 2 + 3 = 8\) items: three public keys, two signatures, two count values (\(m\) and \(n\)), and the dummy byte.
L3. Because of Satoshi's off-by-one bug. OP_CHECKMULTISIG pops one extra item from the stack beyond what is logically needed. The OP_0 dummy byte is consumed by this bug. BIP 147 requires the dummy to be exactly OP_0 (empty byte array); any other value is now invalid.
L4. 15. The P2SH redeem script has a 520-byte consensus limit. With compressed keys: \(34n + 3 \leq 520 \Rightarrow n \leq 15\).
L5. \(n - m = 5 - 3 = 2\) keys can be lost.
H1. Redeem script size for 7-of-11: \(3 + 34 \times 11 = 3 + 374 = 377\) bytes. Since \(377 \leq 520\), this is valid under P2SH.
H2. \(34n + 3 \leq 520 \Rightarrow 34n \leq 517 \Rightarrow n \leq 15.2\). So \(n_{\max} = 15\). A 15-of-15 P2SH multisig has a redeem script of \(3 + 34 \times 15 = 513\) bytes, which fits within the 520-byte limit.
H3. Keys in order [A, B, C]. Valid signature orderings in scriptSig (after dummy OP_0):
OP_0, sig_A, sig_B (A matches key 1, B matches key 2)OP_0, sig_A, sig_C (A matches key 1, C matches key 3)OP_0, sig_B, sig_C (B matches key 2, C matches key 3)Invalid orderings (and why):
OP_0, sig_B, sig_A: Scanner matches B to key 2, then tries A against key 3—fails (no backtrack).OP_0, sig_C, sig_A: Scanner matches C to key 3, then has no more keys for A—fails.OP_0, sig_C, sig_B: Scanner matches C to key 3, then has no more keys for B—fails.H4. 7-of-11 P2SH scriptSig (maximum):
Total: \(1 + 511 + 3 + 377 = 892\) bytes.
P1. Let keys be \(K_1, \ldots , K_n\) and signatures \(S_1, \ldots , S_m\) where \(S_j\) is valid for \(K_f(j)\) and \(f(1) < f(2) < \cdot s < f(m)\) (signatures are in key order).
The algorithm maintains a key pointer \(k\) (initially 1) and a sig pointer \(s\) (initially 1). At each step:
Since signatures are ordered, when the algorithm reaches \(S_s\), we know \(f(s) \geq k\) (because all earlier signatures matched at positions \(< f(s)\)). The key pointer will advance through \(K_k, K_k+1, \ldots\) until it reaches \(K_f(s)\), at which point \(S_s\) matches. This happens for each \(s \in \{1, \ldots, m\}\), so all \(m\) signatures match. The algorithm never runs out of keys because \(f(m) \leq n\), so there are always enough keys remaining for the remaining signatures.
P2. The off-by-one bug is in consensus code—the rules that determine whether a transaction is valid. If a new node enforced the "correct" behavior (not popping the dummy), it would reject transactions that include the dummy byte (standard multisig transactions that the rest of the network considers valid). This new node would fork off the network, rejecting blocks that contain valid-by-old-rules multisig transactions. Conversely, transactions without the dummy byte would fail on old nodes. Either way, the network splits. This is a hard fork by definition: a change that makes previously valid transactions invalid (or vice versa).
P3. Bare multisig places the full script in the scriptPubKey, which is part of the transaction output and stored in the UTXO set. Bitcoin Core's relay policy (not consensus, but standardness) limits bare multisig to \(n \leq 3\) to prevent UTXO bloat. P2SH multisig places only a 23-byte hash in the scriptPubKey; the full script appears in the scriptSig (input data, not stored in the UTXO set). The consensus limit for P2SH is the 520-byte redeem script cap (\(n \leq 15\)). The difference is not about the opcode but about where the data lives and which policy governs it.
C1. In a Lightning payment channel, both parties must cooperate to update the channel state (signing new commitment transactions). If either party becomes unresponsive, the channel can still be force-closed by broadcasting the latest commitment transaction—this is handled by the commitment/HTLC mechanism, not by multisig fault tolerance. The \(m = n = 2\) ensures that neither party can unilaterally spend the funding output without the other's consent, which is the fundamental security property of a payment channel. Fault tolerance (\(n - m = 0\)) is intentional: losing one key should not allow spending—it should trigger the on-chain dispute mechanism.
C2. P2SH 3-of-5 multisig spend:
P2TR key-path spend:
OP_1 + 32-byte x-only key)Savings: the P2TR spend reveals 64 witness bytes (discounted at \(14\) weight) vs. 395 non-witness bytes in P2SH. In weight units: P2TR key-path \(\approx 64\) WU for the signature; P2SH multisig \(\approx 395 \times 4 = 1,580\) WU for the scriptSig. Taproot saves over 95% of the signature weight.
B1. P2WSH maximum: \(34n + 3 \leq 10,000 \Rightarrow 34n \leq 9,997 \Rightarrow n \leq 294\). In consensus, the maximum \(n\) value OP_CHECKMULTISIG accepts is 20 (a separate consensus rule). So the practical maximum under P2WSH is \(n = 20\), limited by the opcode, not the script size.
B2. MuSig2 aggregates \(n\) public keys into a single 32-byte aggregate key \(X\) using a non-interactive key aggregation protocol. All \(n\) participants contribute partial signatures, which are summed into a single 64-byte Schnorr signature that is valid for \(X\). On-chain, the output contains only \(X\) (32 bytes), and the spend reveals only the aggregate signature (64 bytes). No individual public keys are enumerated, no threshold is revealed, and the transaction is indistinguishable from a single-signer spend. This eliminates the \(O(n)\) on-chain cost of traditional multisig.