Note: This has been disclosed to the security mailing list. Filing it publicly is the agreed next step.
Scripted-diff makes large mechanical refactors reviewable. The author includes a small script, the "recipe", between -BEGIN VERIFY SCRIPT- and -END VERIFY SCRIPT- markers in the commit message. Contributors then review the recipe instead of reviewing the mechanical diff line by line. Then, commit-script-check.sh runs the recipe and checks that the result matches the committed diff. The premise is that reviewing a few lines of shell is less error-prone than reviewing a long mechanical diff.
Today, that script recipe is passed to eval as-is, with no sanitization or sandboxing.
This means the security of the mechanism depends solely on reviewers decoding the exact script program that will later run on their machines. Which is a weaker assumption than it looks.
There are two separate problems:
The script reviewers may not see the same program the shell executes.
Even when the program is visible and looks innocent, the recipe is still arbitrary shell that can be doing more than what is expected. The diff check only confirms reproducibility, not intent.
For the first case, a way to bypass the reviewer's eyes is to embed commands in Unicode characters that are not displayed in standard browsers, terminals, and editors. Zero-width characters such as U+200B, U+200C, U+200D, U+FEFF, and others are actual characters in the byte stream that are not rendered. Pairs or sequences of them can encode data inside what appears to reviewers as ordinary whitespace.
That gives an attacker a way to hide commands inside a scripted-diff block. The visible recipe can look benign, while the shell receives something else. The result is arbitrary code execution on any system that runs commit-script-check.sh against that commit.
This is the class of issues described by Boucher and Anderson in the Trojan Source paper: https://arxiv.org/abs/2111.00169. The paper focuses on source code, but the same review gap applies here because the verifier executes a raw script from the commit message.
The immediate patch one might think of is to limit the script block to printable ASCII, which closes the Unicode-based vectors. But while necessary, that is not enough on its own.
Why a printable ASCII-only fix is not sufficient
The structural problem is that eval runs whatever bytes are inside the markers, no matter how those bytes got there. There are at least two paths the ASCII filter doesn't touch.
1) ASCII has its own invisible characters
A pure-ASCII payload can be encoded in whitespace patterns. Tabs and spaces are different bytes but often look identical in editors, review tools and terminals. Trailing whitespace is also frequently invisible. A small decoder can reconstruct a payload from those patterns.
2) The payload doesn't have to be in the commit message at all
This is the XZ utils pattern (CVE-2024-3094). The malicious bytes lived in binary test fixtures, and the build machinery extracted them at build time. Reviewers were watching the build scripts, not the test fixtures, because test fixtures don't look like code.
The same pattern applies here. The recipe can stay completely clean while the payload lives in another file added by the same or other PR. Anywhere reviewers don't read carefully line by line. The recipe doesn't need to contain anything suspicious.
The main point here is that with eval, the limit is on the attacker's creativity.
Why the diff check isn't sufficient
The verifier checks that the recipe reproduces the committed diff. That is a reproducibility check, not a safety check.
The recipe and the diff are submitted by the same author. The check confirms they're consistent with each other, not that either is safe. The recipe only needs to look benign; the diff itself or what the script executes does not have to.
Even worse, reviewers rarely read the full diff. The whole premise of scripted-diff is that they focus on the recipe instead, which is exactly the dangerous reading mode for attacks that split behavior across different inputs.
What scripted-diffs actually do for us
Before suggesting anything to move on, here is what scripted-diffs are actually used for. I wrote an analyzer that walks every scripted-diff commit in the repository history and classifies the transformations based on the tools used and the outcome.
Out of 394 scripted-diff commits, about 95% fall into a small set of operations:
- 73% full-identifier or regex-anchored renames
- 14% function-call argument restructuring
- 8% literal text substitutions
- 5% line or inline deletions
- 5% copyright bumps
- 4% file moves or renames
- 4% include-path updates
- 1% namespace prefix or member renames
Note: The percentages add to more than 100% because some commits contain more than one transformation.
The commands themselves are not that many. Across all 394 commits, this is what we have:
sed: 709 invocations across 335 commitsxargs: 74 invocations across 61 commitsgit grep: 53 invocations across 38 commitsgit mv: 37 invocations across 18 commits./contrib/devtools/copyright_header.py: 19 invocations across 13 commitsgrep: 15 invocations across 14 commitsperl: 11 invocations across 10 commitsgit ls-files: 7 invocations across 7 commits
Everything else appears only a handful of times: find, git rm, git apply, git diff, git show, git archive, tar, mkdir, clang-format-diff, and bash -c.
So the current mechanism gives a lot of freedom, but the actual use cases are much narrower: mostly text rewrites and file moves.
Some real examples
Some existing scripted-diff recipes are dense enough that reviewing them requires shell expertise. For examples 0184d33b from PR #31072 and 9d1dbbd from PR #29404.
Those are merged, reviewed commits, and I am not suggesting there is anything wrong with them. They are useful examples because they show the review burden we already accept today.
The question is whether the verifier needs to give recipes full shell power for this class of change.
How other projects handle similar workflows
The Linux kernel uses Coccinelle for refactors and API migrations. The tool has no primitive for running arbitrary commands.
LLVM and Chromium use clang-rename for similar refactors. These operate on the AST through a fixed set of typed transformations, not by running shell. The user invokes a specific rename rule or a clang-tidy check, with no way to run arbitrary code.
The common pattern is a fixed set of operations: none of these projects let a contributor's refactor commit run arbitrary code on a reviewer's or CI machine.
Investigated Paths
1) Adopting clang-rename
This covers C++ identifier renames cleanly (~74% of historical scripted-diffs), but the remaining 26% (file moves, non-C++ rewrites, and so on) needs separate handling.
So clang-rename is useful, but it is not a full replacement for scripted-diff.
2) Sandboxing
We could keep the eval call but restrict what the attacker can do. Scoping it only to the repository directory, no network, no IPC. Linux has several tools for this: bubblewrap, firejail, nsjail, landlock.
This would limit data exfiltration, writes outside the repo, and other related attacks. But alone it doesn't solve the overall review problem. Sandboxing restricts what the recipe does, not which recipe runs.
So sandboxing + non-printable ASCII rejection is reasonable defense but adds platform-specific machinery that has to work on every developer's setup. Which may not be the best.
3) Restrict shell
The idea here is to keep eval, but constrain what recipes can run: a list of permitted commands, ASCII filtering, no control flow.
The issue is that many of the commands recipes use are themselves arbitrary-code-execution tools. perl -e runs arbitrary Perl, find -exec runs whatever you point it at, GNU sed's e flag runs shell, and xargs can re-launch shell via xargs sh -c '...'.
So, to make this actually work, we would have to filter at the flag level and trace indirect invocation through tools like these.
In essence, if we go down this route, we would be creating a complex typed-grammar approach, just with shell underneath.
4) Replace shell with typed primitives
We can drop shell entirely and replace it with a small set of operations with named arguments. Instead of treating the recipe as code, treat it as data. The verifier reads the recipe, validates each operation against a schema, and dispatches to our well-reviewed python implementation.
Two primitives cover most of what historical scripted-diffs do:
RENAME mode old new files: text substitution in a list of files. mode is one of word (matches whole identifiers only, like a portable \b...\b), literal (exact string match), or regex (python's regex).RENAME_FILE src dst: file move via git mv.
Example
-BEGIN VERIFY SCRIPT-
RENAME literal "enum DBErrors" "enum class DBErrors" src/wallet/walletdb.h
RENAME word DB_LOAD_OK DBErrors::LOAD_OK src/**/*.cpp src/**/*.h
RENAME word DB_CORRUPT DBErrors::CORRUPT src/**/*.cpp src/**/*.h
-END VERIFY SCRIPT-
A regex-mode rewrite, for example simplifying HexStr(buf.begin(), buf.end()) to HexStr(buf):
-BEGIN VERIFY SCRIPT-
RENAME regex 'HexStr\(([^(]+)\.begin\(\), *([^(]+)\.end\(\)\)' 'HexStr(\1)' src/**/*.cpp src/**/*.h
-END VERIFY SCRIPT-
What this would mean for developers
Developers can still iterate locally the same way they do today, with the new functions.
We can provide a terminal tool cli-scripted-diff.py that can be used in the following way:
cli-scripted-diff.py RENAME word some_name some_other_name <files>
The verifier and the CLI share the same implementation, so what runs on the contributor's machine runs identically in CI.
This has several advantages:
- The recipe is data, not code. There is no shell injection surface.
- The recipe cannot read arbitrary files or execute anything else. The XZ-style "payload in another file" pattern no longer applies.
- Reviewers read operation inputs, they do not have to learn different commands anymore.
- Cross-platform behavior. No new dependencies (python only).
Proposed path
We could go in stages or all at once:
- Add checks for printable chars only in the scripted-diff blocks + tests.
- Introduce the typed scripted-diff with the
RENAMEandRENAME_FILEoperations.
Happy to hear thoughts about this. Can open the PR at any time. The typed functions approach can be found in the following WIP branch https://github.com/furszy/bitcoin-core/commits/2026_scripted-diff/