Measure peak stack utilization #188

issue gmaxwell opened this issue on January 11, 2015

gmaxwell commented at 5:30 AM on January 11, 2015: contributor

GCC -fstack-usage can return the usage per function, now that we've eliminated the var arrays.

We should probably commit to some value as part of the interface. If we do, some thought should be given to future batch/semi-batch verification.
gmaxwell added the label documentation on Jan 11, 2015
gmaxwell assigned gmaxwell on Jan 11, 2015
gmaxwell added this to the milestone initial release on Aug 31, 2015
theStack commented at 1:56 PM on May 13, 2026: contributor

This article describes a few different techniques that can be used to measure peak stack usage: https://interrupt.memfault.com/blog/measuring-stack-usage

It would be nice to determine this at compile-time. So far I've never managed to achieve satisfying results with -fstack-usage (which only shows stack usage per single function, i.e. needs additional tooling on top for determining peak usage for nested calls; maybe the provided tools got better since I last looked into them, or I used them wrongly). I remember having used stack painting in my past and it worked reasonably well for microcontrollers, but no idea if this is practical at all on non-bare-metal systems.

Maybe something like https://github.com/bitcoin/bitcoin/pull/33079 could be done in CI? (IIRC @hebasto brought this idea up in person).
real-or-random commented at 2:08 PM on May 13, 2026: contributor

I agree that this is useful but is actual demand for it?
real-or-random added the label assurance on May 13, 2026
theStack commented at 3:40 PM on May 13, 2026: contributor

I agree that this is useful but is actual demand for it?

Fair question, probably not from users.

As a developer, I'd find it at least a nice to have. E.g. for #1765, I'm still wondering how stack utilization would compare to existing API functions from other modules and if the (supposedly significantly) higher usage could lead to problems. E.g. the batch inversion candidates for label scanning are allocated on the stack, so the chosen batch size directly influences peak stack usage: https://github.com/theStack/secp256k1/blob/dedde955a391dd726c806aa69eb2c67333207d49/src/modules/silentpayments/main_impl.h#L646-L647 (on the other hand, it's unlikely that anyone would ever want to do scanning on systems with limited stack size, so maybe in this concrete example it's not that much of a concern).
gmaxwell commented at 4:24 PM on May 13, 2026: contributor

In theory the library is currently less safe to use on embedded devices with small stacks, arguably it's less safe to use on any system with less than the 8MB stacks or whatever is the default on the systems' its developed and tested on. The problem is exacerbated in that the execution environments most likely to have a limited stack are the same ones where tools like valgrind or memcheck are hardest to use-- or where its even harder to run the ordinary tests. Places where smaller than desktop stacks exist include practically every embedded device, but also some specialized desktop cases e.g. the multimedia codecs in firefox (and I'd WAG chrome) run in threads with cut down stacks to reduce the memory usage from spinning up lots of concurrent decoders.

Testing aggressively isn't necessarily sufficient to assure that there is no memory corruption from exceeding the stack but even if it were a random update (including a security fix) could increase the stack usage without disclosure (including unintentionally), and of course memory corruption during signing could lead to signatures that functionally leak keys and during validation lead to incorrect acceptance or rejection. -- or cause security or correctness issues elsewhere in the host application.

In the C calling convention the amount of stack needed to call a function is effectively part of the interface-- if you don't have enough things will fail!-- but unfortunately it's not something normally documented. On desktops its "solved" by making the available space enormous and hoping for the best.

For libsecp256k1 the need for memory is usually pretty minimal and 100% predictable (or at least not data dependent) which should make it easy to manage. OTOH stuff like batch operations and the very prudent avoidance of (failure prone and embedded unfriendly) runtime memory allocation makes it tempting for developers to just shove tons of crap on the stack. I think it would be very easy to ship updates that break small embedded signers and never have CI or reviewers notice. May even have happened before-- since culturally embedded developers have not historically been as inclined to participate in the open, and after hitting an issue they may just throw out libsecp256k1 in favor of a less securely developed embedded only library (we we learned some did before the signing tables were made const so they could go in flash), or fix it themselves in local patches.
real-or-random commented at 5:48 PM on May 15, 2026: contributor

We should probably commit to some value as part of the interface.

How is that possible given that we ship only source? Won't the stack usage depend on the compiler?
gmaxwell commented at 11:50 PM on May 17, 2026: contributor

Other than alignments the usage should be a straightforward function of the source.

Contributors

gmaxwell

theStack

real-or-random

Labels

assurance user-documentation

Milestone
stable release (1.0.0-rc.1)

Linked (view graph)

#692 Don't put an absurd amount of data onto the stack in some configs