RISC Zero’s Path to The First Formally Verified RISC-V zkVM | RISC Zero
RISC Zero’s Path to The First Formally Verified RISC-V zkVM
Jacob Weightman
RISC Zero’s Path to The First Formally Verified RISC-V zkVM
TLDR;
RISC Zero is advancing ZK security with the upcoming R0VM 2.0 release. Our goal: A zkVM that’s both incredibly fast and provably secure, so developers never have to compromise.
Through our collaboration with Veridise, we’re using Picus to formally verify determinism across components of our zkVM. This automated verification process targets underconstrained bugs, which research shows account for nearly 97% of ZK circuit vulnerabilities. By proving our circuits have exactly one valid output for any given input, we’re on track to become the first RISC-V zkVM with formal guarantees against the most prevalent class of ZK security flaws. Our initial work has already confirmed determinism for our entire Keccak accelerator and large parts of our new RISC-V circuit. We hope to have a full proof for the RISC-V circuit soon. This approach combines rigorous security with agile development, enabling continuous formal verification as our circuits evolve. With R0VM 2.0, ZK security isn’t just stronger, it’s provable.
The End of Underconstrained Bugs
The unfortunate reality is that building secure ZK software is hard. The fortunate news is that RISC Zero is about to get a lot more secure. We’re in the final stages of developing R0VM 2.0, which in addition to being several times faster than our previous versions, includes formal proofs of a shiny new security property for some of our circuits: determinism.
First, a little bit of background. The lion’s share of security defects in ZK software come from circuits: one of the big reasons to use a zkVM like RISC Zero’s is to make it simple to secure your application, because you can write your application in a high level language like Rust and don’t have to write your own circuits. Unfortunately, you still have to trust that we got ours right. Circuits encode a computation as a set of constraints, a bunch of equations that describe the relationship between inputs and outputs for the computation. The thing that makes writing circuits hard is that if you forget an equation, you allow extra “outputs” you didn’t mean to. These bugs are called underconstrained bugs, and they allow malicious provers to convincingly lie about the result of a computation, which breaks soundness and is very often a critical vulnerability.
Now let’s connect this with determinism. A deterministic computation is one where the outputs are fully determined by the inputs — in other words, for any data you pass in, you get a predictable output. A nondeterministic computation, on the other hand, might give any of several different results. So if we’re trying to write a circuit for a deterministic computation, like evaluating a hash function or simulating a RISC-V processor, and we can prove that our circuit is actually deterministic, then we know we don’t have any underconstrained bugs. The reason is quite simple: there are no unintended ways to run our computation, because there’s exactly one!
Thanks to a fruitful collaboration with Veridise, we are able to use their Picus tool to mathematically prove (under some relatively standard assumptions) that some of our circuits are deterministic, and therefore completely free of underconstrained bugs. This is a really big deal — some recent research suggests a whopping two out of three ZK bugs fall into this category. It also means that if a circuit is proven to be deterministic, it is proven to have zero bugs of this kind. So far, we have proven that our entire Keccak accelerator circuit is deterministic, as well as some large parts of our new RISC-V circuit circuit. We will soon follow with a proof for the whole circuit, which will be the first such proof for a RISC-V zkVM.
Show me the data!
Let’s back up those numbers, because it should seem a bit outrageous that two thirds of bugs fall into such a narrow category. This number is based on this recent work sponsored by the Ethereum Foundation and 0xPARC, representing a broad collaboration across the ZK space, which gives a useful system model for a ZK application, as well as a useful dataset for reasoning about the frequency of different kinds of bugs. After scouring audit reports, security advisories, and issue trackers for a bunch of projects in the space, that team created a corpus of real ZK bugs, and defined a taxonomy of all the vulnerabilities they encountered. At a high level, they broke down ZK application stacks into four layers:
Circuit layer — the mathematical description of “the computation to be proved,” which is processed by the frontend. For our zkVM, that computation is simulating a RISC-V processor running on a user program, which is implemented in the Zirgen DSL.
Frontend layer — the translation from the circuit description into the artifacts used by the prover and verifier. For us, this includes our executor, preflight, and the outputs of the Zirgen compiler.
Backend layer — the core proof system, including the prover and verifier logic.
Bugs in any layer can pose a serious security risk. However, the team behind this research found that bugs are not uniformly distributed across the layers. In our experience at RISC Zero, constraint bugs in the circuit are by far the most commonly discovered vulnerabilities, and the data confirm this is true across projects. The vast majority — 99 out of 141, or about 70% of bugs in the corpus — occur in the circuit layer of their respective stacks. This is a strong indicator that the circuit layer is the right place to start.
It turns out that there’s actually an even better place to focus within the circuit layer. They further broke down circuit vulnerabilities into three kinds:
Underconstrained bugs — the circuit is missing some checks, and so some witnesses of invalid computations are accepted as valid. This breaks soundness.
Overconstrained bugs — the circuit has checks that are too strong or contradictory, and so some witnesses of valid computations cannot be accepted as valid. This breaks completeness.
Computational bugs — the witness generation procedure computes bad witnesses, which causes “honest” provers to generate invalid proofs. This breaks completeness.
It turns out that almost all vulnerabilities in the circuit layer were of the first kind — 95 out of 99, or about 96% of circuit bugs in the corpus — are underconstrained bugs. Taken across all layers of the system, 95 out of 141 or more than two thirds of all of the bugs in the corpus are underconstrained circuit bugs. This category of bugs dwarfs all other kinds of security vulnerabilities in the dataset, and so if we could completely avoid any singular class of bugs in our software, this is it.
Thinking about it a bit, this seems like a relatively natural result — it’s fundamentally easier to test for completeness issues than soundness. To catch a completeness bug, you simply have to run the “honest” prover in a way that appropriately exercises the constraints and witness generation procedure. But to catch a soundness bug, you have to come up with the right dishonest prover to take advantage of the underconstrained part of the circuit. Approaches like fuzzing can help with this, but it’s a rare developer that reaches for fuzz testing in day-to-day development work, and sometimes exploiting a missing constraint requires more carefully crafted witnesses than simple fuzzing approaches can provide. Suffice it to say, it’s much harder to test well for soundness than it is for completeness, which also makes it an ideal area to apply formal methods.
Towards Perfect Constraints
So, hopefully you’re convinced that underconstrained circuit bugs pose the greatest security threat to a ZK system. How can we actually prove that we don’t have any such bugs? Veridise gave us a very convincing answer in an early meeting where we were considering selecting them for an audit of the new circuit for R0VM 2.0.0. One of their auditors and inventor of Picus, Shankara Pailoor, showed up to the meeting with two underconstrained bugs already in hand before the audit had even started, which he had found automatically by running the tool on the circuit code. Needless to say, we chose them and we’ve been collaborating on “full proofs of determinism” for the circuits ever since.
So how does Picus actually prove this? At a high level, the approach is pretty simple — it takes in a representation of the circuit’s constraints, along with information about which variables are “inputs” and which are “outputs.” Then, using a sophisticated blend of static analysis and SMT solving, Picus attempts to prove that if the inputs are deterministic then the outputs are also deterministic. Static analysis can be used to quickly prove large numbers of variables are deterministic using rules like “if a is deterministic and b is deterministic, then a + b is also deterministic.” SMT solvers, on the other hand, do an “exhaustive search” of the possible values that the variables can take on, using fancy search algorithms to quickly narrow down the search space. There is a tradeoff between these two approaches, in that static analysis is fast but tends to get stuck without finishing, whereas SMT solvers are much better at finding solutions if they exist but can be very slow. Combining the two gives us the best of both worlds, and turns out to be essential for determinism proofs on large circuits. At the end of analysis, Picus always gives one of three kinds answers: the circuit is fully deterministic, the circuit is nondeterministic and provides a counterexample, or that the proof took too long and it gave up. With some tuning and hints about the constraint description for a particular circuit, we can guide Picus towards a proof or counterexample.
The Keccak Circuit is Deterministic!
In collaboration with Veridise, we’ve been able to extract a full description of our Keccak accelerator circuit to Picus and prove that it’s deterministic, up to some relatively mild assumptions. For one, we need to define the inputs to the Keccak circuit: these are the data to be hashed (which are read in nondeterministically), and a nondeterministic flag that indicates whether there is more data to be hashed. Note that the latter assumption is harmless — if this “more data” flag is prematurely set to false, it is as if the rest of the input did not occur at all, and so the computed hash is just the hash of the truncated data. Furthermore, for all circuits we also take all “public inputs” (those shared with the verifier) to be Picus inputs: since those values are known to the verifier, they are fully determined. The outputs are simply all of the values that get written to and committed in the STARK trace. So if Picus proves the circuit is deterministic, that means there is a unique witness for those inputs.
There are a few complications here: first, the full STARK trace is made of a series of “rows,” each with the same set of constraints applied. It would be a waste to repeat the same analysis over all of the rows, and it would furthermore require us to repeat the proof for all trace lengths we end up using. Even worse, STARK traces are equivalent “up to rotation,” such that you can actually “start the computation” at some arbitrary row in the middle of the table. The solution here is to use a special CycleCounter component to distinguish a unique “cycle 0” (proven in our upcoming whitepaper). Observe that cycle 0 references no other rows of the trace, and is itself deterministic — this becomes the base case for induction. Then, for all subsequent rows, assume all previous rows in the trace back to cycle 0 are deterministic, and show that row is deterministic. By induction, we have now shown that the whole STARK trace is deterministic. The caveat here is that Picus doesn’t verify that “backs” in the circuit never reach past cycle 0 (outside of the independently verified CycleCounter), but this is straightforward to check by observing the maximum back distance is 4, and that the first 4 cycles never use backs past cycle 0.
One last thing worth calling out are our “trusted code assumptions” — with any machine-checked proof, there is always some piece of code that we assume operates without bugs. In the case of this work, we are of course dependent on the correctness of Picus and the underlying SMT solvers. We also assume the correctness of the compiler’s translation from Zirgen to Picus. Furthermore, there is a possibility that the Zirgen compiler introduces an underconstrained bug during the compilation process by making some illegal transformation. To the last point, we mitigate this by translating one of the Zirgen compiler’s intermediate representations rather than, for example, the language’s surface syntax. That means we don’t need to trust the correctness of the Zirgen compiler pipeline up to where we translate to Picus, but we do need to trust everything after. To illustrate, this diagram shows the Zirgen compiler pipeline; the red arrows represent code that must be trusted for the determinism proof to be convincing, whereas the green code can contain bugs without affecting the validity of our determinism proof.
These “trusted code assumptions” are currently the weakest link in the proof. However, it should be noted that trusting the correctness of the compiler is a requirement for building on top of any circuit DSL, and trusting the correctness of the formal methods tools is a foundational assumption for any formal methods work. We could relax these assumptions in the future by formally verifying the translation to Picus, and extracting the Picus proof into another proof checker like Lean4, but this hasn’t stopped this verification effort from contributing real value — we’ve been able to identify many real underconstrained bugs in our circuits ahead of release, and Picus makes a convincing argument that we’ve found them all. This gives us more confidence in the correctness of our circuits than we’ve ever had before. The best part? It’s blazingly fast — it takes less than 3 minutes to rerun the proof verifying over 45,000 Picus constraints generated from 87 Zirgen components, which means we can check that every future circuit and compiler change preserves this security property.
2.0 and Beyond
We have been able to complete this proof for our Keccak circuit, and work on the new rv32im is well underway. This represents a first and very significant milestone in our formal methods journey, and rules out the largest class of ZK security bugs from significant parts of our zkVM. As we continue towards full verification, we will continue to look for high impact ways to deliver meaningful security guarantees with formal methods.