Revelio: Agent Harness Is as Important as the Model for Cybersecurity

$300 in compute, and 19 potential security risks found over nine projects. Here is how we did it.

The findings, up front

We randomly selected nine heavily-fuzzed OSS-Fuzz projects and scanned them with Revelio, our end-to-end vulnerability detection agent that generates verifiable PoCs (Proof-of-Concept).

After about one hour per project and $300 in spending, Revelio uncovered:

  • 14 security-related issues (confirmed by manual validation)

  • 5 requested CVEs, confirmed by maintainers

These are not recently introduced: the vulnerabilities of two assigned CVEs were there for almost 10 years. Confirmed vulnerabilities in DNSMasq and OpenEXR can lead to heap out-of-bounds read and adjacent heap corruption. These repositories are critical: DNSMasq is used in essentially every router and embedded device, and OpenEXR is used across major video studios and render firms.

And these are just a small tip of the iceberg. Revelio found various types of vulnerabilities, including seven integer overflows, six heap buffer overflows, and multiple use-after-frees, stack overflows, out-of-bounds read/write, etc. By uncovering these issues, Revelio protects software that depends on the affected libraries, either directly or transitively.

Meet Revelio: cost-efficient vulnerability detection

We introduce Revelio, an end-to-end cost-efficient vulnerability detection agent. We build Revelio on top of several key insights:

  • Inspired by how human experts approach vulnerability discovery, we apply a hypothesis-then-confirm process systematically at file level and at scale, making the vulnerability discovery task affordable and reliable through high codebase review coverage and runtime verification.

  • Cheap tokens make exhaustive scanning (to form hypotheses about potential vulnerabilities) economically practical.

  • PoC generation and sanitizer-based validation help avoid false positives.

The workflow is as follows.

First stage, hypothesis generation. We scan each file separately with a sub-agent. This stage performs large-scale per-file scanning to generate vulnerability hypotheses, followed by triage and filtering to produce many possible vulnerabilities as a high confidence candidate list. Some might be AI slop or otherwise wrong, but our second stage gives us a way to tell which ones are real, and the exhaustive search helps us avoid missing vulnerabilities. It is effectively a brute-force static analysis across the entire codebase. Inspired by The Bitter Lesson by Richard Sutton, we lean into scale: rather than limiting the search space with handcrafted heuristics, we take advantage of cheap tokens to examine everything.

Second stage, PoC confirmation. Verification is critical. AI coding agents are not reliable, and some of the possible vulnerabilities from the first stage won’t be real. For memory corruption vulnerabilities, code sanitizers provide reliable validation. In this stage, Revelio performs iterative trial-and-error PoC generation, an input that triggers the vulnerability, and then verifies that running the program on this input triggers a code sanitizer error report. Verification is deterministic and reliable, and any test case that triggers a code sanitizer error is typically a high-risk issue, so with this verification process, we can be confident that Revelio is finding real issues. The system is guided to behave like a realistic attacker, focusing only on publicly accessible attack surfaces. While it’s relatively easy to trigger sanitizer alerts, the goal is to produce meaningful, actionable confirmations.

Using Claude Haiku 4.5 for hypothesis generation and Claude Sonnet 4.6 for PoC confirmation, Revelio cost roughly $300 to scan nine projects (~24k LLM calls), with a per-project cost ranging from $18 to $46.

Takeaway: you don’t need a secret model or complex orchestration to find real security issues. You need an effective, affordable, and reliable harness.

We next discuss how Revelio tackles the limitations of fuzzing and provides better agentic vulnerability detection.

Fuzzing is not enough

All of the projects audited by Revelio are continuously fuzzed under OSS-Fuzz, with several undergoing years of testing. Yet, Revelio still discovered 19 vulnerabilities in files that have been fuzzed.

It is not news that a project can be heavily fuzzed, while still containing high-severity bugs. Fuzzers explore the input space, guided mainly by coverage and feedback. They are powerful, but real-world codebases contain rare branches, subtle API interactions, and edge-case logic that coverage heuristics may under-prioritize, leading to missed vulnerabilities.

Agentic approaches, like Revelio, better explore hard-to-reach branches through constraints reasoning and input constructions, and further exploits subtle API interactions with project-level context understanding.

Call to action: the security race is on

The economics have flipped:

  • Attackers can now afford to scan every public codebase their target depends on. Cheap tokens mean exhaustive coverage is no longer gated by a six-figure audit budget.

  • Our experience shows that a scan that costs $300 and runs in an hour can surface several CVE-worthy bugs. Accordingly, defenders should be scanning their own code preemptively.

We built Revelio to democratize routine agentic security auditing as a complement to your existing security tools like fuzzing. Open source maintainers, you are welcome to request a Revelio scan of your repository. Findings are disclosed privately through your preferred channel.

Sign up for Revelio:


Meet our team: Yiwei Hou, Hao Wang, Muxi Lyu, Marius Momeu, Dawn Song, Koushik Sen, David Wagner.

Eric Nguyen and Taige Yang also contributed.