We Pentested a Complex API Application with AigentX AI Agent - Then Verified the Results with Source Code Review

Komodo Research
23 hours ago
6 min read

How AigentX found 24 real vulnerabilities, confirmed what wasn't exploitable, and discovered risks that code review alone could never catch.

📄 Full Reports: AigentX Pentest Report · Claude Source Code Review

Why crAPI?

When we set out to benchmark AigentX in a real-world application security scenario, we didn't want an easy target. We chose crAPI - the Completely Ridiculous API - a deliberately vulnerable application created by OWASP to simulate the kind of complex, multi-service, business-logic-heavy environment that real-world penetration testers encounter.

crAPI isn't a simple toy app with obvious SQL injection and weak passwords. It's a multi-microservice platform built on Spring Boot, Go, and Python/Django, backed by both PostgreSQL and MongoDB, running behind a reverse proxy on AWS EC2. It has a vehicle management system, a community forum, a mechanic service, a shop with payment logic, a file upload pipeline with FFmpeg processing, and a JWT-based authentication chain spanning all three services. In other words: it's complex by design.

We chose it precisely because it would expose any weakness in AigentX. If the agent struggled with business logic, mixed tech stacks, or realistic authentication flows, this would be the test that revealed it.

What AigentX Did

auth-and-jwt — Authentication flaws and JWT manipulation
bola-and-access-control — Broken Object Level Authorization across vehicle, order, and report endpoints
injection-and-ssrf — SQL/NoSQL injection and server-side request forgery
business-logic-upload-and-coupon — FFmpeg injection, file upload abuse, coupon manipulation, and shop logic
api-bola-and-mass-assignment — Broken function-level authorization and mass assignment
automated-disclosure-headers — Information disclosure and security header analysis

Phase 4 independently validated and exploited every candidate finding, demonstrated end-to-end attack chains, and filtered out anything that didn't hold up under exploitation. Every confirmed finding came with full HTTP request/response evidence. The result: 24 confirmed vulnerabilities — 14 High, 6 Medium, 4 Low — plus 11 confirmed-clean tests where AigentX actively tested a risk category, found no exploitable vulnerability, and documented the evidence. Zero false positives.

The Standout Findings

Authentication Collapse: The JWT Trilogy

The most severe finding wasn't a single vulnerability — it was a broken authentication chain with three distinct failure modes, any one of which was sufficient for complete authentication bypass.

Finding 1: JWT Signature Validation Completely Absent

The identity service accepted tokens regardless of their signature. A forged token with alg:none, an empty signature, and sub: admin@example.com returned HTTP 200 with the admin's full profile, credit balance, and ROLE_ADMIN assignment. Tokens with garbage signatures and tokens expired since the year 2001 were equally accepted.

Finding 2: OTP Bypass — Cross-User Account Takeover

The password reset endpoint accepted OTP 000000 - and in fact any 6-digit value - for any account, without a prior forgot-password request, without rate limiting, and without lockout. AigentX confirmed this by resetting a target account's password using a different user's token, then logging in with the new credentials. All 10 repeating-digit OTPs succeeded.

Finding 3: Tokens Not Revoked After Password Change

Tokens issued before a password change remained fully valid afterward. Combined with a non-functional logout endpoint (HTTP 404), there was no mechanism for a victim to invalidate a compromised session under any circumstances.

These three findings combine into a complete account takeover chain: forge a token or bypass OTP to access any account, and the original token continues working even after the victim changes their password.

The Infrastructure Pivot: SSRF + Exposed Credentials

The mechanic service included an SSRF vulnerability where a user-controlled URL was fetched server-side with no allowlist or network restriction. AigentX chained this with a second finding: the application's .env file was publicly accessible without authentication, exposing PostgreSQL and MongoDB credentials in plaintext. The combined attack chain: directly retrieve database credentials via GET /.env, confirm SSRF exfiltration by fetching the same file through the mechanic endpoint, probe the internal Docker network to discover internal services, and probe the AWS Instance Metadata Service. Each step was independently validated with HTTP evidence.

Business Logic: The Shop That Paid You to Shop

Two business logic findings hit the shop's payment model from different angles. First, any authenticated user could create a coupon with an arbitrary monetary value - an endpoint intended for administrators had no role check at all. Creating a coupon for $10,000 and redeeming it worked without restriction. Second, placing a shop order with a negative quantity caused the server to add credit rather than deduct it. One request with quantity: -1000 on a $10 item credited $10,000 to the user's account. No rate limiting, no input validation, no detection mechanism.

The Validation Layer: Why We Brought in Source Code Review

The results looked strong. But we wanted to be rigorous, not just optimistic. A runtime penetration test operates without full visibility into the codebase. We ran Claude on the crAPI source code — a static code review with no runtime access and no knowledge of what AigentX had already found. The code review surfaced 12 findings, rated 5 Critical and 6 High. Then we compared.

The Comparison: What It Revealed

Category 1: Full Confirmation

Every finding that the code review identified as exploitable was confirmed by AigentX at runtime. The JWT authentication failures, the SSRF vulnerability, the BOLA on vehicle location and order endpoints, the business logic flaws in the coupon and shop systems - all present, all exploited, all evidenced with HTTP proof. No gaps.

Category 2: Code Present, Not Exploitable — AigentX Got It Right

Two findings from the static code review did not survive runtime testing. SQL Injection in coupon redemption: the source code genuinely concatenates SQL strings rather than using parameterized queries. A code reviewer looking at that code would correctly flag it as a SQL injection risk. AigentX tested it and found no exploitable vulnerability — parameterized queries appear to be in use elsewhere, and the runtime behavior did not produce SQL errors or data leakage. Path Traversal in report download: the source code contains a path traversal vector, but Spring Boot's router normalizes path traversal sequences before they reach the filesystem. AigentX correctly classified both as "not confirmed" rather than inflating the finding count. That distinction matters enormously: a noisy tool that reports every risky code pattern as exploitable creates alert fatigue and erodes trust.

Category 3: Runtime-Only Findings — What Code Review Couldn't See

This is where AigentX's value as a runtime agent becomes clearest. The source code review had no way to discover these findings — they require actually interacting with the running system:

The .env file exposed on the public web root — database credentials for both PostgreSQL and MongoDB, accessible without authentication.
Internal Docker network topology confirmed reachable via SSRF.
AWS Instance Metadata Service reachable from the application server.
OTP bypass — the password reset logic appeared sound in code; the bypass was a runtime behavior requiring active exploitation to discover.
Negative quantity order grants unlimited credit — pure runtime business logic invisible to static analysis.
Stored XSS via SVG upload — accepted, stored verbatim, and returned in the dashboard response.
Unrestricted file upload — PHP webshells, double-extension files, and SVG payloads all accepted and stored.
Token revocation absent — only discoverable by actually changing a password and testing whether the previous token still works.
CORS misconfiguration — Access-Control-Allow-Origin: * on authenticated endpoints.
Missing HTTP security headers — the frontend served no security headers whatsoever, and the server version was disclosed in both response headers and 404 error pages.

What This Means

The comparison between AigentX's runtime assessment and the source code review produced a clear and defensible conclusion. AigentX found everything that matters — every exploitable vulnerability identified through source code review was independently confirmed at runtime, with full HTTP proof. Nothing was missed. AigentX distinguished exploitable from non-exploitable — where source code analysis flagged risk, AigentX tested it, and where the runtime environment mitigated the risk, AigentX documented that accurately rather than reporting it as a finding. And AigentX found what code review cannot — deployment misconfigurations, runtime behavior, infrastructure exposure, and business logic flaws are invisible to static analysis. A penetration test that doesn't actually interact with the running system will miss them every time.

For AppSec teams evaluating AI-assisted pentesting, the question is never just "how many findings did it produce?" The harder questions are: Are those findings real? Did it miss anything that matters? Does it understand what's actually exploitable? On all three counts, this assessment gave us reassuring answers.

AigentX is an AI-powered application security assessment agent developed by KomodoSec.

Unleash the power of AI to validate your web-application security with KomodoSec AigentX.

Book a Demo