Using LLMs to triage vulnerability scan output: risks and guardrails

An LLM cannot tell you whether CVE-2024-XXXX is exploitable in your environment. It can pre-rank a 14,000-row Tenable export by likely business impact in 90 seconds, if you constrain it correctly.

The pattern

Treat the LLM as a *re-ranker*, not a classifier. Feed it the scanner's structured fields plus a tiny business-context block per asset, and ask it to produce a numeric priority and a one-line rationale. Never let it invent a CVSS.

Constrained promptpython

PROMPT = '''You re-rank vulnerability findings.
Inputs (JSON): {finding}
Asset context (JSON): {asset}

Return ONLY JSON: {"priority": 1-5, "rationale": "<= 25 words"}
Rules:
- Do NOT modify CVSS, CVE, or any field from the scanner.
- Priority 5 only if asset.exposure == "internet" AND finding.exploit_available == true.
- If unsure, return priority 3.
'''

Risk

If the LLM hallucinates a CVE ID into the rationale, your auditors will catch it. Strip and validate every field against the source CSV before persisting.

Audit perspective

From an audit standpoint the LLM is just another control input — the same evidence rules apply: prompt version pinned in git, model + temperature logged with each run, and a sample of decisions reviewed by a human each cycle.

Audit insight

Document the LLM step in your vulnerability-management procedure. Undocumented AI-in-the-loop is a finding waiting to happen under ISO 42001 and the EU AI Act.

#ai#llm#vulnerability-management#guardrails

Rudy Prasetiya

IT GRC, cybersecurity & audit practitioner. Writes about controls that actually hold.