[un]prompted II

The AI Security Practitioner Conference

Coming Back This September, San Francisco

[un]prompted is back for the second time in (or around) September, in SF. Dates, CFP, and registration to be announced in the next few days.


Whether you’re a CISO Excel jockey or a researcher sniffing for the scent of bits, we see you as part of our wider AI security practitioner community.

[un]prompted is an intimate, raw, and fun gathering for the professionals actually doing the work, from offense to threat hunting to program building to national policy. No fluff. No filler. Just sharp talks, real demos, and conversations that matter.

Material from the first [un]prompted!

YouTube videos are out. NotebookLM with all conference data can be found here.
To stay on top of when the next unprompted happens, join our Slack.
– Gadi Evron, CFP Chair, [un]prompted and CEO, Knostic.

Agenda:

Evening events:

 

Stage 1
March 4, 2026 | Full conference day

08:30 – 09:00 Gathering & Mingling
09:00 – 09:10

Opening Words

Gadi Evron, CEO, Knostic. CFP and Committee Chair, [un]prompted.
09:10 – 09:35

200 Bugs/Week/Engineer: How We Rebuilt Trail of Bits Around AI

Dan Guido, CEO, Trail of Bits
AI isn’t a feature you “adopt.” It is a force that commoditizes effort and shortens the half-life of best practices, especially in security work where trust, evidence, and privacy are non-negotiable. In this talk, I’ll explain the strategy I’m using to turn Trail of Bits into an AI-native consulting firm. The core idea is a compounding operating system built from incentives, defaults, guardrails, and verification loops that let humans and autonomous agents ship high-rigor work at dramatically higher throughput.
You’ll see the concrete artifacts that make this real: internal and external skills repositories, a curated marketplace for third-party skills, opinionated configuration baselines, and sandboxing patterns. Then I’ll cover what changes when AI output scales. Pricing, staffing, and delivery models evolve when discovery becomes abundant.
Finally, I’ll show what’s next in the full vision. It is to build a firm that compounds faster than the ecosystem changes, and to do it in a way others can copy as a playbook rather than a vendor pitch.
09:35 – 10:00

8 Minutes to Admin. We Caught It in the Wild. Welcome to VibeHacking

Sergej Epp, CISO, Sysdig
We caught two AI-assisted attack campaigns —an 8-minute AWS escalation from stolen creds to full admin, and EtherRAT, a fileless Node.js implant using Ethereum smart contracts for C2. Neither campaign introduced novel attack primitives. Both compressed known techniques to speeds and scales that break traditional detection models. This talk dissects both operations from primary forensic evidence, introduces a behavioral methodology for attributing AI-assistance when proof is impossible, and shows why blockchain C2—the attacker’s resilience play—is actually the defender’s greatest forensic gift.
10:00 – 10:25

macOS Vulnerability Research: Augmenting Apple’s Source Code and OS Logs with AI Agents

Olivia Gallucci, Security Engineer, Datadog
Have you ever wondered how macOS and iOS work under the hood? While Apple is known for its closed ecosystem, did you know that significant portions of macOS and iOS are open source, including security components? For researchers, learning how to analyze and exploit this open-source code, especially with the help of AI, is a game-changer. This talk walks through how we can operationalize Apple’s partial open-source codebase for offensive security: specifically, through the lens of reverse engineering, fuzzing, and vulnerability discovery. We’ll cover how to integrate generative AI and AI tooling into a workflow for automating the triage of open-source diffs, identification of code changes with high exploit potential, and prioritization of fuzzing targets within the shared macOS/iOS codebase.
10:25 – 10:55 Coffee break
10:55 – 11:20

Promp2Pwn – LLMs Winning at Pwn2Own

Georgi G, Director of Research, Interrupt Labs
We built an agentic AI to hunt bugs for Pwn2Own and it delivered. Among the issues it found was a vulnerability in Samsung’s own AI assistant, Bixby. In this talk, we’ll show how we wired it up, what worked, what didn’t, and why letting machines hunt bugs made Pwn2Own fun again.
11:20 – 11:45

Breaking the Lethal Trifecta (Without Ruining Your Agents)

Andrew Bullen, AI Security Lead, Stripe
Prompt injection remains the elephant in the AI Security room—there’s no deterministic defense, yet the urgency driving AI adoption means many teams feel forced to either accept the risk or hobble their agents with overly restrictive policies. But there’s a third path: containment. In this talk, I’ll walk through the architectural guardrails Stripe adopted to protect our agent platform, showing how you can give agents powerful tools while ensuring minimal damage if prompt injection occurs. I’ll cover strategies for preventing data exfiltration through controlled egress, share UI patterns for human confirmation flows to balance oversight with usability, and demonstrate how to enforce these guardrails at CI-time using tool annotations.
11:45 – 12:10

Building Secure Agentic Systems: Lessons from Daily-Driver Agents

Brooks McMillin, AI Security Researcher & Security Engineer, Dropbox
No polished demos or theoretical architectures – this talk shows what actually breaks when you build agents you use every day. I’ll walk through real patterns from building specialized agents with shared infrastructure: capability bounding to prevent tool abuse, prompt injection detection that required real-world tuning, multi-agent memory isolation failures (and the fix), and OAuth device flow for headless operation. Expect live demos, actual code, and honest discussion of security decisions that worked as well as the ones I had to fix after they broke.
12:10 – 12:25

Rethinking how we evaluate security agents for real-world use

Mudita Khurana, Staff Security Engineer, Airbnb
Security agents are gaining momentum across industry, but the way we evaluate them remains rooted in narrow, outcome-only benchmarks. These evaluations tell us whether an agent produced a correct answer, but not “how” it arrived there or whether that behavior will remain stable once deployed.
In practice, security is not a sequence of isolated tasks. It is a connected, end-to-end workflow that follows a find → confirm exploit → patch → validate loop. Agents that perform well on task-specific benchmarks often fail in these multi-stage settings due to contextual loss and brittle transitions across steps.
This talk introduces a practical, capability-centric framework for evaluating security agents, that emphasizes observability into how agents plan, reason, use tools, and carry context across the security lifecycle & thus enable teams to better judge whether an agent is ready for real-world use.
12:25 – 13:30 Lunch break
13:30 – 13:55

Securing Workspace GenAI at Google Speed: Surviving the Perfect Storm

Nicolas Lidzborski, Principal Engineer, Google Workspace Security
GenAI agents are currently navigating a perilous “”Perfect Storm”” defined by the dangerous intersection of three key vulnerabilities: access to sensitive data, exposure to untrusted content, and the capability to execute external commands. This technical deep dive will unveil the architectural principles and defense strategies utilized to protect Gemini and the Google Workspace ecosystem from this toxic convergence.
Moving beyond mere hypothetical discussions, this session provides a detailed breakdown of real-world attacks, specifically, a vulnerability where an attacker could hijack an agent simply through a calendar invitation. Attendees will acquire practical insights into Google’s rigorous defense-in-depth blueprint, covering advanced prompt injection defenses, strategic chaining policies for sandboxing rogue agent actions, and thorough sanitization techniques for hazardous outputs.
13:55 – 14:20

Operation Pale Fire: How We Red-Teamed Our Own AI Agent

Wes Ring, Block
Josiah Peedikayil
The best defense is a good offense. When we released goose, Block’s open source AI agent, we recognized the need to proactively identify how attackers will attempt to abuse it. Enter: Operation Pale Fire.
14:20 – 14:45

Training BrowseSafe: Lessons from Detecting Prompt Injection in Production Browser Agents

Kyle Polley, Member of Technical Staff, Security
Perplexity
Deploying AI agents that browse the web on behalf of users creates a critical security challenge: how do we prevent malicious websites from hijacking agent behavior through embedded prompt injections? This presentation shares our experience training and deploying BrowseSafe, a defense system now protecting browser agents in production.
We’ll cover the model training pipeline, including how we built BrowseSafe-Bench—a realistic benchmark with attacks embedded in high-entropy HTML pages that mirror actual web content. Our fine-tuned Mixture-of-Experts model (Qwen-30B) achieves F1 scores of ~0.91 while maintaining sub-100ms latency requirements for production deployment. The training process revealed key insights: attacks using linguistic camouflage, multilingual instructions, and visible text placement proved most challenging to detect, while traditional academic benchmarks significantly overestimate real-world detection accuracy.
More importantly, we’ll discuss what we’ve observed in the wild since deployment. Real-world attack patterns, adversarial evolution, false positive challenges in diverse web content, and the data flywheel approach that continuously improves the model through production feedback all provide lessons for building robust security in agentic systems. This talk offers practical insights for security teams deploying AI agents that interact with untrusted web content at scale.
14:45 – 15:05 Coffee break
15:05 – 15:30

Exploring the AI Automation Boundary for Threat Hunting at Datadog

Arthi Nagarajan, Software Engineer for Internal Threat Detection, Datadog
Modern threat hunting isn’t limited by a lack of telemetry—it’s limited by humans’ ability to quickly navigate overwhelming amounts of it. At Datadog, we explored how AI can help security practitioners work across massive volumes of telemetry with diverse schemas. We automated three parts of the threat hunting workflow: hypothesis-driven query generation, iterative refinement, and narrowing toward pivotal evidence.
In this talk, we share the pitfalls and wins of our journey evolving a single agent into an orchestrator-subagent system. We focus on our learnings about trust, hallucinations, and evaluations amidst real-world constraints and tradeoffs that formed our definition of the automation boundary: Where AI accelerates defensive work, where it creates new risk, and the design decisions that establish trust with threat hunters.
15:30 – 15:55

Detection & Deception Engineering in the Matrix

Bob Rudis, V.P. Data Science, Security Research, & Detection+Deception Engineering, GreyNoise Labs
Glenn Thorpe, Sr. Director, Security Research & Detection Engineering, GreyNoise Intelligence
GreyNoise built an AI agent — Orbie — that operates on internet-scale honeypot data to surface emergent threats, identify campaigns, and write detection rules. We’re sharing what works, what doesn’t, and the specific campaigns we caught that traditional methods missed. You’ll see how domain expert knowledge embedded in tooling lets LLMs operate on billions of network sessions, and why that matters more than the model you choose.
15:55 – 17:00 Mingling & Something sweet

Stage 2
March 4, 2026 | Full conference day

Stage 2 opens at 9:10
09:10 – 09:35

Total Recon: How We Discovered 1000s of Open Agents in the Wild

Avishai Efrat, Senior Security Researcher, Zenity
Roey Ben Chaim, Staff Engineer, Zenity
AI agents quietly created a new external attack surface: copilots, custom agents, AI middleware and various deployments that ship to the internet – often without anyone realizing they are reachable, enumerable, or over-permissioned. In this talk we’ll show how attackers can already find your agents in the wild, shedding light on the technical details that enable this kind of malicious activity – including how we used these details to find 1000s of exposed agents. We’ll follow up with explaining how to measure exposure, see the proof for obscurity failing, and understand how to detect threat-actor agent-focused recon before it turns into an impactful attack. Capping it all off by dropping PowerPwn – a recon tool you can use to test your own exposure
09:35 – 10:00

Your Agent Works for Me Now

Johann Rehberger, Red Team Director
Agentic AI used in personal assistants, developer tools, and enterprise platforms can be infected with promptware, engineered prompts that act like malware.
This talk demonstrates attacks and exploit chains, including delayed tool invocation and intent activation tricks that bypass existing mitigations. Attacks enable persistence, lateral movement across agentic systems, promptware-powered C2, and data exfiltration.
Several of the exploit demos have not been publicly disclosed before, including attacks against Gemini, Copilot and others.
Many of these issues are not edge cases or unknown problems. Even where simple fixes exist, new and more powerful AI systems keep reintroducing known vulnerability classes while increasing scale and blast radius at the same time. By shipping agents with insecure defaults, responsibility is pushed onto end users.
10:00 – 10:25

Capability-Based Authorization for AI Agents: Warrants That Survive Prompt Injection

Niki Aimable Niyikiza, Senior Security Engineer & AI Security Researcher, Snap
Prompt injection filters and coarse IAM roles consistently fail in multi-agent setups.
I’ll show a working alternative: treating agent authority as ephemeral, cryptographic warrants that attenuate on delegation (inspired by Macaroons/UCAN), task-scoped, holder-bound, and verified offline by tools in microseconds. Even a fully compromised agent can’t escalate or exfiltrate beyond its bounds.
Live demos in LangChain/LangGraph multi-agent workflows, benchmarks against adaptive injection/escalation attacks, and an honest look at remaining gaps (e.g., constraints that require runtime context).
Audience Takeaways:
1) Why identity-based authorization fails for AI agents
2) How capability tokens bound blast radius without blocking legitimate use
3) Practical patterns for delegation in multi-agent systems”
10:25 – 10:55 Coffee break
10:55 – 11:20

Injecting Security Context During Vibe Coding

Srajan Gupta, Senior Security Engineer, Dave
Vibe coding with AI tools like Cursor is fast, but it quietly bypasses traditional AppSec controls. In this talk, we demo an MCP server that injects security context directly into the AI coding loop. Before code is generated, it pulls threat models, security requirements, and OWASP guidance for your task. After generation, it verifies the output for vulnerabilities and if it meets the security standards
11:20 – 11:45

Source to Sink: How to Improve LLM First-Party Vuln Discovery

Scott Behrens, Principal Security Engineer, Netflix
Justice Cassel, Application & GenAI Security, Netflix
We got tired of LLMs crying wolf about every string concatenation, so we built an agentic pipeline that thinks before it screams. This talk explores how to improve the accuracy and actionability of LLM-driven first-party vulnerability discovery in real-world codebases. If you’ve ever mass-closed 200 AI-generated “findings,” this talk is your therapy session.
11:45 – 12:10

The Parseltongue Protocol: A Deep Dive into 100+ Textual Obfuscation Methods

Joey Melo, AI Red Teaming Specialist, CrowdStrike
Large Language Models are designed with robust multilingual and multi-encoding support, but this versatility creates a new security vulnerability. This talk presents the results of a systematic empirical study where 100+ encoding and encryption techniques where used against 9 leading AI models with over 17,000 malicious prompts, revealing significant gaps in current AI safety systems. Attendees will gain critical insights into the evolving prompt injection attack surface and learn which encoding mechanisms pose the greatest threat to LLM security.
12:10 – 12:25

Why Most ML Vulnerability Detection Fails (And What Actually Worked for Kernel Bugs)

Jenny Guanni Qu, AI Researcher, Pebblebed
We tried the obvious approaches to ML-based vulnerability detection. Most failed. This talk covers the counterintuitive lessons from training on 125K Linux kernel commits: why “hard negatives” hurt performance, why subsystem boundaries are where bugs hide, and why the average kernel security bug survives 2.1 years undetected. Practical takeaways for anyone building vuln discovery systems.
12:25 – 13:30 Lunch break
13:30 – 13:55

1.8M Prompts, 30 Alerts: Hunting Abuse in a User-Defined Agent Ecosystem

Matt Rittinghouse, Lead Security Data Scientist, Salesforce
Millie Huang, Staff Security Data Scientist, Salesforce
How do you secure 12,000 autonomous agents when anyone can build one? Static rules alone can’t catch abuse in a user-defined ecosystem without drowning your SOC in noise. Join us at the front lines of real, productionized Agentforce defense, where we process millions of daily prompts across thousands of organizations. We’ll show you how we created meaningful and contextual behavioral baselines like Asset Rarity and Query Complexity, distilling a flood of unpredictable activity into fewer than 30 high-fidelity daily alerts.
13:55 – 14:20

AI Security with Guarantees

Ilia Shumailov, CEO, AI Sequrity Company
In this talk I will describe how one can run modern AI agents in a way that comes with security guarantees, even for the most complex setups such as computer use
14:20 – 14:45

From OSINT Chaos to Knowledge Graph: Building Production-Scale AI-Powered Threat Intelligence

Dongdong Sun, Senior Staff Machine Learning Engineer, Palo Alto Networks
How do you turn millions of unstructured threat reports into a queryable knowledge graph? This talk walks through a production AI pipeline that extracts threats and relationships from raw OSINT data—and the architectural decisions that make it actually work at scale.
14:45 – 15:05 Coffee break
15:05 – 15:30

Beyond the Chatbot: Delivering an Agentic SOC for Real-World Defense

Peter Smith, Director, Agentic SOC Product Management, Salesforce
Ravi Kiran Sharma (RK), Lead Security Engineer, Salesforce
Moving beyond the “copilot” era of simple Q&A, the next frontier in security operations is the Agentic SOC—a system where autonomous agents plan, reason, and act. But building this requires moving away from monolithic “black box” models toward a Polyphonic (Supervisor-Worker) architecture.
15:30 – 15:55

Are Your LLM’s Safety Mechanisms Intact? Detecting Backdoors with White-Box Analysis

Akash Mukherje, Cofounder, Realm Labs
these approaches implicitly assume that correct behavior implies intact safety mechanisms. In this talk, I’ll show why that assumption can fail.
I’ll present hands-on experiments exploring a class of LLM backdoors that selectively weaken refusal behavior while continuing to appear compliant under standard evaluations. Instead of relying on black-box judgments, this work uses a white-box analysis approach: first identifying internal signals associated with refusal behavior, then examining how those signals change when a model is backdoored and triggered. The key observation is that safety can degrade internally even when outputs still look acceptable, making output-only testing insufficient for these threats.
The talk focuses on what this means for practitioners building and operating secure AI systems. I’ll discuss how white-box analysis can provide more transparent safety signals, where it fits in the AI/ML lifecycle (e.g., pre-deployment checks or model upgrades), and how it complements existing benchmarks and red-teaming. I’ll also cover practical limitations, and other possibilities of this technique.
Attendees should leave with a concrete understanding of how backdoors can target safety mechanisms themselves, why black-box evaluations can miss these failures, and how white-box analysis can improve transparency when assessing the integrity of LLM safety behavior.
15:55 – 17:00 Mingling & Something sweet