[un]prompted

The AI Security Practitioner Conference

March 3-4, The Hibernia, San Francisco

Whether you’re a CISO Excel jockey or a researcher sniffing for the scent of bits, we see you as part of our wider AI security practitioner community.

[un]prompted is an intimate, raw, and fun gathering for the professionals actually doing the work, from offense to threat hunting to program building to national policy. No fluff. No filler. Just sharp talks, real demos, and conversations that matter.

March 3-4, The Hibernia, San Francisco.

Agenda is Out!

This has been a complex selection process, with almost 500 talks submitted – many of which submitted in the final two days. But we got it done, and the agenda has now been released. The agenda is subject to change. Thank you for your patience as we worked through the process.
– Gadi Evron, CFP Chair, [un]prompted and CEO, Knostic.

Agenda:

Evening events:

 

Stage 1
March 3, 2026 | Full conference day

08:30 – 09:00 Gathering & Mingling
09:00 – 09:10

Opening Words – “Research conferences aren’t effective.”

Gadi Evron, CEO,  Knostic. CFP Chair, [un]prompted
A presentation originally given by Joe Stewart at ACoD, many a-year ago.
Some of us are introverts, and even if we’re not it’s difficult to know who in the crowd we should speak with. Who can help us on what we need? Who can we help?
Beyond random encounters with 3-12 people, how do we make interactions effective?
We have a plan.
09:10 – 09:20 Move between rooms
09:20 – 09:35

Evaluating Threats & Automating Defense: How Google is Advancing Code Security

Heather Adkins, VP of Security Engineering, Google
Four Flynn, VP Security and Privacy, Google Deepmind
Our discussion will focus on advancing code security, provide a comprehensive overview of Google’s AI security strategy, show how we evaluate emerging cyberattack capabilities and demonstrate how tools like CodeMender are helping build intrinsically safer software.
09:35 – 10:00

The Hard Part Isn’t Building the Agent: On Measuring Agent Effectiveness to Improve It

Joshua Saxe, AI Security Technical Lead, Meta
As AI coding tools drive the cost of building security agents toward zero, the hard problem becomes knowing whether they’ll actually work in the wild against real attacks and vulnerabilities we haven’t seen before. This talk shares a practical journey from naive precision/recall metrics on old data toward multi-dimensional evaluation that captures reasoning quality, evidence gathering, and tool-calling logic –and shows how proper measurement unlocks automated agent improvement using genetic algorithms and AI coding tools. Live demo included.
10:00 – 10:25

Security Guidance as a Service: Building an AI-Native Blueprint for Defensive Security

Shruti Datta Gupta, Product Security Engineer, Adobe
Chandrani Mukherjee, Product Security Engineer, Adobe
Providing consistent security guidance at scale is hard, especially in AI-first environments. This session explores how we built an AI-Native Security Guidance as a Service that centralizes security knowledge and powers multiple defensive AI capabilities with consistent, evaluated and bespoke guidance.
10:25 – 10:45 Coffee break
10:45 – 11:10

Guardrails beyond Vibes: Shipping Security Agents in Production

Jeffrey Zhang, Security Engineer, Stripe
Siddh Shah, Software Engineer, Stripe
In this talk, we’ll share how Stripe is using AI agents to streamline two high-friction security workflows: threat modeling and security request routing. We’ll cover the practical design choices that made these agents reliable in practice – modular orchestrator/child architectures, targeted tools, structured inputs/outputs, and validation to reduce variance and improve determinism. We’ll also walk through how we measure and improve agent quality over time using offline and online evaluation loops, including how we handle subjective outputs in threat modeling versus higher-signal feedback in routing. The session closes with concrete lessons on what worked, what didn’t, with automating security workflows without losing user trust.
11:10 – 11:35

Code Is Free: Securing Software in the Agentic Future

Paul McMillan, Security Engineer, OpenAI
Ryan Lopopolo, Member of Technical Staff, OpenAI
If you have a perfect software security program, this talk is not for you. For everyone else, join us in an AI-maximalist vision of a future you can implement today. Your engineers are using LLMs to write your code, why aren’t they using them for security? We’ll talk about engineering-first ways to improve the security of your projects with zero-friction additions. Want a new security invariant? Just ask the model—Code is Free.
11:35 – 12:00

AI Agents for Exploiting “Auth-by-One” Errors

Brendan Dolan-Gavitt, AI Researcher, XBOW
Vincent Olesen, AI Researcher, XBOW
Modern web applications support a dizzying array of mechanisms to authenticate users and determine whether they are authorized to access application resources. Unfortunately, these mechanisms are largely bespoke, and finding vulnerabilities in such systems has traditionally been the domain of human researchers.
In this talk, we will present techniques for finding—and, importantly, validating—access control flaws using AI agents. Starting with strict validators that can identify when we have successfully logged in to an account (for AuthN validation) and (for AuthZ validation) when we can access a protected resource, our key insight is that these validators allow us to build capable attack agents for exploiting auth vulnerabilities. We will demo these techniques by showing real-world examples of exploits we have discovered in production systems.
12:00 – 12:25

Developing & Deploying AI Fingerprints for Advanced Threat Detection

Natalie Isak, Software Engineer, Microsoft
Waris Gill, Applied Scientist, Microsoft
As LLM-powered services proliferate, so do prompt injection attacks, but privacy regulations prevent sharing raw threat data across organizational boundaries. This talk introduces BinaryShield, a privacy-preserving fingerprinting system that enables cross-service threat intelligence without exposing sensitive user prompts. We’ll cover the research behind the approach (arXiv:2509.05608) and share practical deployment applications (including a demo!) for threat intelligence.
Research Paper Link: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints
12:25 – 13:30 Lunch break
13:30 – 13:55

When Passports Execute: Exploiting AI Driven KYC Pipelines

Sean Park, Principal Threat Researcher, TrendAI
Modern KYC workflows increasingly delegate passport parsing, database writes, and customer verification to AI driven extraction agents. This workflow is assumed to be safe because it is “just extraction,” tightly scoped by schema, and wrapped in compliance controls. In practice, it is an execution environment. We show how document embedded injects and compliance controls together steer AI agents into cross record reads and writes, enabling data theft and exfiltration without bypassing access controls.
This research goes beyond a one off agent or MCP exploit. We present a scalable exploitation approach that generalizes across KYC extraction agents, using LLM generated high success payloads and validating the attack with a tool using Claude Code extraction agent. A document embedded inject can steer the agent, while regulatory verification workflows complete the exploit chain.
13:55 – 14:20

FENRIR: AI Hunting for AI Zero-Days at Scale

Peter Girnus, Senior Threat Researcher, TrendAI
Derek Chen, Vulnerability Researcher, TrendAI
Academic research shows LLM-assisted vulnerability discovery works—IRIS achieves 2.5x improvement over CodeQL, Google’s Big Sleep found a critical SQLite zero-day. But can it work at production scale? FENRIR has discovered 100+ vulnerabilities across AI infrastructure since mid-2025, with 21 CVEs patched including multiple CVSS 9.8 RCEs. This talk presents FENRIR’s multi-stage verification pipeline: static analysis pre-triage, two-layer LLM validation (L1 prune → L2 deep-verify), and confidence-based human routing. We’ll cover what worked (research-backed context generation, CWE-specific agents, pattern recognition for bypass detection), what failed (pure automation’s false positives, generic prompts, insufficient context), and the hybrid model that emerged. Live demo: FENRIR analyzing AI framework code and surfacing candidates for human triage.
14:20 – 14:35

AI Notetakers: The Most Important Person in the Room

Joe Sullivan, CEO, Ukraine Friends and Joe Sullivan Security
The most important attendee in your meetings isn’t a person anymore. It’s the AI notetaker. This system assigns action items, determines what was important, and creates the official record. When facts need revisiting, its summary is treated as impartial evidence.
This talk covers four areas:
Steering: Techniques for influencing what the notetaker captures. Call it manipulation or strategic communication, the methods work and people are already using them.
Risk: The governance gap when notetakers become infrastructure. Shadow deployments, vendor fragility, consent liability, discovery exposure.
Opportunity: A reliable system of record for incident response.
Framework: Enterprise readiness spanning policies, program building, and the full meeting lifecycle.
14:35 – 14:55 Coffee break
14:55 – 15:20

AI go Beep Boop!

Adam Laurie (Major Malfunction), Hardware Hacker turned CISO, Alpitronic
Hardware hacking with AI at the controls. Literally. I gave Claude my hardware lab: Laptop, USB hub, XYZ platform, PICO2, Jlink-pro, Oscilloscope, Chipshouter and some targets. Within 7 minutes it had pwned an LPC chip I had failed to glitch for 6 weeks solid. Within a month it had rewritten my entire glitching platform and now while I sleep it hacks new targets and integrates other solutions and attacks.
15:20 – 15:45

Zeal of the Convert: Taming Shai-Hulud with AI

Rami McCarthy, Principal Security Researcher, Wiz
2025 was the year of Shai-Hulud: a series of attacks leaking massive amounts of victim data onto GitHub, ungraciously scheduled for whenever I was traveling. As a responder, these internet-scale incidents were a real-world lab for evolving AI capabilities. This talk is a raw post-mortem of moving from simple “vibe-coded” scrapers to multi-agent triage engines that parallelize victimology and automate secret-impact analysis. Demos will drive a conversation on what actually worked, where the ground has shifted, and how “lazy” AI will let you down. Walk away with prompts, scripts, skills, and lessons from my scars.
15:45 – 16:10

Anatomy of an Agentic Personal AI Infrastructure

Daniel Miessler, Founder, Unsupervised Learning
A deepdive on my Personal AI infrastructure system, and the open-source project that mirrors it.
16:10 – 16:35

Black-hat LLMs

Nicholas Carlini, Research Scientist, Anthropic
Large language models are now capable of automating attacks that were previously only possible by human adversaries. In this talk, I discuss several ways that adversaries could mis-use current models in order to cause harm both at a larger scale and at a lower cost than they do currently. For example, we find that recent state-of-the-art models can now find 0-day vulnerabilities in large software projects that have been extensively tested by humans for decades. These new capabilities will alter the threat landscape and require we rethink security in the coming years.
16:35 – 17:00

Vibe Check: Security Failures in AI-Assisted IDEs

Piotr Ryciak, AI Red Teamer, Mindgard
AI IDEs and coding agents expand the practical attack surface of development workflows by introducing new paths from untrusted workspace inputs to high-impact actions. This talk presents a catalog of exploitation patterns derived from vulnerability research across major AI-assisted IDEs and agents, including OpenAI Codex, Amazon Kiro, Google Antigravity, Cursor, and others, with a mix of issues already patched and others in active remediation. We organize findings by attacker effort and trigger model: zero-click paths, one-click paths, autorun behavior, and time-delayed execution. The talk is demo-driven and then generalizes beyond the demos to a repeatable playbook and checklist that security teams and builders can apply to assess and harden any AI-assisted IDE deployment.
17:00 – 18:00 Mingling & Something sweet

Stage 2
March 3, 2026 | Full conference day

Stage 2 opens at 9:35
09:35 – 10:00

Establishing AI Governance Without Stifling Innovation: Lessons Learned

Billy Norwood, CISO, FFF Enterprises
Strategy and implementation of a risk-based AI governance committee in a healthcare services firm and our successes and failures along the way.
10:00 – 10:25

Enterprise AI Governance at Snowflake: Balancing Innovation and Risk

Ragini Ramalingam, Director, Snowflake
As generative AI technologies continue to evolve, organizations are working to thoughtfully balance innovation with appropriate governance. In this session, Ragini Ramalingam, Director of Enterprise Security at Snowflake, shares perspectives on supporting responsible AI adoption within a large, dynamic enterprise environment. She will discuss practical approaches to establishing governance frameworks, fostering cross-functional collaboration, and embedding security considerations into emerging technologies—helping organizations enable innovation in a structured, risk-aware manner.
10:25 – 10:45 Coffee break
10:45 – 11:10

Three Phases of AI Adoption: From GPU Lottery to Enterprise Agreements

Chase Hasbrouck, Chief of Forensics/Malware Analysis, U.S. Army Cyber Command
The Army’s path to enterprise AI shows a pattern every organization will face: deployment constraints shape adoption more than security policies. In 2023, fragmented research previews meant high innovation but no institutional knowledge. In 2024, centralized solutions with token budgets killed experimentation. Power users burned through monthly allocations in one or two queries, exactly the people you most want to encourage. In 2025, enterprise agreements removed cost barriers, but now we’re grappling with cultural change: convincing people the tool is actually usable, then dealing with downstream implications when they believe us. As an early power user applying AI to incident response and forensics in Army Cyber, I helped my organization navigate each phase, and can share my lessons learned. (Disclaimer: Personal experience only, not official Army positions.)
11:10 – 11:35

SIFT – FIND EVIL!! I Gave Claude Code R00t on the DFIR SIFT Workstation

Rob T. Lee, Chief AI Officer (CAIO), Chief of Research, SANS Institute
Sounds reckless. Turns out it’s less reckless than letting state actors be the only ones with agentic AI. Anthropic’s GTG-1002 report showed adversaries running Claude Code at 80-90% autonomous execution. Your adversary has an AI. You have tab-completion. I wired the same tool into SIFT via Model Context Protocol—timeline generation, memory analysis, malware sweeps, all via natural language. By the end, you’ll see me type “SIFT!! Find Evil!” and watch it actually work. Mostly. This is what 40+ hours of testing taught me.
11:35 – 12:00

“Can You See What Your AI Saw?”: GenAI Endpoint Observability for Detection Engineers

Mika Ayenson, Threat Research & Detection Engineer, Elastic
As GenAI coding assistants become standard developer tools, detection engineers face a new challenge: understanding what happens when AI executes commands on behalf of users. This talk explores the current state of GenAI endpoint observability from a practitioner’s perspective, what telemetry exists today, where the gaps are, and why the industry needs standardized schemas for AI activity. Through real queries and telemetry examples, we’ll walk through techniques for correlating AI-spawned processes across multi-level ancestry chains, discuss blind spots that surprised us during testing, and make the case for extending and adopting OpenTelemetry semantic conventions to cover GenAI tool activity on endpoints.
12:00 – 12:25

Detecting GenAI Threats at Scale with YARA-Like Semantic Rules

Mohamed Nabeel, Sr Principal Researcher, Palo Alto Networks
Traditional YARA rules revolutionized malware hunting, but they fail against semantic GenAI threats like prompt injection, brand impersonation, and disinformation campaigns. SYARA (Super YARA) extends YARA’s beloved syntax with multi-modal semantic detection—combining string matching, embeddings, ML classifiers, and LLMs in a single rule. In this hands-on session, you’ll learn to hunt GenAI-era threats including direct/indirect prompt injection, phishing detection using perceptual hashes, malicious intent identification, and disinformation detection. We’ll demonstrate why semantic detection at scale requires efficient layered approaches rather than expensive LLM-only solutions, achieving 98% detection rates at <100ms latency and $0.001/query—orders of magnitude faster and cheaper than LLM-based approaches.
12:25 – 13:30 Lunch break
13:30 – 13:55

The Advent of Confidential AI

Raghu Yeluri, Fellow and lead architect, Confidential AI
Confidential AI is a hardware-based security approach that protects sensitive data and AI models during active processing by keeping information encrypted even while being computed on, extending beyond traditional encryption that only secures data at rest or in transit.
The technology relies on Trusted Execution Environments (TEEs) – secure hardware enclaves within processors (CPUs, GPUs, Accelerators) that decrypt data only within isolated spaces invisible to operating systems, cloud providers, or administrators. Along with remote attestation, this approach protects inferencing data, prompts and context info, thus enabling the deployment of enterprise critical applications in public cloud and hybrid cloud environments.
This talk will give you the technology components available for Confidential AI, and real-world deployments with two example use-cases that would be of interest to other practitioners.
13:55 – 14:20

Tenderizing the Target: Soaking Code in Synthetic Vulnerabilities

Aaron Grattafiori, Principle Offensive AI Security Researcher, NVIDIA
Skyler Bingham, Principal Applied Researcher, NVIDIA
Marinade is an agentic workflow we built to solve a fundamental problem in security testing: getting realistic vulnerable applications that aren’t contrived CTF challenges or overused training targets like DVWA. The idea is to point it at some source code—Django, Spring Boot, Java, Rails, whatever—and it works to analyze the codebase, understand the attack surface, and inject realistic, exploitable vulnerabilities that blend naturally into the existing code while preserving functionality. We’ve found that AI is surprisingly adept at weakening security controls rather than clumsily removing them, producing bugs that look like genuine developer mistakes in a given programming language or app, and each injected vulnerability ships with a validation script proving exploitability to avoid false positives. Marinade lets you generate a large-scale synthetic corpus of vulnerable applications from real-world, production-quality codebases opening up new possibilities for scanner evaluation, red team training, and security tool benchmarking.
14:20 – 14:35

Hooking Coding Agents with the Cedar Policy Language

Matt Maisel, CTO and Cofounder, Sondera
Coding agents wield dangerous access to your code and terminal, and prompt injection renders soft guardrails useless. This talk demonstrates a reference monitor using Rust hooks and Cedar policies to deterministically intercept every shell command, file read, and other actions. We’ll live demo forbidding exfiltration and destructive behaviors, leaving you with an open-source tool compatible with Cursor, Claude Code, and GitHub Copilot CLI.
14:45 – 14:55 Coffee break
14:55 – 15:20

Glass-Box Security: Operationalizing Mechanistic Interpretability for Defending AI Agents

Carl Hurd, Co-Founder & CTO, Starseer
Perimeter defenses are failing against the next generation of AI agents. This talk introduces “Glass-Box Security,” a paradigm shift that utilizes Mechanistic Interpretability and Latent Space Geometry to monitor a model’s internal state for malicious intent and data exfiltration. We will explore why true observability requires a return to self-hosted infrastructure and present the Starseer architecture—a technical reference for building an “Internal EDR.” Attendees will learn to replace fragile regex filters with “semantic tripwires” that detect deception and code leakage at the neuron level, long before the model generates output.
15:20 – 15:45

The AI Security Larsen Effect: How to Stop the Feedback Loop

Maxim Kovalsky, Managing Director, AI Security CoE, Consortium Networks
The AI security market has 60+ vendors, and your VAR just sent 15 one-pagers. OWASP tells you what can go wrong. NIST tells you how to govern. Neither tells you which risks actually matter for YOUR architecture or HOW to implement controls given your existing stack. This talk introduces a capability-based framework that zeros in on the risks that are actually relevant, helps you decide how to address them (configure what you own, buy something new, or build it yourself), and—as a consequence—produces a rational vendor shortlist instead of analysis paralysis. Live demo with a realistic scenario: agentic healthcare chatbot, PHI data, existing Azure and CrowdStrike stack. We’ll go from “we need AI security” to implementation clarity in under 20 minutes.
15:45 – 16:10

Kinetic Risk: Securing and Governing Physical AI in the Wild

Padma Apparao, Architecting AI solutions, Intel
When AI leaves the screen and enters the physical world, failure shifts from misinformation to kinetic damage. Physical AI is fundamentally different from traditional AI: while performance and throughput dominate system design, the potential for physical harm means security, risk, and governance must be built in from the start. This talk explains why Vision-Language-Action (VLA) models powering robotics and autonomous machines require system-level thinking beyond model accuracy. We examine VLA-specific security risks such as sensor spoofing and embodied instruction manipulation that can lead to unsafe physical actions. The talk also explores why existing governance frameworks like the EU AI Act and NIST AI RMF fall short for adaptive, non-deterministic AI systems operating in dynamic, real-world environments. Finally, we address the organizational friction between engineering, safety, and risk teams as Physical AI scales into production. Real-world examples are used throughout to illustrate performance, security, governance, and organizational challenges.
The audience will leave with practical reference architecture ideas, recommendations for evolving governance frameworks, and actionable guidance for securing physical AI implementations, all framed around a “safety-first” mindset where innovation leads even without “Ctrl-Z”.
16:10 – 16:35

Trajectory-Aware Post-Training of Open-Weight Models for Security Agents

Aaron Brown, Agentic AI Builder, AWS
Madhur Prashant, Applied AI/ML Engineer, AWS
Everyone talks about AI agents for security, but almost no one talks about how to post-train the underlying open-weight models that power them. Frontier APIs work for prototypes, but scaling autonomous security operations requires fine-tuned small language models optimized for your specific tooling, reasoning patterns, and operational constraints. This talk presents a complete open-source pipeline for trajectory-aware post-training of open-weight SLMs for cybersecurity tasks covering environment setup, data collection and refinement, reward function design, and a two-stage SFT to GRPO training recipe running on NVIDIA DGX Spark. We’ll release training configs, the evaluation harness, and fine-tuned GLM-4.7 30B Flash weights on HuggingFace.
16:35 – 17:00

AI Found 12 Zero-Days In OpenSSL. What Does It Mean For The Industry?

Adam Krivka, AI Security Reseatcher, AISLE
Ondrej Vlcek, Co-founder & CEO, AISLE
OpenSSL is one of the most audited codebases on the planet. Its January 2026 security update fixed 12 vulnerabilities — all of which were found and reported by our AI system. Three had been hiding in the codebase for over two decades. In parallel, we’ve identified hundreds of other vulnerabilities across critical infrastructure projects like curl, the Linux kernel, and wolfSSL.
AI has fundamentally changed the economics of vulnerability discovery. What once required elite expertise and months of manual auditing can now be done in hours. Exploits can be engineered by autonomous agents. The cost of offensive capability is rapidly shrinking.
This talk explores what it takes to make AI vulnerability discovery production-grade — and why organizations that don’t adopt these systems to defend their software will be outpaced by adversaries who do.
17:00 – 18:00 Mingling & Something sweet

Stage 1
March 4, 2026 | Full conference day

08:30 – 09:00 Gathering & Mingling
09:00 – 09:10

Opening Words

Gadi Evron, CEO, Knostic. CFP and Committee Chair, [un]prompted.
09:10 – 09:35

200 Bugs/Week/Engineer: How We Rebuilt Trail of Bits Around AI

Dan Guido, CEO, Trail of Bits
AI isn’t a feature you “adopt.” It is a force that commoditizes effort and shortens the half-life of best practices, especially in security work where trust, evidence, and privacy are non-negotiable. In this talk, I’ll explain the strategy I’m using to turn Trail of Bits into an AI-native consulting firm. The core idea is a compounding operating system built from incentives, defaults, guardrails, and verification loops that let humans and autonomous agents ship high-rigor work at dramatically higher throughput.
You’ll see the concrete artifacts that make this real: internal and external skills repositories, a curated marketplace for third-party skills, opinionated configuration baselines, and sandboxing patterns. Then I’ll cover what changes when AI output scales. Pricing, staffing, and delivery models evolve when discovery becomes abundant.
Finally, I’ll show what’s next in the full vision. It is to build a firm that compounds faster than the ecosystem changes, and to do it in a way others can copy as a playbook rather than a vendor pitch.
09:35 – 10:00

8 Minutes to Admin. We Caught It in the Wild. Welcome to VibeHacking

Sergej Epp, CISO, Sysdig
We caught two AI-assisted attack campaigns —an 8-minute AWS escalation from stolen creds to full admin, and EtherRAT, a fileless Node.js implant using Ethereum smart contracts for C2. Neither campaign introduced novel attack primitives. Both compressed known techniques to speeds and scales that break traditional detection models. This talk dissects both operations from primary forensic evidence, introduces a behavioral methodology for attributing AI-assistance when proof is impossible, and shows why blockchain C2—the attacker’s resilience play—is actually the defender’s greatest forensic gift.
10:00 – 10:25

macOS Vulnerability Research: Augmenting Apple’s Source Code and OS Logs with AI Agents

Olivia Gallucci, Security Engineer, Datadog
Have you ever wondered how macOS and iOS work under the hood? While Apple is known for its closed ecosystem, did you know that significant portions of macOS and iOS are open source, including security components? For researchers, learning how to analyze and exploit this open-source code, especially with the help of AI, is a game-changer. This talk walks through how we can operationalize Apple’s partial open-source codebase for offensive security: specifically, through the lens of reverse engineering, fuzzing, and vulnerability discovery. We’ll cover how to integrate generative AI and AI tooling into a workflow for automating the triage of open-source diffs, identification of code changes with high exploit potential, and prioritization of fuzzing targets within the shared macOS/iOS codebase.
10:25 – 10:55 Coffee break
10:55 – 11:20

Promp2Pwn – LLMs Winning at Pwn2Own

Georgi G, Director of Research, Interrupt Labs
We built an agentic AI to hunt bugs for Pwn2Own and it delivered. Among the issues it found was a vulnerability in Samsung’s own AI assistant, Bixby. In this talk, we’ll show how we wired it up, what worked, what didn’t, and why letting machines hunt bugs made Pwn2Own fun again.
11:20 – 11:45

Breaking the Lethal Trifecta (Without Ruining Your Agents)

Andrew Bullen, AI Security Lead, Stripe
Prompt injection remains the elephant in the AI Security room—there’s no deterministic defense, yet the urgency driving AI adoption means many teams feel forced to either accept the risk or hobble their agents with overly restrictive policies. But there’s a third path: containment. In this talk, I’ll walk through the architectural guardrails Stripe adopted to protect our agent platform, showing how you can give agents powerful tools while ensuring minimal damage if prompt injection occurs. I’ll cover strategies for preventing data exfiltration through controlled egress, share UI patterns for human confirmation flows to balance oversight with usability, and demonstrate how to enforce these guardrails at CI-time using tool annotations.
11:45 – 12:10

Building Secure Agentic Systems: Lessons from Daily-Driver Agents

Brooks McMillin, AI Security Researcher & Security Engineer, Dropbox
No polished demos or theoretical architectures – this talk shows what actually breaks when you build agents you use every day. I’ll walk through real patterns from building specialized agents with shared infrastructure: capability bounding to prevent tool abuse, prompt injection detection that required real-world tuning, multi-agent memory isolation failures (and the fix), and OAuth device flow for headless operation. Expect live demos, actual code, and honest discussion of security decisions that worked as well as the ones I had to fix after they broke.
12:10 – 12:25

Rethinking how we evaluate security agents for real-world use

Mudita Khurana, Staff Security Engineer, Airbnb
Security agents are gaining momentum across industry, but the way we evaluate them remains rooted in narrow, outcome-only benchmarks. These evaluations tell us whether an agent produced a correct answer, but not “how” it arrived there or whether that behavior will remain stable once deployed.
In practice, security is not a sequence of isolated tasks. It is a connected, end-to-end workflow that follows a find → confirm exploit → patch → validate loop. Agents that perform well on task-specific benchmarks often fail in these multi-stage settings due to contextual loss and brittle transitions across steps.
This talk introduces a practical, capability-centric framework for evaluating security agents, that emphasizes observability into how agents plan, reason, use tools, and carry context across the security lifecycle & thus enable teams to better judge whether an agent is ready for real-world use.
12:25 – 13:30 Lunch break
13:30 – 13:55

Securing Workspace GenAI at Google Speed: Surviving the Perfect Storm

Nicolas Lidzborski, Principal Engineer, Google Workspace Security
GenAI agents are currently navigating a perilous “”Perfect Storm”” defined by the dangerous intersection of three key vulnerabilities: access to sensitive data, exposure to untrusted content, and the capability to execute external commands. This technical deep dive will unveil the architectural principles and defense strategies utilized to protect Gemini and the Google Workspace ecosystem from this toxic convergence.
Moving beyond mere hypothetical discussions, this session provides a detailed breakdown of real-world attacks, specifically, a vulnerability where an attacker could hijack an agent simply through a calendar invitation. Attendees will acquire practical insights into Google’s rigorous defense-in-depth blueprint, covering advanced prompt injection defenses, strategic chaining policies for sandboxing rogue agent actions, and thorough sanitization techniques for hazardous outputs.
13:55 – 14:20

Operation Pale Fire: How We Red-Teamed Our Own AI Agent

Wes Ring, Block
Josiah Peedikayil
The best defense is a good offense. When we released goose, Block’s open source AI agent, we recognized the need to proactively identify how attackers will attempt to abuse it. Enter: Operation Pale Fire.
14:20 – 14:45

Training BrowseSafe: Lessons from Detecting Prompt Injection in Production Browser Agents

Kyle Polley, Member of Technical Staff, Security
Perplexity
Deploying AI agents that browse the web on behalf of users creates a critical security challenge: how do we prevent malicious websites from hijacking agent behavior through embedded prompt injections? This presentation shares our experience training and deploying BrowseSafe, a defense system now protecting browser agents in production.
We’ll cover the model training pipeline, including how we built BrowseSafe-Bench—a realistic benchmark with attacks embedded in high-entropy HTML pages that mirror actual web content. Our fine-tuned Mixture-of-Experts model (Qwen-30B) achieves F1 scores of ~0.91 while maintaining sub-100ms latency requirements for production deployment. The training process revealed key insights: attacks using linguistic camouflage, multilingual instructions, and visible text placement proved most challenging to detect, while traditional academic benchmarks significantly overestimate real-world detection accuracy.
More importantly, we’ll discuss what we’ve observed in the wild since deployment. Real-world attack patterns, adversarial evolution, false positive challenges in diverse web content, and the data flywheel approach that continuously improves the model through production feedback all provide lessons for building robust security in agentic systems. This talk offers practical insights for security teams deploying AI agents that interact with untrusted web content at scale.
14:45 – 15:05 Coffee break
15:05 – 15:30

Exploring the AI Automation Boundary for Threat Hunting at Datadog

Arthi Nagarajan, Software Engineer for Internal Threat Detection, Datadog
Modern threat hunting isn’t limited by a lack of telemetry—it’s limited by humans’ ability to quickly navigate overwhelming amounts of it. At Datadog, we explored how AI can help security practitioners work across massive volumes of telemetry with diverse schemas. We automated three parts of the threat hunting workflow: hypothesis-driven query generation, iterative refinement, and narrowing toward pivotal evidence.
In this talk, we share the pitfalls and wins of our journey evolving a single agent into an orchestrator-subagent system. We focus on our learnings about trust, hallucinations, and evaluations amidst real-world constraints and tradeoffs that formed our definition of the automation boundary: Where AI accelerates defensive work, where it creates new risk, and the design decisions that establish trust with threat hunters.
15:30 – 15:55

Detection & Deception Engineering in the Matrix

Bob Rudis, V.P. Data Science, Security Research, & Detection+Deception Engineering, GreyNoise Labs
Glenn Thorpe, Sr. Director, Security Research & Detection Engineering, GreyNoise Intelligence
GreyNoise built an AI agent — Orbie — that operates on internet-scale honeypot data to surface emergent threats, identify campaigns, and write detection rules. We’re sharing what works, what doesn’t, and the specific campaigns we caught that traditional methods missed. You’ll see how domain expert knowledge embedded in tooling lets LLMs operate on billions of network sessions, and why that matters more than the model you choose.
15:55 – 17:00 Mingling & Something sweet

Stage 2
March 4, 2026 | Full conference day

Stage 2 opens at 9:10
09:10 – 09:35

Total Recon: How We Discovered 1000s of Open Agents in the Wild

Avishai Efrat, Senior Security Researcher, Zenity
Roey Ben Chaim, Staff Engineer, Zenity
AI agents quietly created a new external attack surface: copilots, custom agents, AI middleware and various deployments that ship to the internet – often without anyone realizing they are reachable, enumerable, or over-permissioned. In this talk we’ll show how attackers can already find your agents in the wild, shedding light on the technical details that enable this kind of malicious activity – including how we used these details to find 1000s of exposed agents. We’ll follow up with explaining how to measure exposure, see the proof for obscurity failing, and understand how to detect threat-actor agent-focused recon before it turns into an impactful attack. Capping it all off by dropping PowerPwn – a recon tool you can use to test your own exposure
09:35 – 10:00

Your Agent Works for Me Now

Johann Rehberger, Red Team Director
Agentic AI used in personal assistants, developer tools, and enterprise platforms can be infected with promptware, engineered prompts that act like malware.
This talk demonstrates attacks and exploit chains, including delayed tool invocation and intent activation tricks that bypass existing mitigations. Attacks enable persistence, lateral movement across agentic systems, promptware-powered C2, and data exfiltration.
Several of the exploit demos have not been publicly disclosed before, including attacks against Gemini, Copilot and others.
Many of these issues are not edge cases or unknown problems. Even where simple fixes exist, new and more powerful AI systems keep reintroducing known vulnerability classes while increasing scale and blast radius at the same time. By shipping agents with insecure defaults, responsibility is pushed onto end users.
10:00 – 10:25

Capability-Based Authorization for AI Agents: Warrants That Survive Prompt Injection

Niki Aimable Niyikiza, Senior Security Engineer & AI Security Researcher, Snap
Prompt injection filters and coarse IAM roles consistently fail in multi-agent setups.
I’ll show a working alternative: treating agent authority as ephemeral, cryptographic warrants that attenuate on delegation (inspired by Macaroons/UCAN), task-scoped, holder-bound, and verified offline by tools in microseconds. Even a fully compromised agent can’t escalate or exfiltrate beyond its bounds.
Live demos in LangChain/LangGraph multi-agent workflows, benchmarks against adaptive injection/escalation attacks, and an honest look at remaining gaps (e.g., constraints that require runtime context).
Audience Takeaways:
1) Why identity-based authorization fails for AI agents
2) How capability tokens bound blast radius without blocking legitimate use
3) Practical patterns for delegation in multi-agent systems”
10:25 – 10:55 Coffee break
10:55 – 11:20

Injecting Security Context During Vibe Coding

Srajan Gupta, Senior Security Engineer, Dave
Vibe coding with AI tools like Cursor is fast, but it quietly bypasses traditional AppSec controls. In this talk, we demo an MCP server that injects security context directly into the AI coding loop. Before code is generated, it pulls threat models, security requirements, and OWASP guidance for your task. After generation, it verifies the output for vulnerabilities and if it meets the security standards
11:20 – 11:45

Source to Sink: How to Improve LLM First-Party Vuln Discovery

Scott Behrens, Principal Security Engineer, Netflix
Justice Cassel, Application & GenAI Security, Netflix
We got tired of LLMs crying wolf about every string concatenation, so we built an agentic pipeline that thinks before it screams. This talk explores how to improve the accuracy and actionability of LLM-driven first-party vulnerability discovery in real-world codebases. If you’ve ever mass-closed 200 AI-generated “findings,” this talk is your therapy session.
11:45 – 12:10

The Parseltongue Protocol: A Deep Dive into 100+ Textual Obfuscation Methods

Joey Melo, AI Red Teaming Specialist, CrowdStrike
Large Language Models are designed with robust multilingual and multi-encoding support, but this versatility creates a new security vulnerability. This talk presents the results of a systematic empirical study where 100+ encoding and encryption techniques where used against 9 leading AI models with over 17,000 malicious prompts, revealing significant gaps in current AI safety systems. Attendees will gain critical insights into the evolving prompt injection attack surface and learn which encoding mechanisms pose the greatest threat to LLM security.
12:10 – 12:25

Why Most ML Vulnerability Detection Fails (And What Actually Worked for Kernel Bugs)

Jenny Guanni Qu, AI Researcher, Pebblebed
We tried the obvious approaches to ML-based vulnerability detection. Most failed. This talk covers the counterintuitive lessons from training on 125K Linux kernel commits: why “hard negatives” hurt performance, why subsystem boundaries are where bugs hide, and why the average kernel security bug survives 2.1 years undetected. Practical takeaways for anyone building vuln discovery systems.
12:25 – 13:30 Lunch break
13:30 – 13:55

1.8M Prompts, 30 Alerts: Hunting Abuse in a User-Defined Agent Ecosystem

Matt Rittinghouse, Lead Security Data Scientist, Salesforce
Millie Huang, Staff Security Data Scientist, Salesforce
How do you secure 12,000 autonomous agents when anyone can build one? Static rules alone can’t catch abuse in a user-defined ecosystem without drowning your SOC in noise. Join us at the front lines of real, productionized Agentforce defense, where we process millions of daily prompts across thousands of organizations. We’ll show you how we created meaningful and contextual behavioral baselines like Asset Rarity and Query Complexity, distilling a flood of unpredictable activity into fewer than 30 high-fidelity daily alerts.
13:55 – 14:20

AI Security with Guarantees

Ilia Shumailov, CEO, AI Sequrity Company
In this talk I will describe how one can run modern AI agents in a way that comes with security guarantees, even for the most complex setups such as computer use
14:20 – 14:45

From OSINT Chaos to Knowledge Graph: Building Production-Scale AI-Powered Threat Intelligence

Dongdong Sun, Senior Staff Machine Learning Engineer, Palo Alto Networks
How do you turn millions of unstructured threat reports into a queryable knowledge graph? This talk walks through a production AI pipeline that extracts threats and relationships from raw OSINT data—and the architectural decisions that make it actually work at scale.
14:45 – 15:05 Coffee break
15:05 – 15:30

Beyond the Chatbot: Delivering an Agentic SOC for Real-World Defense

Peter Smith, Director, Agentic SOC Product Management, Salesforce
Ravi Kiran Sharma (RK), Lead Security Engineer, Salesforce
Moving beyond the “copilot” era of simple Q&A, the next frontier in security operations is the Agentic SOC—a system where autonomous agents plan, reason, and act. But building this requires moving away from monolithic “black box” models toward a Polyphonic (Supervisor-Worker) architecture.
15:30 – 15:55

Are Your LLM’s Safety Mechanisms Intact? Detecting Backdoors with White-Box Analysis

Akash Mukherje, Cofounder, Realm Labs
these approaches implicitly assume that correct behavior implies intact safety mechanisms. In this talk, I’ll show why that assumption can fail.
I’ll present hands-on experiments exploring a class of LLM backdoors that selectively weaken refusal behavior while continuing to appear compliant under standard evaluations. Instead of relying on black-box judgments, this work uses a white-box analysis approach: first identifying internal signals associated with refusal behavior, then examining how those signals change when a model is backdoored and triggered. The key observation is that safety can degrade internally even when outputs still look acceptable, making output-only testing insufficient for these threats.
The talk focuses on what this means for practitioners building and operating secure AI systems. I’ll discuss how white-box analysis can provide more transparent safety signals, where it fits in the AI/ML lifecycle (e.g., pre-deployment checks or model upgrades), and how it complements existing benchmarks and red-teaming. I’ll also cover practical limitations, and other possibilities of this technique.
Attendees should leave with a concrete understanding of how backdoors can target safety mechanisms themselves, why black-box evaluations can miss these failures, and how white-box analysis can improve transparency when assessing the integrity of LLM safety behavior.
15:55 – 17:00 Mingling & Something sweet