The 6th Workshop of Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026), Wed June 3 - Sun June 7, 2026, Denver, CO, USA.

AdvML Workshop: June 3 or June 4

Overview

Over the past few years, foundation models have fundamentally transformed the landscape of computer vision, enabling large-scale visual understanding, generation, and multimodal reasoning. Building upon these advances, vision-language agents, embodied or digital systems powered by multimodal foundation models, are rapidly emerging as a central paradigm for intelligent perception, decision-making, and human-AI interaction. These agents integrate perception (vision), cognition (language and reasoning), and action (planning and control) within a unified framework, thereby bridging the gap between visual recognition and autonomous behavior. However, the growing autonomy and complexity of such agents have also amplified their susceptibility to adversarial and safety-critical risks. Beyond traditional pixel-level perturbations, new attack surfaces arise from adversarial prompts, instruction injections, and jailbreak manipulations, which can disrupt reasoning chains, mislead perception, or induce harmful actions. These vulnerabilities highlight fundamental challenges in building safe, robust, and trustworthy vision-language agents for real-world applications, from autonomous driving and embodied robotics to interactive medical or industrial systems. Addressing these challenges demands a deeper understanding of multimodal robustness, causal reasoning, and secure perception-action coupling in complex environments.

The 6th Workshop on Adversarial Machine Learning in Computer Vision (6th AdvML@CV): Safety of Vision-Language Agents aims to bring together researchers and practitioners from computer vision, multimodal learning, and AI safety communities to advance the frontier of robust and trustworthy vision-language agents. Continuing the success of the previous five CVPR AdvML@CV workshops, which have attracted thousands of submissions, participants, and widespread attention, the 2026 edition will feature keynote talks by leading experts, contributed papers, and an international challenge on adversarial robustness for multimodal agents.

Through this workshop, we aim to foster cross-disciplinary collaboration, inspire new research directions, and catalyze the development of secure, reliable, and ethically aligned vision-language agents that can safely operate in dynamic and human-centered environments.

Important: The submission deadline has been extended to Mar. 7, 2026 (23:59, UTC±0).

Timeline (Tentative)

Workshop Schedule

Event Start time End time

Opening Remarks 9:00 9:15

Invited Talk #1: Prof. Bo Li 9:15 9:45

Invited Talk #2: Prof. Chaowei Xiao 9:45 10:15

Contributed Talk #1 10:15 10:30

Coffee Break 10:30 10:45

Invited Talk #3: Prof. Ziwei Liu 10:45 11:15

Invited talk #4: Prof. Florian Tramèr 11:15 11:45

Contributed Talk #2 11:45 12:00

Lunch (12:00-13:30)

Invited Talk #5: Dr. Nouha Dziri 13:30 14:00

Invited Talk #6: Prof. Yaodong Yang 14:00 14:30

Invited Talk #7: Prof. Aditi Raghunathan 14:30 15:00

Poster Session 15:00 16:00

Challenge Session 16:00 16:30

Poster Session #2 16:30 17:00

Proposed Speakers

Ziwei
Liu

Nanyang Technological
University

Chaowei
Xiao

Johns Hopkins University

Nouha
Dziri

Allen Institute for AI

Florian
Tramèr

ETH Zürich

Yaodong
Yang

Peking University

Aditi
Raghunathan

Carnegie Mellon University

Bo
Li

University of Illinois
at Urbana-Champaign

Aishan
Liu

Beihang University

Organizers

Jin
Hu

Zhongguancun
Laboratory

Tianyuan
Zhang

Beihang
University

Aishan
Liu

Beihang
University

Jiakai
Wang

Zhongguancun
Laboratory

Julia
Karbing

University of Oxford

Yinpeng
Dong

Tsinghua
University

Zhenfei
Yin

University of Oxford

Shao
Jing

Shanghai AI Laboratory

Xia
Hu

Shanghai AI Laboratory

Juntao
Dai

BAAI

Xinyun
Chen

Xianglong
Liu

Beihang
University

Vishal M.
Patel

Johns Hopkins University

Dawn
Song

UC Berkeley

Alan
Yuille

Johns Hopkins
University

Philip
H.S. Torr

Oxford
University

Dacheng
Tao

Nanyang Technological
University

Call for Papers

Vision-language agents, embodied or digital systems powered by multimodal foundation models, are rapidly emerging as a central paradigm for intelligent perception, decision-making, and human-AI interaction. These agents integrate perception (vision), cognition (language and reasoning), and action (planning and control) within a unified framework, thereby bridging the gap between visual recognition and autonomous behavior. However, beyond traditional pixel-level perturbations, new attack surfaces arise from adversarial prompts, instruction injections, and jailbreak manipulations, which can disrupt reasoning chains, mislead perception, or induce harmful actions. To foster the development of safe, robust, and trustworthy vision-language agents for real-world applications, we invite submissions on both theoretical and practical aspects of adversarial machine learning, with a specific focus on the safety of vision-language agents. We welcome research contributions related to the following (but not limited to) topics:

Attack and defense on vision-language agents
Datasets and benchmarks that could evaluate vision-language agents
Adversarial / Jailbreak attacks on vision-language agents
Improving the robustness of agents or deep learning systems
Interpreting and understanding model robustness, especially agentic AI
Adversarial attacks for social good
Alignment of vision-language agents

Format: Submissions papers (.pdf format) must use the CVPR 2026 Author Kit for LaTeX/Word Zip file and be anonymized and follow CVPR 2026 author instructions. The workshop considers two types of submissions: (1) Long Paper: Papers are limited to 8 pages excluding references; (2) Extended Abstract: Papers are limited to 4 pages including references. Accepted papers have the option to be included in the CVF and IEEE Xplore Proceedings.

Submission Site: https://openreview.net/group?id=thecvf.com/CVPR/2026/Workshop/Advml
Submission Due (both Paper and Supplementary Material): March 7, 2026, 11:59 PM (UTC±0)

Challenge (Tentative)

This workshop will host a competition focused on adversarial attacks against vision-language agent models, with an emphasis on two critical and timely research directions: (1) jailbreaking attacks targeting real-world embodied Vision–Language Agents, and (2) toxicity induction attacks against Large Vision-Language Models. These directions reflect urgent challenges in ensuring the safety, reliability, and ethical behavior of multimodal AI systems in practical applications. The competition comprises two parallel tracks, one for each research direction, each structured into two phases. During Phase 1, participants will engage in white-box evaluation, where model and training data are accessible; in Phase 2, evaluation shifts to a black-box setting, where neither models nor data are directly accessible, simulating realistic deployment scenarios.

Track I: Jailbreak Attacks on Embodied Vision–Language Agents. This workshop will hold a competition about jailbreak attacks that induce unsafe behaviors in Vision–Language Agents (VL agents) operating in simulated environments. As VL agents are increasingly deployed in embodied and decision-making settings, their ability to resist prompt- or image-based manipulations that cause them to execute harmful actions has become a critical safety concern. Specifically, participants are asked to design and submit adversarial instructions and environment-task scenarios that cause VL agents in a simulated environment to carry out unsafe behaviors. We classify unsafe behaviors into three categories for this competition: (1) environment-harmful actions that damage the simulated environment; (2) human-harmful actions that put simulated humans or human-representative agents at risk; and (3) agent-self-harmful actions that degrade or disable the agent’s own capabilities. Submissions should include a concise description of the attack, the input(s) used, and the target task or agent configuration.
The competition comprises two phases. During phase 1 we will adopt white-box evaluation (participants may access model internals and agent configurations), while during phase 2 we will adopt black-box evaluation (only query access to agents is allowed). Teams will be ranked using a severity-weighted unsafe-behavior score computed from the final round inputs and the agents’ observed behaviors; higher scores indicate more effective jailbreaks. Detailed scoring criteria and simulation rules will be provided with the evaluation code.
The dataset and simulation scenarios used in this competition are still under construction and will be released before March 15, 2026. If necessary contingencies arise, we will adopt an appropriate third-party open-source simulator/dataset as a replacement. Regarding ethics, we recognize potential risks in researching adversarial attacks. We will conduct an ethical review, limit exposure to real-world actionable instructions, sandbox all simulations, and require participants to agree to responsible-use terms; we believe these measures mitigate potential ethical concerns.

Track II: Jailbreak Attacks on Large Vision Language Models. This workshop will hold a competition about toxicity induction attacks against Large Vision-Language Models (LVLMs). With the widespread application of LVLMs globally, their toxicity has become a focal point of academic and industrial attention. Specifically, participants are asked to design and submit image-text pairs as adversarial examples in appointed tasks to induce LVLMs to produce toxic content (an utterance is toxic if it is rude, disrespectful, or unreasonable language likely to make someone leave a discussion).
In particular, this competition has 2 phases. During phase 1, we will adopt white-box evaluation (only models are accessible), while during phase 2, we will adopt black-box evaluation (models are not accessible). Ultimately, teams will be evaluated based on the toxicity score using the final round image-text pairs to determine the winning teams. We believe this competition will improve understanding of LVLMs' robustness against toxicity risks and promote their safe practical applications. The dataset used in this competition is not available now, yet we will finish it before March 15, 2026. If there are some contingencies, we will introduce the other third-party open-source dataset as a replacement. As for the ethical concerns, we do not find any potential ethical risks. The timeline can be found in the following table.

Challenge Site: TBD

Timeline

Challenge Timeline
Mar 15, 2026	Competition starts
Mar 17, 2026	Phase 1 starts
April 17, 2026	Phase 1 ends
April 18, 2026	Phase 2 starts
May 18, 2026	Phase 2 ends
May 30, 2026	Results will be released and participants will be selected to present
June 2026	Awards and presentation

Challenge Chair

Workshop Schedule
Event	Start time	End time
Opening Remarks	9:00	9:15
Invited Talk #1: Prof. Bo Li	9:15	9:45
Invited Talk #2: Prof. Chaowei Xiao	9:45	10:15
Contributed Talk #1	10:15	10:30
Coffee Break	10:30	10:45
Invited Talk #3: Prof. Ziwei Liu	10:45	11:15
Invited talk #4: Prof. Florian Tramèr	11:15	11:45
Contributed Talk #2	11:45	12:00
Lunch (12:00-13:30)
Invited Talk #5: Dr. Nouha Dziri	13:30	14:00
Invited Talk #6: Prof. Yaodong Yang	14:00	14:30
Invited Talk #7: Prof. Aditi Raghunathan	14:30	15:00
Poster Session	15:00	16:00
Challenge Session	16:00	16:30
Poster Session #2	16:30	17:00

The 6th Workshop of Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026), Wed June 3 - Sun June 7, 2026, Denver, CO, USA.

AdvML Workshop: June 3 or June 4

Overview

Important Dates

Timeline (Tentative)

Proposed Speakers

Organizers

JuliaKarbing

Call for Papers

Challenge (Tentative)

Sponsors

Program Committee

Julia
Karbing