Projects

Research organized by theme across AI safety, risk analysis, and trustworthy AI.

Attack

  1. CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
    Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Wai Lam, and Lizhuang Ma
    ACL, 2024
  2. PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
    Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang, Huchuan Lu, Feng Zhao, Yu Qiao, and Jing Shao
    ACL 2024, 2024
  3. LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts
    Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, and Jing Shao
    ACL 2025, 2024
  4. Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
    Ziqi Miao, Yi Ding, Lijun Li, and Jing Shao
    EMNLP 2025 Main Conference, 2025

Evaluation

  1. Assessment of Multimodal Large Language Models in Alignment with Human Values
    Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, and Jing Shao
    Arxiv, 2024
  2. SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
    Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, Wangmeng Zuo, Dahua Lin, Yu Qiao, and Jing Shao
    ACL, 2024
  3. ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models
    Zhelun* Shi, Zhipin* Wang, Hongxing* Fan, Zhenfei Yin, Lu† Sheng, Yu Qiao, and Jing† Shao
    Arxiv, 2023
  4. LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
    Zhenfei* Yin, Jiong* Wang, JianJian* Cao, Zhelun* Shi, Dingning Liu, Mukai Li, Lu Sheng, Lei† Bai, Xiaoshui Huang, Zhiyong Wang, Jing† Shao, and Wanli Ouyang
    NeurIPS, 2023
  5. Benchmarking Omni-Vision Representation through the Lens of Visual Realms
    Yuanhan Zhang, Zhenfei Yin, Jing Shao, and Ziwei Liu
    ECCV, 2022
  6. ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
    Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, and Ziwei Liu
    In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021
  7. RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents
    Jingyi Yang, Shuai Shao, Dongrui Liu, and Jing Shao
    arXiv, May 2025
  8. IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
    Xiaoya Lu, Zeren Chen, Xuhao Hu, Yijin Zhou, Weichen Zhang, and Dongrui Liu
    arXiv, May 2025

Frontier AI Risk Analysis

  1. Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
    Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, and Jing Shao
    ACL, 2024
  2. From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
    Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, and Zhipin Wang
    Technicle Report, 2024
  3. REEF: Representation Encoding Fingerprints for Large Language Models
    Jie Zhang
    ICLR 2025, 2024
  4. The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models
    Chen Qian, Dongrui Liu, Jie Zhang, Yong Liu, and Jing Shao
    arXiv, 2024
  5. OASIS: Open Agent Social Interaction Simulations with One Million Agents
    Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Chaochao Lu, Wanli Ouyang, Yu Qiao, Philip Torr, and Jing Shao
    arXiv, 2024

Risk Mitigation

  1. SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
    Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, and Jing Shao
    Arxiv, 2024
  2. X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability
    Xiaoya Lu, Dongrui Liu, Yi Yu, Luxin Xu, and Jing Shao
    EMNLP 2025 Findings, 2025