Projects
Research organized by theme across AI safety, risk analysis, and trustworthy AI.
Attack
- CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code CompletionACL, 2024
- PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System SafetyACL 2024, 2024
- LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution ShiftsACL 2025, 2024
Evaluation
-
- SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language ModelsACL, 2024
- ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language ModelsArxiv, 2023
- LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and BenchmarkNeurIPS, 2023
- Benchmarking Omni-Vision Representation through the Lens of Visual RealmsECCV, 2022
- ForgeryNet: A Versatile Benchmark for Comprehensive Forgery AnalysisIn 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021
Frontier AI Risk Analysis
- Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language ModelsACL, 2024
- From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four ModalitiesTechnicle Report, 2024
- REEF: Representation Encoding Fingerprints for Large Language ModelsICLR 2025, 2024
- The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language ModelsarXiv, 2024